Week-1
1. Centering the dataset 2. Covariance matrix 3. Optimization problem 4. Principal components 5. Projections 6. Reconstruction error revisited (for k directions) 7. Variance captured 8. Compression
1. Centering the dataset x=1
n
ni=1xi
If the dataset is already centered, x=0. If x0, do the following: xi=xi-x Xc=a
| |
x1xn
| |
Xc is the centered data-matrix.
Remark: From now we will work only with the centered data-matrix and will be calling it X (the subscript c will be dropped)
2. Covariance matrixShapeCRd×d Outer-product form C=1
n
ni=1xixTi
Matrix-form C=1
n
XXT
Scalar form Cpq=1
n
xipxiq
Cpq captures the covariance between the pth feature and the qth feature. As a special case: Cpp=1
n
x2ip
Cpp captures the variance of the pth feature. Properties CT=CAll eigenvalues of C are non-negative.𝜆1⩾⋯⩾𝜆d0There is an orthonormal basis for Rd made up of eigenvectors of C{w1,,wd}This comes from the spectral theorem.
Note: If C is a square matrix, then (𝜆,w) is said to be an eigenvalue-eigenvector pair if Cw=𝜆w. Note that w0 for it to be an eigenvector.
Remark: wi will always represent a unit-norm vector in the rest of the document.
3. Optimization problemMinimizing the reconstruction error
min
w
1
n
ni=1||xi-(xTiw)w||2
Maximizing the variance
max
w
wTCw
Both forms are equivalent to each other. 4. Principal componentsLet (𝜆1,w1),,(𝜆d,wd) be the eigen-pairs of C, where 𝜆1⩾⋯⩾𝜆d and {w1,,wd} is an orthonormal basis for Rd. wi is termed the ith principal component of C. To be more precise: Cwi=𝜆iwi wTiwj=a
1,i=j
0,ij
𝜆1=
max
w
wTCw
w1=
argmax
w
wTCw
wT1Cw1=𝜆1 5. Projections(Vector) Projection of xi onto the jth PC (xTiwj)wj Scalar projection of xi onto the jth PC (or) coordinate of the data-point along this direction: xTiwj The projection of a data-point xi onto the top k principal components: xi=(xTiw1)w1++(xTiwk)wk To represent the reconstruction and scalar projections in matrix form: WRd×k W=a
| |
w1wk
| |
Scalar projections XRk×n X=a
xT1w1 xTnwk
||
xT1wk xTnwk
X=WTX Reconstruction XRd×n X=WWTX 6. Reconstruction error revisited (for k directions) 1
n
ni=1||xi-xi||2
1
n
ni=1aaxi-kj=1(xTiwj)wjaa2
7. Variance captured Total variance: 𝜆1++𝜆d Variance along a given direction w (unit vector): 1
n
ni=1(xTiw)2
wTCw Proportion of variance captured by top k PCs: 𝜆1++𝜆k
𝜆1++𝜆d
Heuristic to choose the value of k: smallest value that captures 95% of the variance in the dataset. 8. Compression Reconstruction nk+dk
dn
=k(d+n)
dn
Retaining only scalar projections kn
dn
=k
d