Week-1 | Summary

Week-1

1. Centering the dataset 2. Covariance matrix 3. Optimization problem 4. Principal components 5. Projections 6. Reconstruction error revisited (for k directions) 7. Variance captured 8. Compression

1. Centering the dataset

⏨

n∑i=1x

If the dataset is already centered,

⏨

=0. If

⏨

≠0, do the following: x

′

⏨

\|		\|
x′1	⋯	x′n
\|		\|

is the centered data-matrix.

Remark: From now we will work only with the centered data-matrix and will be calling it X (the subscript c will be dropped)

2. Covariance matrixShapeC∈R

d×d

Outer-product form C=

n∑i=1x

Matrix-form C=

Scalar form C

∑ x

captures the covariance between the p

feature and the q

feature. As a special case: C

∑ x

captures the variance of the p

feature. Properties • C

=C• All eigenvalues of C are non-negative.– 𝜆

⩾⋯⩾𝜆

⩾0• There is an orthonormal basis for R

made up of eigenvectors of C–

{

,⋯,w

}

– This comes from the spectral theorem.

Note: If C is a square matrix, then

(

𝜆,w

)

is said to be an eigenvalue-eigenvector pair if Cw=𝜆w. Note that w≠0 for it to be an eigenvector.

Remark: w

will always represent a unit-norm vector in the rest of the document.

3. Optimization problemMinimizing the reconstruction error

min

n∑i=1||x

(

)

w||

Maximizing the variance

max

w w

Cw Both forms are equivalent to each other. 4. Principal componentsLet

(

𝜆

)

,⋯,

(

𝜆

)

be the eigen-pairs of C, where 𝜆

⩾⋯⩾𝜆

and

{

,⋯,w

}

is an orthonormal basis for R

. w

is termed the i

principal component of C. To be more precise: Cw

=𝜆

1,		i=j
0,		i≠j

𝜆

max

w w

Cw w

arg

max

w w

Cw w

=𝜆

5. Projections(Vector) Projection of x

onto the j

(

)

Scalar projection of x

onto the j

PC (or) coordinate of the data-point along this direction: x

The projection of a data-point x

onto the top k principal components: x

′

(

)

+⋯+

(

)

To represent the reconstruction and scalar projections in matrix form: W∈R

d×k

\|		\|
w1	⋯	wk
\|		\|

Scalar projections X

′

∈R

k×n

′

xT1w1		xTnwk
\|	⋯	\|
xT1wk		xTnwk

′

X Reconstruction X

′

∈R

d×n

′

=WW

X 6. Reconstruction error revisited (for k directions)

n∑i=1||x

-x

′

n∑i=1

-k∑j=1

(

)

7. Variance captured Total variance: 𝜆

+⋯+𝜆

Variance along a given direction w (unit vector):

n∑i=1

(

)

Cw Proportion of variance captured by top k PCs:

𝜆

+⋯+𝜆

𝜆

+⋯+𝜆

Heuristic to choose the value of k: smallest value that captures 95% of the variance in the dataset. 8. Compression Reconstruction

nk+dk

(

d+n

)

Retaining only scalar projections

1	-0.9
-0.9	1