Symbol Soup

1. NOTEThis document is a collection of equations and expressions that are found throughout the course. All context and background information are absent and only the name of the object is presented. The reader is advised to proceed with extreme caution while engaging with this document. 2. Week-1 Data matrix X∈R

d×d

Mean of the data-points

⏨

n∑i=1x

To center data-points x

′

⏨

Centered data matrix X

∈R

d×d

\|		\|
x′1	⋯	x′n
\|		\|

is the centered data-matrix of shape d×n and we will call this X from now. Covariance Matrix We assume a centered dataset. C∈R

d×d

Outer-product form C=

n∑i=1x

Matrix-form C=

Scalar form C

n∑i=1x

Projection of x onto unit vector w

(

)

w Scalar Projection of x onto unit vector w x

w Error vector for one data-point e=x-

(

)

w Reconstruction error for one data-point ||e||

=||x-

(

)

w||

Reconstruction error for the entire dataset

n∑i=1||x

(

)

w||

Error Minimization

min

w||w||=1

n∑i=1||x

(

)

w||

is the same as

min

w||w||=1

n∑i=1||x

(

)

Variance of dataset along w

n∑i=1

(

)

is the same as w

Cw Variance Maximization

max

w||w||=1

n∑i=1

(

)

is the same as

max

w||w||=1 w

Cw First Principal Component

max w\|\|w\|\|=1 wTCw	=𝜆1

argmax w\|\|w\|\|=1 wTCw	=w1

Principal Component

max w\|\|w\|\|=1wTw1=0,⋯,wTwi-1=0 wTCw	=𝜆i

argmax w\|\|w\|\|=1wTw1=0,⋯,wTwi-1=0 wTCw	=wi

Projection of x

on subspace spanned by top-k PCs.

(

)

+⋯+

(

)

Principal component matrix W∈R

d×k

\|		\|
w1	⋯	wk
\|		\|

Dimensionality reduced dataset X

′

∈R

k×n

′

X Compression ratio for dimensionality reduction Defined as new/old:

Reconstruction in R

′

∈R

d×n

′

=WW

X Reconstruction error

n∑i=1

-k∑j=1

(

)

Compression ratio for reconstruction in R

Defined as new/old:

(

n+d

)

Total variance 𝜆

+⋯+𝜆

Choice of k, % variance captured

𝜆

+⋯+𝜆

𝜆

+⋯+𝜆

⩾0.95