Week-2 | Summary

Week-2

1. Issues with PCA 2. Addressing complexity (XX

and X

X) 3. Addressing non-linearity (Feature Transformation) 4. Kernels 5. Kernel PCA 6. Kernel Centering

1. Issues with PCAComplexity O

(

)

Problem when d≫n Non-linearity PCA assumes that data lies in a linear subspace. 2. Addressing complexity (XX

and X

X) XX

and X

X C=

Gram matrix K=X

X K∈R

n×n

-	xT1	-
	⋮
-	xTn	-

\|		\|
x1	⋯	xn
\|		\|

Properties • XX

and X

X are positive semi-definite (both have non-negative eigenvalues) • XX

and X

X have the same non-zero eigenvalues• •

rank

(

)

rank

(

)

rank

(

)

=r• • 𝜆

⩾⋯⩾𝜆

>0• • If

(

𝜆

)

is an eigenpair of K with ||v

||=1• –

(

𝜆

)

is an eigenpair of C– w

𝜆

is the i

PC of C Complexity in this case is O

(

)

3. Addressing non-linearity (Feature Transformation) 𝜙:R

→

Example of a polynomial transformation 𝜙

(

)

Transformed data-matrix 𝜙

(

)

∈R

D×n

𝜙

(

)

\|		\|
𝜙(x1)	⋯	𝜙(xn)
\|		\|

Transformed dataset might be linear in the transformed feature space. PCA can be run on this transformed dataset in R

. But explicit transformations can be hard. Kernels help here. If there are a lot of features that you are adding, then D≫n, so this would take us back to issue-1 (complexity). 4. Kernels Kernel measures the similarity between data-points in the transformed space. k:R

×R

→

R k

(

x,y

)

=𝜙

(

)

𝜙

(

)

Polynomial kernel of degree p k

(

x,y

)

(

1+x

)

The transformation corresponding to this maps to a space R

where: D=

p+d

Gaussian kernel k

(

x,y

)

exp

(

-||x-y||

2𝜎

)

1D example for x=0,𝜎=1 Kernel matrix For a dataset D=

{

,⋯,x

}

K∈R

n×n

(

)

Mercer's Theorem A kernel k:R

×R

→

R is valid if and only if:• k is symmetric• For any set of data-points

{

,⋯,x

}

, the kernel matrix K is symmetric and positive semi-definite. 5. Kernel PCA Kernel-PCA(D, k) • Compute the kernel matrix K using the kernel k• • Let

(

𝜆

)

be an eigenpair of K with 𝜆

>0 and ||v

||=1• – If r is the rank of K, there are r non-zero eigenvalues.• – 𝜆

⩾⋯⩾𝜆

>0• • Form the following matrices:• – D=

1 𝜆1
	⋱
		1 𝜆r

, D∈R

r×r

• – V=

\|		\|
v1	⋯	vr
\|		\|

, V∈R

n×r

• • The (scalar) projection of the data-points in the transformed space is given by:• – X

′

∈R

r×n

• – X

′

=DV

K 6. Kernel Centering 𝜙:R

→

n×n

	⋮
⋯	1	⋯
	⋮

𝜙

(

)

=𝜙

(

)

-𝜙

(

)

n×n

Covariance matrix of transformed dataset

C	=1 n𝜙c(X)𝜙c(X)T

Let k be a kernel corresponding to the transformation 𝜙: k:R

×R

→

R k

(

x,y

)

=𝜙

(

)

𝜙

(

)

Kernel matrix K=𝜙

(

)

𝜙

(

)

Centered kernel matrix

Kc	=𝜙c(X)T𝜙c(X)

=K-K1

n×n

-1

n×n

K+1

n×n

We now replace K with K

in the kernel-PCA algorithm.