Kij=xTixjProperties• XXT and XTX are positive semi-definite (both have non-negative eigenvalues)• XXT and XTX have the same non-zero eigenvalues• • rank(XTX)=rank(XXT)=rank(X)=r• • 𝜆1⩾⋯⩾𝜆r>0• • If (𝜆i,vi) is an eigenpair of K with ||vi||=1• – (𝜆i
n,Xvi
𝜆i) is an eigenpair of C– wi=Xvi
𝜆i is the ith PC of CComplexity in this case is O(n3)3. Addressing non-linearity (Feature Transformation)x1x2X1X2X1=x21X2=x22𝜙:Rd→RDExample of a polynomial transformation𝜙(a
x1
x2
)=a
1
x1
x2
x21
x22
x1x2
Transformed data-matrix𝜙(X)∈RD×n𝜙(X)=a
|
|
𝜙(x1)
⋯
𝜙(xn)
|
|
Transformed dataset might be linear in the transformed feature space. PCA can be run on this transformed dataset in RD. But explicit transformations can be hard. Kernels help here.If there are a lot of features that you are adding, then D≫n, so this would take us back to issue-1 (complexity).4. KernelsKernel measures the similarity between data-points in the transformed space.k:Rd×Rd→Rk(x,y)=𝜙(x)T𝜙(y)Polynomial kernel of degree pk(x,y)=(1+xTy)pThe transformation corresponding to this maps to a space RD where:D=a
p+d
d
Gaussian kernelk(x,y)=exp(-||x-y||2
2𝜎2)1D example for x=0,𝜎=1Kernel matrixFor a dataset D={x1,⋯,xn}K∈Rn×nKij=k(xi,xj)Mercer's TheoremA kernel k:Rd×Rd→R is valid if and only if:• k is symmetric• For any set of data-points {x1,⋯,xn}, the kernel matrix K is symmetric and positive semi-definite.5. Kernel PCAKernel-PCA(D, k)• Compute the kernel matrix K using the kernel k• • Let (𝜆i,vi) be an eigenpair of K with 𝜆i>0 and ||vi||=1• – If r is the rank of K, there are r non-zero eigenvalues.• – 𝜆1⩾⋯⩾𝜆r>0• • Form the following matrices:• – D=a
1
𝜆1
⋱
1
𝜆r
, D∈Rr×r• – V=a
|
|
v1
⋯
vr
|
|
, V∈Rn×r• • The (scalar) projection of the data-points in the transformed space is given by:• – X′∈Rr×n• – X′=DVTK6. Kernel Centering𝜙:Rd→RD1n×n=1
na
⋮
⋯
1
⋯
⋮
𝜙c(X)=𝜙(X)-𝜙(X)1n×nCovariance matrix of transformed dataseta
C
=1
n𝜙c(X)𝜙c(X)T
Let k be a kernel corresponding to the transformation 𝜙:k:Rd×Rd→Rk(x,y)=𝜙(x)T𝜙(y)Kernel matrixK=𝜙(X)T𝜙(X)Centered kernel matrixa
Kc
=𝜙c(X)T𝜙c(X)
Kc=K-K1n×n-1n×nK+1n×nK1n×nWe now replace K with Kc in the kernel-PCA algorithm.