Pca question doubt urgent

Q.No1- Prove that samples in the low-dimensional principal subspace are
uncorrelated.

Q.No2- Consider N, d-dimensional data points. Derive the expression for
Variance/Covariance matrix Sdxd(X) in terms of Variance/Covariance matrix
SNxN(XT). Where d>>N.

What will be the solution to these questions?

1 Like

Try performing operations on the correlation matrix.

1 Like

The way PCA works is by eliminating the dimensions that are redundant first. So, it comes up with the new dimensions which are less correlated.

Therefore, after PCA and removal of some of the dimensions, the low-dimensional principal subspaces would have low correlation with each other.

I am not entirely clear about the question but the way we compute the covariance is very much like variance.

For N points which have d-dimensions, follow this process to find the covariance

  • Find the mean values for each of the d dimensions.
  • Subtract from each row the mean of columns.
  • Form pairs of each column with other column and multiple the values and sum over N data points
  • This matrix will be the covariance matrix

Here is the code:

import numpy as np
A = np.array([
    [90, 60, 90],
    [90, 90, 30],   
    [60, 60, 60],
    [60, 60, 90],
    [30, 30, 30],
])
1. Find the mean values for each of the d dimensions.
AM = np.mean(A, axis=0)
AM
array([66., 60., 60.])

2. Subtract from each row the mean of columns.
AD = A - AM
AD
array([[ 24.,   0.,  30.],
       [ 24.,  30., -30.],
       [ -6.,   0.,   0.],
       [ -6.,   0.,  30.],
       [-36., -30., -30.]])
ADT = AD.T
ADT
array([[ 24.,  24.,  -6.,  -6., -36.],
       [  0.,  30.,   0.,   0., -30.],
       [ 30., -30.,   0.,  30., -30.]])

Form pairs of each column with other column and multiple the values and sum over N data points
This matrix will be the covariance matrix

COV = np.dot(ADT, AD)/len(A)
COV
array([[504., 360., 180.],
       [360., 360.,   0.],
       [180.,   0., 720.]])