Machine Learning Algorithms-Dimensionality Reduction

LDA (Linear Discriminant Analysis)

For dataset that used to train a classification model, we can use LDA to enhance the discriminant possibility.

Step 1 First calculate the within-class scatter and between-classes scatter

  1. within-class scatter

    SW=classes cjcPc(Xjμc)(Xjμc)TS_W = \sum \limits_{classes\ c} \sum \limits_{j \in c} P_c(X_j - \mu_c)(X_j - \mu_c)^T
  2. between-classes scatter

    SB=classes c(μcμ)(μcμ)TS_B = \sum \limits_{classes\ c} (\mu_c - \mu)(\mu_c - \mu)^T

Step 2 Fine the projection direction w, maximum the ratio of between classes scatter and within-class scatter

SBw=λSWwS_B w = \lambda S_W w

this equals to

SW1SBw=λwS_W ^{-1} S_B w = \lambda w
  • eigenvector w : this is the optimal projection direction (discriminant direction)
  • eigenvalue λ\lambda : reflect the discriminant ability in this direction (value greater, classification ability better)

Step 3 Select first k eigenvectors
if there are k classes, most K-1 discriminant direction can be selected (because rank(SB))cannotexceedK1rank(S_B)) cannot exceed K-1

Select the first k eigenvector, along the eigen values ordered descent, comprise projection matrix W=[w1,w2,...,wk]W = [w_1, w_2, ..., w_k]

Step 4 Project data into reduced space

Kprojected=XWK_{projected} = XW
  • XprojectedX_projected is the dimensionality reduced data, this could be used in classification.

PCA

The PCA algorithm should be applied on a set of training dataset, which is to reduce the dimensionality, by discard the lower variance features, but keep the principal components inside the data.
It generally take the training dataset usually is a set of numerical vector, and apply the PCA on it, to get the Principal Components, and calculation the projection of the original vertors on it. The principal components should be kept for future applied on test dataset before predicting.

A simulatation of PCA on a concrete 3*2 dataset (3 samples, 2 features)
Given data matrix X (3 samples * 2 features):

X=[122456] X = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 5 & 6 \\ \end{bmatrix}

Step 1 Center the Data (Mean Normalization)
Compute the mean of each feature and subtract it:

Mean of x1=1+3+53=3x_1 = \frac{1+3+5}{3} = 3
Mean of x2=2+4+63=4x_2 = \frac{2+4+6}{3} = 4

Xcentered=XMean=[132433445364]=[220022]X_{centered} = X- Mean = \begin{bmatrix} 1-3 & 2-4 \\ 3-3 & 4-4 \\ 5-3 & 6-4 \\ \end{bmatrix} = \begin{bmatrix} -2 & -2 \\ 0 & 0 \\ 2 & 2 \\ \end{bmatrix}

Step 2 Compute Covariance Matrix C
Since features are columns, covariance matrix is:

C=1n1XcenteredTXcentered=12[202202][220022]=[4444]C = \frac{1}{n-1} X_{centered}^T X_{centered} = \frac{1}{2} \begin{bmatrix} -2 & 0 & 2 \\ -2 & 0 & 2 \\ \end{bmatrix} \begin{bmatrix} -2 & -2 \\ 0 & 0 \\ 2 & 2 \\ \end{bmatrix} =\begin{bmatrix} 4 & 4 \\ 4 & 4 \\ \end{bmatrix}

Step 3 Eigenvalue Decomposition
Find eigenvalues (λ\lambda) and eigenvectors (v) of C:

det(CλI)=0det[4λ444λ]=0det(C - \lambda I) = 0 \Rightarrow det \begin{bmatrix} 4 - \lambda & 4 \\ 4 & 4 - \lambda \\ \end{bmatrix} = 0

Eigenvalues: λ1=8,λ2=0\lambda_1 = 8, \lambda_2 = 0

Eigenvector for λ1=0\lambda_1 = 0:

(C8I)v=0[4444][v1v2]=0v1=v2(C- 8I)v = 0 \Rightarrow \begin{bmatrix} -4 & 4 \\ 4 & -4 \\ \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = 0 \Rightarrow v_1 = v_2

Normalized eigenvector:

v1=[1212]v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix}

Step 4: Project Data onto Principal Component

Only 1 PC(v1v_1) is meaningful since λ2=0\lambda_2 = 0.
Project XcenteredX_{centered} onto v1v_1:

PC1=Xcenteredv1=[220022][1212]=[2.82802.828]PC1 = X_{centered} \cdot v_1 = \begin{bmatrix} -2 & -2 \\ 0 & 0 \\ 2 & 2 \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ \end{bmatrix} = \begin{bmatrix} -2.828 \\ 0 \\ 2.828 \\ \end{bmatrix}