Machine Learning Algorithms-Dimensionality Reduction

2025-04-21

LDA (Linear Discriminant Analysis)

For dataset that used to train a classification model, we can use LDA to enhance the discriminant possibility.

Step 1 First calculate the within-class scatter and between-classes scatter

within-class scatter
$S_W = \sum \limits_{classes\ c} \sum \limits_{j \in c} P_c(X_j - \mu_c)(X_j - \mu_c)^T$
between-classes scatter
$S_B = \sum \limits_{classes\ c} (\mu_c - \mu)(\mu_c - \mu)^T$

Step 2 Fine the projection direction w, maximum the ratio of between classes scatter and within-class scatter

S_B w = \lambda S_W w

this equals to

S_W ^{-1} S_B w = \lambda w

eigenvector w : this is the optimal projection direction (discriminant direction)
eigenvalue $\lambda$ : reflect the discriminant ability in this direction (value greater, classification ability better)

Step 3 Select first k eigenvectors
if there are k classes, most K-1 discriminant direction can be selected (because $rank(S_B)) cannot exceed K-1$

Select the first k eigenvector, along the eigen values ordered descent, comprise projection matrix $W = [w_1, w_2, ..., w_k]$

Step 4 Project data into reduced space

K_{projected} = XW

$X_projected$ is the dimensionality reduced data, this could be used in classification.

PCA

The PCA algorithm should be applied on a set of training dataset, which is to reduce the dimensionality, by discard the lower variance features, but keep the principal components inside the data.
It generally take the training dataset usually is a set of numerical vector, and apply the PCA on it, to get the Principal Components, and calculation the projection of the original vertors on it. The principal components should be kept for future applied on test dataset before predicting.

A simulatation of PCA on a concrete 3*2 dataset (3 samples, 2 features)
Given data matrix X (3 samples * 2 features):

X = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 5 & 6 \\ \end{bmatrix}

Step 1 Center the Data (Mean Normalization)
Compute the mean of each feature and subtract it:

Mean of $x_1 = \frac{1+3+5}{3} = 3$
Mean of $x_2 = \frac{2+4+6}{3} = 4$

X_{centered} = X- Mean = \begin{bmatrix} 1-3 & 2-4 \\ 3-3 & 4-4 \\ 5-3 & 6-4 \\ \end{bmatrix} = \begin{bmatrix} -2 & -2 \\ 0 & 0 \\ 2 & 2 \\ \end{bmatrix}

Step 2 Compute Covariance Matrix C
Since features are columns, covariance matrix is:

C = \frac{1}{n-1} X_{centered}^T X_{centered} = \frac{1}{2} \begin{bmatrix} -2 & 0 & 2 \\ -2 & 0 & 2 \\ \end{bmatrix} \begin{bmatrix} -2 & -2 \\ 0 & 0 \\ 2 & 2 \\ \end{bmatrix} =\begin{bmatrix} 4 & 4 \\ 4 & 4 \\ \end{bmatrix}

Step 3 Eigenvalue Decomposition
Find eigenvalues ( $\lambda$ ) and eigenvectors (v) of C:

det(C - \lambda I) = 0 \Rightarrow det \begin{bmatrix} 4 - \lambda & 4 \\ 4 & 4 - \lambda \\ \end{bmatrix} = 0

Eigenvalues: $\lambda_1 = 8, \lambda_2 = 0$

Eigenvector for $\lambda_1 = 0$ :

(C- 8I)v = 0 \Rightarrow \begin{bmatrix} -4 & 4 \\ 4 & -4 \\ \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = 0 \Rightarrow v_1 = v_2

Normalized eigenvector:

v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix}

Step 4: Project Data onto Principal Component

Only 1 PC( $v_1$ ) is meaningful since $\lambda_2 = 0$ .
Project $X_{centered}$ onto $v_1$ :

PC1 = X_{centered} \cdot v_1 = \begin{bmatrix} -2 & -2 \\ 0 & 0 \\ 2 & 2 \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ \end{bmatrix} = \begin{bmatrix} -2.828 \\ 0 \\ 2.828 \\ \end{bmatrix}