Radial Basis Functions and Splines / Dimensionality Reduction

Chapter 5 : Radial Basis Functions and Splines

5.1 Receptive fields

Different weight vector reponse to different input vector differently in term of the distence between them.

5.2 The radial basis function(RBF) network

RBF function is a radial basis function, which response to input differently according to the distance between the input vector and the weight vector, it is more sensitive as the distance shorted, and vice versa.
RBF can be a layer or a component of a neural network.
Gaussian function is an often used RBF which is as below :

g(x,w,σ)=exp(xw22σ2) g(x,w,\sigma) = exp(\frac{-||x-w||^2}{2\sigma ^2})

Or the normalised Gaussians,

g(x,w,σ)=exp(xw2/2σ2)iexp(xwi/2σ2) g(x,w,\sigma) = \frac{exp(-||x-w||^2 / 2\sigma ^2)}{\sum_i exp(-||x-w_i|| / 2\sigma^2)}
  • d is maximum distance between the locations of the hidden nodes
  • σ=d/(2M)\sigma = d / \sqrt(2M)

RBF networks never have more than one layer of non-linear neurons.

  • Training the RBF network

    1. position the RBF nodes
    2. use the activations of those nodes to train the linear outputs
  • How to position the RBF nodes

    1. Randomly picking some of the datapoints to act as basis locations.
    2. Use k-means algorithm (unsupervised learning) to position the nodes which are representative of typical inputs.

The Radial Basic Function Algorithm

  1. Position the RBF centres by either:
    • Using the k-means algorithm to initialise the positions of the RBF centres Or
    • Setitng the RBF centres to be randomly chosen datapoints
  2. Calculate the actions of the RBF nodes using Gaussian Equation
  3. Train the output weights by either:
    • Using the perceptron Or
    • Computing the pseudo-inverse of the activations of the RBF centres
  • Problem with the RBF netwrok, when the input dimensionality increase, there will need more RBF nodes to cover the weight space.

5.3 Interpolation and basis functions

RBF nodes in weight space can fit the input data and get an interpolation with splines, cubic function is often effectively. Data out of the bounds will be expressed as linearly.

Chapter 6 : Dimensionality Reduction

Analysising and reduce the input dimensionality is useful to recovery the underneeth relationship of the data and reduce the network dimension.

6.1 Linear discriminant analysis (LDA)

Converiance within class and converiance between classes could tesll us the scatter of the dataset. It doesn’t apply any manipulation on dataset.

6.2 Principal components analysis (PCA)

The Principal Components Analysis Algorithm

  1. Write input data points as matrix X, n*m, n is the number of inputs, and m is the dimension of one input.
  2. Centre the data by substracting off the mean of each column
  3. Compute the covariance matrix C=1/NBTBC = 1/N B^TB
  4. Compute the eigenvalues and eigenvectors of C, so V1CV=DV^{-1}CV=D, where V holds the eigenvectors of C and D is the M*M diagnoal eigenvalue matrix
  5. Sort the columns of D into order of decreasing eigenvalues, and aply the same order to the columns of V
  6. Reject those with eigenvalue less than some η\eta , leaving L dimensions in the data.

Kernel PCA add a kernel function to project the input data first, then apply PCA which can help to increase the complex of the shape of the axis.

6.3 Factor analysis

Factor analysis is to ask whether the data that is observed can be explained by a smaller number of uncorrelated factors or latent variables.

6.4 Independent components analysis (ICA)

ICA is a related approach to factor analysis, and a well known solver of blind source separation through compute the unmixing matrix which defined as the inverse of mixing matrix A:

x=Asx=As

Mutual information and negentropy are used in ICA, and FastICA/informaxICA are two popular ICA algorithm implementations.

6.5&6.6 Locally linear embedding and ISOMAP

Locally linear embedding and ISOMAP are two very powerful input analysising and dimension reduction algorithm.
ISOMAP is based on Multi-Dimensional scaling (MDS).