Matrix Computation Algorithms

1. Gaussian Eliminatin and Its Variants

Topics covered:

  1. Matrix Multiplication
  2. Systems of Linear Equations
  3. Trianglar Systems
  4. Positive Definite Systems; Cholesky Decomposition
  5. Banded Positive Definite Systems
  6. Sparse Positive Definite Systems
  7. Gaussian Elimination and the LU Decomposition
  8. Gaussian Elimination with Pivoting
  9. Sparse Gaussian Elimination

Cholesky Decomposition

To solve a linear equation Ax=b ,if the A matrix is positive definite, we can use Choleskey decomposition.
To solve Ax = b with A = RTRR^TR

  1. let y = Rx
  2. we have RTy=bR^Ty=b, since RTR^T is lower triangular, we can solve for y (by forward substitution)
  3. then from Rx = y, with R is upper triangular, we can solve for x (by backward substitution)

Positive Definite Matrix

Sylvester’s criterion

Sylvester’s criterion can help to determine if a matrix is positive definite.
A symmetric matrix A is positive definite if and only if all of its leading principal minors have a positive determinant:
Example:

A=[210121012]A = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1\\ 0 & -1 & 2 \end{bmatrix}

A is positive definite since:

  1. 2 > 0,

  2. det([2112])=3>0 det \begin{pmatrix} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \end{pmatrix} = 3 > 0
  3. det(A)=232=4>0det(A) = 2 * 3 -2 = 4 >0

properties :

  1. Let M be any n * n nonsignular matrix, and let A=MMTA=MM^T. Then A is positive definite.
  2. If A is positive definite, then A is nonsingular.
  3. If A is positive definite, then ai,i>0a_{i,i} > 0 for i= 1,2,…,n.
  4. If A is positive definite, then A can be decomposed in exactly one way into a productA=RTR=LLT, withRT=LA = R^TR = LL^T \text{, with} R^T = L such that RTR^T is lower triangular with ri,i>0r_{i,i}>0 (all main diagonal are positive)

Cholesky Decomposition Algorithm

A=[a1,1a1,2a1,3...a1,na2,1a3,2a3,3...a2,na3,1a3,2a3,3...a3,n............an,1an,2an,3...an,n]=[r1,100...0r1,2r2,20...0r1,3r2,3r3,3...0............r1,nr2,nr3,n...rn,n][r1,1r1,2r1,3...r1,n0r2,2r2,3...r2,n00r3,3...r3,n............rn,1rn,2rn,3...rn,n]A = \begin{bmatrix} a_{1,1}&a_{1,2}&a_{1,3}& \text{...} &a_{1,n}\\ a_{2,1}&a_{3,2}&a_{3,3}& \text{...} &a_{2,n}\\ a_{3,1}&a_{3,2}&a_{3,3}& \text{...} &a_{3,n}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ a_{n,1}&a_{n,2}&a_{n,3}& \text{...} &a_{n,n} \end{bmatrix} = \begin{bmatrix} r_{1,1}&0&0&\text{...}&0\\ r_{1,2}&r_{2,2}&0&\text{...}&0\\ r_{1,3}&r_{2,3}&r_{3,3}&\text{...}&0\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ r_{1,n}&r_{2,n}&r_{3,n}&\text{...}&r_{n,n} \end{bmatrix} \begin{bmatrix} r_{1,1}&r_{1,2}&r_{1,3}&\text{...}&r_{1,n}\\ 0&r_{2,2}&r_{2,3}&\text{...}&r_{2,n}\\ 0&0&r_{3,3}&\text{...}&r_{3,n}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ r_{n,1}&r_{n,2}&r_{n,3}&\text{...}&r_{n,n} \end{bmatrix}

Pseudocode

Given A,
for i = 1 : n do
    r(i,i)=(a(i,i)k=1i1rk,i2)1/2r(i,i) = (a(i,i) -  \sum_{k=1}^{i-1} r_{k,i}^2)^{1/2} 
    for j=i+1 : n do
        r(i,j)=(a(i,j)k=1i1r(k,i)r(k,j))/r(i,i)r(i,j) = (a(i,j) - \sum_{k=1}^{i-1} r(k,i)r(k,j))/r(i,i) 
    end for
end for

Cost

n33+O(n2)\frac{n^3}{3} + O(n^2)

Matlab function

chol:

RT=chol(A,lower)R^T = chol(A, 'lower') produces a lower trianglar matrix RTR^T

From A=RTRA= R^TR, we obtain det(A)=(det(R))2det(A) = (det(R))^2

Tridiagonal Matrix

For a tridiagonal matrix which looks like

A=[d1e1c1d2e2.........cn2dn1en1cn1dn]A = \begin{bmatrix} d_1& e_1 \\ c_1&d_2&e_2 \\ &\text{.}&\text{.}&\text{.}\\ &&\text{.}&\text{.}&\text{.}\\ &&&\text{.}&\text{.}&\text{.}\\ &&&&c_{n-2}&d_{n-1}&e_{n-1}\\ &&&&&c_{n-1}&d_n \end{bmatrix}

which aij=0a_{ij} = 0 for all i, j with |i-j|>1,
and there are only n+(n+1)+(n-1) = 3n-2 entries in matrix A.

A recurrence computation method can be applied on A for Cholesky decomposition.

Since A=RTRA=R^TR, we suppose

RT=[r1,1r1,2r22.........rn2,n1rn1,n1rn1,nrn,n]R^T = \begin{bmatrix} r_{1,1} \\ r_{1,2}& r_{22}\\ &\text{.}&\text{.}&\text{.}\\ &&\text{.}&\text{.}&\text{.}\\ &&&\text{.}&\text{.}&\text{.}\\ &&&&r_{n-2,n-1}&r_{n-1,n-1}\\ &&&&&r_{n-1,n}&r_{n,n} \end{bmatrix}

Partition A of order n as

A=[A^aaTan,n]A = \begin{bmatrix} \hat{A} & a \\ a^T & a_{n,n} \end{bmatrix}

If we are trying to find L which A=LLTA = LL^T,
partition L as

L=[L^0lTln,n]L=\begin{bmatrix} \hat{L} & 0 \\ l^T &l_{n,n} \end{bmatrix}

Then we can denote A as

A=[A^aaTan,n]=[L^0lTln,n][L^Tl0ln,n]A = \begin{bmatrix} \hat{A}& a \\ a^T & a_{n,n} \end{bmatrix} = \begin{bmatrix} \hat{L} & 0 \\ l^T&l_{n,n} \end{bmatrix} \begin{bmatrix} \hat{L}^T& l\\ 0&l_{n,n} \end{bmatrix}

From above, we can have

A^=L^TL^\hat{A} = \hat{L}^T\hat{L}
a=L^l and aT=lTL^Ta = \hat{L}l \text{ and } a^T = l^T\hat{L}^T
an,n=lTl+ln,n2a_{n,n} = l^Tl + l_{n,n}^2

Finnally we can keep finding ln,nl_{n,n} and lTl^T
then find the lower level choleskey factor L^\hat{L}‘s ln1,n1l_{n-1,n-1} and l with below fomular:

l=L^1al = \hat{L}^{-1}a
ln,n=an,nlTll_{n,n} = \sqrt{a_{n,n} - l^Tl}

which L^\hat{L} satisfy

A^=L^TL^\hat{A} = \hat{L}^T\hat{L}

with a recurrsion function , the tridiagonal matrix can be efficiently computated with Cholesky decomposition.

Envelope Store Schema

The envelope of a symmetric or upper-triangular matrix A is defined as :
Env(A) = {(i, j):i<ji < j and ak,j0a_{k,j} \ne 0 for some kik \le i }

The envelope storage scheme stores A into 3 one-dimensional array

  1. DIAG with length n, store the main diagonal of the matrix (ai,ja_{i,j}, i = 1, …, n )
  2. ENV , store the envelope by columns, one after the other.
  3. IENV with length n+1, store the position in ENV of the first nonzero entry of column j of the matrix A, which
    • IENV(1) =1
    • IENV(j+1) - IENV(j) = number of elements from column j of the matrix that lie in the envelope

Example, for a matrix as below

[a1,1a1,20a1,400a2,2a2,3a2,400a3,3a3,400a4,40a4,6a5,5a5,6a6,6]\begin{bmatrix} a_{1,1}& a_{1,2} & 0 & a_{1,4} & 0 & 0 \\ & a_{2,2} & a_{2,3} & a_{2,4}& 0 &0 \\ && a_{3,3} & a_{3,4}& 0 & 0 \\ &&& a_{4,4} & 0 & a_{4,6}\\ &&&& a_{5,5} &a_{5,6}\\ &&&&& a_{6,6} \end{bmatrix}
  • DIAGT=[a1,1 a2,2 a3,3 a4,4 a5,5 a6,6]DIAG^T = [a_{1,1}\ a_{2,2}\ a_{3,3}\ a_{4,4}\ a_{5,5}\ a_{6,6}]
  • ENVT=[a1,2  a2,3  a1,4  a2,4  a3,4  a4,6  a5,6]ENV^T = [a_{1,2}\; a_{2,3}\; a_{1,4}\; a_{2,4}\; a_{3,4}\; a_{4,6}\; a_{5,6}]
  • IENVT=[1 1 2 3 6 6 8]IENV^T = [1\ 1\ 2\ 3\ 6\ 6\ 8]

CutHill-McKee Algorithm

Cuthill-MacKee algorithm is to transform a sparse matrix to a lower bandwidth matrix.

Theorems
Let A be positive definite with Cholesky factor R. Then
Env(A) = Env(R)

Corollary
If A is a positive definite matrix with semi-bandwidth s, then RTR^T also has semi-bandwidth s.

With above theory, we can try to reduce the envelope of this sparse matrix A before Cholesky Decomposition.

With the basis theory in elementary operations and graph, we can lead to CutHill-McKee algorithm
Goal of CutHill-McKee algorithm:
For a given symmetric sparse matrix, Mn,nM_{n,n}, the problem is to reduce its band width B by permuting rows and columns such as to move all the nonzero elements of M in a band as close as possible to the diagonal.

If consider the given sparse matrix (the input matrix) and the resulting matrix (the output matrix) as adjacency matrices for some given graphs, the only thing that is different from the permuted graph and original graph is the node(vertex) labeling(the two graph structures are identical), the bandwidth reduction problem can also be viewed as a graph labeling problem.

Solving the bandwidth reduction problem is converted to finding the node labeling that minimizes the bandwidth B of the adjacency matrix of the graph G(n),
where

  • B=maxLiLjB = \max \left|L_i-L_j\right|, i, j = 1, ..., n, LiL_i is the label of node i, LjL_j is the label of node j ( formally)
  • The nodes i and j are adjacent or neighbours (i.e there is an edge in graph G(n) connecting nodes i and j)

Reverse Cuthill-McKee (RCM) algorithm find relative equal quality of solution with GPS algorithm on average, but GPS is up to ten times as fast.

Description of the RCM (Reverse CutHill-McKee) algorithm:

  1. Prepare an empty queue Q and an empty result array R.
  2. Select the node in G(n) with the lowest degree (ties are broken arbitrarily) that hasn’t previously been inserted in the result array. Let up name it P.
  3. Add P in the first free position of R.
  4. Add to the queue all the nodes adjacent with P in the increasing order of their degree.
    1. Extract the first node from the queue and examine it. let us name it C.
    2. If C hasn’t previously been inserted in R, add it in the fist free position and add to Q all the neighbours of C that are not in R in the increaseing order of their degree.
  5. If Q is not empty repeat from Step 5.1
  6. If there are unexplored nodes (the graph is not connected) repeat from Step 2.
  7. Reverse the order of the elements in R. Element R[i] is swapped with element R[n+1-i](the Reverse Cuthill-McKee RCM algorithm)

The result array will be interpreted like R[L] = i means that the new label of node i will be L.

Gaussian Elimination

Three types of elementary operations in Gaussian elimination method.

  1. Add a multiple of one equation to another equation.
  2. Interchange two equations.
  3. Multiply an equation by a nonzero constant.

What is pivots:

  1. The diagonal elements of U are called pivots. (The kth pivot is the coefficient of the kth variable in the kth equation at the kth step of the elimination.)

When need pivoting:

  1. If some pivot (in Gaussian elimination without pivoting) is exactly equal to 0, then the elimination fails( divide by 0),
  2. If some pivot is very small in magnitude relative to other numbers in the matrix A, then the computation may be numerically unstable.

Gaussian Elimination Without Pivoting (LU Decomposition)

When a matrix A is not positive definite, then it can not apply Cholesky decomposition algorithm.
But if A is an n*n matrix with A1,A2,A3,...,An1A_1, A_2, A_3,..., A_{n-1} nonsingular, then there exists a unique factorizatin A = LU, where L is unit lower triangular and U is upper triangular.
(the pivots a1,1,a2,2(1),a3,3(2),...,an1,n1(n2)0a_{1,1}, a_{2,2}^{(1)}, a_{3,3}^{(2)},..., a_{n-1,n-1}^{(n-2)} \ne 0 if and only if A1,A2,A3,...,An1A_1, A_2, A_3, ..., A_n-1 are nonsingular).
LU decomposition is a special form of Gaussian elemination method(without pivoting), it utilized only first type elementary operation.

Initialize U = A, L = I
for k =1 : n-1
    for j = k+1:n
        L(j,k) = U(j,k)/U(k,k)
        U(j,k:n) = U(j,k:n) - L(j,k)U(k,k:n)
    end 
end
  1. Operations count is O(23n3)O(\frac{2}{3} n^3)
  2. The factors L and U generated by LU decomposition are storeed in original matrix A in practice.
  3. LU factorization cannot be guaranteed to be stable.

Gaussian Elimination With Pivoting

Because the simple Gaussian Elimination without pivoting (LU decomposition) may fail or unstable when there are exact zero or small numbers in pivots, we need to consider using pivoting when apply Gaussian elimination.

For any n*n nonsingular matrix A, there exists a permutation P such that PA has an LU factorization.

PA=LUPA = LU or A=PTLUA = P^T LU
  • Unless matrix A has special properties(e.g., A is positive definite or diagonally dominant), pivoting must be done to insure stability.
  • A program that reduces A to upper triangular form must keep track of any interchanges made in order to solve Ax = b
  • Same to the LU decomposition, L and U can be stored in one n*n matrix

Partial Pivoting:

  • Partial Pivoting: exchange only rows, which does not affect the order of the XiX_i
  • For increased numerical stability, make sure the largest possible pivot element is used. This requires searching in the partial column below the pivot element.
  • Partial pivoting is usually sufficient.

Matlab Function
The matlab function lu uses Gaussian elimination with partial pivoting.

1
[L,U,P] = lu(A)

Determines matrices L, U and P such that PA = LU. they can be used to sove Ax = b, with A=PTLUA = P^T LU, then

PTLUx=bP^T LUx = b, then Ux=L1PbUx= L^{-1}Pb

The inverse of a lower unit triangular matrix is obtained by simply negating the lower triangular elements.
If

L=[1000l21100l31l3210l41l42l431]L= \begin{bmatrix} 1& 0 &0 & 0\\ l_{21} & 1 & 0 & 0\\ l_{31} & l_{32} & 1 & 0 \\ l_{41} & l_{42} & l_{43} & 1 \end{bmatrix}

then

L1=[1000l21100l31l3210l41l42l431]L^{-1} = \begin{bmatrix} 1& 0 &0 & 0\\ -l_{21} & 1 & 0 & 0\\ -l_{31} & -l_{32} & 1 & 0 \\ -l_{41} & -l_{42} & -l_{43} & 1 \end{bmatrix}

Complete(total/full) Pivoting:

  • Complete pivoting (or total pivoting): exchange rows and column, to put the max value in remaining matrix at the pivot, which requires changing the order of the XiX_i
  • For increased numerical stability, make sure the largest possible pivot element is used. This requires searching in the pivot row, and in all rows below the pivot row, starting the pivot column.
  • Full pivoting is less susceptible to roundoff, but the increase in stability comes at a cost of more complex programming and an increase in work associated with searching and data movement.

2. Sensitivity of Linear Systems

Topics covered in textbook:

  1. Vector and Matrix Norms
  2. Condition Numbers
  3. Perturbing the Coefficient Matrix
  4. Aposteriori Error Analysis Using the Residual
  5. Roundoff Errors: Backward Stability
  6. Propagation of Roundoff Errors
  7. BackWard Error Analysis of Gaussian Elimination
  8. Scaling
  9. Componentwise Sensitivity Analysis

Sensitivity of Matrices (Condition Number)

Some matrices are very sensitive to small changes in input data. Then extent of this sensitivity is measured by the condition number.
Definition of condition number:
Fomular of calculating matrix condition number:

cond(A)=AA1cond(A) = ||A|| ||A^{-1}||

Fomular of application of condition number:

cond(A)maxδx/xδA/A+δb/b=maxRelative error of outputRelative error of inputscond(A) \equiv \max{\frac{||\delta x|| / ||x||}{||\delta A||/||A|| + ||\delta b||/||b||}} = \max{\frac{\text{Relative error of output}}{\text{Relative error of inputs}}}
  • Considering all small changes δA\delta A and δb\delta b in A and b and the resulting change, δx\delta x, in the solution xx.
  • Changes in the input data get multiplied by the condition number to produce changes in the ouputs.
  • A high condition number is bad, it implies that small errors in the input can cause large errors in the output.
  • A matrix with condition number of 1 is called ‘well-conditioned’, and its condition number is too large, then it is called ‘ill-conditioned’

Vector Norm

Vector Norm is a tool used to measure the ‘size’ of a vector. More precisely, a vector norm is a function that maps a vector to a non-negative real number, satisfying the following properties:

  1. If X \neq 0, then ||X|| > 0.
  2. αX=αX||\alpha X|| = |\alpha| ||X|| for all scalars α\alpha.
  3. X+YX+Y||X+Y|| \le ||X|| + ||Y|| for vertors X and Y (triangular inequality).

Different Type of Norms

1-Norm

X1=i+1nXi||X||_1 = \sum_{i+1}^n|X_i|

2-Norm(Euclidean norm)

X2=i=1nXi2||X||_2 = \sqrt{\sum_{i=1}^n X_i^2}

Inf-Norm(Maximum norm)

X=max1inXi||X||_\infty = \max_{1\le i\le n} |X_i|

Holder Norms (p-norms) The normal form of Norm

Xp=(i=1nXip)1/p,1p||X||_p = (\sum\limits_{i=1}^n |X_i|^p)^{1/p}, 1\le p \le \infty

A-Norm
If A is any fixed n*n positive definite matrix, then

XA=XTAX||X||_A = \sqrt{X^T A X} is a vector norm.

When A is a unit matrix, then A-norm is just 2-norm.
Three properties of norm are satisfied:

  1. XTAX>0X^T AX > 0 for all X \ne 0, (this is the property of positive definite matrix), so XA>0||X||_A >0 if X0X \ne 0
  2. αX=(αX)TA(αX)=α2XTAX=αXA||\alpha X|| = \sqrt{(\alpha X)^T A (\alpha X)} = \sqrt{\alpha ^2 X^T A X} = |\alpha| ||X||_A
  3. X+YA2(XTAX+YTAY)2||X+Y||_A^2 \le (\sqrt{X^TAX} + \sqrt{Y^TAY})^2 which is X+YA2(XA+XA)2||X+Y||_A^2 \le (||X||_A+||X||_A)^2

Application of Vector Norms

Distance between two vectors X and Y is ||X-Y||
Convergence of a sequence of vectors
- For sequence

X(1)=[x1(1)x2(1)xn(1)]X^{(1)} = \begin{bmatrix} x_1^{(1)} \\ x_2^{(1)} \\ \vdots \\ x_n^{(1)} \end{bmatrix},X(2)=[x1(2)x2(2)xn(2)]X^{(2)} = \begin{bmatrix} x_1^{(2)} \\ x_2^{(2)} \\ \vdots \\ x_n^{(2)} \end{bmatrix}, \cdots

Definition of vector sequence convergent:

limkX(k)=X\lim \limits_{k \rightarrow \infty} X^{(k)} = X if and only if limkXi(k)=Xi\lim\limits_{k \rightarrow \infty}X_i^{(k)} = X_i for all values i = 1, 2, \cdots, n.

A sequence is said convergent if it approaches some limit.

limkX(k)=X\lim\limits{k\rightarrow \infty} X^{(k)} = X if and only if limkX(k)X=0\lim\limits_{k \rightarrow \infty} ||X^{(k)} - X|| = 0 for any vector norm.

For a given vector norm ||.||, the size of x(k)X||x^{(k)} -X|| is used to determine if the sequence of vectors X(k){X^{(k)}} is converging to some limit vector X. The choice of a norm is arbitrary ( all norms are equivalent).

Matrix Norm

Matrix Norm has one more properties than Vector Norm:

  1. A>0ifA0\Vert A \Vert > 0 \text{if} A \ne 0
  2. αA=αAfor all scalarsα\Vert \alpha A \Vert = |\alpha| \Vert A \Vert \text{for all scalars} \alpha
  3. A+BA+B\Vert A + B \Vert \le \Vert A \Vert + \Vert B \Vert for all matrices A and B
  4. ABAB\Vert AB \Vert \le \Vert A \Vert \Vert B \Vert

Frobenius Matrix norm

AF=i=1nj=1nai,j2\Vert A \Vert _F = \sqrt{\sum_{i=1}^n \sum_{j=1}^n |a_{i,j}|^2}

Matrix Norm induced by Vector Norm

Given any fixed vector norm v\Vert \Vert_v, the matrix norm induced by ˙v\Vert \dot \Vert_v is defined by

AM=maxx0AXvXv\Vert A \Vert _M = \max\limits_{x\ne 0} \frac{\Vert AX \Vert _v}{ \Vert X \Vert _v}
  • The ratio AXvXv\frac{\Vert AX \Vert _v}{\Vert X \Vert _v} is the magnification that takes when X is transformed to AX
  • The number AM\Vert A \Vert _M is the maximum magnification that A can cause.