Matrix Computation Algorithms

2025-02-19

Algorithm

1. Gaussian Eliminatin and Its Variants

Topics covered:

Matrix Multiplication
Systems of Linear Equations
Trianglar Systems
Positive Definite Systems; Cholesky Decomposition
Banded Positive Definite Systems
Sparse Positive Definite Systems
Gaussian Elimination and the LU Decomposition
Gaussian Elimination with Pivoting
Sparse Gaussian Elimination

Cholesky Decomposition

To solve a linear equation Ax=b ,if the A matrix is positive definite, we can use Choleskey decomposition.
To solve Ax = b with A = $R^TR$

let y = Rx
we have $R^Ty=b$ , since $R^T$ is lower triangular, we can solve for y (by forward substitution)
then from Rx = y, with R is upper triangular, we can solve for x (by backward substitution)

Positive Definite Matrix

Sylvester’s criterion

Sylvester’s criterion can help to determine if a matrix is positive definite.
A symmetric matrix A is positive definite if and only if all of its leading principal minors have a positive determinant:
Example:

A = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 2 & -1\\ 0 & -1 & 2 \end{bmatrix}

A is positive definite since:

2 > 0,
$det \begin{pmatrix} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} \end{pmatrix} = 3 > 0$
$det(A) = 2 * 3 -2 = 4 >0$

properties :

Let M be any n * n nonsignular matrix, and let $A=MM^T$ . Then A is positive definite.
If A is positive definite, then A is nonsingular.
If A is positive definite, then $a_{i,i} > 0$ for i= 1,2,…,n.
If A is positive definite, then A can be decomposed in exactly one way into a product $A = R^TR = LL^T \text{, with} R^T = L$ such that $R^T$ is lower triangular with $r_{i,i}>0$ (all main diagonal are positive)

Cholesky Decomposition Algorithm

A = \begin{bmatrix} a_{1,1}&a_{1,2}&a_{1,3}& \text{...} &a_{1,n}\\ a_{2,1}&a_{3,2}&a_{3,3}& \text{...} &a_{2,n}\\ a_{3,1}&a_{3,2}&a_{3,3}& \text{...} &a_{3,n}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ a_{n,1}&a_{n,2}&a_{n,3}& \text{...} &a_{n,n} \end{bmatrix} = \begin{bmatrix} r_{1,1}&0&0&\text{...}&0\\ r_{1,2}&r_{2,2}&0&\text{...}&0\\ r_{1,3}&r_{2,3}&r_{3,3}&\text{...}&0\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ r_{1,n}&r_{2,n}&r_{3,n}&\text{...}&r_{n,n} \end{bmatrix} \begin{bmatrix} r_{1,1}&r_{1,2}&r_{1,3}&\text{...}&r_{1,n}\\ 0&r_{2,2}&r_{2,3}&\text{...}&r_{2,n}\\ 0&0&r_{3,3}&\text{...}&r_{3,n}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ \text{.}&\text{.}&\text{.}&&\text{.}\\ r_{n,1}&r_{n,2}&r_{n,3}&\text{...}&r_{n,n} \end{bmatrix}

Pseudocode

Given A,
for i = 1 : n do
     $r(i,i) = (a(i,i) - \sum_{k=1}^{i-1} r_{k,i}^2)^{1/2}$ 
    for j=i+1 : n do
         $r(i,j) = (a(i,j) - \sum_{k=1}^{i-1} r(k,i)r(k,j))/r(i,i)$ 
    end for
end for

Cost

\frac{n^3}{3} + O(n^2)

Matlab function

chol:

R^T = chol(A, 'lower')

produces a lower trianglar matrix

R^T

From $A= R^TR$ , we obtain $det(A) = (det(R))^2$

Tridiagonal Matrix

For a tridiagonal matrix which looks like

A = \begin{bmatrix} d_1& e_1 \\ c_1&d_2&e_2 \\ &\text{.}&\text{.}&\text{.}\\ &&\text{.}&\text{.}&\text{.}\\ &&&\text{.}&\text{.}&\text{.}\\ &&&&c_{n-2}&d_{n-1}&e_{n-1}\\ &&&&&c_{n-1}&d_n \end{bmatrix}

which $a_{ij} = 0$ for all i, j with |i-j|>1,
and there are only n+(n+1)+(n-1) = 3n-2 entries in matrix A.

A recurrence computation method can be applied on A for Cholesky decomposition.

Since $A=R^TR$ , we suppose

R^T = \begin{bmatrix} r_{1,1} \\ r_{1,2}& r_{22}\\ &\text{.}&\text{.}&\text{.}\\ &&\text{.}&\text{.}&\text{.}\\ &&&\text{.}&\text{.}&\text{.}\\ &&&&r_{n-2,n-1}&r_{n-1,n-1}\\ &&&&&r_{n-1,n}&r_{n,n} \end{bmatrix}

Partition A of order n as

A = \begin{bmatrix} \hat{A} & a \\ a^T & a_{n,n} \end{bmatrix}

If we are trying to find L which $A = LL^T$ ,
partition L as

L=\begin{bmatrix} \hat{L} & 0 \\ l^T &l_{n,n} \end{bmatrix}

Then we can denote A as

A = \begin{bmatrix} \hat{A}& a \\ a^T & a_{n,n} \end{bmatrix} = \begin{bmatrix} \hat{L} & 0 \\ l^T&l_{n,n} \end{bmatrix} \begin{bmatrix} \hat{L}^T& l\\ 0&l_{n,n} \end{bmatrix}

From above, we can have

\hat{A} = \hat{L}^T\hat{L}

a = \hat{L}l \text{ and } a^T = l^T\hat{L}^T

a_{n,n} = l^Tl + l_{n,n}^2

Finnally we can keep finding $l_{n,n}$ and $l^T$
then find the lower level choleskey factor $\hat{L}$ ‘s $l_{n-1,n-1}$ and l with below fomular:

l = \hat{L}^{-1}a

l_{n,n} = \sqrt{a_{n,n} - l^Tl}

which $\hat{L}$ satisfy

\hat{A} = \hat{L}^T\hat{L}

with a recurrsion function , the tridiagonal matrix can be efficiently computated with Cholesky decomposition.

Envelope Store Schema

The envelope of a symmetric or upper-triangular matrix A is defined as :
Env(A) = {(i, j): $i < j$ and $a_{k,j} \ne 0$ for some $k \le i$ }

The envelope storage scheme stores A into 3 one-dimensional array

DIAG with length n, store the main diagonal of the matrix ( $a_{i,j}$ , i = 1, …, n )
ENV , store the envelope by columns, one after the other.
IENV with length n+1, store the position in ENV of the first nonzero entry of column j of the matrix A, which
- IENV(1) =1
- IENV(j+1) - IENV(j) = number of elements from column j of the matrix that lie in the envelope

Example, for a matrix as below

\begin{bmatrix} a_{1,1}& a_{1,2} & 0 & a_{1,4} & 0 & 0 \\ & a_{2,2} & a_{2,3} & a_{2,4}& 0 &0 \\ && a_{3,3} & a_{3,4}& 0 & 0 \\ &&& a_{4,4} & 0 & a_{4,6}\\ &&&& a_{5,5} &a_{5,6}\\ &&&&& a_{6,6} \end{bmatrix}

$DIAG^T = [a_{1,1}\ a_{2,2}\ a_{3,3}\ a_{4,4}\ a_{5,5}\ a_{6,6}]$
$ENV^T = [a_{1,2}\; a_{2,3}\; a_{1,4}\; a_{2,4}\; a_{3,4}\; a_{4,6}\; a_{5,6}]$
$IENV^T = [1\ 1\ 2\ 3\ 6\ 6\ 8]$

CutHill-McKee Algorithm

Cuthill-MacKee algorithm is to transform a sparse matrix to a lower bandwidth matrix.

Theorems
Let A be positive definite with Cholesky factor R. Then
Env(A) = Env(R)

Corollary
If A is a positive definite matrix with semi-bandwidth s, then $R^T$ also has semi-bandwidth s.

With above theory, we can try to reduce the envelope of this sparse matrix A before Cholesky Decomposition.

With the basis theory in elementary operations and graph, we can lead to CutHill-McKee algorithm
Goal of CutHill-McKee algorithm:
For a given symmetric sparse matrix, $M_{n,n}$ , the problem is to reduce its band width B by permuting rows and columns such as to move all the nonzero elements of M in a band as close as possible to the diagonal.

If consider the given sparse matrix (the input matrix) and the resulting matrix (the output matrix) as adjacency matrices for some given graphs, the only thing that is different from the permuted graph and original graph is the node(vertex) labeling(the two graph structures are identical), the bandwidth reduction problem can also be viewed as a graph labeling problem.

Solving the bandwidth reduction problem is converted to finding the node labeling that minimizes the bandwidth B of the adjacency matrix of the graph G(n),
where

$B = \max \left|L_i-L_j\right|$ , i, j = 1, ..., n, $L_i$ is the label of node i, $L_j$ is the label of node j ( formally)
The nodes i and j are adjacent or neighbours (i.e there is an edge in graph G(n) connecting nodes i and j)

Reverse Cuthill-McKee (RCM) algorithm find relative equal quality of solution with GPS algorithm on average, but GPS is up to ten times as fast.

Description of the RCM (Reverse CutHill-McKee) algorithm:

Prepare an empty queue Q and an empty result array R.
Select the node in G(n) with the lowest degree (ties are broken arbitrarily) that hasn’t previously been inserted in the result array. Let up name it P.
Add P in the first free position of R.
Add to the queue all the nodes adjacent with P in the increasing order of their degree.
1. Extract the first node from the queue and examine it. let us name it C.
2. If C hasn’t previously been inserted in R, add it in the fist free position and add to Q all the neighbours of C that are not in R in the increaseing order of their degree.
If Q is not empty repeat from Step 5.1
If there are unexplored nodes (the graph is not connected) repeat from Step 2.
Reverse the order of the elements in R. Element R[i] is swapped with element R[n+1-i](the Reverse Cuthill-McKee RCM algorithm)

The result array will be interpreted like R[L] = i means that the new label of node i will be L.

Gaussian Elimination

Three types of elementary operations in Gaussian elimination method.

Add a multiple of one equation to another equation.
Interchange two equations.
Multiply an equation by a nonzero constant.

What is pivots:

The diagonal elements of U are called pivots. (The kth pivot is the coefficient of the kth variable in the kth equation at the kth step of the elimination.)

When need pivoting:

If some pivot (in Gaussian elimination without pivoting) is exactly equal to 0, then the elimination fails( divide by 0),
If some pivot is very small in magnitude relative to other numbers in the matrix A, then the computation may be numerically unstable.

Gaussian Elimination Without Pivoting (LU Decomposition)

When a matrix A is not positive definite, then it can not apply Cholesky decomposition algorithm.
But if A is an n*n matrix with $A_1, A_2, A_3,..., A_{n-1}$ nonsingular, then there exists a unique factorizatin A = LU, where L is unit lower triangular and U is upper triangular.
(the pivots $a_{1,1}, a_{2,2}^{(1)}, a_{3,3}^{(2)},..., a_{n-1,n-1}^{(n-2)} \ne 0$ if and only if $A_1, A_2, A_3, ..., A_n-1$ are nonsingular).
LU decomposition is a special form of Gaussian elemination method(without pivoting), it utilized only first type elementary operation.

Initialize U = A, L = I
for k =1 : n-1
    for j = k+1:n
        L(j,k) = U(j,k)/U(k,k)
        U(j,k:n) = U(j,k:n) - L(j,k)U(k,k:n)
    end 
end

Operations count is $O(\frac{2}{3} n^3)$
The factors L and U generated by LU decomposition are storeed in original matrix A in practice.
LU factorization cannot be guaranteed to be stable.

Gaussian Elimination With Pivoting

Because the simple Gaussian Elimination without pivoting (LU decomposition) may fail or unstable when there are exact zero or small numbers in pivots, we need to consider using pivoting when apply Gaussian elimination.

For any n*n nonsingular matrix A, there exists a permutation P such that PA has an LU factorization.

PA = LU

A = P^T LU

Unless matrix A has special properties(e.g., A is positive definite or diagonally dominant), pivoting must be done to insure stability.
A program that reduces A to upper triangular form must keep track of any interchanges made in order to solve Ax = b
Same to the LU decomposition, L and U can be stored in one n*n matrix

Partial Pivoting:

Partial Pivoting: exchange only rows, which does not affect the order of the $X_i$
For increased numerical stability, make sure the largest possible pivot element is used. This requires searching in the partial column below the pivot element.
Partial pivoting is usually sufficient.

Matlab Function
The matlab function lu uses Gaussian elimination with partial pivoting.

1	[L,U,P] = lu(A)

Determines matrices L, U and P such that PA = LU. they can be used to sove Ax = b, with $A = P^T LU$ , then

P^T LUx = b

, then

Ux= L^{-1}Pb

The inverse of a lower unit triangular matrix is obtained by simply negating the lower triangular elements.
If

L= \begin{bmatrix} 1& 0 &0 & 0\\ l_{21} & 1 & 0 & 0\\ l_{31} & l_{32} & 1 & 0 \\ l_{41} & l_{42} & l_{43} & 1 \end{bmatrix}

then

L^{-1} = \begin{bmatrix} 1& 0 &0 & 0\\ -l_{21} & 1 & 0 & 0\\ -l_{31} & -l_{32} & 1 & 0 \\ -l_{41} & -l_{42} & -l_{43} & 1 \end{bmatrix}

Complete(total/full) Pivoting:

Complete pivoting (or total pivoting): exchange rows and column, to put the max value in remaining matrix at the pivot, which requires changing the order of the $X_i$
For increased numerical stability, make sure the largest possible pivot element is used. This requires searching in the pivot row, and in all rows below the pivot row, starting the pivot column.
Full pivoting is less susceptible to roundoff, but the increase in stability comes at a cost of more complex programming and an increase in work associated with searching and data movement.

2. Sensitivity of Linear Systems

Topics covered in textbook:

Vector and Matrix Norms
Condition Numbers
Perturbing the Coefficient Matrix
Aposteriori Error Analysis Using the Residual
Roundoff Errors: Backward Stability
Propagation of Roundoff Errors
BackWard Error Analysis of Gaussian Elimination
Scaling
Componentwise Sensitivity Analysis

Sensitivity of Matrices (Condition Number)

Some matrices are very sensitive to small changes in input data. Then extent of this sensitivity is measured by the condition number.
Definition of condition number:
Fomular of calculating matrix condition number:

cond(A) = ||A|| ||A^{-1}||

Fomular of application of condition number:

cond(A) \equiv \max{\frac{||\delta x|| / ||x||}{||\delta A||/||A|| + ||\delta b||/||b||}} = \max{\frac{\text{Relative error of output}}{\text{Relative error of inputs}}}

Considering all small changes $\delta A$ and $\delta b$ in A and b and the resulting change, $\delta x$ , in the solution $x$ .
Changes in the input data get multiplied by the condition number to produce changes in the ouputs.
A high condition number is bad, it implies that small errors in the input can cause large errors in the output.
A matrix with condition number of 1 is called ‘well-conditioned’, and its condition number is too large, then it is called ‘ill-conditioned’

Vector Norm

Vector Norm is a tool used to measure the ‘size’ of a vector. More precisely, a vector norm is a function that maps a vector to a non-negative real number, satisfying the following properties:

If X $\neq$ 0, then ||X|| > 0.
$||\alpha X|| = |\alpha| ||X||$ for all scalars $\alpha$ .
$||X+Y|| \le ||X|| + ||Y||$ for vertors X and Y (triangular inequality).

Different Type of Norms

1-Norm

||X||_1 = \sum_{i+1}^n|X_i|

2-Norm(Euclidean norm)

||X||_2 = \sqrt{\sum_{i=1}^n X_i^2}

Inf-Norm(Maximum norm)

||X||_\infty = \max_{1\le i\le n} |X_i|

Holder Norms (p-norms) The normal form of Norm

||X||_p = (\sum\limits_{i=1}^n |X_i|^p)^{1/p}, 1\le p \le \infty

A-Norm
If A is any fixed n*n positive definite matrix, then

||X||_A = \sqrt{X^T A X}

is a vector norm.

When A is a unit matrix, then A-norm is just 2-norm.
Three properties of norm are satisfied:

$X^T AX > 0$ for all X $\ne$ 0, (this is the property of positive definite matrix), so $||X||_A >0$ if $X \ne 0$
$||\alpha X|| = \sqrt{(\alpha X)^T A (\alpha X)} = \sqrt{\alpha ^2 X^T A X} = |\alpha| ||X||_A$
$||X+Y||_A^2 \le (\sqrt{X^TAX} + \sqrt{Y^TAY})^2$ which is $||X+Y||_A^2 \le (||X||_A+||X||_A)^2$

Application of Vector Norms

Distance between two vectors X and Y is ||X-Y||
Convergence of a sequence of vectors
- For sequence

X^{(1)} = \begin{bmatrix} x_1^{(1)} \\ x_2^{(1)} \\ \vdots \\ x_n^{(1)} \end{bmatrix}

X^{(2)} = \begin{bmatrix} x_1^{(2)} \\ x_2^{(2)} \\ \vdots \\ x_n^{(2)} \end{bmatrix}

\cdots

Definition of vector sequence convergent:

\lim \limits_{k \rightarrow \infty} X^{(k)} = X

if and only if

\lim\limits_{k \rightarrow \infty}X_i^{(k)} = X_i

for all values i = 1, 2,

\cdots

, n.

A sequence is said convergent if it approaches some limit.

\lim\limits{k\rightarrow \infty} X^{(k)} = X

if and only if

\lim\limits_{k \rightarrow \infty} ||X^{(k)} - X|| = 0

for any vector norm.

For a given vector norm ||.||, the size of $||x^{(k)} -X||$ is used to determine if the sequence of vectors ${X^{(k)}}$ is converging to some limit vector X. The choice of a norm is arbitrary ( all norms are equivalent).

Matrix Norm

Matrix Norm has one more properties than Vector Norm:

$\Vert A \Vert > 0 \text{if} A \ne 0$
$\Vert \alpha A \Vert = |\alpha| \Vert A \Vert \text{for all scalars} \alpha$
$\Vert A + B \Vert \le \Vert A \Vert + \Vert B \Vert$ for all matrices A and B
$\Vert AB \Vert \le \Vert A \Vert \Vert B \Vert$

Frobenius Matrix norm

\Vert A \Vert _F = \sqrt{\sum_{i=1}^n \sum_{j=1}^n |a_{i,j}|^2}

Matrix Norm induced by Vector Norm

Given any fixed vector norm $\Vert \Vert_v$ , the matrix norm induced by $\Vert \dot \Vert_v$ is defined by

\Vert A \Vert _M = \max\limits_{x\ne 0} \frac{\Vert AX \Vert _v}{ \Vert X \Vert _v}

The ratio $\frac{\Vert AX \Vert _v}{\Vert X \Vert _v}$ is the magnification that takes when X is transformed to AX
The number $\Vert A \Vert _M$ is the maximum magnification that A can cause.