DefinitionThe pseudoinverse, A + , of an m-by-n matrix, A, (whose entries can be real or complex numbers) is the unique n-by-m matrix satisfying all of the following four criteria:
Here M * is the conjugate transpose of a matrix, M. For matrices whose elements are real numbers instead of complex numbers, M * = MT. PropertiesProofs for some of these relations may be found on the proofs page.
If A and B are such that the product AB is defined and
then
The third case does not cover the first two; an orthonormal (including non-square) matrix must be of full rank, but otherwise there is no assumption made on the matrix it multiplies. Identity transformationsThese relations have in common that the right hand side equals the left hand side multiplied with some expression. They can be used to cancel certain subexpressions or expand expressions. Special casesOrthonormal columns or rowsIf A has orthonormal columns (A * A = I) or orthonormal rows (AA * = I), then A + = A * . Full rankIf the columns of A are linearly independent, then A * A is invertible. In this case, an explicit formula is:[1]
It follows that A + is a left inverse of A: A + A = I. The other product, AA + , is the orthogonal projector on the range of A. If the rows of A are linearly independent, then AA * is invertible. In this case, an explicit formula is:
It follows that A + is a right inverse of A: AA + = I. The other product, A + A, is the orthogonal projector on the range of A * . If both columns and rows are linearly independent (that is, for square regular matrices), the pseudoinverse is just the inverse:
Scalars and vectorsIt is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar x is zero if x is zero and the reciprocal of x otherwise: The pseudoinverse of the null vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude: For a proof, simply check that these definitions meet the defining criteria for the pseudoinverse. Circulant matricesFor a Circulant matrix C the singular value decomposition is given by the Fourier transform, that is the singular values are the Fourier coefficients. Let Block matricesOptimized approaches exist for calculating the pseudoinverse of block structured matrices. Finding the pseudoinverse of a matrixUsing regular inversesLet k be the rank of a
If A has full row rank, so that k = m, then B can be chosen to be the identity matrix and the formula reduces to A + = A * (AA * ) − 1. Similarly, if A has full column rank (that is, k = n), we have A + = (A * A) − 1A * . The QR methodHowever, computing the product AA * or A * A or its inverse explicitly is unnecessary and only causes additional rounding errors and computational cost. Consider first the case when A is full column rank. Then all that is needed is the Cholesky decomposition
where R is an upper triangular matrix. Multiplication by the inverse is then done easily by solving a system with multiple right-hand-side, by back substitution. To compute the Cholesky decomposition without forming A * A explicitly, use the QR decomposition of A, A = QR where Q is a unitary matrix, Q * Q = QQ * = I, and R is upper triangular. Then
so R is the Cholesky factor of A * A. The case of full row rank is obtained by swapping A and A * . The general case and the SVD methodA computationally simpler and more accurate way to get the pseudoinverse is by using the singular value decomposition.[1][5][6] If A = UΣV * is the singular value decomposition of A, then A + = VΣ + U * . For a diagonal matrix such as Σ, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, and leaving the zeros in place. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the Matlab function pinv, the tolerance is taken to be t = ε•max(m,n)•max(Σ), where ε is the machine epsilon. Computing the SVD of a matrix costs several times as much as matrix-matrix multiplication, if state-of-the art implementation is used, such as in LAPACK. In theory, the SVD cannot be computed using a finite number of operations, so iterative methods with stopping test dependent on the rounding precision are used. The result is accurate up to rounding precision. Updating the pseudoinverseFor the cases where A has full row or column rank, and the inverse of the correlation matrix (AA * for A with full row rank or A * A for full column rank) is already known, the pseudoinverse for matrices related to A can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[7] exist that exploit the relationship. Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.[8][9] Software librariesHigh quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable. ApplicationsThe pseudoinverse provides a least squares solution to a system of linear equations.[10] Given a system of linear equations
in general we cannot always expect to find a vector x which will solve the system; even if there exists such a solution vector, then it may not be unique. We can however always ask for a vector x that brings Ax "as close as possible" to b, i.e. a vector that minimizes the Euclidean norm If there are several such vectors x, we could ask for the one among them with the smallest Euclidean norm. Thus formulated, the problem has a unique solution, given by the pseudoinverse:
Note that AA + is the orthogonal projection onto the image (range) of A (the space spanned by the column vectors of A), while (1 − A + A) is the orthogonal projection onto the kernel (null space) of A. References
External links
| |