Saturday, 18 September 2021

Normal Equations

 In this post we discuss about normal equations. But, first we prove the following result which will be useful in our subsequent argument.


Theorem. Let $A\in \mathbb{R}^{m\times n}$. Then,

(i) $N(A)=N(A^TA),$ where $N(B)$ denotes the null space of $B.$

(ii) $r(A)=r(A^TA)=r(AA^T),$ where $r(B)$ denotes the rank of $B.$

(iii) $R(A)=R(AA^T)$ and $R(A^TA)=R(A^T),$ where $R(B)$ denotes the range space of $B.$

proof.  (i) Let $x(\neq 0)$ such that $Ax=0$. Then, $(A^TA)x=A^T(Ax)=0.$ Conversely, suppose $y(\neq 0)$ with $A^TAy=0.$ Then, $0=y^TA^TAy=(Ay)^T(Ay)=||Ay||_2^{2}$ which implies that $Ay=0.$ Hence, $N(A)=N(A^TA).$


(ii) $A$ and $A^TA$ have $n$ columns. By Rank-Nullity Theorem and (i),                                  \begin{align}rank(A)&=n-nullity(A)\\&=n-nullity(A^TA)\\&=rank(A^TA).\end{align}

 Interchanging the role of $A$ and $A^T$ and recalling that $r(A)=r(A^T),$ we get $r(A)=r(AA^T).$


(iii) Observe that $R(AA^T)\subseteq R(A)$ (since every column of  $A^TA$ are in some linear combination of the columns of $A.$) By (ii), $r(A)=r(AA^T).$ Hence, $R(A)=R(AA^T).$ Replacing $A$ by $A^T$ and vice-versa, we further get $R(A^T)=R(A^TA).$


What do we mean by normal equations?

$A^TAx=A^Tb$ is called normal equation associated with $Ax=b.$ It is called normal because $A^T(b-Ax)=0$, i.e., the residue vector $b-Ax$ is always normal (perpendicular) to the columns of $A,$ equivalently, $(b-Ax)\perp R(A).$


Normal equations is always consistent, regardless of whether or not the original system is consistent.

Let $A\in \mathbb{R}^{m\times n}$ and $b\in \mathbb{R}^{m},$ then $A^TAx=A^Tb$ is consistent since $A^Tb\in R(A^T)=R(A^TA).$


$Ax=b$ and $A^TAx=A^Tb$ have same solution set whenever $Ax=b$ is consistent.

Let $x_0$  be a particular solution of $Ax=b.$ Clearly, $x_0$ is also a particular solution of $A^TAx=A^Tb.$ Now, the general solution of $Ax=b$ is given by $x=x_0+N(A)$. But, $N(A)=N(A^TA)$, therefore $x=x_0+N(A^TA)$, which is the general solution of $A^TAx=A^Tb.$

Conversely, if $y_0$ is a particular solution of $A^TAx=A^Tb,$ then $b-A^Ty_0\in N(A^T)=R(A)^{\perp}.$ Since, $Ax=b$ is consistent and $R(A)$ is subspace, we have $b-Ay_0\in R(A)$ implying $b-Ay_0\in R(A)\cap R(A)^{\perp}=\{0\}.$ Thus, $Ay_0=b$ is also a particular solution of $Ax=b$ and the general solution is also same as $N(A)=N(A^TA).$


When $A^TAx=A^Tb$ has a unique solution?

For any $A\in \mathbb{R}^{m\times n}$ and $b\in \mathbb{R}^{m},$ the system $A^TAx=A^Tb$ has unique solution if and only if when $N(A^TA)=\{0\}$. But, $N(A^TA)=N(A).$ Hence, $A^TAx=A^Tb$ has a unique solution if and only if $N(A)=\{0\},$ equivalently $rank(A)=n$. The unique solution is given as $x=(A^TA)^{-1}A^Tb=A^{\dagger}b.$


What do the solutions of the normal equations $A^TAx=A^Tb$ represent when the original system $Ax=b$  is inconsistent?

Perhaps the most significant question. The solution of the normal equation  $A^TAx=A^Tb$ gives the least-squares solution for an inconsistent system $Ax=b.$  See the post Least-squares solution.


Drawbacks of normal equations.

The normal equations are often avoided in numerical computations because if the system $Ax=b$ is ill-conditioned (i.e., the solutions are very sensitive to small changes in the entries), then $A^TA$ is even worse conditioned (see example below for better understanding), and the theoretical properties surrounding $A^TA$ and the normal equations may be lost in practical applications. Nevertheless, the normal equations are an important theoretical idea that leads to practical tools of fundamental importance such as the method of least squares.

Example. Consider the system $\begin{bmatrix}3 &6\\1 &2.01\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}9\\3.01\end{bmatrix}.$

When three-digit floating-point arithmetic is used to solve the above system, then the solution is $(1,1)^T$ which is also the exact solution. However, if three-digit arithmetic is used to form the normal equations, we get a singular system

                              $\begin{bmatrix}10 &20\\20 &40\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}30\\60.1\end{bmatrix},$

which is an inconsistent system.

Why is this ambiguity happening?

Real numbers generally require an infinite string of digits and so cannot be represented exactly using a finite string of bits. Numbers are represented in a computer by a string of bits of a fixed length.



SOURCES: 
[1] Ben-Israel, A.; Greville, T. N. E., Generalized Inverses. Theory and Applications, Springer-Verlag, New York, 2003.
[2] Meyer, C. D., Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, 2000.
[3] Burden, R. L.; Faires, J. D., Numerical Analysis, Brooks/Cole, 2011.

Tuesday, 14 September 2021

Least-squares solutions

 In this post, we discuss about least-squares solutions of a linear system $Ax=b$, where $A\in \mathbb{R}^{m\times n}$, $x\in \mathbb{R}^{n}$ and $b\in \mathbb{R}^{m}$. First we consider the following example:


Example: Find a solution of the system

                                              $$\begin{bmatrix}1 &2\\1 &1\\2 &3\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}3\\10\\2\end{bmatrix}.$$

We observe that it is not possible to find $x$ and $y$ which satisfies above matrix equation since $\begin{bmatrix}3\\10\\2\end{bmatrix}\notin Range(A)$. So, the system is inconsistent. Such system where $m>n$, is called overdetermined system and in general such systems may not have a solution as vector $b$ has $m$ coordinates (here, in this example $m=3$), while $dim(Range(A))\leq n$ (here, $rank(A)= n=2$). Clearly, $r=b-Ax\neq 0$ for any $x$. So, let us instead make $r$ as small as possible i.e., very close to zero in some sense, where the term 'closeness' is measured as minimizing the Euclidean norm of the vector $r$ which is also called the residue vector.

The problem defined above takes the following form

             Given $A\in \mathbb{R}^{m\times n},~m\geq n$, find $x$ such that $||b-Ax||_2$ is minimized.


How to find $x$ in the above problem?

The least-squares problem asks for vectors $x$ such that $Ax$ is as close to $b$ as possible. But, $Ax\in Range(A)$. We claim that $y=Pb$ is closest to $b$, where $P$ is the orthogonal projection of $b$ onto $Range(A)$. To show that $y=Pb$ is the unique point that minimizes $||b-y||_2$, suppose $z (\neq y)$ is another vector in $Range(A)$. Clearly, $(y-z) \perp (b-y)$, then by Pythagorean theorem, $||b-z||_2^2=||b-y||_2^2+||y-z||_2^2>||b-y||_2^2$. Thus, $y=Pb$ is the unique point that minimizes $||b-y||_2$. Hence, finding least-squares solution of $Ax=b$ is equivalent to solving the system $Ax=Pb$, where $P$ is the orthogonal projection of $b$ onto $Range(A)$.



Theorem 1.  A vector $x$ is a least-squares solution of $Ax=b$ if and only if $x$ is a solution of                                                                          $$A^TAx=A^Tb,$$

often called the normal equation of $Ax=b$.

proof. We will show that the system $Ax=Pb$ is equivalent to the normal equation $A^TAx=A^Tb$, then the conclusion directly follows by the above discussions. Now,

\begin{align*}&Ax=Pb\\ \iff & PAx=P^2b=Pb,~~\text{ since } P \text{ is a projection }\\ \iff & P(Ax-b)=0\\ \iff & (Ax-b)\in Null(P)=Range(A)^{\perp}=Null(A^T)\\ \iff & A^T(Ax-b)=0\\ \iff & A^TAx=A^Tb.\end{align*}


Example A small company in business for four years and has recoded annual sales (in tens of thousands of dollars) as follows: 

         
                     Year 1 2 3 4
                     Sales 23 27 30 34
When this data is plotted as shown in the figure below, we see that although the points do not exactly lie on a straight line, there nevertheless appears to be a linear trend. Predict the sales for any future year if this trend continues.




Solution: We are asked to find the line $f(t)=a+bt$ that best fits the data in the sense of least-squares. We have the matrix representation of this problem as $Ax=b$, where
                                       $A=\begin{bmatrix}1 &1\\1 &2\\1 &3\\1 &4\end{bmatrix},~b=\begin{bmatrix}23\\27\\30\\34\end{bmatrix}, \text{ and } x=\begin{bmatrix}a\\b\end{bmatrix}.$
Then, the previous discussion guarantees that $x$ is the solution of the normal equation $A^TAx=A^Tb$, i.e., 
                                $\begin{bmatrix}4 &10\\10 &30\end{bmatrix} \begin{bmatrix}a\\b\end{bmatrix}=\begin{bmatrix}114\\303\end{bmatrix}.$
Solving this we get $a=19.5$ and $b=3.6$, so we predict that the sales in year $t$ will be $f(t)=19.5+3.6t.$ For example, $f(5)=19.5+(3.6)(5)=37.5$, so the estimated sales for year five is $\$375000.$


What is $P$ precisely?

Let $A^{(1,3)}\in A\{1,3\}$, i.e., $A^{(1,3)}$ satisfies the first and third Penrose equations $(AA^{(1,3)}A=A~\&~(AA^{(1,3)})^T=AA^{(1,3)})$, then $AA^{(1,3)}$ is an orthogonal projection onto $Range(A)$ as shown below:

(i) $(AA^{(1,3)})^2=AA^{(1,3)}AA^{(1,3)}=AA^{(1,3)};$

(ii) $(AA^{(1,3)})^T=AA^{(1,3)};$

(iii) \begin{align*} Range(AA^{(1,3)})&\subseteq Range(A)\\ &\subseteq Range(AA^{(1,3)}A)\\ &\subseteq Range(AA^{(1,3)})\\ &\subseteq Range(A).\end{align*} 

       So, $Range(AA^{(1,3)})=Range(A).$

Precisely, $P=AA^{(1,3)}$ and notice that $x=A^{(1,3)}b$ satisfies $Ax=Pb$. So, we can conclude that $x=A^{(1,3)}b$ is the required least-squares solution. But, we have learnt that $A^{(1,3)}$ is not unique (see Moore-Penrose discussion). So, there can be infinite least-squares solutions of $Ax=b$. The general least-squares solution is

                                                             $x=A^{(1,3)}b+(I_n-A^{(1,3)}A)z$

with $A^{(1,3)}\in A\{1,3\}$ and arbitrary $z\in \mathbb{R}^{n}$. It is unique only when $rank(A)=n$ (since in that case $A^{(1,3)}A=I_n$).


Solutions of minimum norm:

Theorem 2. Let $A\in \mathbb{R}^{m\times n},~b\in \mathbb{R}^{m}$. If $Ax=b$ has a solution for $x$, the unique solution for which $||x||_2$ is smallest, is given by

                                                                       $x=A^{(1,4)}b,$

where $A^{(1,4)}\in A\{1,4\}.$ Conversely,  if $X\in \mathbb{R}^{n\times m}$ is such that, whenever $Ax=b$ has a solution $x=Xb,$ is the solution of minimum-norm, then $X\in A^{(1,4)}$.

proof. If $Ax=b$ is consistent, then $b\in Range(A)$ and therefore, $AA^{(1,4)}b=b$ (since $AA^{(1,4)}$ is a projection on $Range(A)~ \& ~b\in Range(A)$). So, $x=A^{(1,4)}b$ is a solution of $Ax=b$. Since the map $A:Range(A^T)\rightarrow Range(A)$ is bijective, therefore, $x=A^{(1,4)}b\in Range(A^T)$ is a unique solution $Ax=b$. Now, the general solution is given as

                                                          $x=A^{(1,4)}b+z,~~z\in N(A)$

                                                      $\Rightarrow ||x||_2^{2}=||A^{(1,4)}b||_2^2 + ||z||_2^2$

                                                     $\Rightarrow ||x||_2>||A^{(1,4)}b||_2, \text{ unless } z=0.$

Thus, $x=A^{(1,4)}b$ is the unique solution of the consistent system $Ax=b$ whose norm is minimum.

Conversely, if $X$ is such that, $\forall~b\in Range(A),~x=Xb$ is the solution of $Ax=b$ of minimum-norm. Set $b$ equal to each columns of $A$ successively, then we can say that $XA=A^{(1,4)}A$. Now, we will show that $X\in A\{1,4\}$. Pre-multiplying $XA=A^{(1,4)}A$ by $A$, we get $AXA=AA^{(1,4)}A=A$. So, $X\in A\{1\}$. Again, using the fact that $(A^{(1,4)}A)^T=A^{(1,4)}A,$ we get

\begin{align}A^{(1,4)}A&=A^{(1,4)}AXA\\&=(A^{(1,4)}AXA)^T\\&=(A^{(1,4)}A)^T(XA)^T\\&=(AA^{(1,4)}A)^TX^T\\&=A^TX^T\\&=(XA)^T=XA.\end{align}

Hence, $X\in \{1,4\}.$


Example. Consider the system

                                            $$\begin{bmatrix}1 &2\\1 &2 \end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}1\\2\end{bmatrix}.$$ Both $x_1=(0.5,0.5)^T$ and $x_2=(-204/985,1189/985)^T$ satisfies the given system. But, $x_1=A^{(1,4)}b=(0.5,0.5)^T$ is the minimum-norm solution. One can verify that $||x_1||_2=0.7071<1.2247=||x_2||_2.$ 


Least-squares solution of minimum norm:

$x_0$ is called the least-squares solution of minimum norm of the system $Ax=b$ if

(i) $||Ax_0-b||_2\leq ||Ax-b||_2,~~\forall~x;$ and

(ii) $||x_0||_2<||x||_2,$ for any $x\neq x_0$ which gives equality in (i).


Theorem 3. Let $A\in \mathbb{R}^{m\times n},~b\in \mathbb{R}^{m}$. Then, among the least-squares solutions of $Ax=b$, $A^{\dagger}b$ is the one of minimum-norm.

Conversely, if $X\in \mathbb{R}^{n\times m}$ has the property that, for all $b,$ $Xb$ is the minimum-norm least-squares solution of $Ax=b,$ then $X=A^{\dagger}.$

proof.  From the above discussions, we learnt that the least-squares solution of $Ax=b$ coincides with the solutions of the consistent system

                                                   $Ax=Pb,~\text{ where } P=AA^{(1,3)}.$

Thus the least-squares solution of minimum-norm of $Ax=b$ is the minimum-norm solution of the consistent system $Ax=AA^{(1,3)}b$. By Theorem 2, the minimum-norm solution of $Ax=AA^{(1,3)}b$ is given by 

                                                 $x=A^{(1,4)}AA^{(1,3)}b.$

Now, one may check that $Z=A^{(1,4)}AA^{(1,3)}$ satisfies the four Penrose equations $AZA=A$, $ZAZ=Z$, $(AZ)^T=AZ$ and $(ZA)^T=ZA$. Thus $A^{(1,4)}AA^{(1,3)}=A^{\dagger}$. Hence, the least-squares solution of minimum-norm of $Ax=b$ is given as $x=A^{\dagger}b$.

Converse is easy since $X=A^{\dagger}\in A\{1,3\}\cap A\{1,4\}$. As $A^{\dagger}$ is always unique, we conclude that the least-squares solution of minimum-norm of a given system $Ax=b$ is always unique.


Example. Consider the system

                                            $$\begin{bmatrix}1 &2\\1 &1\\2 &3\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}3\\10\\2\end{bmatrix}.$$

We wish to find a solution $x$ such that it is the least-squares solution of minimum-norm of the above system, i.e., among all those $x$ for which $||Ax-b||_2$ is least, we aim to find such $x$ whose $||x||_2$ is minimum. Thus, from the above discussed theory, we know that such $x$ is unique and give as $x=A^{\dagger}b$. Here, $A$ is full column rank. In this case, $A^{\dagger}=(A^TA)^{-1}A^T$. Hence, after some multiplications, we get $x=13.33$ and $y=-7.$



NOTE: Computing the inverses $A^{(1,3)},$ $A^{(1,4)}$ and $A^{\dagger}$ are not easy in general. In this post we have discussed only the theoretical part and its implications on practical problems. In upcoming posts, we will see different methods for computing different generalized inverses.


SOURCES: 
[1] Ben-Israel, A.; Greville, T. N. E., Generalized Inverses. Theory and Applications, Springer-Verlag, New York, 2003.
[2] Meyer, C. D., Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, 2000.
[3] Trefethen, L. N.; Bau, D. III, Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia, 1997.

Friday, 3 September 2021

The Drazin inverse

Motivation: Consider a matrix $A=\begin{pmatrix}0 &1\\0 &0\end{pmatrix}$, then $A^2=O$. So, $Rank(A)=1\neq 0=Rank(A^2)$ and thus the Group inverse of the matrix $A$ does not exist because the index of $A\neq 1$. Thus, the theory developed in the Group inverse fails when the index of the given matrix is not 1. Hence, we require concept of another generalized inverse which not only  preserve spectral properties of the matrix like the group inverse but also exists for any square matrix $A$ of any index and whose inverse coincides with the usual inverse whenever $A$ is nonsingular.


Definition: Let $A\in \mathbb{R}^{n\times n}$, then there exists a unique matrix $X$ satisfying following three equations:

(i) $A^kXA=A^k$, where $k$ is the index of $A$

(ii) $XAX=X$

(iii) $AX=XA$.

The matrix $X$ is called the Drazin inverse of $A$ and is denoted as $A^D$. The Drazin inverse exists for every square matrix and is always unique for a matrix. When $A$ is nonsingular, in that case $A^D=A^{-1}.$


Properties of the Drazin inverse:

(i) $(A^T)^D=(A^D)^T$.

(ii) $A^D)^D=A$ if and only if $A$ has index 1.

(iii) $((A^D)^D)^D=A^D$.

(iv) The Drazin inverse preserves similarity, i.e., $A=ZBZ^{-1}\Rightarrow A^D=ZB^DZ^{-1}$.

(v) $AA^D (=A^DA)$ is a projection on $R(A^D)$ along $N(A^D)$.

(vi) $A^D$ has index 1, and $(A^D)^{\#}=A^2A^D$.

(vii) $A^D(A^D)^{\#}=AA^D$.

(viii) If $A$ is nilpotent, then $A^D=O$.

proofs. (i) Let $A^D$ be the Drazin inverse of $A$ with index $k$, so it satisfies the three conditions: $A^kA^DA=A^k$, $A^DAA^D=A^D$ and $AA^D=A^DA$. Let us denote $B=A^T$ and $X=(A^D)^T$. Then, we have to show that $X=B^D$. Now, $BX=A^T(A^D)^T=(A^DA)^T=(AA^D)^T=(A^D)^TA^T=XB$ and using this we get, $XBX=(A^D)^T(A^T(A^D)^T)=(A^DAA^D)^T=(A^D)^T=X$. Finally, 

                     $\begin{align}B^kXB&=(A^T)^k(A^T(A^D)^T)\\&=(A^k)^TA^T(A^D)^T\\&=(A^kAA^D)^T\\&=(A^k)^T\\&=(A^T)^k\\&=B^k.\end{align}$


(ii) If part: Let $A$ be of index 1. Then, $(A^D)^D=(A^{\#})^{\#}=A$ (since $A^D=A^{\#}$ and $(A^{\#})^{\#}=A$ proved in the Group inverse post).

Only if: Assume that $A$ is singular, i.e., $0$ is an eigenvalue of $A$. Then, 

$A=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &J_{(\lambda=0)}\end{bmatrix}Z^{-1},$ where $J_{(\lambda\neq 0)}$ and $J_{(\lambda=0)}$ ($k\times k$) are the Jordan block corresponding to the non-zero and zero eigenvalues of $A$, respectively. Clearly, $J_{(\lambda\neq 0)}$ is nonsingular and $J^{k}_{(\lambda=0)}=O$ with $J^{k-1}_{(\lambda=0)} \neq O$. Now, $A^D=Z\begin{bmatrix}J^{-1}_{(\lambda\neq 0)} &O\\O &O\end{bmatrix}Z^{-1}$ (see Theorem 2 for the proof). It is easy to check that 

$(A^D)^D=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &O\end{bmatrix}Z^{-1}$. Now, if $(A^D)^D=A$, then $J_{(\lambda=0)}=O$, a zero matrix and thus, $A=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &O\end{bmatrix}Z^{-1}$. Since $J_{(\lambda\neq 0)}$ is nonsingular, we have $rank(A)=rank(A^2)$ and hence, index of $A$ is 1.


(iii) From (ii), $A=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &J_{(\lambda=0)}\end{bmatrix}Z^{-1}$, then we can easily check that $A^D=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &O\end{bmatrix}Z^{-1}$ (see Theorem 2) and $(A^D)^D=Z\begin{bmatrix}J^{-1}_{(\lambda\neq 0)} &O\\O &O\end{bmatrix}Z^{-1}$. Applying Theorem 2 again to find the Drazin inverse of $(A^D)^D$, we get $((A^D)^D)^D=A^D$.


(iv) Let $A=ZBZ^{-1}$ and $X=ZB^DZ^{-1}$, then

                             $\begin{align}AX&=ZBB^DZ^{-1}\\&=ZB^DBZ^{-1}\\&=ZB^DZ^{-1}ZBZ^{-1}\\&=XA;\end{align}$

                     $\begin{align}XAX&=ZB^DZ^{-1}ZB^DBZ^{-1}\\&=ZB^DB^DBZ^{-1}\\&=Z(B^DBB^D)Z^{-1}\\&=ZB^DZ^{-1}\\&=X;\end{align}$


 $\begin{align}B^kXB&=(ZB^DZ^{-1}ZB^DBZ^{-1})\\&=ZB^DB^DBZ^{-1}\\&=ZB^DBB^DZ^{-1}\\&=ZB^DZ^{-1}.\end{align}$


(v) Let $P=AA^D$. Clearly, $P^2=P$ (since $AA^D=A^DA$). So, $P$ is a projection. Now,

                             $\begin{align}R(AA^D)&=R(A^DA)\\&\subseteq R(A^D)\\&=R(A^DAA^D)\\&\subseteq R(A^DA)\\&=R(AA^D);\end{align}$ 

$N(AA^D)\subseteq N(A^D)=N(A^DAA^D)=N(AA^D)$. Thus, $R(AA^D)=R(A^D)$ and $N(AA^D)=N(A^D)$. Hence, $AA^D (=A^DA)$ is a projection on $R(A^D)$ along $N(A^D)$.


(vi) The Drazin inverse of any singular matrix $A$ is of the form, $A^D=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &O\end{bmatrix}Z^{-1}$. Clearly, $A^D$ has index 1 since $rank(A^D)=rank(A^D)^2$ and thus its Group inverse exists. Let $X=A^2A^D$, then

                                  $\begin{align}A^DXA^D &=A^DAAA^DA^D\\ &=A^DAA^DAA^D\\ &=A^DAA^D\\ &=A^D;\end{align}$

                                     $\begin{align}XA^DX &=A^2A^DA^DA^2A^D\\ &=AA^DAA^DA^2AA^D\\ &=AA^DAA^DAA^D\\ &=A^2A^D;\end{align}$

                             $\begin{align}AX &=AA^2A^D\\ &=A^3A^D\\ &=A^2A^DA\\ &=XA.\end{align}$

 Hence, $A^2A^D=X=(A^D)^{\#}$.


(vii) Using (vi), $A^D(A^D)^{\#}=A^D(A^2A^D)=A^DA =A^DA$.


(viii) Obvious from Theorem 2 (see the proof below).


Theorem 1. Let $A$ be square singular matrix whose index is $k$. If the system

                                                                        $Ax=b,~~x\in R(A^k)$,

is consistent, then it is uniquely given by $x=A^Db$.

proof. Since $x\in R(A^k)$, then $x=A^ky$ for $y$. Therefore, $A^{k+1}y=A(A^ky)=b$. Using $AA^DA=A$ and $AA^D=A^DA$, we get 

                                                            $\begin{align}x &=A^ky\\ &=A^{k-1}(AA^DA)y\\ &=A^{k+1}A^Dy\\ &=A^DA^{k+1}y\\ &=A^Db.\end{align}$

Now, we will prove the uniqueness. Let $z_1$ and $z_1$ be two solutions of $Ax=b$ such that $z_1, z_2\in R(A^k)$. Then, $Az_1=b$ and $Az_2=b$ imply that $z_1-z_2\in N(A)\subseteq N(A^k)$. Since $k$ is the index of $A$, therefore $R(A^k)\cap N(A^k)=\{0\}$. Thus, $z_1-z_2\in R(A^k)\cap N(A^k)=\{0\}$. Hence, $z_1=z_2$.


Theorem 2. Let $A\in \mathbb{R}^{n\times n}$ whose Jordan form is given as

                                         $A=Z\begin{bmatrix}J_{(\lambda\neq 0)} &O\\O &J_{(\lambda=0)}\end{bmatrix}Z^{-1},$

where $J_{(\lambda\neq 0)}$ (k\times k) and $J_{(\lambda=0)}$ are the Jordan block corresponding to the non-zero and zero eigenvalues of $A$. Then

                                         $A^D=Z\begin{bmatrix}J_{(\lambda\neq 0)}^{-1} &O\\O &O\end{bmatrix}Z^{-1}.$

proof.  We will show that $X=Z\begin{bmatrix}J_{(\lambda\neq 0)}^{-1} &O\\O &O\end{bmatrix}Z^{-1}$ satisfies all three matrix equations: $A^{k}XA=A^{k}$, $XAX=X$ and $AX=XA$. It is clear that $A^k=Z\begin{bmatrix}J_{(\lambda\neq 0)}^{k} &O\\O &O\end{bmatrix}Z^{-1}$ (since the index of $A$ is $k$) and thus by block matrix multiplication $A^{k}XA=A^{k}$. Conditions $XAX=X$ and $AX=XA$ satisfy trivially. Hence the proof.


Example. Let $A=\begin{bmatrix}0 &-2  &0  &-5 &2\\0 &-1 &0 &-2 &0\\0 &0 &0 &2 &0\\0 &1 &0 &2 &0\\-2 &-2 &1 &-8 &4\end{bmatrix}$. Then, the Jordan form of $A$ is given as

$A=Z\begin{bmatrix}1 &0 &0 &0 &0\\0 &2 &1 &0 &0\\0 &0 &2 &0 &0\\0 &0 &0 &1 &0\\0 &0 &0 &0 &0\\\end{bmatrix}Z^{-1}$, where $Z=\begin{bmatrix}1 &-2 &1 &1 &0\\-1 &0 &0 &0 &-2\\2 &0 &0 &2 &0\\1 &0 &0 &0 &1\\2 &-2 &0 &0 &1\\\end{bmatrix}$. The matrix $J=\begin{bmatrix}1 &0 &0 &0 &0\\0 &2 &1 &0 &0\\0 &0 &2 &0 &0\\0 &0 &0 &1 &0\\0 &0 &0 &0 &0\\\end{bmatrix}$ has two jordan blocks, $J_{(\lambda\neq 0)}=\begin{bmatrix}1 &0 &0 \\0 &2 &1 \\0 &0 &2\end{bmatrix}$ and $J_{(\lambda=0)}=\begin{bmatrix}0 &1 \\0 &0 \end{bmatrix}$.

By Theorem 2, the Drazin inverse of $A$ is given as

$A^D=Z\begin{bmatrix} J_{(\lambda\neq 0)}^{-1} &O \\O &O\end{bmatrix}Z^{-1}=Z\begin{bmatrix}1 &0 &0 &0 &0\\0 &\frac{1}{2} &\frac{-1}{4} &0 &0\\0 &0 &\frac{1}{2} &0 &0\\0 &0 &0 &0 &0\\0 &0 &0 &0 &0\\\end{bmatrix}Z^{-1}$, i.e., 

$A^D=\begin{bmatrix}1 &\frac{3}{2} &\frac{-1}{2} &\frac{7}{2} &\frac{-1}{2}\\0 &-1 &0 &-2 &0\\0 &2 &0 &4 &0\\0 &1 &0 &2 &0\\\frac{1}{2} &2 &\frac{-1}{4} &4 &0\\\end{bmatrix}$.


SOURCES: 
[1] Ben-Israel, A.; Greville, T. N. E., Generalized Inverses. Theory and Applications, Springer-Verlag, New York, 2003.

[2] Drazin, M. P., Pseudo-inverses in associative rings and subgroups. The American Mathematical Monthly, 65  (1958), 506-504.

Monday, 23 August 2021

EP matrices

Motivation: We know that when $A$ is nonsingular, then $AA^{-1}=A^{-1}A=I$. Consider $A=\begin{pmatrix}1 &1\\0 &0 \end{pmatrix}$, then it is easy to check that the Moore-Penrose inverse of this matrix is given by $A^{\dagger}=\begin{pmatrix}1/2 &0\\1/2 &0 \end{pmatrix}$. Now, $AA^{\dagger}=\begin{pmatrix}1 &0\\0 &0 \end{pmatrix}\neq \begin{pmatrix}1/2 &1/2\\1/2 &1/2 \end{pmatrix}=A^{\dagger}A$. Further,  Let $B=\begin{pmatrix}1 &0 &1\\1 &0 &1\\ 0 &1 &0\end{pmatrix}$. Then, $B^{\dagger}=\begin{pmatrix}0.25 &0.25 &0\\ 0 &0 &1\\ 0.25 &0.25 &0\end{pmatrix}\neq \begin{pmatrix}1 &-1 &1\\1 &-1 &1\\-1 &2 &-1\end{pmatrix}=B^{\#}$, where $B^{\dagger}$ is the Moore-Penrose inverse and $B^{\#}$ is the Group inverse of $B$.

Observe that when $AA^{\dagger}=A^{\dagger}A$, then from the third equation (recall, $AX=XA$) in the definition of the Group inverse, $A^{\dagger}=A^{\#}$.

It is natural from the above examples to search for the class of matrices for which $AA^{\dagger}=A^{\dagger}A$, i.e., the product of $A$ and $A^{\dagger}$ commutes or equivalently, $A^{\dagger}=A^{\#}$.

Problem statement: Is there a class of matrices for which the product of matrices $A$ and $A^{\dagger}$ commutes? 

or equivalently, 

for which class of matrices the group inverse and the Moore– Penrose inverse are the same?

Formulation: We know that $AA^{\dagger}=P_{R(A)}$, an orthogonal projection onto $R(A)$ and $A^{\dagger}A=P_{R(A^*)}$, an orthogonal projection onto $R(A^*)$. So, if we want $AA^{\dagger}=A^{\dagger}A$, we must have $P_{R(A)}=P_{R(A^*)}$ or equivalently,  $R(A)=R(A^*)$.

Definition: A square matrix $A$ is called EP (also called Range-Hermitian) matrix if $R(A)=R(A^T).$

NOTE: 1. When $A$ is Symmetric, i.e., $A=A^T$, then it is obvious that $R(A)=R(A^T)$. But, not all EP matrices are symmetric as the following examples shows:

Ex: Consider $A=\begin{pmatrix}1 &2\\1&1\end{pmatrix}$, then $A\neq A^T$ but $R(A)=R(A^T)$. So, all nonsingular nonsymmetric square matrices are contained in the class of  EP matrices.

2. When $A$ is normal, i.e., $AA^T=A^TA$, then clearly $R(A)=R(A^T)$. So, every Normal matrices are EP but converse is not true:

Ex:  Consider the same matrix $A=\begin{pmatrix}1 &2\\1&1\end{pmatrix}$, then $AA^T=\begin{pmatrix}5 &3\\3&2\end{pmatrix}\neq \begin{pmatrix}2 &3\\3&5\end{pmatrix}=A^TA$ but $R(A)=R(A^T)$ (since $A$ is nonsingular the columns of $A$ and $A^T$ span same space $\mathbb{R}^2$). 

3. EP matrices have index 1.

Proof: Let $A$ be an EP matrix, then $R(A)=R(A^T)$. But, $R(A^T)=N(A)^{\perp}$, so $R(A)=N(A)^{\perp}$ and thus, $R(A)\cap N(A)=\{0\}$.

Hence, we can draw the following conclusion:

 set of  Normal matrices $\subset$ set of  Symmetric matrices $\subset$ set of  EP matrices $\subset$ set of matrices with index 1.

Some results: 

Theorem 1. $A$ is an EP matrix if and only if the Moore-Penrose inverse of $A$ is an EP matrix.

proof. Let $A$ be an EP matrix. So, $AA^{\dagger}=A^{\dagger}A$, but we know that $(A^{\dagger})^{\dagger}=A$. Hence, $A^{\dagger}$ is EP, i.e., $A^{\dagger}(A^{\dagger})^{\dagger}=(A^{\dagger})^{\dagger}A^{\dagger}$,  Hence proved. Conversely, we have $A^{\dagger}(A^{\dagger})^{\dagger}=(A^{\dagger})^{\dagger}A^{\dagger}$ and using the property $(A^{\dagger})^{\dagger}=A$, we get $AA^{\dagger}=A^{\dagger}A$ which shows that $A$ is an EP matrix.


Theorem 2. $A$ is normal if and only if $A$ is EP matrix and $AA^TA^2=A^2A^TA$.

proof. Let $A$ be an EP and $AA^TA^2=A^2A^TA$, then we will show that $A$ is normal, i.e., $AA^T=A^TA$. The other part is trivial. Pre-multiplying and post-multiplying $AA^TA^2=A^2A^TA$ by $A^{\dagger}$, we get $A^{\dagger}AA^T(A^2A^{\dagger})=(A^{\dagger}A^2)A^TAA^{\dagger}$, but $A^2A^{\dagger}=AA^{\dagger}A=A$ and $A^{\dagger}A^2=AA^{\dagger}A=A$ (since $A~\&~A^{\dagger}$ commutes). So, $A^{\dagger}AA^TA=AA^TAA^{\dagger}$. Further, $A^{\dagger}AA^T=A^T$ as $R(A^T)\subseteq R(A^{\dagger})$ and $A^TAA^{\dagger}=A^T$ as $N(A^\dagger)\subseteq N(A^T)$. Hence, $A^TA=AA^T$ and thus $A$ is normal.


Theorem 3. For any two matrices $A$ and $B$ such that $AB$ exists, the reverse order law

                                                $(AB)^{\dagger}=B^{\dagger}A^{\dagger}$

 holds if and only if $A^TABB^T$ is EP.

proof. Let $W=A^TABB^T$ be an EP matrix. We prove this results by showing that '$W$ is an EP' if and only if '$R(W)=R(A^TAB)\subset R(B)$' and $R(W^T)=R(BB^TA^T)\subset R(B^T)$'. Then, utilizing Theorem 1 (in the Moore-Penrose inverse post), we get the proof. Clearly, $R(W)\subseteq R(A^TAB)$. Now, $W(B^T)^{\dagger}=A^TABB^T(B^T)^{\dagger}=A^TA(B^{\dagger}BB^T)^T=A^TAB$, so $R(A^TAB)\subseteq R(W)$. Thus, $R(W)=R(A^TAB)$. Using similar arguments, we can show that $W^TA^{\dagger}=BB^TA^T$ and then it follows that $R(W^T)=R(BB^TA^T)$. But, $R(W)=R(W^T)$. So, $R(W)=R(BB^TA^T)\subset R(B)$ and $R(W^T)=R(A^TAB)\subset R(A^T)$.

Conversely, assume that $R(W)\subset R(B)$ and $R(W^T)\subset R(A^T)$ and let $F=A^TA$ and $G=BB^T$, then by Theorem 1 (in the group inverse post), $R(C)=R(A^T)\cap R(B)$. Interchanging the roles of $F$ $\&$ $G$ and applying Theorem 1 (in the group inverse post) again, we get $R(W^T)=R(A^T)\cap R(B)$. Thus, $R(W)=R(W^T)$. Hence, we have proved that $R(W)=R(W^T)$ if and only if $R(W)\subset R(B)$ and $R(W^T)\subset R(B^T)$. Now directly using the conclusion of Theorem 1 (in the Moore-Penrose inverse post), we get our conclusion.


Theorem 4. Let $A$ and $B$ be two EP matrices. Then, $C=A+B$ is an EP matrix if any one of the following equivalent conditions hold

(i) $N(C)\subseteq N(A)$ and $N(C)\subseteq N(B)$.

(ii) rank $\begin{bmatrix} A\\B \end{bmatrix}$=rank$(C)$.

proof. First we prove that (i) and (ii) are equivalent. Assume that (i) holds, then $N(C)\subseteq N(A)\cap N(B)$. Since $C=A+B$, then $N(A)\cap N(B)\subseteq N(A+B)=N(C)$. Thus, $N(C)=N(A)\cap N(B)=N\left(\begin{bmatrix}A\\B\end{bmatrix}\right)$ and by Rank-Nullity Theorem, $rank(C)=rank\begin{bmatrix}A\\B\end{bmatrix}$. 

Conversely, observe that $N\left(\begin{bmatrix}A\\B\end{bmatrix}\right)=N(A)\cap N(B)\subset N(C)$ and if (ii) holds, then clearly, $N(C)=N(A)\cap N(B)$ and hence $N(C)\subseteq N(A)$ and $N(C)\subseteq N(B).$ So, (i) $\&$ (ii) are equivalent.

Now, if  $A$ and $B$ are EP, i.e., $N(A)=N(A^T)$ and $N(B)=N(B^T)$, then (i) implies that $N(C)\subseteq N(A)\cap N(B)=N(A^T)\cap N(B^T)\subseteq N(C^T)$ (since $N(A^T)\subseteq N(C^T)$ and $N(B^T)\subset N(C^T)$ by (i)) and $rank(C)=rank(C^T)$. Hence, $N(C)=N(C^T)$, i.e., $C$ is EP.


Computing the Moore-Penrose inverse of a matrix is considered to be infeasible. Very recently, Ignacio Bajo [3] proposed a method for computing the Moore-Penrose inverse using polynomial in matrices. The following result provides a characterization using an EP matrix of all those matrices whose Moore-Penrose inverse is a polynomial in $A^T$. The proof of this result will be added later.

Theorem 5. Let $A$ be an EP matrix. Then, there exists a polynomial $p(x)$ such that $A^{\dagger}=p(A^T)$ if and only if $A$ is normal.


SOURCES:

[1] Ben-Israel, A.; Greville, T. N. E., Generalized Inverses. Theory and Applications, Springer-Verlag, New York, 2003.

[2] Meenakshi, A. R., On sums of EP matrices. Houston Journal of Mathematics, 9 (1983).

[3] Bajo, I., Computing Moore-Penrose inverses with polynomials in matrices. https://doi.org/10.1080/00029890.2021.1886840

Saturday, 14 August 2021

The Group Inverse

In this post, I will discuss about an another type of generalized inverse called the Group inverse. Unlike the Moore-Penrose inverse, this pseudo inverse is only confined with the singular matrices and not for rectangular matrices. Hence we restrict ourselves to square matrices only.

Consider the matrix $A=\begin{pmatrix}1 &0 &1\\ 0 &1 &1\\ 1 &0 &1\end{pmatrix}$, then the spectrum of $A$, $\sigma(A)=\{0,1,2\}$. Further, $A^{\dagger}\begin{pmatrix}1/3 &-1/3 &1/3 \\ -1/6 &2/3 &-1/6\\ 1/6 &1/3  &1/6  \end{pmatrix}$ and $\sigma(A^{\dagger})=\{0,2/3,1/2\}.$ So, $1\in  \sigma(A)$, but $1^{-1}=1\notin  \sigma(A^{\dagger}).$

In our previous discussion about the Moore-Penrose inverse, we remarked that the Moore-Penrose inverse of a matrix does not holds many properties that a nonsingular matrix holds. For example, if for a singular $A\in \mathbb{R}^{n\times n}$, $\lambda \in \sigma(A)$, then it may be possible that $\lambda^{\dagger} \notin \sigma(A^{\dagger})$. However, for a nonsingular matrix $A$, if $\lambda \in \sigma(A)$, then it always holds that $\lambda^{-1} \in \sigma(A^{-1})$. The same argument is also true if we consider an eigenvector $x$ of $A$ and $A^{-1}.$

Similarly, $B=X^{-1}AX$ does not imply that $B^{\dagger}=X^{-1}A^{\dagger}X$. But when $A$ is nonsingular then this property holds. So, the eigenvalue properties is not preserved when we compute $A^{\dagger}$. This problem is resolved by introducing another interesting generalized inverse called the Group inverse.

Definition: Let $A\in \mathbb{R}^{n\times n}$, then there is a matrix $X$ (if it exists) which satisfies the following three matrix equations:

(i) $AXA=A$

(ii) $XAX=X$

(iii) $AX=XA.$

This $X$ is called the Group inverse of the matrix $A.$ It exists only for those matrix $A$ whose index is 1, i.e., $rank(A)=rank(A^2)$ or equivalently, $R(A)\cap N(A)=\{0\}$. The Group inverse of  $A$ is denoted by $A^{\#}.$ It is also unique whenever it exists.

proof for uniqueness: Let $X$ and $Y$ both satisfy the above three equations, then

$X=(XA)X=(A)XX=AY(AX)X=AY(XA)X=(AY)X=YAX$

and 

$Y=YAY=Y(A)Y=Y(AX)(AY)=YXAYA=Y(XA)=YAX$. Hence $X=Y.$

The name `Group' is given because the set $\{\cdots,(A^{\#})^{2},A^{\#},AA^{\#},A,A^{2},\cdots\}$ together with usual matrix multiplication form an abelian group. It is not difficult to notice that the identity element of this group is $AA^{\#}.$

Example: Consider the nilpotent matrix $A=\begin{pmatrix}0 &1\\0 &0 \end{pmatrix}$, then $A^{2}=\begin{pmatrix}0 &0\\0 &0 \end{pmatrix}$. Clearly, $1=rank(A)\neq rank(A^2)=0$. Alternatively, we can see that $(1,0)^T\in R(A)\cap N(A)$.

From above example, it is easy to observe that the Group inverse of any nonzero nilpotent matrix can not exist because the range and null space can never be complimentary for a nilpotent matrix. From our elementary knowledge of linear algebra, we know that a (nonzero) nilpotent matrix is not diagonalizable. Below we first try to investigate the group inverse of those matrices which is diagonalizable.

Suppose $A$ is diagonalizable index 1 matrix, then there exists a nonsingular matrix $P$ and a diagonal matrix $D=diag(\lambda_1,\lambda_2,\cdots,\lambda_n)$ ($n$ is order of $A$) such that $A=PDP^{-1}$. Then we can check that $A^{\#}=P^{-1}D^{\dagger}P$. 

Recall that $D^{\dagger}=diag(\lambda_1^{\dagger},\lambda_2^{\dagger},\cdots,\lambda_n^{\dagger}).$

The explicit formula for calculating Group inverse of a matrix $A$:

step 1: Find the full rank decomposition of $A$ as $A=FG$.

step 2: Check whether $(GF)^{-1}$ exists. If $GF$ is singular, then the group inverse of $A$ doesn't               exists.

step 3: Finally, $A^{\#}=F(GF)^{-2}G.$

Properties of Group inverse:

(i) Whenever $A$ is nonsingular, $A^{\#}=A^{-1}$

(ii) $A^{\# \#}=A$

(iii) $(A^T)^{\#}=(A^{\#})^{T}$, where $T$ stands for the transpose

(iv) $\lambda(\neq 0)\in \sigma(A)$, then $\lambda^{-1}\in \sigma(A^{\#})$

(v) $x$ is the eigenvector corresponding to the eigenvalue $\lambda (\neq 0)$ of $A$, then $x$ is also  the                eigenvector of corresponding to the eigenvalue $\lambda^{-1}$ of $A^{\#}.$


Theorem 1. Let $K$ be a square matrix of index 1, and let $L$ be such that $R(KL)\subset R(L)$. Then, $R(KL)=R(K)\cap R(L)$.

SOURCES: 

[1] Ben-Israel, A.; Greville, T. N. E., Generalized Inverses. Theory and Applications, Springer-Verlag, New York, 2003.

Normal Equations

 In this post we discuss about normal equations. But, first we prove the following result which will be useful in our subsequent argumen...