In this post we discuss about normal equations. But, first we prove the following result which will be useful in our subsequent argument.
Theorem. Let $A\in \mathbb{R}^{m\times n}$. Then,
(i) $N(A)=N(A^TA),$ where $N(B)$ denotes the null space of $B.$
(ii) $r(A)=r(A^TA)=r(AA^T),$ where $r(B)$ denotes the rank of $B.$
(iii) $R(A)=R(AA^T)$ and $R(A^TA)=R(A^T),$ where $R(B)$ denotes the range space of $B.$
proof. (i) Let $x(\neq 0)$ such that $Ax=0$. Then, $(A^TA)x=A^T(Ax)=0.$ Conversely, suppose $y(\neq 0)$ with $A^TAy=0.$ Then, $0=y^TA^TAy=(Ay)^T(Ay)=||Ay||_2^{2}$ which implies that $Ay=0.$ Hence, $N(A)=N(A^TA).$
(ii) $A$ and $A^TA$ have $n$ columns. By Rank-Nullity Theorem and (i), \begin{align}rank(A)&=n-nullity(A)\\&=n-nullity(A^TA)\\&=rank(A^TA).\end{align}
Interchanging the role of $A$ and $A^T$ and recalling that $r(A)=r(A^T),$ we get $r(A)=r(AA^T).$
(iii) Observe that $R(AA^T)\subseteq R(A)$ (since every column of $A^TA$ are in some linear combination of the columns of $A.$) By (ii), $r(A)=r(AA^T).$ Hence, $R(A)=R(AA^T).$ Replacing $A$ by $A^T$ and vice-versa, we further get $R(A^T)=R(A^TA).$
What do we mean by normal equations?
$A^TAx=A^Tb$ is called normal equation associated with $Ax=b.$ It is called normal because $A^T(b-Ax)=0$, i.e., the residue vector $b-Ax$ is always normal (perpendicular) to the columns of $A,$ equivalently, $(b-Ax)\perp R(A).$
Normal equations is always consistent, regardless of whether or not the original system is consistent.
Let $A\in \mathbb{R}^{m\times n}$ and $b\in \mathbb{R}^{m},$ then $A^TAx=A^Tb$ is consistent since $A^Tb\in R(A^T)=R(A^TA).$
$Ax=b$ and $A^TAx=A^Tb$ have same solution set whenever $Ax=b$ is consistent.
Let $x_0$ be a particular solution of $Ax=b.$ Clearly, $x_0$ is also a particular solution of $A^TAx=A^Tb.$ Now, the general solution of $Ax=b$ is given by $x=x_0+N(A)$. But, $N(A)=N(A^TA)$, therefore $x=x_0+N(A^TA)$, which is the general solution of $A^TAx=A^Tb.$
Conversely, if $y_0$ is a particular solution of $A^TAx=A^Tb,$ then $b-A^Ty_0\in N(A^T)=R(A)^{\perp}.$ Since, $Ax=b$ is consistent and $R(A)$ is subspace, we have $b-Ay_0\in R(A)$ implying $b-Ay_0\in R(A)\cap R(A)^{\perp}=\{0\}.$ Thus, $Ay_0=b$ is also a particular solution of $Ax=b$ and the general solution is also same as $N(A)=N(A^TA).$
When $A^TAx=A^Tb$ has a unique solution?
For any $A\in \mathbb{R}^{m\times n}$ and $b\in \mathbb{R}^{m},$ the system $A^TAx=A^Tb$ has unique solution if and only if when $N(A^TA)=\{0\}$. But, $N(A^TA)=N(A).$ Hence, $A^TAx=A^Tb$ has a unique solution if and only if $N(A)=\{0\},$ equivalently $rank(A)=n$. The unique solution is given as $x=(A^TA)^{-1}A^Tb=A^{\dagger}b.$
What do the solutions of the normal equations $A^TAx=A^Tb$ represent when the original system $Ax=b$ is inconsistent?
Perhaps the most significant question. The solution of the normal equation $A^TAx=A^Tb$ gives the least-squares solution for an inconsistent system $Ax=b.$ See the post Least-squares solution.
Drawbacks of normal equations.
The normal equations are often avoided in numerical computations because if the system $Ax=b$ is ill-conditioned (i.e., the solutions are very sensitive to small changes in the entries), then $A^TA$ is even worse conditioned (see example below for better understanding), and the theoretical properties surrounding $A^TA$ and the normal equations may be lost in practical applications. Nevertheless, the normal equations are an important theoretical idea that leads to practical tools of fundamental importance such as the method of least squares.
Example. Consider the system $\begin{bmatrix}3 &6\\1 &2.01\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}9\\3.01\end{bmatrix}.$
When three-digit floating-point arithmetic is used to solve the above system, then the solution is $(1,1)^T$ which is also the exact solution. However, if three-digit arithmetic is used to form the normal equations, we get a singular system
$\begin{bmatrix}10 &20\\20 &40\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}30\\60.1\end{bmatrix},$
which is an inconsistent system.
Why is this ambiguity happening?
Real numbers generally require an infinite string of digits and so cannot be represented exactly using a finite string of bits. Numbers are represented in a computer by a string of bits of a fixed length.
Nice 🤘🤘
ReplyDeleteVery informative knowledge with nicely explained .
ReplyDelete