In this post, we discuss about least-squares solutions of a linear system $Ax=b$, where $A\in \mathbb{R}^{m\times n}$, $x\in \mathbb{R}^{n}$ and $b\in \mathbb{R}^{m}$. First we consider the following example:
Example: Find a solution of the system
$$\begin{bmatrix}1 &2\\1 &1\\2 &3\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}3\\10\\2\end{bmatrix}.$$
We observe that it is not possible to find $x$ and $y$ which satisfies above matrix equation since $\begin{bmatrix}3\\10\\2\end{bmatrix}\notin Range(A)$. So, the system is inconsistent. Such system where $m>n$, is called overdetermined system and in general such systems may not have a solution as vector $b$ has $m$ coordinates (here, in this example $m=3$), while $dim(Range(A))\leq n$ (here, $rank(A)= n=2$). Clearly, $r=b-Ax\neq 0$ for any $x$. So, let us instead make $r$ as small as possible i.e., very close to zero in some sense, where the term 'closeness' is measured as minimizing the Euclidean norm of the vector $r$ which is also called the residue vector.
The problem defined above takes the following form
Given $A\in \mathbb{R}^{m\times n},~m\geq n$, find $x$ such that $||b-Ax||_2$ is minimized.
How to find $x$ in the above problem?
The least-squares problem asks for vectors $x$ such that $Ax$ is as close to $b$ as possible. But, $Ax\in Range(A)$. We claim that $y=Pb$ is closest to $b$, where $P$ is the orthogonal projection of $b$ onto $Range(A)$. To show that $y=Pb$ is the unique point that minimizes $||b-y||_2$, suppose $z (\neq y)$ is another vector in $Range(A)$. Clearly, $(y-z) \perp (b-y)$, then by Pythagorean theorem, $||b-z||_2^2=||b-y||_2^2+||y-z||_2^2>||b-y||_2^2$. Thus, $y=Pb$ is the unique point that minimizes $||b-y||_2$. Hence, finding least-squares solution of $Ax=b$ is equivalent to solving the system $Ax=Pb$, where $P$ is the orthogonal projection of $b$ onto $Range(A)$.
Theorem 1. A vector $x$ is a least-squares solution of $Ax=b$ if and only if $x$ is a solution of $$A^TAx=A^Tb,$$
often called the normal equation of $Ax=b$.
proof. We will show that the system $Ax=Pb$ is equivalent to the normal equation $A^TAx=A^Tb$, then the conclusion directly follows by the above discussions. Now,
\begin{align*}&Ax=Pb\\ \iff & PAx=P^2b=Pb,~~\text{ since } P \text{ is a projection }\\ \iff & P(Ax-b)=0\\ \iff & (Ax-b)\in Null(P)=Range(A)^{\perp}=Null(A^T)\\ \iff & A^T(Ax-b)=0\\ \iff & A^TAx=A^Tb.\end{align*}
Example A small company in business for four years and has recoded annual sales (in tens of thousands of dollars) as follows:
|
|
|
|
|
|
Year |
1 |
2 |
3 |
4 |
Sales |
23 |
27 |
30 |
34
|
When this data is plotted as shown in the figure below, we see that although the points do not exactly lie on a straight line, there nevertheless appears to be a linear trend. Predict the sales for any future year if this trend continues.
Solution: We are asked to find the line $f(t)=a+bt$ that best fits the data in the sense of least-squares. We have the matrix representation of this problem as $Ax=b$, where
$A=\begin{bmatrix}1 &1\\1 &2\\1 &3\\1 &4\end{bmatrix},~b=\begin{bmatrix}23\\27\\30\\34\end{bmatrix}, \text{ and } x=\begin{bmatrix}a\\b\end{bmatrix}.$
Then, the previous discussion guarantees that $x$ is the solution of the normal equation $A^TAx=A^Tb$, i.e.,
$\begin{bmatrix}4 &10\\10 &30\end{bmatrix} \begin{bmatrix}a\\b\end{bmatrix}=\begin{bmatrix}114\\303\end{bmatrix}.$
Solving this we get $a=19.5$ and $b=3.6$, so we predict that the sales in year $t$ will be $f(t)=19.5+3.6t.$ For example, $f(5)=19.5+(3.6)(5)=37.5$, so the estimated sales for year five is $\$375000.$
What is $P$ precisely?
Let $A^{(1,3)}\in A\{1,3\}$, i.e., $A^{(1,3)}$ satisfies the first and third Penrose equations $(AA^{(1,3)}A=A~\&~(AA^{(1,3)})^T=AA^{(1,3)})$, then $AA^{(1,3)}$ is an orthogonal projection onto $Range(A)$ as shown below:
(i) $(AA^{(1,3)})^2=AA^{(1,3)}AA^{(1,3)}=AA^{(1,3)};$
(ii) $(AA^{(1,3)})^T=AA^{(1,3)};$
(iii) \begin{align*} Range(AA^{(1,3)})&\subseteq Range(A)\\ &\subseteq Range(AA^{(1,3)}A)\\ &\subseteq Range(AA^{(1,3)})\\ &\subseteq Range(A).\end{align*}
So, $Range(AA^{(1,3)})=Range(A).$
Precisely, $P=AA^{(1,3)}$ and notice that $x=A^{(1,3)}b$ satisfies $Ax=Pb$. So, we can conclude that $x=A^{(1,3)}b$ is the required least-squares solution. But, we have learnt that $A^{(1,3)}$ is not unique (see Moore-Penrose discussion). So, there can be infinite least-squares solutions of $Ax=b$. The general least-squares solution is
$x=A^{(1,3)}b+(I_n-A^{(1,3)}A)z$
with $A^{(1,3)}\in A\{1,3\}$ and arbitrary $z\in \mathbb{R}^{n}$. It is unique only when $rank(A)=n$ (since in that case $A^{(1,3)}A=I_n$).
Solutions of minimum norm:
Theorem 2. Let $A\in \mathbb{R}^{m\times n},~b\in \mathbb{R}^{m}$. If $Ax=b$ has a solution for $x$, the unique solution for which $||x||_2$ is smallest, is given by
$x=A^{(1,4)}b,$
where $A^{(1,4)}\in A\{1,4\}.$ Conversely, if $X\in \mathbb{R}^{n\times m}$ is such that, whenever $Ax=b$ has a solution $x=Xb,$ is the solution of minimum-norm, then $X\in A^{(1,4)}$.
proof. If $Ax=b$ is consistent, then $b\in Range(A)$ and therefore, $AA^{(1,4)}b=b$ (since $AA^{(1,4)}$ is a projection on $Range(A)~ \& ~b\in Range(A)$). So, $x=A^{(1,4)}b$ is a solution of $Ax=b$. Since the map $A:Range(A^T)\rightarrow Range(A)$ is bijective, therefore, $x=A^{(1,4)}b\in Range(A^T)$ is a unique solution $Ax=b$. Now, the general solution is given as
$x=A^{(1,4)}b+z,~~z\in N(A)$
$\Rightarrow ||x||_2^{2}=||A^{(1,4)}b||_2^2 + ||z||_2^2$
$\Rightarrow ||x||_2>||A^{(1,4)}b||_2, \text{ unless } z=0.$
Thus, $x=A^{(1,4)}b$ is the unique solution of the consistent system $Ax=b$ whose norm is minimum.
Conversely, if $X$ is such that, $\forall~b\in Range(A),~x=Xb$ is the solution of $Ax=b$ of minimum-norm. Set $b$ equal to each columns of $A$ successively, then we can say that $XA=A^{(1,4)}A$. Now, we will show that $X\in A\{1,4\}$. Pre-multiplying $XA=A^{(1,4)}A$ by $A$, we get $AXA=AA^{(1,4)}A=A$. So, $X\in A\{1\}$. Again, using the fact that $(A^{(1,4)}A)^T=A^{(1,4)}A,$ we get
\begin{align}A^{(1,4)}A&=A^{(1,4)}AXA\\&=(A^{(1,4)}AXA)^T\\&=(A^{(1,4)}A)^T(XA)^T\\&=(AA^{(1,4)}A)^TX^T\\&=A^TX^T\\&=(XA)^T=XA.\end{align}
Hence, $X\in \{1,4\}.$
Example. Consider the system
$$\begin{bmatrix}1 &2\\1 &2 \end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}1\\2\end{bmatrix}.$$ Both $x_1=(0.5,0.5)^T$ and $x_2=(-204/985,1189/985)^T$ satisfies the given system. But, $x_1=A^{(1,4)}b=(0.5,0.5)^T$ is the minimum-norm solution. One can verify that $||x_1||_2=0.7071<1.2247=||x_2||_2.$
Least-squares solution of minimum norm:
$x_0$ is called the least-squares solution of minimum norm of the system $Ax=b$ if
(i) $||Ax_0-b||_2\leq ||Ax-b||_2,~~\forall~x;$ and
(ii) $||x_0||_2<||x||_2,$ for any $x\neq x_0$ which gives equality in (i).
Theorem 3. Let $A\in \mathbb{R}^{m\times n},~b\in \mathbb{R}^{m}$. Then, among the least-squares solutions of $Ax=b$, $A^{\dagger}b$ is the one of minimum-norm.
Conversely, if $X\in \mathbb{R}^{n\times m}$ has the property that, for all $b,$ $Xb$ is the minimum-norm least-squares solution of $Ax=b,$ then $X=A^{\dagger}.$
proof. From the above discussions, we learnt that the least-squares solution of $Ax=b$ coincides with the solutions of the consistent system
$Ax=Pb,~\text{ where } P=AA^{(1,3)}.$
Thus the least-squares solution of minimum-norm of $Ax=b$ is the minimum-norm solution of the consistent system $Ax=AA^{(1,3)}b$. By Theorem 2, the minimum-norm solution of $Ax=AA^{(1,3)}b$ is given by
$x=A^{(1,4)}AA^{(1,3)}b.$
Now, one may check that $Z=A^{(1,4)}AA^{(1,3)}$ satisfies the four Penrose equations $AZA=A$, $ZAZ=Z$, $(AZ)^T=AZ$ and $(ZA)^T=ZA$. Thus $A^{(1,4)}AA^{(1,3)}=A^{\dagger}$. Hence, the least-squares solution of minimum-norm of $Ax=b$ is given as $x=A^{\dagger}b$.
Converse is easy since $X=A^{\dagger}\in A\{1,3\}\cap A\{1,4\}$. As $A^{\dagger}$ is always unique, we conclude that the least-squares solution of minimum-norm of a given system $Ax=b$ is always unique.
Example. Consider the system
$$\begin{bmatrix}1 &2\\1 &1\\2 &3\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix}=\begin{bmatrix}3\\10\\2\end{bmatrix}.$$
We wish to find a solution $x$ such that it is the least-squares solution of minimum-norm of the above system, i.e., among all those $x$ for which $||Ax-b||_2$ is least, we aim to find such $x$ whose $||x||_2$ is minimum. Thus, from the above discussed theory, we know that such $x$ is unique and give as $x=A^{\dagger}b$. Here, $A$ is full column rank. In this case, $A^{\dagger}=(A^TA)^{-1}A^T$. Hence, after some multiplications, we get $x=13.33$ and $y=-7.$
NOTE: Computing the inverses $A^{(1,3)},$ $A^{(1,4)}$ and $A^{\dagger}$ are not easy in general. In this post we have discussed only the theoretical part and its implications on practical problems. In upcoming posts, we will see different methods for computing different generalized inverses.
SOURCES:
[1] Ben-Israel, A.; Greville, T. N. E., Generalized Inverses. Theory and Applications, Springer-Verlag, New York, 2003.
[2] Meyer, C. D., Matrix Analysis and Applied Linear
Algebra. Society for Industrial and Applied Mathematics,
Philadelphia, 2000.
[3] Trefethen, L. N.; Bau, D. III, Numerical linear algebra. Society for Industrial and Applied Mathematics, Philadelphia, 1997.