Underdetermined Linear Systems
Contents
18.3. Underdetermined Linear Systems#
The discussion in this section is primarily based on chapter 1 of [30].
Consider a matrix
with .Define an under-determined system of linear equations:
where
is known and is unknown.This system has
unknowns and linear equations.There are more unknowns than equations.
Let the columns of
be given by .Column space of
(vector space spanned by all columns of ) is denoted by ; i.e.,We know that
.Clearly
for every .Thus if
then we have no solution.But, if
then we have infinite number of solutions.Let
represent the null space of given byLet
be a solution of .Let
.Then
Hence
is also a solution of of the system .Thus the set
forms the complete set of infinite solutions to the problem where
Example 18.2 (An under-determined system)
As a running example in this section,
we will consider a simple under-determined system
in
The system is specified by
and
with
where
or more simply
The solution space of this system is a line in

Fig. 18.1 An underdetermined system#
Specification of the under-determined system as above, doesn’t give us any reason to pick one particular point on the line as the preferred solution.
Two specific solutions are of interest
lies on the axis. lies on the axis.
In both of these solutions, one component is 0, thus leading these solutions to be sparse.
It is easy to visualize sparsity in this simplified 2-dimensional setup but situation becomes more difficult when we are looking at high dimensional signal spaces. We need well defined criteria to promote sparse solutions.
18.3.1. Regularization#
Are all these solutions equivalent or can we say that one solution is better than the other in some sense? In order to suggest that some solution is better than other solutions, we need to define a criteria for comparing two solutions.
In optimization theory, this idea is known as regularization.
We define a cost function
Thus the goal of the optimization problem is to find a desired
We can write this optimization problem as
If
If
A variety of such cost function based criteria can be considered.
18.3.2. Regularization#
One of the most common criteria is to choose a solution with the smallest
The problem can then be reformulated as an optimization problem
We can see that minimizing
Hence an equivalent formulation is
Example 18.3 (Minimum
We continue with our running example.
We can write
With this definition the squared
Minimizing
Since
This gives us
Thus the optimal
We note that the minimum
It is instructive to note that the
We can view this solution graphically by drawing

Fig. 18.2 Minimum
All other norm balls either don’t touch the solution line at all, or they cross it at exactly two points.
Remark 18.1 (Least squares via Lagrangian multipliers)
A formal solution to
We define the Lagrangian
with
Differentiating
By equating the derivative to
Plugging this solution back into the constraint
In above we are implicitly assuming that
Putting
We would like to mention that there are several iterative approaches to solve the
The beauty of
18.3.2.1. Convexity#
Convex optimization problems have a unique feature that it is possible to find the global optimal solution if such a solution exists.
The solution space
We recall that all
and
In the following section we will attempt to find a unique solution to our
optimization problem (18.1) using
18.3.3. Regularization#
In this subsection we will restrict our attention to the
Euclidean space case where
We choose our cost function
Example 18.4 (Minimum
We continue with our running example.
we can view this solution graphically by drawing

Fig. 18.3 Minimum
As we can see from the figure the minimum
It is interesting to note that
It is time to have a closer look at our cost function
Example 18.5 (
Consider again
Hence for any
Thus,
As an example consider the under-determined system
We can easily visualize that the solution line will pass through points
and .Moreover, it will be clearly parallel with
-norm ball of radius in the first quadrant.This gives us infinitely possible solutions to the minimization problem (18.3).
We can still observe that
these solutions are gathered in a small line segment that is bounded (a bounded convex set) and
There exist two solutions
and among these solutions which have only 1 non-zero component.
For the
these solutions are gathered in a set that is bounded and convex, and
among these solutions, there exists at least one solution with at most
non-zeros (as the number of constraints in ).
Theorem 18.1 (Existence of a sparse solution for
Let
Proof. We have the following facts
is convex and bounded. .Since
is full rank and , hence .
We proceed as follows.
Let
be an optimal solution with .Consider the
columns of which correspond to .Since
and hence these columns linearly dependent.Thus there exists a nonzero vector
with such thatNote that since we are only considering those columns of
which correspond to , hence we require whenever .Consider a new vector
where
is small enough such that every element in has the same sign as . As long assuch an
can be constructed.Note that
whenever .Clearly
Thus
is a feasible solution to the problem (18.3) though it need not be an optimal solution.But since
is optimal hence, we must assume that norm of is greater than or equal to the norm ofNow look at
as a function of in the region .In this region,
function is continuous and differentiable (w.r.t. ) since all vectors have the same sign pattern.If we define
(the vector of absolute values), thenSince the sign patterns don’t change, hence
Thus
The quantity
is a constant.The inequality
applies to both positive and negative values of in the region .This is possible only when inequality is in fact an equality.
This implies that the addition / subtraction of
under these conditions does not change the length of the solution.Thus,
is also an optimal solution.This can happen only if
We now wish to tune
such that one entry in gets zeroed while keeping the solutions length.We choose
corresponding to (defined above) and pickClearly for the corresponding
the
-th entry is zeroed while others keep their sign and the norm is also preserved.Thus, we have got a new optimal solution with
non-zeros at the most.It is possible that more than 1 entries get zeroed during this operation.
We can repeat this procedure till we are left with
non-zero elements.Beyond this we may not proceed since
. Hence we cannot say that corresponding columns of are linearly dependent.
We thus note that
18.3.4. Norm Minimization as a Linear Programming Problem#
We now show that (18.3) in
Recalling the problem:
Let us write
Example 18.6 (
Let
Then
And
Clearly
We note here that by definition
i.e., support of
We now construct a vector
We can now verify that
Also
where
Hence the optimization problem (18.3) can be recast as
This optimization problem has the classic Linear Programming structure since the objective function is affine as well as constraints are affine.
Remark 18.2 (Justification for the equivalence of the linear program)
Let
In order to show that the two optimization problems are equivalent, we need
to verify that our assumption about the decomposition of
Since
, hence .If support of
and don’t overlap, then we have .And if they overlap then
.Now for the sake of contradiction, let us assume that support of
and do overlap for the optimal solution .Let
be one of the indices at which both and .Since
, hence and .Without loss of generality let us assume that
.In the equality constraint
both of these coefficients multiply the same column of
with opposite signs giving us a termNow if we replace the two entries in
byand
to obtain an new vector
, we see that there is no impact in the equality constraint sinceAlso the nonnegativity constraint
is satisfied for
.This means that
is a feasible solution.On the other hand the objective function
value reduces by for .This contradicts our assumption that
is the optimal solution.Hence for the optimal solution of (18.4) we must have
Thus
is indeed the desired solution for the optimization problem (18.3).