Orthogonal Matching Pursuit
Contents
20.3. Orthogonal Matching Pursuit#
Orthogonal Matching Pursuit is a greedy algorithm for computing sparse approximations of a signal in a suitable dictionary. The goal of this algorithm is to select a small number of atoms from the dictionary which can provide a representation for the signal with low error. The algorithm is iterative in structure. In each step we greedily select one new atom from the dictionary which can maximally reduce the representation error. Once we have enough atoms selected such that the representation error is below an acceptable bound, we stop. The algorithm is presented in Algorithm 20.1.
Algorithm 20.1 (Orthogonal matching pursuit for sparse approximation)
Inputs:
Signal dictionary
withThreshold
Signal
Outputs:
-sparse approximate representation satisfying support for sparse solution identified by the algorithm
Initialization:
# Iteration counter # Solution vector # Residual # Solution support
Algorithm:
If
break. # Next iteration # SweepUpdate support
Find
that maximizes
Update provisional solution
# Update residualGo to step 1.
20.3.1. The Algorithm#
We start with the initial estimate of solution
.We also maintain the support of
; i.e., the set of indices for which is non-zero.We start with an empty support.
In each (
-th) iteration we attempt to reduce the difference between the actual signal and the approximate signal based on current solution given by .We do this by choosing a new index in
given by for the column which most closely matches our current residual.We include this to our support for
and estimate new solution vector .We then compute the new residual
.We stop when the residual magnitude is below a threshold
defined by us.
Each iteration of algorithm consists of following stages:
[Sweep] We try to find the best matching atom from the dictionary with the current residual.
The best matching atom is selected using the least square error principle.
For each column
in our dictionary, we measure the projection of residual from previous iteration on the columnWe compute the magnitude of error between the projection and residual.
The square of minimum error for
is given by:We can also note that minimizing over
is equivalent to maximizing over the inner product of with .
[Update support] Ignoring the columns which have already been included in the support, we pick up the column which most closely resembles the residual of previous stage; i.e., the magnitude of error is minimum. We include the index of this column
in the support set .[Update provisional solution]
In this step we find the solution of minimizing
over the support as our next candidate solution vector.By keeping
for we are leaving out corresponding columns from our calculations.Thus we pickup up only the columns specified by
from .Let us call this subdictionary as
.The size of this subdictionary is
.Let us call corresponding sub vector as
.Suppose
, then . Let . Then and .Our minimization problem then reduces to minimizing
.We use standard least squares estimate for getting the coefficients for
over these indices.We put back
to obtain our new solution estimate .In the running example after obtaining the values
and , we will have .The solution to this minimization problem is given by
Interestingly we note that
, thuswhich means that columns in
which are part of support are necessarily orthogonal to the residual .This implies that these columns will not be considered in the coming iterations for extending the support.
This orthogonality is the reason behind the name of the algorithm as OMP.
[Update residual] We finally update the residual vector to
based on new solution vector estimate.
Example 20.1 (
Let us consider a dictionary of size
with
You may verify that each column is unit norm.
It is known that
and .Thus if a signal
has a sparse representation in then the representation is necessarily unique.We now consider a signal
given byFor saving space, we have written it as an
-tuple over two rows. You should treat it as a column vector of size .It is known that the vector has a two sparse representation in
.Let us go through the steps of OMP and see how it works.
In step 0,
, , and .We now compute absolute value of inner product of
with each of the columns. They are given byWe quickly note that the maximum occurs at index 7 with value 11.
We modify our support to
.We now solve the least squares problem
The solution gives us
.Thus we get
Again note that to save space we have presented
over two rows. You should consider it as a column vector.This leaves us the residual as
We can cross check that the residual is indeed orthogonal to the columns already selected, for
Next we compute inner product of
with all the columns in and take absolute values.They are given by
We quickly note that the maximum occurs at index 13 with value
.We modify our support to
.We now solve the least squares problem
This gives us
and .Thus we get
Finally the residual we get at step 2 is
The magnitude of residual is very small.
We conclude that our OMP algorithm has converged and we have been able to recover the exact 2 sparse representation of
in .
20.3.2. Exact Recovery Conditions#
In this subsection, following [77], we will closely look at some conditions
under which OMP is guaranteed to recover the solution for
It is given that
where contains at most non-zero entries.Both the support and entries of
are known and can be used to verify the correctness of OMP. Note that itself won’t be given to OMP.Let
; i.e., the set of indices at which optimal representation has non-zero entries.Then we can write
From the dictionary
we can extract a matrix whose columns are indexed by .where
.Thus we can also write
where
is a vector of complex entries.The columns of optimum
are linearly independent. Hence has full column rank.We define another matrix
whose columns are the remaining columns of .Thus
consists of atoms or columns which do not participate in the optimum representation of .OMP starts with an empty support.
At every step, it picks up one column from
and adds to the support of approximation.If we can ensure that it never selects any column from
we will be guaranteed that correct sparse representation is recovered.We will use mathematical induction and assume that OMP has succeeded in its first
steps and has chosen columns from so far.At this point it is left with the residual
.In
-th iteration, we compute inner product of with all columns in and choose the column which has highest inner product.We note that the maximum value of inner product of
with any of the columns in is given byCorrespondingly, the maximum value of inner product of
with any of the columns in is given bysince we have shown that
is orthogonal to the columns already chosen, hence they will not contribute to this term.In order to make sure that none of the columns in
is selected, we need
Definition 20.3 (Greedy selection ratio)
We define a ratio
This ratio is known as greedy selection ratio.
We can see that as long as
, OMP will make a right decision at -th stage.If
then there is no guarantee that OMP will make the right decision.We will assume pessimistically that OMP makes wrong decision in such situations.
We note that this definition of
We now present a condition which guarantees that
Theorem 20.17 (A sufficient condition for exact recovery using OMP)
A sufficient condition for Orthogonal Matching Pursuit
to resolve
where
Moreover, Orthogonal Matching Pursuit is a correct algorithm for
the
Proof. .
In (20.24),
is the pseudo-inverse of .What we need to show is if (20.24) holds true then
will always be less than 1.We note that the projection operator for the column span of
is given byWe also note that by assumption since
and the approximation at the -th step, .Hence
also belongs to .Thus
i.e. applying the projection operator for
on doesn’t change it.Using this we can rewrite
asWe see the term
appearing both in numerator and denominator.Now consider the matrix
and recall the definition of matrix -norm from Definition 4.146Thus
which gives us
We recall that
is max row sum norm while is max column sum norm.Hence
which means
Thus we have:
Now the columns of
are nothing but where ranges over columns of .Thus in terms of max column sum norm
Thus assuming that OMP has made
correct decision and lies in , wheneverThe initial residual
which always lies in column space of .By above logic, OMP will always select an optimal column in each step.
Since the residual is always orthogonal to the columns already selected, hence it will never select the same column twice.
Thus in
steps it will retrieve all atoms which comprise .
20.3.3. Babel Function Estimates#
There is a small problem with Theorem 20.17.
Since we don’t know the support of
holds.
Verifying this for all
It turns out that Babel function (recall Definition 18.22) can be used to relax the sufficient condition in a manner so that it becomes computationally tractable. We show how Babel function guarantees that exact recovery condition for OMP holds.
Theorem 20.18 (Babel function based guarantee for exact recovery using OMP)
Suppose that
Thus, Orthogonal Matching Pursuit is a correct algorithm for
the
In other words, for sufficiently small
Proof. .
We can write
We recall from Lemma 4.90 that operator-norm is subordinate; i.e.,
Thus with
we haveWith this we have
Now let us look at
closely.There are
columns in .For each column we compute its inner product with
and then absolute sum of the inner product.Also recall the definition of Babel function:
Clearly
We also need to provide a bound on
which requires more work.First note that since all columns in
are unit norm, hence the diagonal of contains unit entries.Thus we can write
where
contains the off diagonal terms in .Looking carefully , each column of
lists the inner products between one atom of and the remaining atoms.By definition of Babel function
Whenever
then the Von Neumann series converges to the inverse .Thus we have
Putting things together we get
Thus whenever
holds,
we have
20.3.4. Sparse Approximation Conditions#
We now remove the assumption that
In this subsection we will look at conditions under which
OMP can successfully solve the
Let
be an arbitrary signal.Suppose that
is an optimal -term approximation representation of ; i.e., is a solution to (18.20) and the optimal -term approximation of is given byWe note that
may not be unique.Let
be the support of which identifies the atoms in that participate in the -term approximation of .Let
be the subdictionary consisting of columns of indexed by .We assume that columns in
are linearly independent. This is easily established since if any atom in this set were linearly dependent on other atoms, we could always find a sparser solution which would contradict the fact that is optimal.Again let
be the matrix of columns which are not indexed by .We note that if
is identified then finding is a straightforward least squares problem.
We now present a condition under which Orthogonal Matching Pursuit is able to recover the optimal atoms.
Theorem 20.19
Assume that
A few remarks are in order.
is the approximation error norm at -th iteration. is the optimum approximation error after iterations.The theorem says that OMP makes absolute progress whenever the current approximation error is larger than the optimum error by a factor.
As a result of this theorem, we note that every optimal
-term approximation of contains the same kernel of atoms.The optimum error is always independent of choice of atoms in
term approximation (since it is optimum).Initial error is also independent of choice of atoms (since initial support is empty).
OMP always selects the same set of atoms by design.
Proof. .
Let us assume that after
steps, OMP has recovered an approximation given bywhere
chooses columns from all of which belong to .Let the residual at
-th stage beRecalling from previous section, a sufficient condition for recovering another optimal atom is
One difference from previous section is that
.We can write
Note that
is nothing but the residual left after iterations.We also note that since residual in OMP is always orthogonal to already selected columns, hence
We will now use these expressions to simplify
.We now define two new terms
and
With these we have
(20.27)#Now
has an exact -term representation in given by .Hence
is nothing but for corresponding EXACT-SPARSE problem.From the proof of Theorem 20.18 we recall
since
The remaining problem is
.Let us look at its numerator and denominator one by one.
is the maximum (absolute) inner product between any column in with .We can write
since all columns in
are unit norm. In between we used Cauchy-Schwartz inequality.Now look at denominator
where and .Thus
Now for every
we haveHence
Since
has full column rank, hence its singular values are non-zero.Thus
From Corollary 18.2 we have
Combining these observations we get
Now from (20.27)
whenever .Thus a sufficient condition for
can be written asWe need to simplify this expression a bit. Multiplying by
on both sides we getWe assumed
thus which validates the steps above.We recall that
and thus and are orthogonal to each other.Thus by applying Pythagorean theorem we have
Thus we have
This gives us a sufficient condition
(20.28)#In other words, whenever (20.28) holds true, we have
which leads to OMP making a correct choice and choosing an atom from the optimal set.Putting
and we get back (20.26) which is the desired result.
Theorem 20.19 establishes that
as long as (20.26) holds for each of the
steps from 1 to
This is answered in following corollary of Theorem 20.19.
Corollary 20.2 (An estimate for the worst case
Assume that
where
Proof. .
Suppose that OMP runs fine for first
steps where .Thus (20.26) keeps holding for first
steps.We now assume that (20.26) breaks at step
and OMP is no longer guaranteed to make an optimal choice of column from .Thus at step
we haveAny further iterations of OMP will only reduce the error further (although not in an optimal way).
This gives us
Choosing
we can rewrite this as
This is a very useful result.
It establishes that even if OMP is not able to recover the optimum
If the optimum approximation error
is small then will also be not too large.If
is moderate, then the OMP may inflate the approximation error to a higher value. But in this case, probably sparse approximation is not the right tool for signal representation over the dictionary.