Orthogonal Matching Pursuit
Contents
21.3. Orthogonal Matching Pursuit#
In this section, we consider the application of Orthogonal Matching Pursuit (OMP) for the problem of recovering a signal from its compressive measurements. This section is largely based on [80].
Please review the notation of compressing sensing process from Definition 18.24.
We will restrict our attention to the
is a signal vector. is the real sensing matrix with .The measurement vector
is given bywhere
is our measurement space. is known, is known while is unknown to the recovery algorithm. is assumed to be either -sparse or -compressible.
The sparse recovery problem can be written as
Though the problem looks similar to
We will adapt OMP algorithm studied in Orthogonal Matching Pursuit for the problem of sparse recovery in compressed sensing framework. In the analysis of OMP for CS We will address following questions:
How many measurements are required to recover
from exactly if is -sparse?What kind of sensing matrices are admissible for OMP to work in CS framework?
If
is not -sparse, then how much maximum error is incurred?
21.3.1. The Algorithm#
OMP algorithm adapted to CS is presented in Algorithm 21.1.
Algorithm 21.1 (Orthogonal matching pursuit for sparse recovery from compressive measurements)
Inputs:
Sensing matrix
Measurement vector
Desired sparsity level
Outputs:
A
-sparse estimate for the ideal signalAn index set
identifying the support ofAn approximation
ofA residual
Initialization:
# Iteration counter # Initial Estimate of # Approximation of # Residual # Solution support
Algorithm:
If
or : break.Increase the iteration count
Sweep (find column with largest inner product)
Update support
Update provisional solution
subject to
.Update residual
Go to step 1.
Finalization:
.
Some remarks are in order
The algorithm returns a
-term approximation of given by .Each step of algorithm is identified by the iteration counter
which runs from 0 to .At each step
, and are computed where is the -term estimate of , is corresponding measurement vector and is the residual between actual measurement vector and the estimated measurement vector .The support for
is maintained in an index set .At each iteration we add one more new index
to giving us .We will use
to denote the submatrix constructed from the columns indexed by . In other words, if , thenSimilarly we will denote a vector
to denote a vector consisting of only non-zero entries in .We note that
is orthogonal to . This is true due to being the least squares solution in the update provisional solution step.This also ensures that in each iteration a new column from
indexed by will be chosen. OMP will never choose the same column again.In case
has a sparsity level less than then will become zero in the middle. At that point we halt. There is no point going forward.An equivalent formulation of the least squares step is
followed by
We solve the least squares problem for columns of
indexed by and then assign the entries in the resultant to the entries in indexed by while keeping other entries as 0.Least squares can be accelerated by using
as the starting estimate of and carrying out a descent like Richardson’s iteration from there.
21.3.2. Exact Recovery using OMP with Random Sensing Matrices#
The objective of this subsection is to prove theoretically that OMP can recover
sparse signals from a small set of random linear measurements. In this
subsection we discuss the conditions on the random sensing matrix
Definition 21.1 (Admissible sensing matrix)
An admissible sensing matrix for
[M0] Independence: The columns of
are stochastically independent.[M1] Normalization:
for .[M2] Joint correlation:
Let
be a sequence of vectors whose norms do not exceed one.Let
be a column of that is independent from this sequence.Then
[M3] Smallest singular value: Given any
( ) submatrix from , the smallest ( -th largest) singular value satisfies
It can be shown Rademacher sensing matrices (Rademacher Sensing Matrices) and Gaussian sensing matrices (Gaussian Sensing Matrices) satisfy all the requirements of admissible sensing matrices for sparse recovery using OMP. Some of the proofs are included in the book. You may want to review corresponding sections.
Some remarks are in order to further explain the definition of admissible sensing matrices.
Typically all the columns of a sensing matrix are drawn from the same distribution. But (M0) doesn’t require so. It allows different columns of
to be drawn from different distributions.The joint correlation property (M2) depends on the decay of random variables
. i.e. it needs the tails of to be small.A bound on the smallest (non-zero) singular value of
-sub-matrices (M3) controls how much the sensing matrix can shrink -sparse vectors.I guess that the idea of admissible matrices came as follows. First OMP signal recovery guarantees were developed for Gaussian and Rademacher sensing matrices. Then the proofs were analyzed to identify the minimum requirements they imposed on the structure of random sensing matrices. This was extracted in the form of notion of admissible matrices. Finally the proof was reorganized to work for all random matrices which satisfy the admissibility criteria. It is important to understand this process of abstraction otherwise we just get surprised as to how the ideas like admissible matrices came out of the blue.
21.3.3. Signal Recovery Guarantees with OMP#
We now show that OMP can be used to recover the original signal with high probability if the random measurements are taken using an admissible sensing matrix as described above.
Here we consider the case where
Theorem 21.2
Fix some
Given the measurement vector
Some remarks are in order. Specifically we compare OMP here with basis pursuit (BP).
The theorem provides probabilistic guarantees.
The theorem actually requires more measurements than the results for BP.
The biggest advantage is that OMP is a much simpler algorithm than BP and works very fast.
Results for BP show that a single random sensing matrix can be used for recovering all sparse signals.
This theorem says that any sparse signal independent from the sensing matrix can be recovered.
Thus this theorem is weaker than the results for BP. It can be argued that for practical situations, this limitation doesn’t matter much.
Proof. The main challenge here is to handle the issues that arise
due to random nature of
We note that the columns that OMP chooses do not depend on the order in which they are stacked in
.Thus without loss of generality we can assume that the first
entries of are non-zero and rest are zero.If OMP picks up the first
columns, then OMP has succeeded otherwise it has failed.With this, support of
given by .
We now partition the sensing matrix
where
We recall from the proof of Theorem 20.17
that in order for OMP to make absolute progress at step
The proof is organized as follows:
We first construct a thought experiment in which
is not present and OMP is run only with and .We then run OMP with
present under the condition .We show that the sequence of columns chosen and residual obtained in both cases is exactly the same.
We show that the residuals obtained in the thought experiment are stochastically independent from the columns of
.We then describe the success of OMP as an event in terms of these residuals.
We compute a lower bound on the probability of the event of OMP success.
For a moment suppose that there was no
Let the columns it picks up in each step be indexed by
.Let the residuals before each step be
.Since
, hence the residual after iterations .Since OMP is a deterministic algorithm, hence the two sequences are simply functions of
and .Clearly, we can say that the residual
are stochastically independent of the columns in (since columns of are independent of the columns of ).We also know that
.
In this thought experiment we made no assumptions about
The actual sequence of residuals before each step is
.The actual sequence of column indices is
.OMP succeeds in recovering
in steps if and only if it selects the first columns of in some order.This can happen if and only if
holds.We are going to show inductively that this can happen if and only if
and .At the beginning of step 1, we have
.Now OMP selects one column from
if and only if which is identical to .So it remains to show at step 1 that
.Because
, the algorithm selects the index of the column from whose inner product with is the largest (in absolute value).Also since
with , is the index of column in whose inner product with is largest.Thus
.We now assume that for the first
iterations, real OMP chooses the same columns as our imaginary thought experiment.Thus we have
and
This is valid since the residuals at each step depend solely on the set of columns chosen so far and input
which are same for both cases.OMP chooses a column in
at -th step if and only if which is same as .Moreover since
hence the column chosen by maximizing the inner product is same for both situations.Thus
Therefore the criteria for success of OMP can be stated as
for all .
We now recall that
Thus the event on which the algorithm succeeds in sparse recovery of
from is given byIn a particular instance of OMP execution if the event
happens, then OMP successfully recovers from .Otherwise OMP fails.
Hence the probability of success of OMP is same as the probability of event
.We will be looking for some sort of a lower bound on
.We note that we have
as a sequence of random vectors in the column span of and they are stochastically independent from columns of .It is difficult to compute
directly.We consider another event
Clearly
Using conditional probability we can rewrite
Since
is an admissible matrix hence it satisfies (M3) which gives usWe just need a lower bound on the conditional probability.
We assume that
occurs.For each step index
, we haveSince
, we haveThis gives us
To simplify this expression, we define a vector
This lets us write
Thus
From the basic properties of singular values we recall that
for all vectors
in the range of .This gives us
Since
is in the column space of , for defined above we haveThis is valid under the assumption that the event
has happened.From the above we get
In the R.H.S. we can exchange the order of two maxima. This gives us
We also note that columns of
are independent.Thus in above we require that for each column of
should hold independently.Hence we can say
We recall that event
depends only on columns of .Hence columns of
are independent of .Thus
Since the sequence
depends only on columns of , hence columns of are independent of .Thus we can take help of (M2) to get
This gives us the lower bound
Finally plugging in the lower bound for
we get(21.6)#All that is remaining now is to simplify this expression.
We recall that we assumed in the theorem statement
But we assumed that
.Thus
If we choose
then(21.7)#where we assumed that
.We recall that
Applying on (21.6) we get
(21.8)#We ignored the 4-th term in this expansion.
Now we can safely assume that
giving usIf we choose
then following (21.7) we can getThus
Some further simplification can give us
Thus with a suitable choice of the constant
, a choice of with is sufficient to reduce the failure probability below .
21.3.4. Analysis of OMP using Restricted Isometry Property#
In this subsection we present an alternative analysis of
OMP algorithm using the Restricted Isometry Property of
the matrix
21.3.4.1. A re-look at the OMP Algorithm#
Before we get into the RIP based analysis of OMP, it would be useful to get some new insights into the behavior of OMP algorithm. These insights will help us a lot in performing the analysis later.
We will assume throughout that whenever
, then is full rank.The pseudo-inverse is given by
The orthogonal projection operator to the column space for
is given byThe orthogonal projection operator onto the orthogonal complement of
(column space of ) is given byBoth
and satisfy the standard properties like and .We further define
We are orthogonalizing the atoms in
against , i.e. taking the component of the atom which is orthogonal to the column space of .The atoms in
corresponding to the index set would be .We will make some further observations on the behavior of OMP algorithm [25].
Recall that the approximation after the
-th iteration is given byThe residual after
-th iteration is given byand by construction
is orthogonal to .We can write
Thus,
In summary
This shows that it is not actually necessary to compute
in order to find . An equivalent way of writing OMP algorithm could be as in Algorithm 21.2.
Algorithm 21.2 (Sketch of OMP without intermediate
Algorithm:
If halting criteria is satisfied, then break.
# Match # Identify # Update support # Update residual .Go back to step 1.
Finalization:
. .
In the matching step, we are correlating
with columns of .Since
is orthogonal to column space of , hence this correlation is identical to correlating with .To see this, observe that
Thus,
On similar lines, we can also see that
In other words, we have
(21.9)#Thus, we can observe that OMP can be further simplified and we don’t even need to compute
in order to compute .There is one catch though. If the halting criterion depends on the need to compute the residual energy, then we certainly need to compute
. If the halting criteria is simply the number of iterations, then we don’t need to compute .The revised OMP algorithm sketch is presented in Algorithm 21.3.
Algorithm 21.3 (Sketch of OMP without intermediate
Algorithm:
If halting criteria is satisfied, then break.
# Match # Identify # Update supportGo back to step 1.
Finalization:
. .
With this the OMP algorithm is considerably simplified from the perspective of analyzing its recovery guarantees.
Coming back to
, note that the columns of indexed by are all s.Thus
This makes it obvious that
and consequently (inductively).Lastly for the case of noise free model
, we may writeSince columns of
indexed by are , hence when , then .In this case
exactly since it is a least squares estimate over .For the same reason, if we construct a vector
by zeroing out the entries indexed by i.e.(21.10)#then
(21.11)#If
, then .Lastly putting
back in (21.9), we obtain(21.12)#In this version, we see that
is computed by applying the matrix to the sparse vector .We are now ready to carry out RIP based analysis of OMP.
21.3.4.2. RIP based Analysis of OMP#
Our analysis here will focus on the case for
real signals and real matrices i.e.
Some results for matrices that satisfy RIP will be useful in the upcoming analysis. Please refer to Restricted Isometry Property for an extensive treatment of RIP based results.
Theorem 18.64
applies to approximate preservation of the inner
product of sparse signals
Let
Theorem 18.66
shows that the matrix
whenever
If
From (21.11) recall that
the residual vector
Our interest is in combining above two results and get some
bound on the inner products
The next result develops these bounds around (21.12).
Lemma 21.1
Let
Then if
Note that
Proof. .
We have
and .Thus, from (21.14), we obtain
We can make a statement saying
satisfies a RIP of orderwith a RIP constant
.By the definition of
, we havewhere
is the -th entry in and denotes the -th vector from the identity basis.We already know that
for all .Consider
and take the two vectors and .We can see that
and
Applying (21.13) on the two vectors with
as our RIP matrix, we see thatBut
Noting that
, we get our desired result.
With this bound in place, we can develop a sufficient condition under which
the identification step of OMP (which identifies the new index
The following corollary establishes a lower bound on the largest entry
in
Corollary 21.1
Suppose that
we are guaranteed that
Proof. .
If (21.16) is satisfied, then for indices
, we will haveWe already know that
for all .If (21.17) is satisfied, then there exists
withFor this particular
, applying triangular inequality on (21.16)Thus
We have established that there exists some
for whichand for every
Together, they establish that OMP will indeed choose an index from the correct set.
All we need to do now is to make sure that (21.17)
is satisfied by choosing
Theorem 21.3
Suppose that
The upper bound on
Proof. The proof works by induction.
We show that under the stated conditions,
For the first iteration, we have
Note that
.It is given that
.Thus due to Theorem 18.17:
Now
or implies that(21.18)#This can be seen as follows. Assuming
, we have:Therefore
and (21.17) is satisfied and
will indeed be chosen from due to Corollary 21.1.We now assume that OMP has correctly discovered indices up to
. i.e.We have to show that it will also correctly discover
.From the definition of
in (21.10), we know that .Thus
We also know that
. By assumption satisfies RIP of order .Thus
Also due to Theorem 18.17:
Using (21.18), we get
This is the sufficient condition for Corollary 21.1 in (21.17) giving us
Hence
.