21.3. Orthogonal Matching Pursuit#

In this section, we consider the application of Orthogonal Matching Pursuit (OMP) for the problem of recovering a signal from its compressive measurements. This section is largely based on [80].

Please review the notation of compressing sensing process from Definition 18.24.

We will restrict our attention to the N dimensional Euclidean space as our signal space for the moment.

  1. xRN is a signal vector.

  2. ΦRM×N is the real sensing matrix with MN.

  3. The measurement vector yRM is given by

    y=Φx

    where RM is our measurement space.

  4. y is known, Φ is known while x is unknown to the recovery algorithm.

  5. x is assumed to be either K-sparse or K-compressible.

The sparse recovery problem can be written as

(21.5)#minimize xyΦx2subject tox0K.

Though the problem looks similar to (D,K)-SPARSE approximation problem, but there are differences since Φ is not a dictionary (see Sensing Matrices). In particular, we don’t require each column to be unit norm.

We will adapt OMP algorithm studied in Orthogonal Matching Pursuit for the problem of sparse recovery in compressed sensing framework. In the analysis of OMP for CS We will address following questions:

  • How many measurements are required to recover x from y exactly if x is K-sparse?

  • What kind of sensing matrices are admissible for OMP to work in CS framework?

  • If x is not K-sparse, then how much maximum error is incurred?

21.3.1. The Algorithm#

OMP algorithm adapted to CS is presented in Algorithm 21.1.

Algorithm 21.1 (Orthogonal matching pursuit for sparse recovery from compressive measurements)

Inputs:

  • Sensing matrix ΦRM×N

  • Measurement vector yRM

  • Desired sparsity level K

Outputs:

  • A K-sparse estimate x^ΣKRN for the ideal signal x

  • An index set ΛK{1,,N} identifying the support of x^

  • An approximation yKRM of y

  • A residual rK=yyKRM

Initialization:

  1. k0 # Iteration counter

  2. x00 # Initial Estimate of xRN

  3. y0Φx0=0 # Approximation of y

  4. r0yy0=y # Residual rRM

  5. Λ0 # Solution support Λ=supp(x^)

Algorithm:

  1. If k==K or rk==0: break.

  2. Increase the iteration count

    kk+1.
  3. Sweep (find column with largest inner product)

    λk=argmax1jN|rk1,ϕj|.
  4. Update support

    ΛkΛk1{λk}.
  5. Update provisional solution

    xkargminxΦxy22

    subject to supp(x)=Λk.

  6. Update residual

    ykΦxk;rkyyk.
  7. Go to step 1.

Finalization:

  • x^xk.

Some remarks are in order

  1. The algorithm returns a K-term approximation of x given by x^.

  2. Each step of algorithm is identified by the iteration counter k which runs from 0 to K.

  3. At each step xk, yk and rk are computed where xk is the k-term estimate of x, yk is corresponding measurement vector and rk is the residual between actual measurement vector y and the estimated measurement vector yk.

  4. The support for x^ is maintained in an index set Λ.

  5. At each iteration we add one more new index λk to Λk1 giving us Λk.

  6. We will use ΦΛkRM×k to denote the submatrix constructed from the columns indexed by Λk. In other words, if Λk={λ1,,λk}, then

    ΦΛk=[ϕλ1ϕλk].
  7. Similarly we will denote a vector xΛkkRk to denote a vector consisting of only k non-zero entries in x.

  8. We note that rk is orthogonal to ΦΛk. This is true due to xk being the least squares solution in the update provisional solution step.

  9. This also ensures that in each iteration a new column from Φ indexed by λk will be chosen. OMP will never choose the same column again.

  10. In case x has a sparsity level less than K then rk will become zero in the middle. At that point we halt. There is no point going forward.

  11. An equivalent formulation of the least squares step is

    zarg minvRkΦΛkvy22

    followed by

    xΛkkz.
  12. We solve the least squares problem for columns of Φ indexed by Λk and then assign the k entries in the resultant z to the entries in xk indexed by Λk while keeping other entries as 0.

  13. Least squares can be accelerated by using xk1 as the starting estimate of xk and carrying out a descent like Richardson’s iteration from there.

21.3.2. Exact Recovery using OMP with Random Sensing Matrices#

The objective of this subsection is to prove theoretically that OMP can recover sparse signals from a small set of random linear measurements. In this subsection we discuss the conditions on the random sensing matrix Φ under which they are suitable for signal recovery through OMP.

Definition 21.1 (Admissible sensing matrix)

An admissible sensing matrix for K-sparse signals in RM is an M×N random matrix Φ with following properties.

  • [M0] Independence: The columns of Φ are stochastically independent.

  • [M1] Normalization: E(ϕj22)=1 for j=1,,N.

  • [M2] Joint correlation:

    • Let {uk} be a sequence of K vectors whose 2 norms do not exceed one.

    • Let ϕ be a column of Φ that is independent from this sequence.

    • Then

      P(maxk|ϕ,uk|ϵ)12Kexp(cϵ2M).
  • [M3] Smallest singular value: Given any M×K (K<M) submatrix Z from Φ, the smallest (K-th largest) singular value σmin(Z) satisfies

    P(σmin(Z)0.5)1exp(cM).

c>0 is some positive constant.

It can be shown Rademacher sensing matrices (Rademacher Sensing Matrices) and Gaussian sensing matrices (Gaussian Sensing Matrices) satisfy all the requirements of admissible sensing matrices for sparse recovery using OMP. Some of the proofs are included in the book. You may want to review corresponding sections.

Some remarks are in order to further explain the definition of admissible sensing matrices.

  1. Typically all the columns of a sensing matrix are drawn from the same distribution. But (M0) doesn’t require so. It allows different columns of Φ to be drawn from different distributions.

  2. The joint correlation property (M2) depends on the decay of random variables ϕj2. i.e. it needs the tails of ϕj2 to be small.

  3. A bound on the smallest (non-zero) singular value of M×K-sub-matrices (M3) controls how much the sensing matrix can shrink K-sparse vectors.

  4. I guess that the idea of admissible matrices came as follows. First OMP signal recovery guarantees were developed for Gaussian and Rademacher sensing matrices. Then the proofs were analyzed to identify the minimum requirements they imposed on the structure of random sensing matrices. This was extracted in the form of notion of admissible matrices. Finally the proof was reorganized to work for all random matrices which satisfy the admissibility criteria. It is important to understand this process of abstraction otherwise we just get surprised as to how the ideas like admissible matrices came out of the blue.

21.3.3. Signal Recovery Guarantees with OMP#

We now show that OMP can be used to recover the original signal with high probability if the random measurements are taken using an admissible sensing matrix as described above.

Here we consider the case where x is known to be K-sparse.

Theorem 21.2

Fix some δ(0,1), and choose MCKln(Nδ) where C is an absolute constant. Suppose that x is an arbitrary K-sparse signal in RN and draw an M×N admissible sensing matrix Φ independent from x.

Given the measurement vector y=ΦxRM, Orthogonal Matching Pursuit can reconstruct the signal with probability exceeding 1δ.

Some remarks are in order. Specifically we compare OMP here with basis pursuit (BP).

  1. The theorem provides probabilistic guarantees.

  2. The theorem actually requires more measurements than the results for BP.

  3. The biggest advantage is that OMP is a much simpler algorithm than BP and works very fast.

  4. Results for BP show that a single random sensing matrix can be used for recovering all sparse signals.

  5. This theorem says that any sparse signal independent from the sensing matrix can be recovered.

  6. Thus this theorem is weaker than the results for BP. It can be argued that for practical situations, this limitation doesn’t matter much.

Proof. The main challenge here is to handle the issues that arise due to random nature of Φ. We start with setting up some notation for this proof.

  1. We note that the columns that OMP chooses do not depend on the order in which they are stacked in Φ.

  2. Thus without loss of generality we can assume that the first K entries of x are non-zero and rest are zero.

  3. If OMP picks up the first K columns, then OMP has succeeded otherwise it has failed.

  4. With this, support of x given by Λopt={1,,K}.

We now partition the sensing matrix Φ as

Φ=[Φopt|Ψ]

where Φopt consists of first K columns of Φ which correspond to Λopt. Ψ consists of remaining (NK) columns of Φ.

We recall from the proof of Theorem 20.17 that in order for OMP to make absolute progress at step k+1 we require the greedy selection ratio ρ(rk)<1 where

ρ(rk)=ΨHrkΦoptHrk=maxψΨ|ψ,rk|ΦoptHrk.

The proof is organized as follows:

  1. We first construct a thought experiment in which Ψ is not present and OMP is run only with y and Φopt.

  2. We then run OMP with Ψ present under the condition ρ(rk)<1.

  3. We show that the sequence of columns chosen and residual obtained in both cases is exactly the same.

  4. We show that the residuals obtained in the thought experiment are stochastically independent from the columns of Ψ.

  5. We then describe the success of OMP as an event in terms of these residuals.

  6. We compute a lower bound on the probability of the event of OMP success.

For a moment suppose that there was no Ψ and OMP is run with y and Φopt as input for K iterations. Naturally OMP will choose K columns in Φopt one by one.

  1. Let the columns it picks up in each step be indexed by ω1,ω2,,ωK.

  2. Let the residuals before each step be q0,q1,q2,,qK1.

  3. Since xC(Φopt), hence the residual after K iterations qK=0.

  4. Since OMP is a deterministic algorithm, hence the two sequences are simply functions of x and Φopt.

  5. Clearly, we can say that the residual qk are stochastically independent of the columns in Ψ (since columns of Ψ are independent of the columns of Φopt).

  6. We also know that qkC(Φopt).

In this thought experiment we made no assumptions about ρ(qk) since Ψ is not present. We now consider the full matrix Φ and execute OMP with y.

  1. The actual sequence of residuals before each step is r0,r1,,rK1.

  2. The actual sequence of column indices is λ1,,λK.

  3. OMP succeeds in recovering x in K steps if and only if it selects the first K columns of Φ in some order.

  4. This can happen if and only if ρ(rk)<1 holds.

  5. We are going to show inductively that this can happen if and only if λk=ωk and qk=rk.

  6. At the beginning of step 1, we have r0=q0=y.

  7. Now OMP selects one column from Φopt if and only if ρ(r0)<1 which is identical to ρ(q0)<1.

  8. So it remains to show at step 1 that λ1=ω1.

  9. Because ρ(r0)<1, the algorithm selects the index λ1 of the column from Φopt whose inner product with r0 is the largest (in absolute value).

  10. Also since ρ(q0)<1 with r0=q0, ω1 is the index of column in Φopt whose inner product with q0 is largest.

  11. Thus ω1=λ1.

  12. We now assume that for the first k iterations, real OMP chooses the same columns as our imaginary thought experiment.

  13. Thus we have

    λj=ωj1jk

    and

    rj=qj0jk.

    This is valid since the residuals at each step depend solely on the set of columns chosen so far and input y which are same for both cases.

  14. OMP chooses a column in Φopt at (k+1)-th step if and only if ρ(rk)<1 which is same as ρ(qk)<1.

  15. Moreover since rk=qk hence the column chosen by maximizing the inner product is same for both situations.

  16. Thus

    λk+1=ωk+1.
  17. Therefore the criteria for success of OMP can be stated as ρ(qk)<1 for all 0kK1.

We now recall that qk is actually a random variable (depending upon the random vectors which comprise the columns of Φopt).

  1. Thus the event on which the algorithm succeeds in sparse recovery of x from y is given by

    Esucc{max0k<Kρ(qk)<1}.
  2. In a particular instance of OMP execution if the event Esucc happens, then OMP successfully recovers x from y.

  3. Otherwise OMP fails.

  4. Hence the probability of success of OMP is same as the probability of event Esucc.

  5. We will be looking for some sort of a lower bound on P(Esucc).

  6. We note that we have {qk} as a sequence of random vectors in the column span of Φopt and they are stochastically independent from columns of Ψ.

  7. It is difficult to compute P(Esucc) directly.

  8. We consider another event

    Γ={σmin(Φopt)0.5}.
  9. Clearly

    P(Esucc)P(max0k<Kρ(qk)<1 and Γ).
  10. Using conditional probability we can rewrite

    P(Esucc)P(max0k<Kρ(qk)<1|Γ)P(Γ).
  11. Since Φ is an admissible matrix hence it satisfies (M3) which gives us

    P(Γ)1exp(cM).
  12. We just need a lower bound on the conditional probability.

  13. We assume that Γ occurs.

  14. For each step index k=0,1,,K1, we have

    ρ(qk)=maxψ|ψ,qk|ΦoptHqk.
  15. Since ΦoptHqkRK, we have

    KΦoptHqkΦoptHqk2.
  16. This gives us

    ρ(qk)Kmaxψ|ψ,qk|ΦoptHqk2.
  17. To simplify this expression, we define a vector

    uk0.5qkΦoptHqk2.
  18. This lets us write

    ρ(qk)2Kmaxψ|ψ,uk|.
  19. Thus

    P(ρ(qk)<1|Γ)P(2Kmaxψ|ψ,uk|<1|Γ)=P(maxψ|ψ,uk|<12K|Γ).
  20. From the basic properties of singular values we recall that

    ΦoptHq2q2σmin(Φopt)0.5

    for all vectors q in the range of Φopt.

  21. This gives us

    0.5q2ΦoptHq21.
  22. Since qk is in the column space of Φopt, for uk defined above we have

    uk21.

    This is valid under the assumption that the event Γ has happened.

  23. From the above we get

    P(maxkρ(qk)<1|Γ)P(maxkmaxψ|ψ,uk|<12K|Γ)
  24. In the R.H.S. we can exchange the order of two maxima. This gives us

    P(maxkmaxψ|ψ,uk|<12K|Γ)=P(maxψmaxk|ψ,uk|<12K|Γ).
  25. We also note that columns of Ψ are independent.

  26. Thus in above we require that for each column of Ψ maxk|ψ,uk|<12K should hold independently.

  27. Hence we can say

    P(maxψmaxk|ψ,uk|<12K|Γ)=ψP(maxk|ψ,uk|<12K|Γ).
  28. We recall that event Γ depends only on columns of Φopt.

  29. Hence columns of Ψ are independent of Γ.

  30. Thus

    ψP(maxk|ψ,uk|<12K|Γ)=ψP(maxk|ψ,uk|<12K).
  31. Since the sequence {uk} depends only on columns of Φopt, hence columns of Ψ are independent of {uk}.

  32. Thus we can take help of (M2) to get

    ψP(maxk|ψ,uk|<12K)(12Kexp(cM4K))NK.
  33. This gives us the lower bound

    P(maxkρ(qk)<1|Γ)(12Kexp(cM4K))NK.
  34. Finally plugging in the lower bound for P(Γ) we get

    (21.6)#P(Esucc)(12Kexp(cM4K))NK(1exp(cM)).

    All that is remaining now is to simplify this expression.

  35. We recall that we assumed in the theorem statement

    MCKln(Nδ)MCKln(Nδ)exp(MCK)NδδNexp(MCK)δNexp(MCK).
  36. But we assumed that 0<δ<1.

  37. Thus

    Nexp(MCK)<1.
  38. If we choose C4c then

    (21.7)#1Cc4MCKcM4Kexp(MCK)exp(cM4K)Nexp(MCK)2Kexp(cM4K)1>δ2Kexp(cM4K)

    where we assumed that N2K.

  39. We recall that

    (1x)k1kx if k1 and x1.
  40. Applying on (21.6) we get

    (21.8)#P(Esucc)12K(NK)exp(cM4K)exp(cM).

    We ignored the 4-th term in this expansion.

  41. Now we can safely assume that K(NK)N24 giving us

    P(Esucc)1N22exp(cM4K)exp(cM).
  42. If we choose C8c then following (21.7) we can get

    Nexp(MCK)Nexp(cM8K)δNexp(cM8K)δ2N2exp(cM4K)1δ221N22exp(cM4K).
  43. Thus

    P(Esucc)1δ22exp(cM).
  44. Some further simplification can give us

    P(Esucc)1δ.
  45. Thus with a suitable choice of the constant C, a choice of MCKln(Nδ) with δ(0,1) is sufficient to reduce the failure probability below δ.

21.3.4. Analysis of OMP using Restricted Isometry Property#

In this subsection we present an alternative analysis of OMP algorithm using the Restricted Isometry Property of the matrix Φ [25].

21.3.4.1. A re-look at the OMP Algorithm#

Before we get into the RIP based analysis of OMP, it would be useful to get some new insights into the behavior of OMP algorithm. These insights will help us a lot in performing the analysis later.

  1. We will assume throughout that whenever |Λ|K, then ΦΛ is full rank.

  2. The pseudo-inverse is given by

    ΦΛ=(ΦΛHΦΛ)1ΦΛH.
  3. The orthogonal projection operator to the column space for ΦΛ is given by

    PΛ=ΦΛΦΛ.
  4. The orthogonal projection operator onto the orthogonal complement of C(ΦΛ) (column space of ΦΛ) is given by

    PΛ=IPΛ.
  5. Both PΛ and PΛ satisfy the standard properties like P=PH and P2=P.

  6. We further define

    ΨΛ=PΛΦ.
  7. We are orthogonalizing the atoms in Φ against C(ΦΛ), i.e. taking the component of the atom which is orthogonal to the column space of ΦΛ.

  8. The atoms in ΨΛ corresponding to the index set Λ would be 0.

  9. We will make some further observations on the behavior of OMP algorithm [25].

  10. Recall that the approximation after the k-th iteration is given by

    xΛkk=ΦΛky and xΛkck=0.
  11. The residual after k-th iteration is given by

    rk=yΦxk

    and by construction rk is orthogonal to ΦΛk.

  12. We can write

    Φxk=ΦΛxΛkk+ΦΛkcxΛck=ΦΛkxΛkk.
  13. Thus,

    rk=yΦΛkxΛkk=yΦΛkΦΛky=(IPΛk)y=PΛky.
  14. In summary

    rk=PΛky.
  15. This shows that it is not actually necessary to compute xk in order to find rk. An equivalent way of writing OMP algorithm could be as in Algorithm 21.2.

Algorithm 21.2 (Sketch of OMP without intermediate xk computation)

Algorithm:

  1. If halting criteria is satisfied, then break.

  2. hk+1ΦHrk # Match

  3. λk+1argmaxjΛk|hjk+1| # Identify

  4. Λk+1Λk{λk+1} # Update support

  5. rk+1PΛk+1y # Update residual

  6. kk+1.

  7. Go back to step 1.

Finalization:

  1. x^ΛkΦΛky.

  2. x^Λkc0.

  1. In the matching step, we are correlating rk with columns of Φ.

  2. Since rk is orthogonal to column space of ΦΛk, hence this correlation is identical to correlating rk with ΨΛk.

  3. To see this, observe that

    rk=PΛky=PΛkPΛky=(PΛk)HPΛky.
  4. Thus,

    hk+1=ΦHrk=ΦH(PΛk)HPΛky=(PΛkΦ)HPΛky=(ΨΛk)Hrk.
  5. On similar lines, we can also see that

    hk+1=ΦHrk=ΦHPΛky=ΦH(PΛk)Hy=(ΨΛk)Hy.
  6. In other words, we have

    (21.9)#hk+1=(ΨΛk)Hrk=(ΨΛk)Hy.
  7. Thus, we can observe that OMP can be further simplified and we don’t even need to compute rk in order to compute hk+1.

  8. There is one catch though. If the halting criterion depends on the need to compute the residual energy, then we certainly need to compute rk. If the halting criteria is simply the number of K iterations, then we don’t need to compute rk.

  9. The revised OMP algorithm sketch is presented in Algorithm 21.3.

Algorithm 21.3 (Sketch of OMP without intermediate xk computation)

Algorithm:

  1. If halting criteria is satisfied, then break.

  2. hk+1(ΨΛk)Hy # Match

  3. λk+1argmaxiΛk|hik+1| # Identify

  4. Λk+1Λk{λk+1} # Update support

  5. kk+1

  6. Go back to step 1.

Finalization:

  1. x^ΛkΦΛky.

  2. x^Λkc0.

With this the OMP algorithm is considerably simplified from the perspective of analyzing its recovery guarantees.

  1. Coming back to hk+1, note that the columns of ΨΛk indexed by Λk are all 0s.

  2. Thus

    hjk+1=0jΛk.
  3. This makes it obvious that λk+1Λ and consequently |Λk|=k (inductively).

  4. Lastly for the case of noise free model y=Φx, we may write

    rk=PΛky=PΛkΦx=ΨΛkx.
  5. Since columns of ΨΛk indexed by Λk are 0, hence when supp(x)Λk, then rk=0.

  6. In this case xk=x exactly since it is a least squares estimate over ΦΛk.

  7. For the same reason, if we construct a vector x~k by zeroing out the entries indexed by Λk i.e.

    (21.10)#x~Λkk=0 and x~Λkc=xΛkc

    then

    (21.11)#rk=ΨΛkx~k.
  8. If x0=K, then x~k0=Kk.

  9. Lastly putting rk back in (21.9), we obtain

    (21.12)#hk+1=(ΨΛk)HΨΛkx~k.
  10. In this version, we see that hk+1 is computed by applying the matrix (ΨΛk)HΨΛk to the (Kk) sparse vector x~k.

  11. We are now ready to carry out RIP based analysis of OMP.

21.3.4.2. RIP based Analysis of OMP#

Our analysis here will focus on the case for real signals and real matrices i.e. ΦRM×N and xRN. We will attack the noise free case.

Some results for matrices that satisfy RIP will be useful in the upcoming analysis. Please refer to Restricted Isometry Property for an extensive treatment of RIP based results.

Theorem 18.64 applies to approximate preservation of the inner product of sparse signals u,vRN.

Let u,vRN and Kmax(u+v0,uv0). Then

(21.13)#|Φu,Φvu,v|δKu2v2.

Theorem 18.66 shows that the matrix ΨΛ also satisfies a modified version of RIP. Let |Λ|<K. Then

(21.14)#(1δK1δK)y22ΨΛy22(1+δK)y22

whenever y0K|Λ| and supp(y)Λ=.

If Φ satisfies RIP of order K, then ΨΛ acts as an approximate isometry on every (K|Λ|)-sparse vector supported on Λc.

From (21.11) recall that the residual vector rk is formed by applying ΨΛk to x~k which is a Kk sparse vector supported on Λkc.

Our interest is in combining above two results and get some bound on the inner products hjk+1. Exactly what kind of bound? When Λk has been identified, our interest is in ensuring that the next index is chosen from the set supp(x)Λk. A useful way to ensure this would be to verify if the entries in hk+1 are close to x~k. If they are, then they would be 0 over Λk , they would be pretty high over supp(x)Λk and lastly, very small over supp(x)c which is what we want.

The next result develops these bounds around (21.12).

Lemma 21.1

Let Λ{1,,N} and suppose x~RN with supp(x~)Λ=. Define

(21.15)#h=ΨΛTΨΛx~.

Then if Φ satisfies the RIP of order Kx~0+|Λ|+1 with isometry constant δK, we have

(21.16)#|hjx~j|δK1δKx~2jΛ.

Note that |Λ| is the number of entries in the the discovered part of the support at any iteration in OMP and x~0 is the number of entries in not yet discovered part of the support.

Proof. .

  1. We have |Λ|<K and x~0<K|Λ|.

  2. Thus, from (21.14), we obtain

    (1δK1δK)x~22ΨΛx~22(1+δK)x~22.
  3. We can make a statement saying ΨΛ satisfies a RIP of order

    (x~0+|Λ|+1)|Λ|=x~0+1

    with a RIP constant δK1δK.

  4. By the definition of h, we have

    hj=ΨΛx~,ΨΛej

    where hj is the j-th entry in h and ej denotes the j-th vector from the identity basis.

  5. We already know that hj=0 for all jΛ.

  6. Consider jΛ and take the two vectors x~ and ej.

  7. We can see that

    x~±ej0x~0+1

    and

    supp(x~±ej)Λ=.
  8. Applying (21.13) on the two vectors with ΨΛ as our RIP matrix, we see that

    |ΨΛx~,ΨΛejx~,ej|δK1δKx~2ej2.
  9. But

    |ΨΛx~,ΨΛejx~,ej|=|hjx~j|.
  10. Noting that ej2=1, we get our desired result.

With this bound in place, we can develop a sufficient condition under which the identification step of OMP (which identifies the new index λk+1) will succeed.

The following corollary establishes a lower bound on the largest entry in x~ which will ensure that OMP indeed chooses the next index λk from the support of x~.

Corollary 21.1

Suppose that Λ, Φ, x~ meet the assumptions in Lemma 21.1, and let h be as defined in (21.15). If

(21.17)#x~>2δK1δKx~2,

we are guaranteed that

argmaxjΛ|hj|supp(x~).

Proof. .

  1. If (21.16) is satisfied, then for indices jsupp(x~), we will have

    |hj|δK1δKx~2.
  2. We already know that hj=0 for all jΛ.

  3. If (21.17) is satisfied, then there exists jsupp(x~) with

    |x~j|>2δK1δKx~2.
  4. For this particular j, applying triangular inequality on (21.16)

    δK1δKx~2|hjx~j||x~j||hj|.
  5. Thus

    |hj||x~j|δK1δKx~2>2δK1δKx~2δK1δKx~2=δK1δKx~2.
  6. We have established that there exists some jsupp(x~) for which

    |hj|>δK1δKx~2

    and for every jsupp(x~)

    |hj|δK1δKx~2.
  7. Together, they establish that OMP will indeed choose an index from the correct set.

All we need to do now is to make sure that (21.17) is satisfied by choosing δK small enough. The following result from [25] guarantees that.

Theorem 21.3

Suppose that Φ satisfies the RIP of order K+1 with isometry constant δ<12K+1. Then for any xRN with x0K, OMP will recover x exactly from y=Φx in K iterations.

The upper bound on δ can be simplified can be simplified as δ<13K.

Proof. The proof works by induction. We show that under the stated conditions, λ1supp(x). Then we show that whenever λksupp(x) then λk+1 also supp(x).

  1. For the first iteration, we have

    h1=ΦTΦy.
  2. Note that Φ=Ψ.

  3. It is given that x0K.

  4. Thus due to Theorem 18.17:

    xx2K.
  5. Now δ<13K or δ<12K+1 implies that

    (21.18)#2δ1δ<1K.
  6. This can be seen as follows. Assuming K1, we have:

    3K2K+113K12K+1δ<12K+12δK+δ<12δK<1δ2δ1δ<1K.
  7. Therefore

    x>2δ1δx2

    and (21.17) is satisfied and λ1 will indeed be chosen from supp(x) due to Corollary 21.1.

  8. We now assume that OMP has correctly discovered indices up to λ1,,λk. i.e.

    Λksupp(x).
  9. We have to show that it will also correctly discover λk+1.

  10. From the definition of x~ in (21.10), we know that supp(x~k)Λk=.

  11. Thus

    x~k0Kk.
  12. We also know that |Λk|=k. By assumption Φ satisfies RIP of order K+1=(Kk)+k+1.

  13. Thus

    K+1x~k0+|Λk|+1.
  14. Also due to Theorem 18.17:

    x~kx~k2Kkx~k2K.
  15. Using (21.18), we get

    x~k>2δ1δx~k2.
  16. This is the sufficient condition for Corollary 21.1 in (21.17) giving us

    λk+1=argmaxjΛk|hjk+1|supp(x~k).
  17. Hence Λk+1supp(x).