18.7. Compressive Sensing#

In this section we formally define the problem of compressive sensing.

Compressive sensing refers to the idea that for sparse or compressible signals, a small number of nonadaptive measurements carry sufficient information to approximate the signal well. In the literature it is also known as compressed sensing and compressive sampling. Different authors seem to prefer different names.

In this section we will represent a signal dictionary as well as its synthesis matrix as D. We recall the definition of sparse signals from Definition 18.5.

A signal xCN is K-sparse in D if there exists a representation a for x which has at most K non-zero entries; i.e.,

x=Da

and

a0K.

The dictionary could be standard basis, Fourier basis, wavelet basis, a wavelet packet dictionary, a multi-ONB or even a randomly generated dictionary.

Real life signals are not sparse, yet they are compressible in the sense that entries in the signal decay rapidly when sorted by magnitude. As a result compressible signals are well approximated by sparse signals. Note that we are talking about the sparsity or compressibility of the signal in a suitable dictionary. Thus we mean that the signal x has a representation a in D in which the coefficients decay rapidly when sorted by magnitude.

18.7.1. Definition#

../_images/srr_cs.png

Definition 18.24 (Compressive sensing)

In compressive sensing, a measurement is a linear functional applied to a signal

y=x,f.

The compressive sensor makes multiple such linear measurements. This can best be represented by the action of a sensing matrix Φ on the signal x given by

y=Φx

where ΦCM×N represents M different measurements made on the signal x by the sensing process. Each row of Φ represents one linear measurement.

The vector yCM is known as measurement vector.

CN forms the signal space while CM forms the measurement space. We also note that above can be written as

y=Φx=ΦDa=(ΦD)a.

It is assumed that the signal x is K-sparse or K-compressible in D and KN.

The objective is to recover x from y given that Φ and D are known.

We do this by first recovering the sparse representation a from y and then computing x=Da.

If MN then the problem is a straight forward least squares problem. So we don’t consider it here.

The more interesting case is when K<MN; i.e., the number of measurements is much less than the dimension of the ambient signal space while more than the sparsity level of signal namely K.

We note that given a is found, finding x is straightforward.
We therefore can remove the dictionary from our consideration and look at the simplified problem given as:

Recover x from y with

y=Φx

where xCN itself is assumed to be K-sparse or K-compressible and ΦCM×N is the sensing matrix.

../_images/gaussian_matrix.png

Fig. 18.13 A Gaussian sensing matrix of shape 64×256. The sensing matrix maps signals from the space R256 to the measurement space R64.#

18.7.2. The Sensing Matrix#

There are two ways to look at the sensing matrix. First view is in terms of its columns

(18.34)#Φ=[ϕ1ϕ2ϕN]

where ϕiCM are the columns of sensing matrix. In this view we see that

y=i=1Nxiϕi;

i.e., y belongs to the column span of Φ and one representation of y in Φ is given by x.

This view looks very similar to a dictionary and its atoms but there is a difference. In a dictionary, we require each atom to be unit norm. We don’t require columns of the sensing matrix Φ to be unit norm.

../_images/guassian_sensing_matrix_histogram.png

Fig. 18.14 A histogram of the norms of the columns of a Gaussian sensing matrix. Although the histogram is strongly concentrated around 1, yet there is significant variation in norm between 0.75 and 1.3. Compare this with the two ortho bases type dictionaries where by design all columns are unit norm.#

The second view of sensing matrix Φ is in terms of its columns. We write

(18.35)#Φ=[f1Hf2HfMH]

where fiCN are conjugate transposes of rows of Φ. This view gives us following expression:

(18.36)#[y1y2yM]=[f1Hf2HfMH]x=[f1Hxf2HxfMHx]=[x,f1x,f2x,fM]

In this view yi is a measurement given by the inner product of x with fi (x,fi=fiHx).

We will call fi as a sensing vector. There are M such sensing vectors in CN comprising Φ corresponding to M measurements in the measurement space CM.

Definition 18.25 (Embedding of a signal)

Given a signal xRN, a vector y=ΦxRM is called an embedding of x in the measurement space RM.

Definition 18.26 (Explanation of a measurement)

A signal xRN is called an explanation of a measurement yRM w.r.t. sensing matrix Φ if y=Φx.

Definition 18.27 (Measurement noise)

In real life, the measurement is not error free. It is affected by the presence of noise during measurement. This is modeled by the following equation:

y=Φx+e

where eRM is the measurement noise.

Example 18.27 (Compressive measurements of a synthetic sparse signal )

In this example we show:

  • A synthetic sparse signal

  • Its measurements using a Gaussian sensing matrix

  • Addition of noise during the measurement process.

../_images/k_sparse_biuniform_signal.png

Fig. 18.15 A 4-sparse signal of length 256. The signal is assumed to be sparse in itself. Thus, we don’t require an orthonormal basis or a dictionary in which the signal has a sparse representation. We have N=256 and K=4.#

../_images/measurement_vector_biuniform.png

Fig. 18.16 Measurements made by a Gaussian sensing matrix y=Φx. The measurements are random. The original sparse structure is not clear by visual inspection.#

../_images/measurement_vector_biuniform_noisy.png

Fig. 18.17 Addition of measurement noise at an SNR of 15dB.#

In Example 18.30, we show the reconstruction of the sparse signal from its measurements.

In the following we present examples of real life problems which can be modeled as compressive sensing problems.

18.7.3. Error Correction in Linear Codes#

The classical error correction problem was discussed in one of the seminal founding papers on compressive sensing [19].

Example 18.28 (Error correction in linear codes as a compressive sensing problem)

Let fRN be a “plaintext” message being sent over a communication channel.

In order to make the message robust against errors in communication channel, we encode the error with an error correcting code.

We consider ARD×N with D>N as a linear code. A is essentially a collection of code words given by

A=[a1a2aN]

where aiRD are the code words.

We construct the “ciphertext”

x=Af

where xRD is sent over the communication channel. Clearly x is a redundant representation of f which is expected to be robust against small errors during transmission.

A is assumed to be full column rank. Thus ATA is invertible and we can easily see that

f=Ax

where

A=(ATA)1AT

is the left pseudo inverse of A.

The communication channel is going to add some error. What we actually receive is

y=x+e=Af+e

where eRD is the error being introduced by the channel.

The least squares solution by minimizing the error 2 norm is given by

f=Ay=A(Af+e)=f+Ae.

Since Ae is usually non-zero (we cannot assume that A will annihilate e), hence f is not an exact replica of f.

What is needed is an exact reconstruction of f. To achieve this, a common assumption in literature is that error vector e is in fact sparse. i.e.

e0KD.

To reconstruct f it is sufficient to reconstruct e since once e is known we can get

x=ye

and from there f can be faithfully reconstructed.

The question is: for a given sparsity level K for the error vector e can one reconstruct e via practical algorithms? By practical we mean algorithms which are of polynomial time w.r.t. the length of “ciphertext” (D).

The approach in [19] is as follows. We construct a matrix FRM×D which can annihilate A; i.e.,

FA=O.

We then apply F to y giving us

y~=F(Af+e)=Fe.

Therefore the decoding problem is reduced to that of reconstructing a sparse vector eRD from the measurements FeRM where we would like to have MD.

With this the problem of finding e can be cast as problem of finding a sparse solution for the under-determined system given by

(18.37)#minimizeeΣKe0subject to y~=Fe.

This now becomes the compressive sensing problem. The natural questions are

  • How many measurements M are necessary (in F) to be able to recover e exactly?

  • How should F be constructed?

  • How do we recover e from y~?

These problems are addressed in following chapters as we discuss sensing matrices and signal recovery algorithms.

18.7.4. Piecewise Cubic Polynomial Signal#

Example 18.29 (Piecewise cubic polynomial signal)

This example was discussed in [18]. Our signal of interest is a piecewise cubic polynomial signal as shown below.

../_images/signal.png

Fig. 18.18 A piecewise cubic polynomials signal#

It has a compressible representation in a wavelet basis.

../_images/representation.png

Fig. 18.19 Compressible representation of signal in wavelet basis#

The representation is described by the equation.

x=Ψα

The chosen basis is a Daubechies wavelet basis Ψ.

../_images/dictionary.png

Fig. 18.20 Daubechies-8 wavelet basis#

In this example N=2048. We have xRN. Ψ is a complete dictionary of size N×N. Thus we have D=N and αRN.

We can sort the wavelet coefficients by magnitude and plot them in descending order to visualize how sparse the representation is.

../_images/representation_sorted.png

Fig. 18.21 Wavelet coefficients sorted by magnitude#

Before making compressive measurements, we need to decide how many compressive measurements will be sufficient?

Closely examining the coefficients in α we can note that max(αi)=78.0546. Further if we put different thresholds over magnitudes of entries in α we can find the number of coefficients higher than different thresholds as listed below.

Entries in wavelet representation of piecewise cubic polynomial signal higher than a threshold

Threshold

Entries higher than threshold

1

129

1E-1

173

1E-2

186

1E-4

197

1E-8

199

1E-12

200

A choice of M=600 looks quite reasonable given the decay of entries in α. Later we shall provide theoretical bounds for choice of M.

A Gaussian random sensing matrix Φ is used to generate the compressed measurements.

../_images/sensing_matrix.png

Fig. 18.22 Gaussian sensing matrix Φ#

The measurement process is described by the equation

y=Φx+e=ΦΨα+e

with xRN, ΦRM×N, and measurement vector yRM. For this example we chose the measurement noise to be e=0.

The compressed measurements are shown below.

../_images/measurements.png

Fig. 18.23 Measurement vector y=Φx+e#

Finally the product of Φ and Ψ given by ΦΨ will be used for actual recovery of sparse representation α from the measurements y.

../_images/recovery_matrix.png

Fig. 18.24 Recovery matrix ΦΨ#

The sparse signal recovery problem is denoted as

α^=recovery(ΦΨ,y,K).

where α^ is a K-sparse approximation of α.

18.7.5. Number of Measurements#

A fundamental question of compressive sensing framework is: How many measurements are necessary to acquire K-sparse signals? By necessary we mean that y carries enough information about x such that x can be recovered from y.

Clearly if M<K then recovery is not possible.

We further note that the sensing matrix Φ should not map two different K-sparse signals to the same measurement vector. Thus we will need M2K and each collection of 2K columns in Φ must be non-singular.

If the K-column sub matrices of Φ are badly conditioned, then it is possible that some sparse signals get mapped to very similar measurement vectors. Then it is numerically unstable to recover the signal. Moreover, if noise is present, stability further degrades.

In [20] Cand`es and Tao showed that the geometry of sparse signals should be preserved under the action of a sensing matrix. In particular the distance between two sparse signals shouldn’t change by much during sensing.

They quantified this idea in the form of a restricted isometric constant of a matrix Φ as the smallest number δK for which the following holds

(18.38)#(1δK)x22Φx22(1+δK)x22x:x0K.

We will study more about this property known as restricted isometry property (RIP) in Restricted Isometry Property. Here we just sketch the implications of RIP for compressive sensing.

When δK<1 then the inequalities imply that every collection of K columns from Φ is non-singular. Since we need every collection of 2K columns to be non-singular, we actually need δ2K<1 which is the minimum requirement for recovery of K sparse signals.

Further if δ2K1 then we note that sensing operator very nearly maintains the 2 distance between any two K sparse signals. As a consequence, it is possible to invert the sensing process stably.

It is now known that many randomly generated matrices have excellent RIP behavior. One can show that if δ2K0.1, then with

M=OKlnaN

measurements, one can recover x with high probability.

Some of the typical random matrices which have suitable RIP properties are

  • Gaussian sensing matrices

  • Partial Fourier matrices

  • Rademacher sensing matrices

18.7.6. Signal Recovery#

The second fundamental problem in compressive sensing is: Given the compressive measurements y how do we recover the signal x? This problem is known as SPARSE-RECOVERY problem.

A simple formulation of the problem as: minimize x0 subject to y=Φx is hopeless since it entails a combinatorial explosion of search space.

Over the years, people have developed a number of algorithms to tackle the sparse recovery problem.

The algorithms can be broadly classified into following categories

  • [Greedy pursuits] These algorithms attempt to build the approximation of the signal iteratively by making locally optimal choices at each step. Examples of such algorithms include OMP (orthogonal matching pursuit), stage-wise OMP, regularized OMP, CoSaMP (compressive sampling pursuit) and IHT (iterative hard thresholding).

  • [Convex relaxation] These techniques relax the 0 “norm” minimization problem into a suitable problem which is a convex optimization problem. This relaxation is valid for a large class of signals of interest. Once the problem has been formulated as a convex optimization problem, a number of solutions are available, e.g. interior point methods, projected gradient methods and iterative thresholding.

  • [Combinatorial algorithms] These methods are based on research in group testing and are specifically suited for situations where highly structured measurements of the signal are taken. This class includes algorithms like Fourier sampling, chaining pursuit, and HHS pursuit.

A major emphasis of these notes will be the study of these sparse recovery algorithms. We shall provide some basic results in this section. We shall work under the following framework in the remainder of this section.

  1. Let xRN be our signal of interest where N is the number of signal components or dimension of the signal space RN.

  2. Let us make M linear measurements of the signal.

  3. The measurements are given by

    y=Φx.
  4. yRM is our measurement vector in the measurement space RM and M is the dimension of our measurement space.

  5. Φ is an M×N matrix known as the sensing matrix.

  6. MN, hence Φ achieves a dimensionality reduction over x.

  7. We assume that measurements are non-adaptive; i.e., the matrix Φ is predefined and doesn’t depend on x.

  8. The recovery process is denoted by

    x=Δy=Δ(Φx)

    where Δ:RMRN is a (usually nonlinear) recovery algorithm.

We will look at three kinds of situations:

  • Signals are truly sparse. A signal has up to K(KN) non-zero values only where K is known in advance. Measurement process is ideal and no noise is introduced during measurement. We will look for guarantees which can ensure exact recovery of signal from M(K<MN) linear measurements.

  • Signals are not truly sparse but they have few K(KN) values which dominate the signal. Thus if we approximate the signal by these K values, then approximation error is not noticeable. We again assume that there is no measurement noise being introduced. When we recover the signal, it will in general not be exact recovery. We expect the recovery error to be bounded (by approximation error). Also in special cases where the signal turns out to be K-sparse, we expect the recovery algorithm to recover the signal exactly. Such an algorithm with bounded recovery error will be called robust.

  • Signals are not sparse. Also there is measurement noise being introduced. We expect recovery algorithm to minimize error and thus perform stable recovery in the presence of measurement noise.

Example 18.30 (Reconstruction of the synthetic sparse signal from its noisy measurements)

Continuing from Example 18.27, we show the approximation reconstruction of the synthetic sparse signal using 3 different algorithms.

  • Basis pursuit (1 minimization) algorithm

  • Matching pursuit algorithm

  • Orthogonal matching pursuit algorithm

The reconstructions are not exact due to the presence of measurement noise.

../_images/cs_l_1_minimization_solution.png

Fig. 18.25 Reconstruction of the sparse signal using 1 minimization (basis pursuit). If we compare with the original signal in Fig. 18.15, we can see that the algorithm has been able to recover the large nonzero entries pretty closely. However, there is still some amount of reconstruction error at other indices outside the support of the original signal due to the presence of measurement noise.#

../_images/cs_matching_pursuit_solution.png

Fig. 18.26 Reconstruction of the sparse signal using matching pursuit. The support of the original signal has been identified correctly but matching pursuit hasn’t been able to get the magnitude of these components correctly. Some energy seems to have been transferred between the nearby second and third nonzero components. This is clear from the large magnitude in the recovery error at the second and third components. The presence of measurement noise affects simple algorithms like matching pursuit severely.#

../_images/cs_orthogonal_matching_pursuit_solution.png

Fig. 18.27 Reconstruction of the sparse signal using orthogonal matching pursuit. The support has been identified correctly and the magnitude of the nonzero components in the reconstructed signal is close to the original signal. The reconstruction error is primarily due to the presence of measurement noise. It is roughly evenly distributed between all the nonzero entries since orthogonal matching pursuit computes a least squares solution over the selected indices in the support.#

18.7.7. Exact Recovery of Sparse Signals#

The null space of a matrix Φ is denoted as

N(Φ)={vRN:Φv=0}.

The set of K-sparse signals is defined as

ΣK={xRN:x0K}.

Example 18.31 (K sparse signals)

Let N=10.

  • x=(1,2,1,1,2,3,4,2,2,2)R10 is not a sparse signal.

  • x=(0,0,0,0,1,0,0,1,0,0)R10 is a 2-sparse signal. Its also a 4 sparse signal.

Lemma 18.2

If a and b are two K sparse signals then ab is a 2K sparse signal.

Proof. (ab)i is non zero only if at least one of ai and bi is non-zero. Hence number of non-zero components of ab cannot exceed 2K. Hence ab is a 2K-sparse signal.

Example 18.32 (Difference of K sparse signals)

Let N = 5.

  • Let a=(0,1,1,0,0) and b=(0,2,0,1,0). Then ab=(0,1,1,1,0) is a 3 sparse as well as 4 sparse signal.

  • Let a=(0,1,1,0,0) and b=(0,2,1,0,0). Then ab=(0,1,2,0,0) is a 2 sparse as well as 4 sparse signal.

  • Let a=(0,1,1,0,0) and b=(0,0,0,1,1). Then ab=(0,1,1,1,1) is a 4 sparse signal.

Definition 18.28 (Unique embedding of a set)

We say that a sensing matrix Φ uniquely embeds a set CRN if for any a,bC, we have

ΦbΦb.

Theorem 18.38 (Unique embeddings of K sparse vectors)

A sensing matrix Φ uniquely embeds every xΣK if and only if N(Φ)Σ2K=; i.e., N(Φ) contains no vectors in Σ2K.

Proof. We first show that the difference of sparse signals is not in the nullspace.

  1. Let a and b be two K sparse signals.

  2. Then Φa and Φb are corresponding measurements.

  3. Now if Φ provides unique embedding of all K sparse signals, then ΦaΦb.

  4. Thus Φ(ab)0.

  5. Thus abN(Φ).

We show the converse by contradiction.

  1. Let xN(Φ)Σ2K.

  2. Thus Φx=0 and x02K.

  3. Then we can find y,zΣK such that x=zy.

  4. Thus there exists mRM such that m=Φz=Φy.

  5. But then, Φ doesn’t uniquely embed y,zΣK.

There are equivalent ways of characterizing this condition. In the following, we present a condition based on spark.

18.7.7.1. Spark#

We recall from Definition 18.17, that spark of a matrix Φ is defined as the minimum number of columns which are linearly dependent.

Theorem 18.39 (Unique explanations and spark)

For any measurement yRM, there exists at most one signal xΣK such that y=Φx if and only if spark(Φ)>2K.

Proof. We need to show

  • If for every measurement, there is only one K-sparse explanation, then spark(Φ)>2K.

  • If spark(Φ)>2K then for every measurement, there is only one K-sparse explanation.

Assume that for every yRM there exists at most one signal xΣK such that y=Φx.

  1. Now assume that spark(Φ)2K.

  2. Thus there exists a set of at most 2K columns which are linearly dependent.

  3. Thus there exists vΣ2K such that Φv=0.

  4. Thus vN(Φ).

  5. Thus Σ2KN(Φ).

  6. Hence Φ doesn’t uniquely embed each signal xΣK. A contradiction.

  7. Hence spark(Φ)>2K.

Now suppose that spark(Φ)>2K.

  1. Assume that for some y there exist two different K-sparse explanations x,x such that y=Φx=Φx.

  2. Then Φ(xx)=0.

  3. Thus xxN(Φ) and xxΣ2K.

  4. Hence, there exists a set of at most 2K columns in Φ which is linearly dependent.

  5. Thus spark(Φ)2K. A contradiction.

  6. Hence, for every yRM, there exists at most one xΣK.

Since spark(Φ)[2,M+1] and we require that spark(Φ)>2K hence we require that M2K.

18.7.8. Recovery of Approximately Sparse Signals#

Spark is a useful criteria for characterization of sensing matrices for truly sparse signals. But this doesn’t work well for approximately sparse signals. We need to have more restrictive criteria on Φ for ensuring recovery of approximately sparse signals from compressed measurements.

In this context we will deal with two types of errors:

Approximation error

  • Let us approximate a signal x using only K coefficients.

  • Let us call the approximation as x^.

  • Thus ea=(xx^) is approximation error.

Recovery error

  • Let Φ be a sensing matrix.

  • Let Δ be a recovery algorithm.

  • Then x=Δ(Φx) is the recovered signal vector.

  • The error er=(xx) is recovery error.

Ideally, the recovery error should not be too large compared to the approximation error.

In this following we will

  • Formalize the notion of null space property (NSP) of a matrix Φ.

  • Describe a measure for performance of an arbitrary recovery algorithm Δ.

  • Establish the connection between NSP and performance guarantee for recovery algorithms.

Suppose we approximate x by a K-sparse signal x^ΣK, then the minimum error under p norm is given by

σK(x)p=minx^ΣKxx^p.

One specific x^ΣK for which this minimum is achieved is the best K-term approximation (Theorem 18.18).

In the following, we will need some additional notation.

  1. Let I={1,2,,N} be the set of indices for signal xRN.

  2. Let ΛI be a subset of indices.

  3. Let Λc=IΛ.

  4. xΛ will denote a signal vector obtained by setting the entries of x indexed by Λc to zero.

Example 18.33

  1. Let N = 4.

  2. Then I={1,2,3,4}.

  3. Let Λ={1,3}.

  4. Then Λc={2,4}.

  5. Now let x=(1,1,2,4).

  6. Then xΛ=(1,0,2,0).

ΦΛ will denote a M×N matrix obtained by setting the columns of Φ indexed by Λc to zero.

Example 18.34

  1. Let N = 4.

  2. Then I={1,2,3,4}.

  3. Let Λ={1,3}.

  4. Then Λc={2,4}.

  5. Now let x=(1,1,2,4).

  6. Then xΛ=(1,0,2,4).

  7. Now let

    Φ=(10111223).
  8. Then

    ΦΛ=(10101020).

18.7.8.1. Null Space Property#

Definition 18.29 (Null space property)

A matrix Φ satisfies the null space property (NSP) of order K if there exists a constant C>0 such that,

hΛ2ChΛc1K

holds for every hN(Φ) and for every Λ such that |Λ|K.

  • Let h be K sparse. Thus choosing the indices on which h is non-zero, I can construct a Λ such that |Λ|K and hΛc=0. Thus hΛc1 = 0. Hence above condition is not satisfied. Thus such a vector h should not belong to N(Φ) if Φ satisfies NSP.

  • Essentially vectors in N(Φ) shouldn’t be concentrated in a small subset of indices.

  • If Φ satisfies NSP then the only K-sparse vector in N(Φ) is h=0.

18.7.8.2. Measuring the Performance of a Recovery Algorithm#

Let Δ:RMRN represent a recovery method to recover approximately sparse x from y.

2 recovery error is given by

Δ(Φx)x2.

The 1 error for K-term approximation is given by σK(x)1.

We will be interested in guarantees of the form

(18.39)#Δ(Φx)x2CσK(x)1K.

Why, this recovery guarantee formulation?

  • Exact recovery of K-sparse signals. σK(x)1=0 if xΣK.

  • Robust recovery of non-sparse signals

  • Recovery dependent on how well the signals are approximated by K-sparse vectors.

  • Such guarantees are known as instance optimal guarantees.

  • Also known as uniform guarantees.

Why the specific choice of norms?

  • Different choices of p norms lead to different guarantees.

  • 2 norm on the LHS is a typical least squares error.

  • 2 norm on the RHS will require prohibitively large number of measurements.

  • 1 norm on the RHS helps us keep the number of measurements less.

If an algorithm Δ provides instance optimal guarantees as defined above, what kind of requirements does it place on the sensing matrix Φ?

We show that NSP of order 2K is a necessary condition for providing uniform guarantees.

18.7.8.3. NSP and Instance Optimal Guarantees#

Theorem 18.40 (NSP and instance optimal guarantees)

Let Φ:RNRM denote a sensing matrix and Δ:RMRN denote an arbitrary recovery algorithm. If the pair (Φ,Δ) satisfies instance optimal guarantee (18.39), then Φ satisfies NSP of the order 2K.

Proof. We are given that

  • (Φ,Δ) form an encoder-decoder pair.

  • Together, they satisfy instance optimal guarantee (18.39).

  • Thus they are able to recover all sparse signals exactly.

  • For non-sparse signals, they are able to recover their K-sparse approximation with bounded recovery error.

We need to show that if hN(Φ), then h satisfies

hΛ2ChΛc12K

where Λ corresponds to 2K largest magnitude entries in h. Note that we have used 2K in this expression, since we need to show that Φ satisfies NSP of order 2K.

  1. Let hN(Φ).

  2. Let Λ be the indices corresponding to the 2K largest entries of h.

  3. Then

    h=hΛ+hΛc.
  4. Split Λ into Λ0 and Λ1 such that |Λ0|=|Λ1|=K.

  5. We have

    hΛ=hΛ0+hΛ1.
  6. Let

    x=hΛ0+hΛc.
  7. Let

    x=hΛ1.
  8. Then

    h=xx.
  9. By assumption hN(Φ).

  10. Thus

    Φh=Φ(xx)=0Φx=Φx.
  11. But since xΣK (recall that Λ1 indexes only K entries) and Δ is able to recover all K-sparse signals exactly, hence

    x=Δ(Φx).
  12. Thus

    Δ(Φx)=Δ(Φx)=x;

    i.e., the recovery algorithm Δ recovers x for the signal x.

  13. Certainly x is not K-sparse since Δ recovers every K-sparse signal uniquely.

  14. Hence Λc must be nonempty.

  15. Finally we also have

    hΛ2h2=xx2=xΔ(Φx)2

    since h contains some additional non-zero entries.

  16. But as per instance optimal recovery guarantee (18.39) for (Φ,Δ) pair, we have

    Δ(Φx)x2CσK(x)1K.
  17. Thus

    hΛ2CσK(x)1K.
  18. But

    σK(x)1=minx^ΣKxx^1.
  19. Recall that x=hΛ0+hΛc where Λ0 indexes K entries of h which are (magnitude wise) larger than all entries indexed by Λc.

  20. Hence the best 1-norm K term approximation of x is given by hΛ0.

  21. Hence

    σK(x)1=hΛc1.
  22. Thus we finally have

    hΛ2ChΛc1K=2ChΛc12KhN(Φ).
  23. Thus Φ satisfies the NSP of order 2K.

It turns out that NSP of order 2K is also sufficient to establish a guarantee of the form above for a practical recovery algorithm.

18.7.9. Recovery in Presence of Measurement Noise#

Measurement vector in the presence of noise is given by

y=Φx+e

where e is the measurement noise or error. e2 is the 2 size of measurement error.

Recovery error as usual is given by

Δ(y)x2=Δ(Φx+e)x2.

Stability of a recovery algorithm is characterized by comparing variation of recovery error w.r.t. measurement error.

NSP is both necessary and sufficient for establishing guarantees of the form:

Δ(Φx)x2CσK(x)1K.

These guarantees do not account for presence of noise during measurement.

We need stronger conditions for handling noise. The restricted isometry property for sensing matrices comes to our rescue.

18.7.9.1. Restricted Isometry Property#

We recall that a matrix Φ satisfies the restricted isometry property (RIP) of order K if there exists δK(0,1) such that

(1δK)x22Φx22(1+δK)x22

holds for every xΣK={x|x0K}.

  • If a matrix satisfies RIP of order K, then we can see that it approximately preserves the size of a K-sparse vector.

  • If a matrix satisfies RIP of order 2K, then we can see that it approximately preserves the distance between any two K-sparse vectors since difference vectors would be 2K sparse (see Theorem 18.49) .

  • We say that the matrix is nearly orthonormal for sparse vectors.

  • If a matrix satisfies RIP of order K with a constant δK, it automatically satisfies RIP of any order K<K with a constant δKδK.

18.7.9.2. Stability#

Informally a recovery algorithm is stable if recovery error is small in the presence of small measurement noise.

Is RIP necessary and sufficient for sparse signal recovery from noisy measurements? Let us look at the necessary part.

We will define a notion of stability of the recovery algorithm.

Definition 18.30 (C stable encoder-decoder pair)

Let Φ:RNRM be a sensing matrix and Δ:RMRN be a recovery algorithm. We say that the pair (Φ,Δ) is C-stable if for any xΣK and any eRM we have that

Δ(Φx+e)x2Ce2.
  • Error is added to the measurements.

  • LHS is 2 norm of recovery error.

  • RHS consists of scaling of the 2 norm of measurement error.

  • The definition says that recovery error is bounded by a multiple of the measurement error.

  • Thus adding a small amount of measurement noise shouldn’t be causing arbitrarily large recovery error.

It turns out that C-stability requires Φ to satisfy RIP.

Theorem 18.41 (Necessity of RIP for C-stability)

If a pair (Φ,Δ) is C-stable then

1Cx2Φx2

for all xΣ2K.

Proof. Remember that any xΣ2K can be written in the form of x=yz where y,zΣK.

  1. Let xΣ2K.

  2. Split it in the form of x=yz with y,zΣK.

  3. Define

    ey=Φ(zy)2andez=Φ(yz)2.
  4. Thus

    eyez=Φ(zy)Φy+ey=Φz+ez.
  5. We have

    Φy+ey=Φz+ez=Φ(y+z)2.
  6. Also we have

    ey2=ez2=Φ(yz)22=Φx22.
  7. Let

    y=Δ(Φy+ey)=Δ(Φz+ez).
  8. Since (Φ,Δ) is C-stable, hence we have

    yy2Cey2.
  9. Also

    yz2Cez2.
  10. Using the triangle inequality

    x2=yz2=yy+yz2yy2+yz2Cey2+Cez2=C(ey2+ez2)=CΦx2.
  11. Thus we have for every xΣ2K

    1Cx2Φx2.

This theorem gives us the lower bound for RIP property of order 2K in (18.38) with δ2K=11C2 as a necessary condition for C-stable recovery algorithms.

Note that smaller the constant C, lower is the bound on recovery error (w.r.t. measurement error). But as C1, δ2K0, thus reducing the impact of measurement noise requires sensing matrix Φ to be designed with tighter RIP constraints.

This result doesn’t require an upper bound on the RIP property in (18.38).

It turns out that If Φ satisfies RIP, then this is also sufficient for a variety of algorithms to be able to successfully recover a sparse signal from noisy measurements. We will discuss this later.

18.7.9.3. Measurement Bounds#

As stated in previous section, for a (Φ,Δ) pair to be C-stable we require that Φ satisfies RIP of order 2K with a constant δ2K. Let us ignore δ2K for the time being and look at relationship between M, N and K. We have a sensing matrix Φ of size M×N and expect it to provide RIP of order 2K. How many measurements M are necessary? We will assume that K<N/2. This assumption is valid for approximately sparse signals.

Before we start figuring out the bounds, let us develop a special subset of ΣK sets. Consider the set

U={x{0,+1,1}N|x0=K}.

When we say x0=K, we mean that exactly K terms in each member of U can be non-zero (i.e. 1 or +1).

Hence U is a set of signal vectors x of length N where each sample takes values from {0,+1,1} and number of allowed non-zero samples is fixed at K. An example below explains it further.

Example 18.35 (U for N=6 and K=2)

Each vector in U will have 6 elements out of which 2 can be non zero. There are (62) ways of choosing the non-zero elements. Some of those sets are listed below as examples:

(+1,+1,0,0,0,0)(+1,1,0,0,0,0)(0,1,0,+1,0,0)(0,1,0,+1,0,0)(0,0,0,0,1,+1)(0,0,1,1,0,0).

Revisiting

U={x{0,+1,1}N|x0=K}

It is now obvious that

x22=KxU.

Since there are (NK) ways of choosing K non-zero elements and each non zero element can take either of the two values +1 or 1, hence the cardinality of set U is given by:

|U|=(NK)2K.

By definition

UΣK.

Further Let x,yU.
Then xy will have a maximum of 2K non-zero elements. The non-zero elements would have values in {2,1,1,2}. Thus xy0=R2K. Further xy22R. Hence

xy0xy22x,yU.

We now state a result which will help us in getting to the bounds.

Lemma 18.3

Let K and N satisfying K<N2 be given. There exists a set XΣK such that for any xX we have x2K and for any x,yX with xy,

xy2K2

and

ln|X|K2ln(NK).

Proof. We just need to find one set X which satisfies the requirements of this lemma. We have to construct a set X such that

  1. x2KxX.

  2. xy2K2 for every bx,yX.

  3. ln|X|K2ln(NK) or equivalently |X|(NK)K2.

First condition states that the set X lies in the intersection of ΣK and the closed ball B[0,K]. Second condition states that the points in X are sufficiently distant from each other. Third condition states that there are at least a certain number of points in X.

We will construct X by picking vectors from U. Thus XU.

  1. Since xXU hence x2=KKxX.

  2. Consider any fixed xU.

  3. How many elements y are there in U such that xy22<K2?

  4. Define

    Ux2={yU|xy22<K2}.
  5. Clearly by requirements in the lemma, if xX then Ux2X=; i.e., no vector in Ux2 belongs to X.

  6. How many elements are there in Ux2? Let us find an upper bound.

  7. x,yU we have xy0xy22.

  8. If x and y differ in K2 or more places, then naturally xy22K2.

  9. Hence if xy22<K2 then xy0<K2 hence xy0K2 for any x,yUx2.

  10. So define

    Ux0={yU|xy0K2}.
  11. We have

    Ux2Ux0.
  12. Thus we have an upper bound given by

    |Ux2||Ux0|.
  13. Let us look at Ux0 carefully.

  14. We can choose K2 indices where x and y may differ in (NK2) ways.

  15. At each of these K2 indices, yi can take value as one of (0,+1,1).

  16. Thus we have an upper bound

    |Ux2||Ux0|(NK2)3K2.
  17. We now describe an iterative process for building X from vectors in U.

  18. Say we have added j vectors to X namely x1,x2,,xj.

  19. Then

    (Ux12Ux22Uxj2)X=.
  20. Number of vectors in Ux12Ux22Uxj2 is bounded by j(NK2)3K2.

  21. Thus we have at least

    (NK)2Kj(NK2)3K2

    vectors left in U to choose from for adding in X.

  22. We can keep adding vectors to X till there are no more suitable vectors left.

  23. We can construct a set of size |X| provided

    (18.40)#|X|(NK2)3K2(NK)2K
  24. Now

    (NK)(NK2)=(K2)!(NK2)!K!(NK)!=i=1K2NK+iK/2+i.
  25. Note that NK+iK/2+i is a decreasing function of i.

  26. Its minimum value is achieved for i=K2 as (NK12).

  27. So we have

    NK+iK/2+iNK12i=1K2NK+iK/2+i(NK12)K2(NK)(NK2)(NK12)K2
  28. Rephrasing the bound on |X in (18.40) we have

    |X|(34)K2(NK)(NK2)
  29. Hence we can definitely construct a set X with the cardinality satisfying

    |X|(34)K2(NK12)K2.
  30. Now it is given that K<N2. So we have:

    K<N2NK>2N4K>12NKN4K<NK123N4K<NK12(3N4K)K2<(NK12)K2
  31. Thus we have

    (NK)K2(34)K2<(NK)(NK2)
  32. Choose

    |X|=(NK)K2
  33. Clearly this value of |X| satisfies (18.40).

  34. Hence X can have at least these many elements.

  35. Thus

    |X|(NK)K2ln|X|K2ln(NK)

    which completes the proof.

We can now establish following bound on the required number of measurements to satisfy RIP.

At this moment, we won’t worry about exact value of δ2K. We will just assume that δ2K is small in range (0,12].

Theorem 18.42 (Minimum number of required measurements for RIP of order 2K)

Let Φ be an M×N matrix that satisfies RIP of order 2K with constant δ2K(0,12]. Then

MCKln(NK)

where

C=12ln(24+1)0.28173.

Proof. Since Φ satisfies RIP of order 2K we have

(1δ2K)x22Φx22(1+δ2K)x22xΣ2K(1δ2K)xy22ΦxΦy22(1+δ2K)xy22x,yΣK.

Also

δ2K121δ2K12 and 1+δ2K32.

Consider the set XUΣK developed in Lemma 18.3. We have

xy22K2x,yX(1δ2K)xy22K4ΦxΦy22K4ΦxΦy2K4x,yX.

Also

Φx22(1+δ2K)x2232x22xXΣKΣ2KΦx232x23K2xX

since x2KxX. So we have a lower bound:

(18.41)#ΦxΦy2K4x,yX.

and an upper bound:

(18.42)#Φx23K2xX.

What do these bounds mean? Let us start with the lower bound.

Φx and Φy are projections of x and y in RM (measurement space).

Construct 2 balls of radius K4/2=K16 in RM around Φx and Φy.

Lower bound says that these balls are disjoint. Since x,y are arbitrary, this applies to every xX.

Upper bound tells us that all vectors Φx lie in a ball of radius 3K2 around origin in RM.

Thus the set of all balls lies within a larger ball of radius 3K2+K16 around origin in RM.

So we require that the volume of the larger ball MUST be greater than the sum of volumes of |X| individual balls.

Since volume of an 2 ball of radius r is proportional to rM, we have:

(3K2+K16)M|X|(K16)M.(24+1)M|X|Mln|X|ln(24+1)

Again from Lemma 18.3 we have

ln|X|K2ln(NK).

Putting back we get

MK2ln(NK)ln(24+1)

which establishes a lower bound on the number of measurements M.

Example 18.36 (Lower bounds on M for RIP of order 2K)

  • N=1000,K=100M65.

  • N=1000,K=200M91.

  • N=1000,K=400M104.

Some remarks are in order:

  • The theorem only establishes a necessary lower bound on M. It doesn’t mean that if we choose an M larger than the lower bound then Φ will have RIP of order 2K with any constant δ2K(0,12].

  • The restriction δ2K12 is arbitrary and is made for convenience. In general, we can work with 0<δ2Kδmax<1 and develop the bounds accordingly.

  • This result fails to capture dependence of M on the RIP constant δ2K directly. Johnson-Lindenstrauss lemma helps us resolve this which concerns embeddings of finite sets of points in low-dimensional spaces.

  • We haven’t made significant efforts to optimize the constants. Still they are quite reasonable.

18.7.10. RIP and NSP#

RIP and NSP are connected. If a matrix Φ satisfies RIP then it also satisfies NSP (under certain conditions).

Thus RIP is strictly stronger than NSP (under certain conditions).

We will need following lemma which applies to any arbitrary hRN. The lemma will be proved later.

Lemma 18.4

Suppose that Φ satisfies RIP of order 2K. Let hRN,h0 be arbitrary. Let Λ0 be any subset of {1,2,,N} such that |Λ0|K.

Define Λ1 as the index set corresponding to the K entries of hΛ0c with largest magnitude, and set Λ=Λ0Λ1. Then

hΛ2αhΛ0c1K+β|ΦhΛ,Φh|hΛ2,

where

α=2δ2K1δ2K,β=11δ2K.

Let us understand this lemma a bit. If hN(Φ), then the lemma simplifies to

hΛ2αhΛ0c1K
  • Λ0 maps to the initial few (K or less) elements we chose.

  • Λ0c maps to all other elements.

  • Λ1 maps to largest (in magnitude) K elements of Λ0c.

  • hΛ contains a maximum of 2K non-zero elements.

  • Φ satisfies RIP of order 2K.

  • Thus

    (1δ2K)hΛ2ΦhΛ2(1+δ2K)hΛ2.

We now state the connection between RIP and NSP.

Theorem 18.43

Suppose that Φ satisfies RIP of order 2K with δ2K<21. Then Φ satisfies the NSP of order 2K with constant

C=2δ2K1(1+2)δ2K

Proof. We are given that

(1δ2K)x22Φx22(1+δ2K)x22

holds for all xΣ2K where δ2K<21. We have to show that:

hΛ2ChΛc1K

holds hN(Φ) and Λ such that |Λ|2K.

  1. Let hN(Φ).

  2. Then Φh=0.

  3. Let Λm denote the 2K largest entries of h.

  4. Then

    hΛ2hΛm2Λ||Λ|2K.
  5. Similarly

    hΛc1hΛmc1Λ||Λ|2K.
  6. Thus if we show that Φ satisfies NSP of order 2K for Λm; i.e.,

    hΛm2ChΛmc1K

    then we would have shown it for all Λ such that |Λ|2K.

  7. Hence let Λ=Λm.

  8. We can divide Λ into two components Λ0 and Λ1 of size K each.

  9. Since Λ maps to the largest 2K entries in h, hence whatever entries we choose in Λ0, the largest K entries in Λ0c will be Λ1.

  10. Hence as per Lemma 18.4 above, we have

    hΛ2αhΛ0c1K.
  11. Also

    Λ=Λ0Λ1Λ0=ΛΛ1=ΛΛ1cΛ0c=Λ1Λc.
  12. Thus we have

    hΛ0c1=hΛ11+hΛc1.
  13. We have to get rid of Λ1.

  14. Since hΛ1ΣK, by applying Theorem 18.17 we get

    hΛ11KhΛ12
  15. Hence

    hΛ2α(hΛ12+hΛc1K)
  16. But since Λ1Λ, hence hΛ12hΛ2.

  17. Hence

    hΛ2α(hΛ2+hΛc1K)(1α)hΛ2αhΛc1KhΛ2α1αhΛc1K if α1.
  18. Note that the inequality is also satisfied for α=1 in which case, we don’t need to bring 1α to denominator.

  19. Now

    α12δ2K1δ2K12δ2K1δ2K(2+1)δ2K1δ2K21.
  20. Putting

    C=α1α=2δ2K1(1+2)δ2K

    we see that Φ satisfies NSP of order 2K whenever Φ satisfies RIP of order 2K with δ2K21.

Note that for δ2K=21, C=.