Compressive Sensing
Contents
18.7. Compressive Sensing#
In this section we formally define the problem of compressive sensing.
Compressive sensing refers to the idea that for sparse or compressible signals, a small number of nonadaptive measurements carry sufficient information to approximate the signal well. In the literature it is also known as compressed sensing and compressive sampling. Different authors seem to prefer different names.
In this section we will represent a signal dictionary
as well as its synthesis matrix as
A signal
and
The dictionary could be standard basis, Fourier basis, wavelet basis, a wavelet packet dictionary, a multi-ONB or even a randomly generated dictionary.
Real life signals are not sparse, yet they are compressible in the sense that
entries in the signal decay rapidly when sorted by magnitude.
As a result compressible signals are well
approximated by sparse signals.
Note that we are talking about the sparsity or compressibility
of the signal in a suitable dictionary.
Thus we mean that the signal
18.7.1. Definition#
(Compressive sensing)
In compressive sensing, a measurement is a linear functional applied to a signal
The compressive sensor makes multiple such linear measurements.
This can best be represented by the action of a sensing matrix
where
The vector
It is assumed that the signal
The objective is to recover
We do this by first recovering the sparse representation
If
The more interesting case is when
We note that given
We therefore can remove the dictionary from our consideration and look at
the simplified problem given as:
Recover
where
18.7.2. The Sensing Matrix#
There are two ways to look at the sensing matrix. First view is in terms of its columns
where
i.e.,
This view looks very similar to a dictionary and its atoms but there is a difference.
In a dictionary, we require each atom to be unit norm.
We don’t require columns of the sensing matrix
The second view of sensing matrix
where
In this view
We will call
(Embedding of a signal)
Given a signal
(Explanation of a measurement)
A signal
(Measurement noise)
In real life, the measurement is not error free. It is affected by the presence of noise during measurement. This is modeled by the following equation:
where
(Compressive measurements of a synthetic sparse signal )
In this example we show:
A synthetic sparse signal
Its measurements using a Gaussian sensing matrix
Addition of noise during the measurement process.
In Example 18.30, we show the reconstruction of the sparse signal from its measurements.
In the following we present examples of real life problems which can be modeled as compressive sensing problems.
18.7.3. Error Correction in Linear Codes#
The classical error correction problem was discussed in one of the seminal founding papers on compressive sensing [19].
(Error correction in linear codes as a compressive sensing problem)
Let
In order to make the message robust against errors in communication channel, we encode the error with an error correcting code.
We consider
where
We construct the “ciphertext”
where
where
is the left pseudo inverse of
The communication channel is going to add some error. What we actually receive is
where
The least squares solution by minimizing the error
Since
What is needed is an exact reconstruction of
To reconstruct
and from there
The question is: for a given sparsity level
The approach in [19] is as follows.
We construct a matrix
We then apply
Therefore the decoding problem is reduced to that of reconstructing
a sparse vector
With this the problem of finding
This now becomes the compressive sensing problem. The natural questions are
How many measurements
are necessary (in ) to be able to recover exactly?How should
be constructed?How do we recover
from ?
These problems are addressed in following chapters as we discuss sensing matrices and signal recovery algorithms.
18.7.4. Piecewise Cubic Polynomial Signal#
(Piecewise cubic polynomial signal)
This example was discussed in [18]. Our signal of interest is a piecewise cubic polynomial signal as shown below.
It has a compressible representation in a wavelet basis.
The representation is described by the equation.
The chosen basis is a Daubechies wavelet basis
In this example
We can sort the wavelet coefficients by magnitude and plot them in descending order to visualize how sparse the representation is.
Before making compressive measurements, we need to decide how many compressive measurements will be sufficient?
Closely examining the coefficients in
Entries in wavelet representation of piecewise cubic polynomial signal higher than a threshold
Threshold |
Entries higher than threshold |
---|---|
1 |
129 |
1E-1 |
173 |
1E-2 |
186 |
1E-4 |
197 |
1E-8 |
199 |
1E-12 |
200 |
A choice of
A Gaussian random sensing matrix
The measurement process is described by the equation
with
The compressed measurements are shown below.
Finally the product of
The sparse signal recovery problem is denoted as
where
18.7.5. Number of Measurements#
A fundamental question of compressive sensing framework is:
How many measurements are necessary to acquire
Clearly if
We further note that the sensing matrix
If the
In [20] Cand`es and Tao showed that the geometry of sparse signals should be preserved under the action of a sensing matrix. In particular the distance between two sparse signals shouldn’t change by much during sensing.
They quantified this idea in the form of a restricted isometric constant of a matrix
We will study more about this property known as restricted isometry property (RIP) in Restricted Isometry Property. Here we just sketch the implications of RIP for compressive sensing.
When
Further if
It is now known that many randomly generated matrices have excellent RIP behavior.
One can show that if
measurements, one can recover
Some of the typical random matrices which have suitable RIP properties are
Gaussian sensing matrices
Partial Fourier matrices
Rademacher sensing matrices
18.7.6. Signal Recovery#
The second fundamental problem in compressive sensing is:
Given the compressive measurements
A simple formulation of the problem as:
minimize
Over the years, people have developed a number of algorithms to tackle the sparse recovery problem.
The algorithms can be broadly classified into following categories
[Greedy pursuits] These algorithms attempt to build the approximation of the signal iteratively by making locally optimal choices at each step. Examples of such algorithms include OMP (orthogonal matching pursuit), stage-wise OMP, regularized OMP, CoSaMP (compressive sampling pursuit) and IHT (iterative hard thresholding).
[Convex relaxation] These techniques relax the
“norm” minimization problem into a suitable problem which is a convex optimization problem. This relaxation is valid for a large class of signals of interest. Once the problem has been formulated as a convex optimization problem, a number of solutions are available, e.g. interior point methods, projected gradient methods and iterative thresholding.[Combinatorial algorithms] These methods are based on research in group testing and are specifically suited for situations where highly structured measurements of the signal are taken. This class includes algorithms like Fourier sampling, chaining pursuit, and HHS pursuit.
A major emphasis of these notes will be the study of these sparse recovery algorithms. We shall provide some basic results in this section. We shall work under the following framework in the remainder of this section.
Let
be our signal of interest where is the number of signal components or dimension of the signal space .Let us make
linear measurements of the signal.The measurements are given by
is our measurement vector in the measurement space and is the dimension of our measurement space. is an matrix known as the sensing matrix. , hence achieves a dimensionality reduction over .We assume that measurements are non-adaptive; i.e., the matrix
is predefined and doesn’t depend on .The recovery process is denoted by
where
is a (usually nonlinear) recovery algorithm.
We will look at three kinds of situations:
Signals are truly sparse. A signal has up to
non-zero values only where is known in advance. Measurement process is ideal and no noise is introduced during measurement. We will look for guarantees which can ensure exact recovery of signal from linear measurements.Signals are not truly sparse but they have few
values which dominate the signal. Thus if we approximate the signal by these values, then approximation error is not noticeable. We again assume that there is no measurement noise being introduced. When we recover the signal, it will in general not be exact recovery. We expect the recovery error to be bounded (by approximation error). Also in special cases where the signal turns out to be -sparse, we expect the recovery algorithm to recover the signal exactly. Such an algorithm with bounded recovery error will be called robust.Signals are not sparse. Also there is measurement noise being introduced. We expect recovery algorithm to minimize error and thus perform stable recovery in the presence of measurement noise.
(Reconstruction of the synthetic sparse signal from its noisy measurements)
Continuing from Example 18.27, we show the approximation reconstruction of the synthetic sparse signal using 3 different algorithms.
Basis pursuit (
minimization) algorithmMatching pursuit algorithm
Orthogonal matching pursuit algorithm
The reconstructions are not exact due to the presence of measurement noise.
18.7.7. Exact Recovery of Sparse Signals#
The null space of a matrix
The set of
(K sparse signals)
Let
is not a sparse signal. is a 2-sparse signal. Its also a 4 sparse signal.
If
Proof.
(Difference of K sparse signals)
Let N = 5.
Let
and . Then is a 3 sparse as well as 4 sparse signal.Let
and . Then is a 2 sparse as well as 4 sparse signal.Let
and . Then is a 4 sparse signal.
(Unique embedding of a set)
We say that a sensing matrix
A sensing matrix
Proof. We first show that the difference of sparse signals is not in the nullspace.
Let
and be two sparse signals.Then
and are corresponding measurements.Now if
provides unique embedding of all sparse signals, then .Thus
.Thus
.
We show the converse by contradiction.
Let
.Thus
and .Then we can find
such that .Thus there exists
such that .But then,
doesn’t uniquely embed .
There are equivalent ways of characterizing this condition. In the following, we present a condition based on spark.
18.7.7.1. Spark#
We recall from Definition 18.17, that spark of a matrix
(Unique explanations and spark)
For any measurement
Proof. We need to show
If for every measurement, there is only one
-sparse explanation, then .If
then for every measurement, there is only one -sparse explanation.
Assume that for every
Now assume that
.Thus there exists a set of at most
columns which are linearly dependent.Thus there exists
such that .Thus
.Thus
.Hence
doesn’t uniquely embed each signal . A contradiction.Hence
.
Now suppose that
Assume that for some
there exist two different -sparse explanations such that .Then
.Thus
and .Hence, there exists a set of at most
columns in which is linearly dependent.Thus
. A contradiction.Hence, for every
, there exists at most one .
Since
18.7.8. Recovery of Approximately Sparse Signals#
Spark is a useful criteria for characterization of sensing matrices for truly sparse signals.
But this doesn’t work well for approximately sparse signals.
We need to have more restrictive criteria on
In this context we will deal with two types of errors:
Approximation error
Let us approximate a signal
using only coefficients.Let us call the approximation as
.Thus
is approximation error.
Recovery error
Let
be a sensing matrix.Let
be a recovery algorithm.Then
is the recovered signal vector.The error
is recovery error.
Ideally, the recovery error should not be too large compared to the approximation error.
In this following we will
Formalize the notion of null space property (NSP) of a matrix
.Describe a measure for performance of an arbitrary recovery algorithm
.Establish the connection between NSP and performance guarantee for recovery algorithms.
Suppose we approximate
One specific
In the following, we will need some additional notation.
Let
be the set of indices for signal .Let
be a subset of indices.Let
. will denote a signal vector obtained by setting the entries of indexed by to zero.
Let N = 4.
Then
.Let
.Then
.Now let
.Then
.
Let N = 4.
Then
.Let
.Then
.Now let
.Then
.Now let
Then
18.7.8.1. Null Space Property#
(Null space property)
A matrix
holds for every
Let
be sparse. Thus choosing the indices on which is non-zero, I can construct a such that and . Thus = 0. Hence above condition is not satisfied. Thus such a vector should not belong to if satisfies NSP.Essentially vectors in
shouldn’t be concentrated in a small subset of indices.If
satisfies NSP then the only -sparse vector in is .
18.7.8.2. Measuring the Performance of a Recovery Algorithm#
Let
The
We will be interested in guarantees of the form
Why, this recovery guarantee formulation?
Exact recovery of K-sparse signals.
if .Robust recovery of non-sparse signals
Recovery dependent on how well the signals are approximated by
-sparse vectors.Such guarantees are known as instance optimal guarantees.
Also known as uniform guarantees.
Why the specific choice of norms?
Different choices of
norms lead to different guarantees. norm on the LHS is a typical least squares error. norm on the RHS will require prohibitively large number of measurements. norm on the RHS helps us keep the number of measurements less.
If an algorithm
We show that NSP of order
18.7.8.3. NSP and Instance Optimal Guarantees#
(NSP and instance optimal guarantees)
Let
Proof. We are given that
form an encoder-decoder pair.Together, they satisfy instance optimal guarantee (18.39).
Thus they are able to recover all sparse signals exactly.
For non-sparse signals, they are able to recover their
-sparse approximation with bounded recovery error.
We need to show that if
where
Let
.Let
be the indices corresponding to the largest entries of .Then
Split
into and such that .We have
Let
Let
Then
By assumption
.Thus
But since
(recall that indexes only entries) and is able to recover all -sparse signals exactly, henceThus
i.e., the recovery algorithm
recovers for the signal .Certainly
is not -sparse since recovers every -sparse signal uniquely.Hence
must be nonempty.Finally we also have
since
contains some additional non-zero entries.But as per instance optimal recovery guarantee (18.39) for
pair, we haveThus
But
Recall that
where indexes entries of which are (magnitude wise) larger than all entries indexed by .Hence the best
-norm term approximation of is given by .Hence
Thus we finally have
Thus
satisfies the NSP of order .
It turns out that NSP of order
18.7.9. Recovery in Presence of Measurement Noise#
Measurement vector in the presence of noise is given by
where
Recovery error as usual is given by
Stability of a recovery algorithm is characterized by comparing variation of recovery error w.r.t. measurement error.
NSP is both necessary and sufficient for establishing guarantees of the form:
These guarantees do not account for presence of noise during measurement.
We need stronger conditions for handling noise. The restricted isometry property for sensing matrices comes to our rescue.
18.7.9.1. Restricted Isometry Property#
We recall that a matrix
holds for every
If a matrix satisfies RIP of order
, then we can see that it approximately preserves the size of a -sparse vector.If a matrix satisfies RIP of order
, then we can see that it approximately preserves the distance between any two -sparse vectors since difference vectors would be sparse (see Theorem 18.49) .We say that the matrix is nearly orthonormal for sparse vectors.
If a matrix satisfies RIP of order
with a constant , it automatically satisfies RIP of any order with a constant .
18.7.9.2. Stability#
Informally a recovery algorithm is stable if recovery error is small in the presence of small measurement noise.
Is RIP necessary and sufficient for sparse signal recovery from noisy measurements? Let us look at the necessary part.
We will define a notion of stability of the recovery algorithm.
Let
Error is added to the measurements.
LHS is
norm of recovery error.RHS consists of scaling of the
norm of measurement error.The definition says that recovery error is bounded by a multiple of the measurement error.
Thus adding a small amount of measurement noise shouldn’t be causing arbitrarily large recovery error.
It turns out that
If a pair
for all
Proof. Remember that any
Let
.Split it in the form of
with .Define
Thus
We have
Also we have
Let
Since
is -stable, hence we haveAlso
Using the triangle inequality
Thus we have for every
This theorem gives us the lower bound for RIP property of
order
Note that smaller the constant
This result doesn’t require an upper bound on the RIP property in (18.38).
It turns out that If
18.7.9.3. Measurement Bounds#
As stated in previous section, for a
Before we start figuring out the bounds, let us develop a special subset of
When we say
Hence
Each vector in
Revisiting
It is now obvious that
Since there are
By definition
Further Let
Then
We now state a result which will help us in getting to the bounds.
Let
and
Proof. We just need to find one set
. for every . or equivalently .
First condition states that the set
We will construct
Since
hence .Consider any fixed
.How many elements
are there in such that ?Define
Clearly by requirements in the lemma, if
then ; i.e., no vector in belongs to .How many elements are there in
? Let us find an upper bound. we have .If
and differ in or more places, then naturally .Hence if
then hence for any .So define
We have
Thus we have an upper bound given by
Let us look at
carefully.We can choose
indices where and may differ in ways.At each of these
indices, can take value as one of .Thus we have an upper bound
We now describe an iterative process for building
from vectors in .Say we have added
vectors to namely .Then
Number of vectors in
is bounded by .Thus we have at least
vectors left in
to choose from for adding in .We can keep adding vectors to
till there are no more suitable vectors left.We can construct a set of size
provided(18.40)#Now
Note that
is a decreasing function of .Its minimum value is achieved for
as .So we have
Rephrasing the bound on
in (18.40) we haveHence we can definitely construct a set
with the cardinality satisfyingNow it is given that
. So we have:Thus we have
Choose
Clearly this value of
satisfies (18.40).Hence
can have at least these many elements.Thus
which completes the proof.
We can now establish following bound on the required number of measurements to satisfy RIP.
At this moment, we won’t worry about exact value of
Let
where
Proof. Since
Also
Consider the set
Also
since
and an upper bound:
What do these bounds mean? Let us start with the lower bound.
Construct
Lower bound says that these balls are disjoint.
Since
Upper bound tells us that all vectors
Thus the set of all balls lies within a larger ball of radius
So we require that the volume of the larger ball MUST be greater than the sum of volumes of
Since volume of an
Again from Lemma 18.3 we have
Putting back we get
which establishes a lower bound on the number of measurements
. . .
Some remarks are in order:
The theorem only establishes a necessary lower bound on
. It doesn’t mean that if we choose an larger than the lower bound then will have RIP of order with any constant .The restriction
is arbitrary and is made for convenience. In general, we can work with and develop the bounds accordingly.This result fails to capture dependence of
on the RIP constant directly. Johnson-Lindenstrauss lemma helps us resolve this which concerns embeddings of finite sets of points in low-dimensional spaces.We haven’t made significant efforts to optimize the constants. Still they are quite reasonable.
18.7.10. RIP and NSP#
RIP and NSP are connected.
If a matrix
Thus RIP is strictly stronger than NSP (under certain conditions).
We will need following lemma which applies to any arbitrary
Suppose that
Define
where
Let us understand this lemma a bit.
If
maps to the initial few ( or less) elements we chose. maps to all other elements. maps to largest (in magnitude) elements of . contains a maximum of non-zero elements. satisfies RIP of order .Thus
We now state the connection between RIP and NSP.
Suppose that
Proof. We are given that
holds for all
holds
Let
.Then
.Let
denote the largest entries of .Then
Similarly
Thus if we show that
satisfies NSP of order for ; i.e.,then we would have shown it for all
such that .Hence let
.We can divide
into two components and of size each.Since
maps to the largest entries in , hence whatever entries we choose in , the largest entries in will be .Hence as per Lemma 18.4 above, we have
Also
Thus we have
We have to get rid of
.Since
, by applying Theorem 18.17 we getHence
But since
, hence .Hence
Note that the inequality is also satisfied for
in which case, we don’t need to bring to denominator.Now
Putting
we see that
satisfies NSP of order whenever satisfies RIP of order with .
Note that for