19.1. Sensing Matrices#

19.1.1. Matrices Satisfying RIP#

This section provides basic results about construction of matrices which can satisfy restricted isometry property.

The goal of designing an M×N sensing matrix involves:

  • Stable embedding for signals with has high sparsity as possible (high K)

  • As few measurements as possible (low M)

There are two different approaches

  • Deterministic approach

  • Randomized approach

Known deterministic approaches so far tend to require M to be very large(O(K2lnN) or O(KNα). We can overcome this limitation by randomizing matrix construction.

Construction process:

  • Input M and N.

  • Generate Φ by choosing Φij as independent realizations from some probability distribution.

Suppose that Φ is drawn from normal distribution.

It can be shown that the rank of Φ is M with probability 1.

Example 19.1 (Random matrices are full rank.)

We can verify this fact by doing a small computer simulation.

M = 6;
N = 20;
trials = 10000;
numFullRankMatrices = 0;
for i=1:trials
    % Create a random matrix of size M x N
    A = rand(M,N);
    % Obtain its rank
    R = rank(A);
    % Check whether the rank equals M or not
    if R == M
        numFullRankMatrices = numFullRankMatrices + 1;
    end
end
fprintf('Number of trials: %d\n',trials);
fprintf('Number of full rank matrices: %d\n',numFullRankMatrices);
percentage = numFullRankMatrices*100/trials;
fprintf('Percentage of full rank matrices: %.2f %%\n', percentage);

This program generates a number of random matrices and measures their ranks. It verifies whether they are full rank or not.

Here is a sample output:

>> demoRandomMatrixRank
Number of trials: 10000
Number of full rank matrices: 10000
Percentage of full rank matrices: 100.00 

Thus if we choose M=2K, any subset of 2K columns will be linearly independent. Thus the matrix with satisfy RIP with some δ2K>0. But this construction doesn’t tell us exact value of δ2K. In order to find out δ2K, we must consider all possible K-dimensional subspaces of RN. This is computationally impossible for reasonably large N and K. What is the alternative?

We can start with a chosen value of δ2K and try to construct a matrix which matches it.

Before we proceed further, we should take a detour and review sub-Gaussian distributions in Subgaussian Distributions.

We now state the main theorem of this section.

Theorem 19.1 (Norm bounds on subgaussian vectors)

Suppose that X=[X1,X2,,XM] where each Xi is i.i.d. with XiSub(c2) and E(Xi2)=σ2. Then

E(X22)=Mσ2

Moreover, for any α(0,1) and for any β[c2/σ2,βmax, there exists a constant κ4 depending only on βmax and the ratio σ2/c2 such that

P(X22αMσ2)exp(M(1α)2κ)

and

P(X22βMσ2)exp(M(β1)2κ)

19.1.1.1. Conditions on Random Distribution for RIP#

Let us get back to our business of constructing a matrix Φ using random distributions which satisfies RIP with a given δ. We will impose some conditions on the random distribution.

  • We require that the distribution will yield a matrix that is norm-preserving. This requires that

    (19.1)#E(Φij2)=1M

    Hence variance of distribution should be 1M.

  • We require that distribution is a sub-Gaussian distribution; i.e., there exists a constant c>0 such that

    (19.2)#E(exp(Φijt))exp(c2t22)

This says that the moment generating function of the distribution is dominated by a Gaussian distribution.
In other words, tails of the distribution decay at least as fast as the tails of a Gaussian distribution.

  • We will further assume that entries of Φ are strictly sub-Gaussian. i.e., they must satisfy (19.2) with

    c2=E(Φij2)=1M.

Under these conditions we have the following result.

Corollary 19.1 (Norm bounds on subgaussian matrix vector product)

Suppose that Φ is an M×N matrix whose entries Φij are i.i.d. with Φij drawn according to a strictly sub-Gaussian distribution with c2=1M2.

Let Y=Φx for xRN. Then for any ϵ>0 and any xRN,

E(Y22)=x22

and

P(Y22x22ϵx22)2exp(Mϵ2κ)

where κ=21ln(2)6.5178.

This means that the norm of a sub-Gaussian random vector strongly concentrates about its mean.

19.1.1.2. Sub Gaussian Matrices satisfy the RIP#

Using this result we now state that sub-Gaussian matrices satisfy the RIP.

Theorem 19.2 (Lower bound on required number of measurements)

Fix δ(0,1). Let Φ be an M×N random matrix whose entries Φij are i.i.d. with Φij drawn according to a strictly sub-Gaussian distribution with c2=1M. If

Mκ1Kln(NK),

then Φ satisfies the RIP of order K with the prescribed δ with probability exceeding 12eκ2M, where κ1 is arbitrary and

κ2=δ22κ1κ1ln(42eδ)

We note that this theorem achieves M of the same order as the lower bound obtained in Theorem 18.42 up to a constant.

This is much better than deterministic approaches.

19.1.1.3. Advantages of Random Construction#

There are a number of advantages of the random sensing matrix construction approach:

  • One can show that for random construction, the measurements are democratic. This means that all measurements are equal in importance and it is possible to recover the signal from any sufficiently large subset of the measurements.

    Thus by using random Φ one can be robust to the loss of loss or corruption of a small fraction of measurements.

  • In general we are more interested in x which is sparse in some basis Ψ. In this setting, we require that ΦΨ satisfy the RIP.

    Deterministic construction would explicitly require taking Ψ into account.

    But if Φ is random, we can avoid this issue.

    If Φ is Gaussian and Ψ is an orthonormal basis, then one can easily show that ΦΨ will also have a Gaussian distribution.

    Thus if M is high, ΦΨ will also satisfy RIP with very high probability.

    Similar results hold for other sub-Gaussian distributions as well.

19.1.2. Rademacher Sensing Matrices#

In this subsection, we collect several results related to Rademacher sensing matrices.

Definition 19.1

A Rademacher sensing matrix ΦRM×N with M<N is constructed by drawing each entry ϕij independently from a Rademacher random distribution given by

(19.3)#PX(x)=12δ(x1M)+12δ(x+1M).

Thus ϕij takes a value ±1M with equal probability.

We can remove the scale factor 1M out of the matrix Φ writing

Φ=1MX

With that we can draw individual entries of X from a simpler Rademacher distribution given by

(19.4)#PX(x)=12δ(x1)+12δ(x+1).

Thus entries in X take values of ±1 with equal probability.

This construction is useful since it allows us to implement the multiplication with Φ in terms of just additions and subtractions. The scaling can be implemented towards the end in the signal processing chain.

We note that

E(ϕij)=0.
E(ϕij2)=1M.

Actually we have a better result with

ϕij2=1M.

We can write

Φ=[ϕ1ϕN]

where ϕjRM is a Rademacher random vector with independent entries. We note that

E(ϕj22)=E(i=1Mϕij2)=i=1M(E(ϕij2))=M1M=1.

In this case we also have

ϕj22=1.

Thus the squared length of each of the columns in Φ is 1.

Lemma 19.1

Let zRM be a Rademacher random vector with i.i.d entries zi that take a value ±1M with equal probability. Let uRM be an arbitrary unit norm vector. Then

P(|z,u|>ϵ)2exp(ϵ2M2).

Representative values of this bound are plotted below.

../_images/img_rademacher_rand_vec_tail_bound.png

Fig. 19.1 Tail bound for the probability of inner product of a Rademacher random vector with a unit norm vector#

Proof. This can be proven using Hoeffding’s inequality. To be elaborated later.

A particular application of this lemma is when u itself is another (independently chosen) unit norm Rademacher random vector.

The lemma establishes that the probability of inner product of two independent unit norm Rademacher random vectors being large is very very small. In other words, independently chosen unit norm Rademacher random vectors are incoherent with high probability. This is a very useful result as we will see later in measurement of coherence of Rademacher sensing matrices.

19.1.2.1. Joint Correlation#

Columns of Φ satisfy a joint correlation property ([80]) which is described in following lemma.

Lemma 19.2

Let {uk} be a sequence of K vectors (where ukRM) whose 2 norms do not exceed one. Independently choose zRM to be a random vector with i.i.d. entries zi that take a value ±1M with equal probability. Then

P(maxk|z,uk|ϵ)12Kexp(ϵ2M2).

Proof. Let us call γ=maxk|z,uk|.

  1. We note that if for any uk, uk2<1 and we increase the length of uk by scaling it, then γ will not decrease and hence P(γϵ) will not increase.

  2. Thus if we prove the bound for vectors uk with uk2=11kK, it will be applicable for all uk whose 2 norms do not exceed one.

  3. Hence we will assume that uk2=1.

  4. From Lemma 19.1 we have

    P(|z,uk|>ϵ)2exp(ϵ2M2).
  5. Now the event

    {maxk|z,uk|>ϵ}=k=1K{|z,uk|>ϵ}

    i.e. if any of the inner products (absolute value) is greater than ϵ then the maximum is greater.

  6. We recall Boole’s inequality which states that

    P(iAi)iP(Ai).
  7. Thus

    P(maxk|z,uk|>ϵ)2Kexp(ϵ2M2).
  8. This gives us

    P(maxk|z,uk|ϵ)=1P(maxk|z,uk|>ϵ)12Kexp(ϵ2M2).

19.1.2.2. Coherence#

We show that coherence of Rademacher sensing matrix is fairly small with high probability (adapted from [80]).

Lemma 19.3

Fix δ(0,1). For an M×N Rademacher sensing matrix Φ as defined in Definition 19.1, the coherence statistic

μ4Mln(Nδ)

with probability exceeding 1δ.

../_images/img_rademacher_coherence_bound.png

Fig. 19.2 Coherence bounds for Rademacher sensing matrices#

Proof. We recall the definition of coherence as

μ=maxjk|ϕj,ϕk|=maxj<k|ϕj,ϕk|.
  1. Since Φ is a Rademacher sensing matrix hence each column of Φ is unit norm column.

  2. Consider some 1j<kN identifying columns ϕj and ϕk.

  3. We note that they are independent of each other.

  4. Thus from Lemma 19.1 we have

    P(|ϕj,ϕk|>ϵ)2exp(ϵ2M2).
  5. Now there are N(N1)2 such pairs of (j,k).

  6. Hence by applying Boole’s inequality

    P(maxj<k|ϕj,ϕk|>ϵ)2N(N1)2exp(ϵ2M2)N2exp(ϵ2M2).
  7. Thus we have

    P(μ>ϵ)N2exp(ϵ2M2).
  8. What we need to do now is to choose a suitable value of ϵ so that the R.H.S. of this inequality is simplified.

  9. We choose

    ϵ2=4Mln(Nδ).
  10. This gives us

    ϵ2M2=2ln(Nδ)exp(ϵ2M2)=(δN)2.
  11. Putting back we get

    P(μ>ϵ)N2(δN)2δ2.
  12. This justifies why we need δ(0,1).

  13. Finally

    P(μ4Mln(Nδ))=P(μϵ)=1P(μ>ϵ)>1δ2

    and

    1δ2>1δ

    which completes the proof.

19.1.3. Gaussian Sensing Matrices#

In this subsection we collect several results related to Gaussian sensing matrices.

Definition 19.2

A Gaussian sensing matrix ΦRM×N with M<N is constructed by drawing each entry ϕij independently from a Gaussian random distribution N(0,1M).

We note that

E(ϕij)=0.
E(ϕij2)=1M.

We can write

Φ=[ϕ1ϕN]

where ϕjRM is a Gaussian random vector with independent entries. We note that

E(ϕj22)=E(i=1Mϕij2)=i=1M(E(ϕij2))=M1M=1.

Thus the expected value of squared length of each of the columns in Φ is 1.

19.1.3.1. Joint Correlation#

Columns of Φ satisfy a joint correlation property ([80]) which is described in following lemma.

Lemma 19.4

Let {uk} be a sequence of K vectors (where ukRM) whose 2 norms do not exceed one. Independently choose zRM to be a random vector with i.i.d. N(0,1M) entries. Then

P(maxk|z,uk|ϵ)1Kexp(ϵ2M2).

Proof. Let us call γ=maxk|z,uk|.

  1. We note that if for any uk, uk2<1 and we increase the length of uk by scaling it, then γ will not decrease and hence P(γϵ) will not increase.

  2. Thus if we prove the bound for vectors uk with uk2=11kK, it will be applicable for all uk whose 2 norms do not exceed one.

  3. Hence we will assume that uk2=1.

  4. Now consider z,uk.

  5. Since z is a Gaussian random vector, hence z,uk is a Gaussian random variable.

  6. Since uk=1 hence

    z,ukN(0,1M).
  7. We recall a well known tail bound for Gaussian random variables which states that

    PX(|x|>ϵ)=2πϵNexp(x22)dxexp(ϵ2M2).
  8. Now the event

    {maxk|z,uk|>ϵ}=k=1K{|z,uk|>ϵ}

    i.e. if any of the inner products (absolute value) is greater than ϵ then the maximum is greater.

  9. We recall Boole’s inequality which states that

    P(iAi)iP(Ai).
  10. Thus

    P(maxk|z,uk|>ϵ)Kexp(ϵ2M2).
  11. This gives us

    P(maxk|z,uk|ϵ)=1P(maxk|z,uk|>ϵ)1Kexp(ϵ2M2).