18.9. Dictionaries II#

This section continues the development of dictionaries for sparse and redundant representations.

18.9.1. Spark#

We present some more results on spark of a dictionary.

18.9.1.1. Upper Bounds for Spark#

Whenever a set of atoms in a dictionary are linearly dependent, the dependence corresponds to some vector in its null space. Thus, identifying the spark of a dictionary essentially amounts of sifting through the vectors in its null space and finding one with smallest 0-“norm”. This can be cast as an optimization problem:

(18.57)#minimizevv0subject to Dv=0.

Note that the solution v of this problem is not unique. If v is a solution that cv for any c0 is also a solution. Spark is the optimum value of the objective function v0. We now define a sequence of optimization problems for k=1,,D

(18.58)#minimizevv0subject to Dv=0,vk=1.
  1. The k-th problem constrains the solution to choose atom dk from the dictionary.

  2. Since the minimal set of linearly dependent atoms in D will contain at least two vectors, hence spark(D) would correspond to the optimal value of one (or more) of the problems (18.58).

  3. Formally, if we denote vk0, as an optimal vector for the problem (18.58), then

    spark(D)=minimize 1kDvk0,0.
  4. Thus, solving (18.57) is equivalent to solving all D problems specified by (18.58) and then finding the minimum 0-“norm” among them.

  5. The problems (18.58) are still computationally intractable.

We now change each of the 0-“norm” (18.58) minimization problems to 1-“norm” minimization problems.

(18.59)#minimize vv1subject to Dv=0,vk=1.
  1. We have a convex objective and convex (linear) constraints. These are tractable problems.

  2. Let us indicate an optimal solution of (18.59) as vk1,.

  3. Since Dvk1,=0, hence vk1, is feasible for (18.58).

  4. Thus,

    vk0,0vk1,0.
  5. This gives us the relationship

    spark(D)minimize 1kDvk1,0.

We formally state the upper bound on spark(D) in the following theorem [28].

Theorem 18.70

Let D be a dictionary. Then

spark(D)minimize 1kDvk1,0

where vk1, is a solution of the problem (18.59).

18.9.2. Coherence#

In this subsection we develop some more bounds using coherence of a dictionary. As usual, we will be considering an overcomplete dictionary DCN×D consisting of D atoms. The coherence of D is denoted by μ(D). In short we will simply write it as μ. A subdictionary will be indexed by an index set Λ consisting of linearly independent atoms.

Theorem 18.71

Suppose that (K1)μ<1 and assume that |Λ|K. Then

DΛ211(K1)μ.

Equivalently, the rows of DΛ have 2 norms no greater than 11(K1)μ.

Proof. We recall that the operator norm DΛ2 computes the maximum 2 norm among the rows of DΛ. TODO COMPLETE ITS PROOF.

The following definition is due to [22, 28].

Definition 18.33

Let G=DHD be the Gram matrix for dictionary D. We define μ1/2(G) as the smallest number m such that the sum of magnitudes of a collection of m off-diagonal entries in a single row or column of the Gram matrix G is at least 12.

This quantity was introduced in [28] for developing more accurate bounds compared to bounds based on coherence. At that time the idea of Babel function was not available. A careful examination reveals that μ1/2(G) can be related to Babel function.

Theorem 18.72

μ1/2(G)12μ.

Proof. Since μ is the maximum absolute value of any off diagonal term in G=DHD, hence sum of any m terms, say T, is bounded by

Tmμ.

Thus

T12mμ12m12μ.

Since μ1/2(G) is the minimum number of off diagonal terms whose sum exceeds 1/2, hence

μ1/2(G)12μ.

The following result is due to [28].

Theorem 18.73

spark(D)2μ1/2(G)+1.

Proof. We proceed as follows.

  1. Let hN(D).

  2. Then

    Dh=0Gh=DHDh=0.
  3. Subtracting both sides with h we get

    Ghh=(GI)h=h.
  4. Let Λ=supp(h).

  5. By taking columns indexed by Λ from GI and corresponding entries in h, we can write:

    (GI)Λhλ=h.
  6. Taking norm on both sides we get

    h=(GI)Λhλ.
  7. We know that

    (GI)Λhλ(GI)Λhλ
  8. It is easy to see that:

    hλ=h.
  9. Thus

    h(GI)Λh.
  10. This gives us

    (GI)Λ1.
  11. But (GI)Λ is nothing but the maximum sum of magnitudes of off diagonal entries in G along a row in GΛ.

  12. Consider any row in (GI)Λ.

  13. One of the entries in the row (on the main diagonal of GI) is 0.

  14. Thus, there are a maximum of |Λ|1 nonzero entries in the row.

  15. Λ is smallest when |Λ|=spark(D).

  16. For such a Λ, there exists a row in G such that the sum of spark(D)1 off diagonal entries in the row exceeds 1.

  17. Let n denote the minimum number of off diagonal elements on a row or a column of G such that the sum of their magnitudes exceeds one.

  18. Clearly

    spark(D)1n.
  19. It is easy to see that

    n2μ1/2(G)

    i.e. minimum number of off diagonal elements summing up to 1 or more is at least twice the minimum number of off diagonal elements summing up to 12 or more on any row (or column due to Hermitian property).

  20. Thus

    spark(D)12μ1/2(G).
  21. Rewriting, we get

    spark(D)2μ1/2(G)+1.

18.9.3. Babel function#

In this subsection, we provide a more general development of Babel function for a pair of dictionaries.

  1. When we consider a single dictionary, we will use D as the dictionary.

  2. When considering a pair of dictionaries of equal size, we would typically label them as Φ and Ψ with both Φ,ΨCN×D.

  3. We will assume that the dictionaries are full rank as they span the signal space CN.

  4. Why a pair of dictionaries?

    1. We consider Φ as a modeling dictionary from which the sparse signals

      xΦa

      are built.

    2. Ψ on the other hand is the sensing dictionary which will be used to compute correlations with the signal x and try to estimate the approximation a.

    3. Ideally, Φ and Ψ should be same.

    4. But in real life, we may not know Φ correctly.

    5. Hence, Ψ would be a dictionary slightly different from Φ.

18.9.4. p-Babel Function#

See [43] for reference.

Definition 18.34 (p-Babel function over Λ)

Consider an index set Λ{1,,D} indexing a subset of atoms in Φ and Ψ. The p-Babel function over Λ is defined as

(18.60)#μp(Φ,Ψ,Λ)suplΛ(jΛ|ϕj,ψl|p)1p.

What is going on here?

  1. Consider the row vector

    vl=ψlHΦΛ.
  2. This vector contains inner products of modeling atoms in Φ indexed by Λ with the sensing atom ψl.

  3. Now

    vlp=(i|vil|p)1p=(jΛ|ϕj,ψl|p)1p
  4. This is the term in (18.60).

  5. Thus

    μp(Φ,Ψ,Λ)=suplΛvlp.
  6. vlp is a measure of the correlation of the sensing atom ψl with a group of modeling atoms in Φ indexed by Λ using the p-norm.

  7. μp(Φ,Ψ,Λ) attempts to find out a sensing atom from Ψ outside the index set Λ which is most correlated to the group of modeling atoms in Φ indexed by Λ and returns the maximum correlation value.

  8. Different choices of p-norm lead to different correlation values.

We can also measure a correlation of sensing and modeling atoms inside the index set Λ.

Definition 18.35 (Complementary p-Babel function over Λ)

A complement to the p-Babel function measures the amount of correlation between atoms inside the support Λ:

(18.61)#μpin(Φ,Ψ,Λ)supiΛμp(ΦΛ,ΨΛ,Λ{i}).

μp(ΦΛ,ΨΛ,Λ{i}) computes the correlation of i-th sensing atom in Ψ with the modeling atoms in Φ indexed by Λ{i} i.e. all modeling atoms in Λ except the i-th modeling atom.

Finally μpin(Φ,Ψ,Λ) finds the maximum correlation of any sensing atom inside Λ with modeling atoms inside Λ (leaving the corresponding modeling atom).

So far, we have focused our attention to a specific index set Λ. We now consider all index sets with |Λ|K.

Definition 18.36 (p Babel function)

The Babel function for a pair of dictionaries Φ and Ψ as a function of the sparsity level K is defined as

(18.62)#μp(Φ,Ψ,K)sup|Λ|Kμp(Φ,Ψ,Λ).

Correspondingly, the complement of Babel function is defined as

(18.63)#μpin(Φ,Ψ,K)sup|Λ|Kμpin(Φ,Ψ,Λ).

It is straightforward to see that

(18.64)#μpin(Φ,Ψ,K)μp(Φ,Ψ,K1).

Now consider the special case where D=Φ=Ψ. In other words, the sensing and modeling dictionaries are same.

We obtain

(18.65)#μp(D,Λ)=suplΛ(jΛ|dj,dl|p)1p.
(18.66)#μpin(D,Λ)=supiΛμp(DΛ,Λ{i}).
(18.67)#μp(D,K)=sup|Λ|Kμp(D,Λ).
(18.68)#μpin(D,K)=sup|Λ|Kμpin(D,Λ).

Further by choosing p=1, we get

(18.69)#μ1(D,Λ)=suplΛ(jΛ|dj,dl|).
(18.70)#μ1in(D,Λ)=supiΛμ1(DΛ,Λ{i}).
(18.71)#μ1(D,K)=sup|Λ|Kμ1(D,Λ).
(18.72)#μ1in(D,K)=sup|Λ|Kμ1in(D,Λ).

Finally compare this definition of μ1(D,K) with the standard definition of Babel function as

(18.73)#μ1(K)=max|Λ|=KmaxψΛ|ψ,dλ|,

where the vector ψ ranges over the atoms indexed by ΩΛ.

We also know that μ1(K) is an increasing function of K. Thus, replacing |Λ|=K with |Λ|K doesn’t make any difference to the value of μ1(K).

Careful observation shows that the definitions of μ1(K) in (18.73) and μ1(D,K) in (18.71) are exactly the same.

18.9.5. Exact Recovery Coefficient#

we introduce a measure of similarity between a subdictionary and the remaining atoms from the dictionary known as the exact recovery coefficient.

Definition 18.37 (Exact recovery coefficient)

The Exact Recovery Coefficient [77, 78, 79] for a subdictionary DΛ is defined as

ERC(DΛ)=1maxωΛDΛdω1.

We will also use the notation ERC(Λ) when the dictionary is clear from context.

The quantity is called exact recovery coefficient since for a number of algorithms the criteria ERC(Λ)>0 is a sufficient condition for exact recovery of sparse representations.

18.9.5.1. ERC and Babel Function#

We present a lower bound on ERC(Λ) in terms of Babel function.

Theorem 18.74 (ERC lower bound: babel function)

Suppose that |Λ|=kK. A lower bound on Exact Recovery Coefficient is

ERC(Λ)1μ1(K1)μ1(K)1μ1(K1).

It follows that ERC(Λ)>0 whenever

μ1(K1)+μ1(K)<1.

Proof. 1. Let us expand the pseudo-inverse DΛ.

maxωΛDΛdω1=maxωΛ(DΛHDΛ)1DΛHdω1(DΛHDΛ)111maxωΛDΛHdω1.
  1. For the Gram matrix G=DΛHDΛ we recall from Theorem 18.37 that:

    G1111μ1(k1)11μ1(K1).
  2. For the other term we have

    maxωΛDΛHdω1=maxωΛλΛ|dω,dλ|μ1(k)μ1(K).
  3. Thus, we get

    maxωΛDΛdω1μ1(K)1μ1(K1).
  4. Putting back in the definition of Exact Recovery Coefficient:

    ERC(Λ)=1maxωΛDΛdω11μ1(K)1μ1(K1).
  5. This completes the bound on ERC.

  6. Now, we verify the condition for ERC(Λ)>0.

    μ1(K)+μ1(K1)<1μ1(K)<1μ1(K1)μ1(K)1μ1(K1)<11μ1(K)1μ1(K1)>0.
  7. Thus, if μ1(K)+μ1(K1)<1, then the lower bound on ERC(Λ) is positive leading to ERC(Λ)>0.

18.9.5.2. ERC and Coherence#

On the same lines we develop a coherence bound for ERC.

Theorem 18.75 (ERC lower bound: coherence)

Suppose that |Λ|=kK. A lower bound on Exact Recovery Coefficient is

ERC(Λ)1(2K1)μ1(K1)μ.

It follows that ERC(Λ)>0 whenever

Kμ12.

Proof. .

  1. Following the proof of Theorem 18.74 for the Gram matrix G=DΛHDΛ have:

    G1111μ1(K1)11(K1)μ.
  2. For the other term we have

    maxωΛDΛHdω1μ1(K)Kμ.
  3. Thus, we get

    maxωΛDΛdω1Kμ1(K1)μ.
  4. Putting back in the definition of Exact Recovery Coefficient:

    ERC(Λ)1Kμ1(K1)μ=1(2K1)μ1(K1)μ.
  5. This completes the bound on ERC.

  6. Now, we verify the condition for ERC(Λ)>0.

    Kμ122Kμ112Kμ012Kμ+μ>0.
  7. And

    Kμ121Kμ121Kμ+μ12+μ.
  8. Thus Kμ12 ensures that both numerator and denominator for the coherence lower bound on ERC(Λ) are positive leading to ERC(Λ)>0.

A more accurate bound on K is presented in the next theorem.

Theorem 18.76

ERC(Λ)>0 holds whenever

K<12(1+1μ)

where K=|Λ|.

Proof. .

  1. Assuming 1(K1)μ>0, we have

    1(2K1)μ1(K1)μ>01(2K1)μ>01>(2K1)μ2K1<1μK<12(1+1μ).
  2. From Theorem 18.75, we have

    ERC(Λ)1(2K1)μ1(K1)μ.
  3. Thus under the given conditions, we have

    ERC(Λ)>0.
  4. We also need to show that under these conditions

    1(K1)μ>0.
  5. We can see that:

    2K1<1μ2K2<1μ12(K1)μ<1μ(K1)μ>μ2121(K1)μ>12+μ21(K1)μ>0.

18.9.5.3. Geometrical Interpretation of ERC#

Definition 18.38 (Antipodal convex hull of a subdictionary)

The antipodal convex hull [78] of a subdictionary DΛ is defined as the set of signals given by

A1(Λ)={DΛx|xCΛ and x11}.

It is the smallest convex set that contains every unit multiple of every atom.

  1. We recall that PΛ=DΛDΛ is the orthogonal projector on to the column space of DΛ.

  2. Therefore cω=DΛdωCΛ is a coefficient vector which can be used to synthesize this projection.

  3. In other words:

    PΛdω=DΛDΛdω=DΛcω.
  4. Thus, the quantity 1DΛdω1 measures how far the projected atom PΛdω lies from the boundary of A1(Λ).

  5. If every projected atom lies well within the antipodal convex hull, then it is possible to recover superpositions of atoms from Λ.

  6. This happens because coefficient associated with an atom outside Λ must be quite large to represent anything in the span of the subdictionary whenever ERC(Λ)>0.

18.9.6. Dirac DCT Dictionary#

Theorem 18.77

The p-Babel function for Dirac-DCT dictionary is given by

μp(k)=k1pμ1kN.

In particular, the standard Babel function is given by

μ1(k)=kμ

Proof. TODO prove it.