# 18.9. Dictionaries II#

This section continues the development of dictionaries for sparse and redundant representations.

## 18.9.1. Spark#

We present some more results on spark of a dictionary.

### 18.9.1.1. Upper Bounds for Spark#

Whenever a set of atoms in a dictionary are linearly dependent, the dependence corresponds to some vector in its null space. Thus, identifying the spark of a dictionary essentially amounts of sifting through the vectors in its null space and finding one with smallest $$\ell_0$$-“norm”. This can be cast as an optimization problem:

(18.57)#$\begin{split}& \underset{\bv}{\text{minimize}} & & \| \bv \|_0 \\ & \text{subject to } & & \bDDD \bv = \bzero.\end{split}$

Note that the solution $$\bv^*$$ of this problem is not unique. If $$\bv^*$$ is a solution that $$c \bv^*$$ for any $$c \neq 0$$ is also a solution. Spark is the optimum value of the objective function $$\| \bv \|_0$$. We now define a sequence of optimization problems for $$k = 1, \dots, D$$

(18.58)#$\begin{split} & \underset{\bv}{\text{minimize}} & & \| \bv \|_0 \\ & \text{subject to } & & \bDDD \bv = \bzero, v_k = 1.\end{split}$
1. The $$k$$-th problem constrains the solution to choose atom $$\bd_k$$ from the dictionary.

2. Since the minimal set of linearly dependent atoms in $$\bDDD$$ will contain at least two vectors, hence $$\spark(\bDDD)$$ would correspond to the optimal value of one (or more) of the problems (18.58).

3. Formally, if we denote $$\bv_k^{0, *}$$ as an optimal vector for the problem (18.58), then

$\spark(\bDDD) = \underset{1 \leq k \leq D} {\text{minimize }}\| \bv_k^{0, *}\|_0.$
4. Thus, solving (18.57) is equivalent to solving all $$D$$ problems specified by (18.58) and then finding the minimum $$\ell_0$$-“norm” among them.

5. The problems (18.58) are still computationally intractable.

We now change each of the $$\ell_0$$-“norm” (18.58) minimization problems to $$\ell_1$$-“norm” minimization problems.

(18.59)#$\begin{split} & \underset{\bv}{\text{minimize }} & & \| \bv \|_1 \\ & \text{subject to } & & \bDDD \bv = \bzero, v_k = 1.\end{split}$
1. We have a convex objective and convex (linear) constraints. These are tractable problems.

2. Let us indicate an optimal solution of (18.59) as $$\bv_k^{1, *}$$.

3. Since $$\bDDD \bv_k^{1, *} = \bzero$$, hence $$\bv_k^{1, *}$$ is feasible for (18.58).

4. Thus,

$\| \bv_k^{0, *}\|_0 \leq \| \bv_k^{1, *}\|_0.$
5. This gives us the relationship

$\spark(\bDDD) \leq \underset{1 \leq k \leq D}{\text{minimize }} \| \bv_k^{1, *}\|_0.$

We formally state the upper bound on $$\spark(\bDDD)$$ in the following theorem [28].

Theorem 18.70

Let $$\bDDD$$ be a dictionary. Then

$\spark(\bDDD) \leq \underset{1 \leq k \leq D}{\text{minimize }} \| \bv_k^{1, *}\|_0$

where $$\bv_k^{1, *}$$ is a solution of the problem (18.59).

## 18.9.2. Coherence#

In this subsection we develop some more bounds using coherence of a dictionary. As usual, we will be considering an overcomplete dictionary $$\bDDD \in \CC^{N \times D}$$ consisting of $$D$$ atoms. The coherence of $$\bDDD$$ is denoted by $$\mu (\bDDD)$$. In short we will simply write it as $$\mu$$. A subdictionary will be indexed by an index set $$\Lambda$$ consisting of linearly independent atoms.

Theorem 18.71

Suppose that $$(K - 1) \mu < 1$$ and assume that $$| \Lambda | \leq K$$. Then

$\| \bDDD_{\Lambda}^{\dag} \|_{2 \to \infty} \leq \frac{1}{\sqrt{ 1 - ( K - 1) \mu}}.$

Equivalently, the rows of $$\bDDD_{\Lambda}^{\dag}$$ have $$\ell_2$$ norms no greater than $$\frac{1}{\sqrt{ 1 - ( K - 1) \mu}}$$.

Proof. We recall that the operator norm $$\| \bDDD_{\Lambda}^{\dag} \|_{2 \to \infty}$$ computes the maximum $$\ell_2$$ norm among the rows of $$\bDDD_{\Lambda}^{\dag}$$. TODO COMPLETE ITS PROOF.

The following definition is due to [22, 28].

Definition 18.33

Let $$\bG = \bDDD^H \bDDD$$ be the Gram matrix for dictionary $$\bDDD$$. We define $$\mu_{1/2}(\bG)$$ as the smallest number $$m$$ such that the sum of magnitudes of a collection of $$m$$ off-diagonal entries in a single row or column of the Gram matrix $$\bG$$ is at least $$\frac{1}{2}$$.

This quantity was introduced in [28] for developing more accurate bounds compared to bounds based on coherence. At that time the idea of Babel function was not available. A careful examination reveals that $$\mu_{1/2}(\bG)$$ can be related to Babel function.

Theorem 18.72

$\mu_{1/2}(\bG) \geq \frac{1}{2\mu}.$

Proof. Since $$\mu$$ is the maximum absolute value of any off diagonal term in $$\bG = \bDDD^H \bDDD$$, hence sum of any $$m$$ terms, say $$T$$, is bounded by

$T \leq m \mu.$

Thus

$T \geq \frac{1}{2} \implies m \mu \geq \frac{1}{2} \implies m \geq \frac{1}{2\mu}.$

Since $$\mu_{1/2}(\bG)$$ is the minimum number of off diagonal terms whose sum exceeds $$1/2$$, hence

$\mu_{1/2}(\bG) \geq \frac{1}{2\mu}.$

The following result is due to [28].

Theorem 18.73

$\spark(\bDDD) \geq 2 \mu_{1/2}(G) +1.$

Proof. We proceed as follows.

1. Let $$\bh \in \NullSpace(\bDDD)$$.

2. Then

$\bDDD \bh = \bzero \implies \bG \bh = \bDDD^H \bDDD \bh = \bzero.$
3. Subtracting both sides with $$\bh$$ we get

$\bG \bh - \bh = (\bG - \bI) \bh = -\bh.$
4. Let $$\Lambda = \supp(\bh)$$.

5. By taking columns indexed by $$\Lambda$$ from $$\bG - \bI$$ and corresponding entries in $$\bh$$, we can write:

$(\bG - \bI)_{\Lambda} \bh_{\lambda} = - \bh.$
6. Taking $$\ell_{\infty}$$ norm on both sides we get

$\| \bh \|_{\infty} = \| (\bG - \bI)_{\Lambda} \bh_{\lambda} \|_{\infty}.$
7. We know that

$\| (\bG - \bI)_{\Lambda} \bh_{\lambda} \|_{\infty} \leq \| (\bG - \bI)_{\Lambda} \|_{\infty} \| \bh_{\lambda} \|_{\infty}$
8. It is easy to see that:

$\| \bh_{\lambda} \|_{\infty} = \| \bh \|_{\infty}.$
9. Thus

$\| \bh \|_{\infty} \leq \| (\bG - \bI)_{\Lambda} \|_{\infty} \| \bh \|_{\infty}.$
10. This gives us

$\| (\bG - \bI)_{\Lambda} \|_{\infty} \geq 1.$
11. But $$\| (\bG - \bI)_{\Lambda} \|_{\infty}$$ is nothing but the maximum sum of magnitudes of off diagonal entries in $$\bG$$ along a row in $$\bG_{\Lambda}$$.

12. Consider any row in $$(\bG - \bI)_{\Lambda}$$.

13. One of the entries in the row (on the main diagonal of $$\bG - \bI$$) is 0.

14. Thus, there are a maximum of $$|\Lambda | - 1$$ nonzero entries in the row.

15. $$\Lambda$$ is smallest when $$|\Lambda | = \spark(\bDDD)$$.

16. For such a $$\Lambda$$, there exists a row in $$\bG$$ such that the sum of $$\spark(\bDDD) - 1$$ off diagonal entries in the row exceeds 1.

17. Let $$n$$ denote the minimum number of off diagonal elements on a row or a column of $$\bG$$ such that the sum of their magnitudes exceeds one.

18. Clearly

$\spark(\bDDD) - 1 \geq n.$
19. It is easy to see that

$n \geq 2 \mu_{1/2} (\bG)$

i.e. minimum number of off diagonal elements summing up to 1 or more is at least twice the minimum number of off diagonal elements summing up to $$\frac{1}{2}$$ or more on any row (or column due to Hermitian property).

20. Thus

$\spark(\bDDD) - 1 \geq 2 \mu_{1/2} (G).$
21. Rewriting, we get

$\spark(\bDDD) \geq 2 \mu_{1/2} (G) + 1.$

## 18.9.3. Babel function#

In this subsection, we provide a more general development of Babel function for a pair of dictionaries.

1. When we consider a single dictionary, we will use $$\bDDD$$ as the dictionary.

2. When considering a pair of dictionaries of equal size, we would typically label them as $$\Phi$$ and $$\Psi$$ with both $$\Phi, \Psi \in \CC^{N \times D}$$.

3. We will assume that the dictionaries are full rank as they span the signal space $$\CC^N$$.

4. Why a pair of dictionaries?

1. We consider $$\Phi$$ as a modeling dictionary from which the sparse signals

$\bx \approx \Phi \ba$

are built.

2. $$\Psi$$ on the other hand is the sensing dictionary which will be used to compute correlations with the signal $$\bx$$ and try to estimate the approximation $$\ba$$.

3. Ideally, $$\Phi$$ and $$\Psi$$ should be same.

4. But in real life, we may not know $$\Phi$$ correctly.

5. Hence, $$\Psi$$ would be a dictionary slightly different from $$\Phi$$.

## 18.9.4. p-Babel Function#

See [43] for reference.

Definition 18.34 ($$p$$-Babel function over $$\Lambda$$)

Consider an index set $$\Lambda \subset \{1, \dots, D \}$$ indexing a subset of atoms in $$\Phi$$ and $$\Psi$$. The $$p$$-Babel function over $${\Lambda}$$ is defined as

(18.60)#$\mu_p(\Phi, \Psi, \Lambda) \triangleq \underset{l \notin \Lambda}{\sup} \left (\sum_{j \in \Lambda} |\langle \phi_j, \psi_l \rangle |^p \right )^{\frac{1}{p}}.$

What is going on here?

1. Consider the row vector

$\bv^l = \psi_l^H \Phi_{\Lambda}.$
2. This vector contains inner products of modeling atoms in $$\Phi$$ indexed by $$\Lambda$$ with the sensing atom $$\psi_l$$.

3. Now

$\| \bv^l \|_p = \left ( \sum_{i} | v^l_i |^p \right )^{\frac{1}{p}} = \left ( \sum_{j \in \Lambda} | \langle \phi_j, \psi_l \rangle |^p \right )^{\frac{1}{p}}$
4. This is the term in (18.60).

5. Thus

$\mu_p(\Phi, \Psi, \Lambda) = \underset{l \notin \Lambda}{\sup} \| v^l \|_p.$
6. $$\| v^l \|_p$$ is a measure of the correlation of the sensing atom $$\psi_l$$ with a group of modeling atoms in $$\Phi$$ indexed by $$\Lambda$$ using the $$p$$-norm.

7. $$\mu_p(\Phi, \Psi, \Lambda)$$ attempts to find out a sensing atom from $$\Psi$$ outside the index set $$\Lambda$$ which is most correlated to the group of modeling atoms in $$\Phi$$ indexed by $$\Lambda$$ and returns the maximum correlation value.

8. Different choices of $$p$$-norm lead to different correlation values.

We can also measure a correlation of sensing and modeling atoms inside the index set $$\Lambda$$.

Definition 18.35 (Complementary $$p$$-Babel function over $$\Lambda$$)

A complement to the $$p$$-Babel function measures the amount of correlation between atoms inside the support $$\Lambda$$:

(18.61)#$\mu_p^{\text{in}}(\Phi, \Psi, \Lambda) \triangleq \underset{i \in \Lambda}{\sup} \, \mu_p \left (\Phi_{\Lambda}, \Psi_{\Lambda}, \Lambda \setminus \{i \} \right).$

$$\mu_p(\Phi_{\Lambda}, \Psi_{\Lambda}, \Lambda \setminus \{i \})$$ computes the correlation of $$i$$-th sensing atom in $$\Psi$$ with the modeling atoms in $$\Phi$$ indexed by $$\Lambda \setminus \{i \}$$ i.e. all modeling atoms in $$\Lambda$$ except the $$i$$-th modeling atom.

Finally $$\mu_p^{\text{in}}(\Phi, \Psi, \Lambda)$$ finds the maximum correlation of any sensing atom inside $$\Lambda$$ with modeling atoms inside $$\Lambda$$ (leaving the corresponding modeling atom).

So far, we have focused our attention to a specific index set $$\Lambda$$. We now consider all index sets with $$|\Lambda | \leq K$$.

Definition 18.36 ($$p$$ Babel function)

The Babel function for a pair of dictionaries $$\Phi$$ and $$\Psi$$ as a function of the sparsity level $$K$$ is defined as

(18.62)#$\mu_p(\Phi, \Psi, K) \triangleq \underset{|\Lambda| \leq K}{\sup}\, \mu_p(\Phi, \Psi, \Lambda).$

Correspondingly, the complement of Babel function is defined as

(18.63)#$\mu_p^{\text{in}}(\Phi, \Psi, K) \triangleq \underset{|\Lambda| \leq K}{\sup}\, \mu_p^{\text{in}}(\Phi, \Psi, \Lambda).$

It is straightforward to see that

(18.64)#$\mu_p^{\text{in}}(\Phi, \Psi, K) \leq \mu_p(\Phi, \Psi, K-1).$

Now consider the special case where $$\bDDD=\Phi=\Psi$$. In other words, the sensing and modeling dictionaries are same.

We obtain

(18.65)#$\mu_p(\bDDD, \Lambda) = \underset{l \notin \Lambda}{\sup} \left (\sum_{j \in \Lambda} |\langle \bd_j, \bd_l \rangle |^p \right )^{\frac{1}{p}}.$
(18.66)#$\mu_p^{\text{in}}(\bDDD, \Lambda) = \underset{i \in \Lambda}{\sup} \, \mu_p \left (\bDDD_{\Lambda}, \Lambda \setminus \{i \} \right).$
(18.67)#$\mu_p(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_p(\bDDD, \Lambda).$
(18.68)#$\mu_p^{\text{in}}(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_p^{\text{in}}(\bDDD, \Lambda).$

Further by choosing $$p=1$$, we get

(18.69)#$\mu_1(\bDDD, \Lambda) = \underset{l \notin \Lambda}{\sup} \left (\sum_{j \in \Lambda} |\langle \bd_j, \bd_l \rangle | \right ).$
(18.70)#$\mu_1^{\text{in}}(\bDDD, \Lambda) = \underset{i \in \Lambda}{\sup} \, \mu_1 \left ( \bDDD_{\Lambda}, \Lambda \setminus \{i \} \right).$
(18.71)#$\mu_1(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_1(\bDDD, \Lambda).$
(18.72)#$\mu_1^{\text{in}}(\bDDD, K) = \underset{|\Lambda| \leq K}{\sup}\, \mu_1^{\text{in}}(\bDDD, \Lambda).$

Finally compare this definition of $$\mu_1(\bDDD, K)$$ with the standard definition of Babel function as

(18.73)#$\mu_1(K) = \underset{|\Lambda| = K}{\max} \; \underset {\psi}{\max} \sum_{\Lambda} | \langle \psi, \bd_{\lambda} \rangle |,$

where the vector $$\psi$$ ranges over the atoms indexed by $$\Omega \setminus \Lambda$$.

We also know that $$\mu_1(K)$$ is an increasing function of $$K$$. Thus, replacing $$|\Lambda| = K$$ with $$|\Lambda| \leq K$$ doesn’t make any difference to the value of $$\mu_1(K)$$.

Careful observation shows that the definitions of $$\mu_1(K)$$ in (18.73) and $$\mu_1(\bDDD, K)$$ in (18.71) are exactly the same.

## 18.9.5. Exact Recovery Coefficient#

we introduce a measure of similarity between a subdictionary and the remaining atoms from the dictionary known as the exact recovery coefficient.

Definition 18.37 (Exact recovery coefficient)

The Exact Recovery Coefficient [77, 78, 79] for a subdictionary $$\bDDD_{\Lambda}$$ is defined as

$\ERC(\bDDD_{\Lambda}) = 1 - \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1.$

We will also use the notation $$\ERC(\Lambda)$$ when the dictionary is clear from context.

The quantity is called exact recovery coefficient since for a number of algorithms the criteria $$\ERC(\Lambda) > 0$$ is a sufficient condition for exact recovery of sparse representations.

### 18.9.5.1. ERC and Babel Function#

We present a lower bound on $$\ERC(\Lambda)$$ in terms of Babel function.

Theorem 18.74 (ERC lower bound: babel function)

Suppose that $$|\Lambda| = k \leq K$$. A lower bound on Exact Recovery Coefficient is

$\ERC(\Lambda) \geq \frac{1 - \mu_1(K- 1) - \mu_1(K)}{1 - \mu_1(K-1)}.$

It follows that $$\ERC(\Lambda) > 0$$ whenever

$\mu_1(K- 1) + \mu_1 (K) < 1.$

Proof. 1. Let us expand the pseudo-inverse $$\bDDD_{\Lambda}^{\dag}$$.

$\begin{split} \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 &= \underset{\omega \notin \Lambda}{\max} \left \|\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{-1} \bDDD_{\Lambda}^H \bd_{\omega} \right \|_1\\ &\leq \|\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{-1} \|_{1 \to 1} \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^H \bd_{\omega} \|_1. \end{split}$
1. For the Gram matrix $$\bG = \bDDD_{\Lambda}^H \bDDD_{\Lambda}$$ we recall from Theorem 18.37 that:

$\| \bG^{-1} \|_{1} \leq \frac{1}{1 - \mu_1(k - 1)} \leq \frac{1}{1 - \mu_1(K - 1)} .$
2. For the other term we have

$\underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^H \bd_{\omega} \|_1 = \underset{\omega \notin \Lambda}{\max} \sum_{\lambda \in \Lambda}| \langle \bd_{\omega}, \bd_{\lambda} \rangle | \leq \mu_1(k) \leq \mu_1(K).$
3. Thus, we get

$\underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 \leq \frac{\mu_1(K)}{1 - \mu_1(K - 1)}.$
4. Putting back in the definition of Exact Recovery Coefficient:

$\ERC(\Lambda) = 1 - \underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 \geq 1 - \frac{\mu_1(K)}{1 - \mu_1(K - 1)}.$
5. This completes the bound on ERC.

6. Now, we verify the condition for $$\ERC(\Lambda) > 0$$.

$\begin{split} \mu_1(K) + \mu_1(K - 1) < 1 &\iff \mu_1(K) < 1 - \mu_1(K - 1) \\ &\iff \frac{\mu_1(K)}{1 - \mu_1(K - 1)} < 1 \\ &\iff 1 - \frac{\mu_1(K)}{1 - \mu_1(K - 1)} > 0. \end{split}$
7. Thus, if $$\mu_1(K) + \mu_1(K - 1) < 1$$, then the lower bound on $$\ERC(\Lambda)$$ is positive leading to $$\ERC(\Lambda) > 0$$.

### 18.9.5.2. ERC and Coherence#

On the same lines we develop a coherence bound for ERC.

Theorem 18.75 (ERC lower bound: coherence)

Suppose that $$|\Lambda| = k \leq K$$. A lower bound on Exact Recovery Coefficient is

$\ERC(\Lambda) \geq \frac{1 - (2K - 1) \mu}{1 - (K - 1)\mu}.$

It follows that $$\ERC(\Lambda) > 0$$ whenever

$K \mu \leq \frac{1}{2}.$

Proof. .

1. Following the proof of Theorem 18.74 for the Gram matrix $$\bG = \bDDD_{\Lambda}^H \bDDD_{\Lambda}$$ have:

$\| \bG^{-1} \|_{1} \leq \frac{1}{1 - \mu_1(K - 1)} \leq \frac{1}{1 - (K - 1)\mu}.$
2. For the other term we have

$\underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^H \bd_{\omega} \|_1 \leq \mu_1(K) \leq K \mu.$
3. Thus, we get

$\underset{\omega \notin \Lambda}{\max} \|\bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1 \leq \frac{K \mu}{1 - (K - 1)\mu}.$
4. Putting back in the definition of Exact Recovery Coefficient:

$\ERC(\Lambda) \geq 1 - \frac{K \mu}{1 - (K - 1)\mu} = \frac{1 - (2K - 1)\mu}{1 - (K - 1)\mu}.$
5. This completes the bound on ERC.

6. Now, we verify the condition for $$\ERC(\Lambda) > 0$$.

$K \mu \leq \frac{1}{2} \implies 2 K \mu \leq 1 \implies 1 - 2K \mu \geq 0 \implies 1 - 2K \mu + \mu > 0.$
7. And

$K \mu \leq \frac{1}{2} \implies 1 - K \mu \geq \frac{1}{2} \implies 1 - K \mu + \mu \geq \frac{1}{2} + \mu.$
8. Thus $$K \mu \leq \frac{1}{2}$$ ensures that both numerator and denominator for the coherence lower bound on $$\ERC(\Lambda)$$ are positive leading to $$\ERC(\Lambda) > 0$$.

A more accurate bound on $$K$$ is presented in the next theorem.

Theorem 18.76

$$\ERC(\Lambda) > 0$$ holds whenever

$K < \frac{1}{2} \left (1 + \frac{1}{\mu} \right )$

where $$K = | \Lambda|$$.

Proof. .

1. Assuming $$1 - (K - 1)\mu > 0$$, we have

$\begin{split} &\frac{1 - (2K - 1) \mu}{1 - (K - 1)\mu} > 0 \\ \iff & 1 - (2K - 1) \mu > 0\\ \iff & 1 > (2K - 1) \mu \\ \iff & 2K - 1 < \frac{1}{\mu}\\ \iff & K < \frac{1}{2} \left (1 + \frac{1}{\mu} \right ). \end{split}$
2. From Theorem 18.75, we have

$\ERC(\Lambda) \geq \frac{1 - (2K - 1) \mu}{1 - (K - 1)\mu}.$
3. Thus under the given conditions, we have

$\ERC(\Lambda) > 0.$
4. We also need to show that under these conditions

$1 - (K - 1)\mu > 0.$
5. We can see that:

$\begin{split} & 2K - 1 < \frac{1}{\mu} \\ \implies & 2K - 2 < \frac{1}{\mu} - 1\\ \implies & 2 (K - 1) \mu < 1 - \mu \\ \implies & - (K - 1) \mu > \frac{\mu}{2} - \frac{1}{2}\\ \implies & 1 - (K - 1) \mu > \frac{1}{2} + \frac{\mu}{2}\\ \implies & 1 - (K - 1) \mu > 0. \end{split}$

### 18.9.5.3. Geometrical Interpretation of ERC#

Definition 18.38 (Antipodal convex hull of a subdictionary)

The antipodal convex hull [78] of a subdictionary $$\bDDD_{\Lambda}$$ is defined as the set of signals given by

$\AAA_1 (\Lambda) = \{\bDDD_{\Lambda} \bx \ST \bx \in \CC^{\Lambda} \text{ and } \| \bx \|_1 \leq 1\}.$

It is the smallest convex set that contains every unit multiple of every atom.

1. We recall that $$\bP_{\Lambda} = \bDDD_{\Lambda} \bDDD_{\Lambda}^{\dag}$$ is the orthogonal projector on to the column space of $$\bDDD_{\Lambda}$$.

2. Therefore $$\bc_{\omega} = \bDDD_{\Lambda}^{\dag} \bd_{\omega} \in \CC^{\Lambda}$$ is a coefficient vector which can be used to synthesize this projection.

3. In other words:

$\bP_{\Lambda} \bd_{\omega} = \bDDD_{\Lambda} \bDDD_{\Lambda}^{\dag} \bd_{\omega} = \bDDD_{\Lambda} \bc_{\omega}.$
4. Thus, the quantity $$1 - \| \bDDD_{\Lambda}^{\dag} \bd_{\omega} \|_1$$ measures how far the projected atom $$\bP_{\Lambda} \bd_{\omega}$$ lies from the boundary of $$\AAA_1 (\Lambda)$$.

5. If every projected atom lies well within the antipodal convex hull, then it is possible to recover superpositions of atoms from $$\Lambda$$.

6. This happens because coefficient associated with an atom outside $$\Lambda$$ must be quite large to represent anything in the span of the subdictionary whenever $$\ERC(\Lambda) > 0$$.

## 18.9.6. Dirac DCT Dictionary#

Theorem 18.77

The $$p$$-Babel function for Dirac-DCT dictionary is given by

$\mu_p(k) = k^{\frac{1}{p}} \mu \Forall 1\leq k \leq N.$

In particular, the standard Babel function is given by

$\mu_1(k) = k\mu$

Proof. TODO prove it.