18.9. Dictionaries II#

This section continues the development of dictionaries for sparse and redundant representations.

18.9.1. Spark#

We present some more results on spark of a dictionary.

18.9.1.1. Upper Bounds for Spark#

Whenever a set of atoms in a dictionary are linearly dependent, the dependence corresponds to some vector in its null space. Thus, identifying the spark of a dictionary essentially amounts of sifting through the vectors in its null space and finding one with smallest $ℓ_{0}$ -“norm”. This can be cast as an optimization problem:

(18.57)#

\begin{aligned} \underset{v}{minimize} & ‖ v ‖_{0} \\ subject to & D v = 0 . \end{aligned}

Note that the solution $v^{*}$ of this problem is not unique. If $v^{*}$ is a solution that $c v^{*}$ for any $c \neq 0$ is also a solution. Spark is the optimum value of the objective function $‖ v ‖_{0}$ . We now define a sequence of optimization problems for $k = 1, \dots, D$

(18.58)#

\begin{aligned} \underset{v}{minimize} & ‖ v ‖_{0} \\ subject to & D v = 0, v_{k} = 1. \end{aligned}

The $k$ -th problem constrains the solution to choose atom $d_{k}$ from the dictionary.
Since the minimal set of linearly dependent atoms in $D$ will contain at least two vectors, hence $spark (D)$ would correspond to the optimal value of one (or more) of the problems (18.58).
Formally, if we denote $v_{k}^{0, *}$ as an optimal vector for the problem (18.58), then

$spark (D) = \underset{1 \leq k \leq D}{minimize} ‖ v_{k}^{0, *} ‖_{0} .$
Thus, solving (18.57) is equivalent to solving all $D$ problems specified by (18.58) and then finding the minimum $ℓ_{0}$ -“norm” among them.
The problems (18.58) are still computationally intractable.

We now change each of the $ℓ_{0}$ -“norm” (18.58) minimization problems to $ℓ_{1}$ -“norm” minimization problems.

(18.59)#

\begin{aligned} \underset{v}{minimize} & ‖ v ‖_{1} \\ subject to & D v = 0, v_{k} = 1. \end{aligned}

We have a convex objective and convex (linear) constraints. These are tractable problems.
Let us indicate an optimal solution of (18.59) as $v_{k}^{1, *}$ .
Since $D v_{k}^{1, *} = 0$ , hence $v_{k}^{1, *}$ is feasible for (18.58).
Thus,

$‖ v_{k}^{0, *} ‖_{0} \leq ‖ v_{k}^{1, *} ‖_{0} .$
This gives us the relationship

$spark (D) \leq \underset{1 \leq k \leq D}{minimize} ‖ v_{k}^{1, *} ‖_{0} .$

We formally state the upper bound on $spark (D)$ in the following theorem [28].

Theorem 18.70

Let $D$ be a dictionary. Then

spark (D) \leq \underset{1 \leq k \leq D}{minimize} ‖ v_{k}^{1, *} ‖_{0}

where $v_{k}^{1, *}$ is a solution of the problem (18.59).

18.9.2. Coherence#

In this subsection we develop some more bounds using coherence of a dictionary. As usual, we will be considering an overcomplete dictionary $D \in C^{N \times D}$ consisting of $D$ atoms. The coherence of $D$ is denoted by $μ (D)$ . In short we will simply write it as $μ$ . A subdictionary will be indexed by an index set $Λ$ consisting of linearly independent atoms.

Theorem 18.71

Suppose that $(K - 1) μ < 1$ and assume that $| Λ | \leq K$ . Then

‖ D_{Λ}^{†} ‖_{2 \to \infty} \leq \frac{1}{\sqrt{1 - (K - 1) μ}} .

Equivalently, the rows of $D_{Λ}^{†}$ have $ℓ_{2}$ norms no greater than $\frac{1}{\sqrt{1 - (K - 1) μ}}$ .

Proof. We recall that the operator norm $‖ D_{Λ}^{†} ‖_{2 \to \infty}$ computes the maximum $ℓ_{2}$ norm among the rows of $D_{Λ}^{†}$ . TODO COMPLETE ITS PROOF.

The following definition is due to [22, 28].

Definition 18.33

Let $G = D^{H} D$ be the Gram matrix for dictionary $D$ . We define $μ_{1 / 2} (G)$ as the smallest number $m$ such that the sum of magnitudes of a collection of $m$ off-diagonal entries in a single row or column of the Gram matrix $G$ is at least $\frac{1}{2}$ .

This quantity was introduced in [28] for developing more accurate bounds compared to bounds based on coherence. At that time the idea of Babel function was not available. A careful examination reveals that $μ_{1 / 2} (G)$ can be related to Babel function.

Theorem 18.72

μ_{1 / 2} (G) \geq \frac{1}{2 μ} .

Proof. Since $μ$ is the maximum absolute value of any off diagonal term in $G = D^{H} D$ , hence sum of any $m$ terms, say $T$ , is bounded by

T \leq m μ .

Thus

T \geq \frac{1}{2} ⟹ m μ \geq \frac{1}{2} ⟹ m \geq \frac{1}{2 μ} .

Since $μ_{1 / 2} (G)$ is the minimum number of off diagonal terms whose sum exceeds $1 / 2$ , hence

μ_{1 / 2} (G) \geq \frac{1}{2 μ} .

The following result is due to [28].

Theorem 18.73

spark (D) \geq 2 μ_{1 / 2} (G) + 1.

Proof. We proceed as follows.

Let $h \in N (D)$ .
Then

$D h = 0 ⟹ G h = D^{H} D h = 0 .$
Subtracting both sides with $h$ we get

$G h - h = (G - I) h = - h .$
Let $Λ = supp (h)$ .
By taking columns indexed by $Λ$ from $G - I$ and corresponding entries in $h$ , we can write:

$(G - I)_{Λ} h_{λ} = - h .$
Taking $ℓ_{\infty}$ norm on both sides we get

$‖ h ‖_{\infty} = ‖ (G - I)_{Λ} h_{λ} ‖_{\infty} .$
We know that

$‖ (G - I)_{Λ} h_{λ} ‖_{\infty} \leq ‖ (G - I)_{Λ} ‖_{\infty} ‖ h_{λ} ‖_{\infty}$
It is easy to see that:

$‖ h_{λ} ‖_{\infty} = ‖ h ‖_{\infty} .$
Thus

$‖ h ‖_{\infty} \leq ‖ (G - I)_{Λ} ‖_{\infty} ‖ h ‖_{\infty} .$
This gives us

$‖ (G - I)_{Λ} ‖_{\infty} \geq 1.$
But $‖ (G - I)_{Λ} ‖_{\infty}$ is nothing but the maximum sum of magnitudes of off diagonal entries in $G$ along a row in $G_{Λ}$ .
Consider any row in $(G - I)_{Λ}$ .
One of the entries in the row (on the main diagonal of $G - I$ ) is 0.
Thus, there are a maximum of $| Λ | - 1$ nonzero entries in the row.
$Λ$ is smallest when $| Λ | = spark (D)$ .
For such a $Λ$ , there exists a row in $G$ such that the sum of $spark (D) - 1$ off diagonal entries in the row exceeds 1.
Let $n$ denote the minimum number of off diagonal elements on a row or a column of $G$ such that the sum of their magnitudes exceeds one.
Clearly

$spark (D) - 1 \geq n .$
It is easy to see that

$n \geq 2 μ_{1 / 2} (G)$

i.e. minimum number of off diagonal elements summing up to 1 or more is at least twice the minimum number of off diagonal elements summing up to $\frac{1}{2}$ or more on any row (or column due to Hermitian property).
Thus

$spark (D) - 1 \geq 2 μ_{1 / 2} (G) .$
Rewriting, we get

$spark (D) \geq 2 μ_{1 / 2} (G) + 1.$

18.9.3. Babel function#

In this subsection, we provide a more general development of Babel function for a pair of dictionaries.

When we consider a single dictionary, we will use $D$ as the dictionary.
When considering a pair of dictionaries of equal size, we would typically label them as $Φ$ and $Ψ$ with both $Φ, Ψ \in C^{N \times D}$ .
We will assume that the dictionaries are full rank as they span the signal space $C^{N}$ .
Why a pair of dictionaries?
1. We consider $Φ$ as a modeling dictionary from which the sparse signals
  
  $x \approx Φ a$
  
  are built.
2. $Ψ$ on the other hand is the sensing dictionary which will be used to compute correlations with the signal $x$ and try to estimate the approximation $a$ .
3. Ideally, $Φ$ and $Ψ$ should be same.
4. But in real life, we may not know $Φ$ correctly.
5. Hence, $Ψ$ would be a dictionary slightly different from $Φ$ .

18.9.4. p-Babel Function#

See [43] for reference.

Definition 18.34 ( $p$ -Babel function over $Λ$ )

Consider an index set $Λ \subset {1, \dots, D}$ indexing a subset of atoms in $Φ$ and $Ψ$ . The $p$ -Babel function over $Λ$ is defined as

(18.60)#

μ_{p} (Φ, Ψ, Λ) ≜ sup_{l \notin Λ} {(\sum_{j \in Λ} | ⟨ ϕ_{j}, ψ_{l} ⟩ |^{p})}^{\frac{1}{p}} .

What is going on here?

Consider the row vector

$v^{l} = ψ_{l}^{H} Φ_{Λ} .$
This vector contains inner products of modeling atoms in $Φ$ indexed by $Λ$ with the sensing atom $ψ_{l}$ .
Now

$‖ v^{l} ‖_{p} = {(\sum_{i} | v_{i}^{l} |^{p})}^{\frac{1}{p}} = {(\sum_{j \in Λ} | ⟨ ϕ_{j}, ψ_{l} ⟩ |^{p})}^{\frac{1}{p}}$
This is the term in (18.60).
Thus

$μ_{p} (Φ, Ψ, Λ) = sup_{l \notin Λ} ‖ v^{l} ‖_{p} .$
$‖ v^{l} ‖_{p}$ is a measure of the correlation of the sensing atom $ψ_{l}$ with a group of modeling atoms in $Φ$ indexed by $Λ$ using the $p$ -norm.
$μ_{p} (Φ, Ψ, Λ)$ attempts to find out a sensing atom from $Ψ$ outside the index set $Λ$ which is most correlated to the group of modeling atoms in $Φ$ indexed by $Λ$ and returns the maximum correlation value.
Different choices of $p$ -norm lead to different correlation values.

We can also measure a correlation of sensing and modeling atoms inside the index set $Λ$ .

Definition 18.35 (Complementary $p$ -Babel function over $Λ$ )

A complement to the $p$ -Babel function measures the amount of correlation between atoms inside the support $Λ$ :

(18.61)#

μ_{p}^{in} (Φ, Ψ, Λ) ≜ sup_{i \in Λ} μ_{p} (Φ_{Λ}, Ψ_{Λ}, Λ ∖ {i}) .

$μ_{p} (Φ_{Λ}, Ψ_{Λ}, Λ ∖ {i})$ computes the correlation of $i$ -th sensing atom in $Ψ$ with the modeling atoms in $Φ$ indexed by $Λ ∖ {i}$ i.e. all modeling atoms in $Λ$ except the $i$ -th modeling atom.

Finally $μ_{p}^{in} (Φ, Ψ, Λ)$ finds the maximum correlation of any sensing atom inside $Λ$ with modeling atoms inside $Λ$ (leaving the corresponding modeling atom).

So far, we have focused our attention to a specific index set $Λ$ . We now consider all index sets with $| Λ | \leq K$ .

Definition 18.36 ( $p$ Babel function)

The Babel function for a pair of dictionaries $Φ$ and $Ψ$ as a function of the sparsity level $K$ is defined as

(18.62)#

μ_{p} (Φ, Ψ, K) ≜ sup_{| Λ | \leq K} μ_{p} (Φ, Ψ, Λ) .

Correspondingly, the complement of Babel function is defined as

(18.63)#

μ_{p}^{in} (Φ, Ψ, K) ≜ sup_{| Λ | \leq K} μ_{p}^{in} (Φ, Ψ, Λ) .

It is straightforward to see that

(18.64)#

μ_{p}^{in} (Φ, Ψ, K) \leq μ_{p} (Φ, Ψ, K - 1) .

Now consider the special case where $D = Φ = Ψ$ . In other words, the sensing and modeling dictionaries are same.

We obtain

(18.65)#

μ_{p} (D, Λ) = sup_{l \notin Λ} {(\sum_{j \in Λ} | ⟨ d_{j}, d_{l} ⟩ |^{p})}^{\frac{1}{p}} .

(18.66)#

μ_{p}^{in} (D, Λ) = sup_{i \in Λ} μ_{p} (D_{Λ}, Λ ∖ {i}) .

(18.67)#

μ_{p} (D, K) = sup_{| Λ | \leq K} μ_{p} (D, Λ) .

(18.68)#

μ_{p}^{in} (D, K) = sup_{| Λ | \leq K} μ_{p}^{in} (D, Λ) .

Further by choosing $p = 1$ , we get

(18.69)#

μ_{1} (D, Λ) = sup_{l \notin Λ} (\sum_{j \in Λ} | ⟨ d_{j}, d_{l} ⟩ |) .

(18.70)#

μ_{1}^{in} (D, Λ) = sup_{i \in Λ} μ_{1} (D_{Λ}, Λ ∖ {i}) .

(18.71)#

μ_{1} (D, K) = sup_{| Λ | \leq K} μ_{1} (D, Λ) .

(18.72)#

μ_{1}^{in} (D, K) = sup_{| Λ | \leq K} μ_{1}^{in} (D, Λ) .

Finally compare this definition of $μ_{1} (D, K)$ with the standard definition of Babel function as

(18.73)#

μ_{1} (K) = max_{| Λ | = K} max_{ψ} \sum_{Λ} | ⟨ ψ, d_{λ} ⟩ |,

where the vector $ψ$ ranges over the atoms indexed by $Ω ∖ Λ$ .

We also know that $μ_{1} (K)$ is an increasing function of $K$ . Thus, replacing $| Λ | = K$ with $| Λ | \leq K$ doesn’t make any difference to the value of $μ_{1} (K)$ .

Careful observation shows that the definitions of $μ_{1} (K)$ in (18.73) and $μ_{1} (D, K)$ in (18.71) are exactly the same.

18.9.5. Exact Recovery Coefficient#

we introduce a measure of similarity between a subdictionary and the remaining atoms from the dictionary known as the exact recovery coefficient.

Definition 18.37 (Exact recovery coefficient)

The Exact Recovery Coefficient [77, 78, 79] for a subdictionary $D_{Λ}$ is defined as

ERC (D_{Λ}) = 1 - max_{ω \notin Λ} ‖ D_{Λ}^{†} d_{ω} ‖_{1} .

We will also use the notation $ERC (Λ)$ when the dictionary is clear from context.

The quantity is called exact recovery coefficient since for a number of algorithms the criteria $ERC (Λ) > 0$ is a sufficient condition for exact recovery of sparse representations.

18.9.5.1. ERC and Babel Function#

We present a lower bound on $ERC (Λ)$ in terms of Babel function.

Theorem 18.74 (ERC lower bound: babel function)

Suppose that $| Λ | = k \leq K$ . A lower bound on Exact Recovery Coefficient is

ERC (Λ) \geq \frac{1 - μ_{1} (K - 1) - μ_{1} (K)}{1 - μ_{1} (K - 1)} .

It follows that $ERC (Λ) > 0$ whenever

μ_{1} (K - 1) + μ_{1} (K) < 1.

Proof. 1. Let us expand the pseudo-inverse $D_{Λ}^{†}$ .

\begin{aligned} max_{ω \notin Λ} ‖ D_{Λ}^{†} d_{ω} ‖_{1} & = max_{ω \notin Λ} {‖ {(D_{Λ}^{H} D_{Λ})}^{- 1} D_{Λ}^{H} d_{ω} ‖}_{1} \\ \leq ‖ {(D_{Λ}^{H} D_{Λ})}^{- 1} ‖_{1 \to 1} max_{ω \notin Λ} ‖ D_{Λ}^{H} d_{ω} ‖_{1} . \end{aligned}

For the Gram matrix $G = D_{Λ}^{H} D_{Λ}$ we recall from Theorem 18.37 that:

$‖ G^{- 1} ‖_{1} \leq \frac{1}{1 - μ_{1} (k - 1)} \leq \frac{1}{1 - μ_{1} (K - 1)} .$
For the other term we have

$max_{ω \notin Λ} ‖ D_{Λ}^{H} d_{ω} ‖_{1} = max_{ω \notin Λ} \sum_{λ \in Λ} | ⟨ d_{ω}, d_{λ} ⟩ | \leq μ_{1} (k) \leq μ_{1} (K) .$
Thus, we get

$max_{ω \notin Λ} ‖ D_{Λ}^{†} d_{ω} ‖_{1} \leq \frac{μ_{1} (K)}{1 - μ_{1} (K - 1)} .$
Putting back in the definition of Exact Recovery Coefficient:

$ERC (Λ) = 1 - max_{ω \notin Λ} ‖ D_{Λ}^{†} d_{ω} ‖_{1} \geq 1 - \frac{μ_{1} (K)}{1 - μ_{1} (K - 1)} .$
This completes the bound on ERC.
Now, we verify the condition for $ERC (Λ) > 0$ .

$\begin{aligned} μ_{1} (K) + μ_{1} (K - 1) < 1 & ⟺ μ_{1} (K) < 1 - μ_{1} (K - 1) \\ ⟺ \frac{μ_{1} (K)}{1 - μ_{1} (K - 1)} < 1 \\ ⟺ 1 - \frac{μ_{1} (K)}{1 - μ_{1} (K - 1)} > 0. \end{aligned}$
Thus, if $μ_{1} (K) + μ_{1} (K - 1) < 1$ , then the lower bound on $ERC (Λ)$ is positive leading to $ERC (Λ) > 0$ .

18.9.5.2. ERC and Coherence#

On the same lines we develop a coherence bound for ERC.

Theorem 18.75 (ERC lower bound: coherence)

Suppose that $| Λ | = k \leq K$ . A lower bound on Exact Recovery Coefficient is

ERC (Λ) \geq \frac{1 - (2 K - 1) μ}{1 - (K - 1) μ} .

It follows that $ERC (Λ) > 0$ whenever

K μ \leq \frac{1}{2} .

Proof. .

Following the proof of Theorem 18.74 for the Gram matrix $G = D_{Λ}^{H} D_{Λ}$ have:

$‖ G^{- 1} ‖_{1} \leq \frac{1}{1 - μ_{1} (K - 1)} \leq \frac{1}{1 - (K - 1) μ} .$
For the other term we have

$max_{ω \notin Λ} ‖ D_{Λ}^{H} d_{ω} ‖_{1} \leq μ_{1} (K) \leq K μ .$
Thus, we get

$max_{ω \notin Λ} ‖ D_{Λ}^{†} d_{ω} ‖_{1} \leq \frac{K μ}{1 - (K - 1) μ} .$
Putting back in the definition of Exact Recovery Coefficient:

$ERC (Λ) \geq 1 - \frac{K μ}{1 - (K - 1) μ} = \frac{1 - (2 K - 1) μ}{1 - (K - 1) μ} .$
This completes the bound on ERC.
Now, we verify the condition for $ERC (Λ) > 0$ .

$K μ \leq \frac{1}{2} ⟹ 2 K μ \leq 1 ⟹ 1 - 2 K μ \geq 0 ⟹ 1 - 2 K μ + μ > 0.$
And

$K μ \leq \frac{1}{2} ⟹ 1 - K μ \geq \frac{1}{2} ⟹ 1 - K μ + μ \geq \frac{1}{2} + μ .$
Thus $K μ \leq \frac{1}{2}$ ensures that both numerator and denominator for the coherence lower bound on $ERC (Λ)$ are positive leading to $ERC (Λ) > 0$ .

A more accurate bound on $K$ is presented in the next theorem.

Theorem 18.76

$ERC (Λ) > 0$ holds whenever

K < \frac{1}{2} (1 + \frac{1}{μ})

where $K = | Λ |$ .

Proof. .

Assuming $1 - (K - 1) μ > 0$ , we have

$\begin{aligned} \frac{1 - (2 K - 1) μ}{1 - (K - 1) μ} > 0 \\ ⟺ & 1 - (2 K - 1) μ > 0 \\ ⟺ & 1 > (2 K - 1) μ \\ ⟺ & 2 K - 1 < \frac{1}{μ} \\ ⟺ & K < \frac{1}{2} (1 + \frac{1}{μ}) . \end{aligned}$
From Theorem 18.75, we have

$ERC (Λ) \geq \frac{1 - (2 K - 1) μ}{1 - (K - 1) μ} .$
Thus under the given conditions, we have

$ERC (Λ) > 0.$
We also need to show that under these conditions

$1 - (K - 1) μ > 0.$
We can see that:

$\begin{aligned} 2 K - 1 < \frac{1}{μ} \\ ⟹ & 2 K - 2 < \frac{1}{μ} - 1 \\ ⟹ & 2 (K - 1) μ < 1 - μ \\ ⟹ & - (K - 1) μ > \frac{μ}{2} - \frac{1}{2} \\ ⟹ & 1 - (K - 1) μ > \frac{1}{2} + \frac{μ}{2} \\ ⟹ & 1 - (K - 1) μ > 0. \end{aligned}$

18.9.5.3. Geometrical Interpretation of ERC#

Definition 18.38 (Antipodal convex hull of a subdictionary)

The antipodal convex hull [78] of a subdictionary $D_{Λ}$ is defined as the set of signals given by

A_{1} (Λ) = {D_{Λ} x | x \in C^{Λ} and ‖ x ‖_{1} \leq 1} .

It is the smallest convex set that contains every unit multiple of every atom.

We recall that $P_{Λ} = D_{Λ} D_{Λ}^{†}$ is the orthogonal projector on to the column space of $D_{Λ}$ .
Therefore $c_{ω} = D_{Λ}^{†} d_{ω} \in C^{Λ}$ is a coefficient vector which can be used to synthesize this projection.
In other words:

$P_{Λ} d_{ω} = D_{Λ} D_{Λ}^{†} d_{ω} = D_{Λ} c_{ω} .$
Thus, the quantity $1 - ‖ D_{Λ}^{†} d_{ω} ‖_{1}$ measures how far the projected atom $P_{Λ} d_{ω}$ lies from the boundary of $A_{1} (Λ)$ .
If every projected atom lies well within the antipodal convex hull, then it is possible to recover superpositions of atoms from $Λ$ .
This happens because coefficient associated with an atom outside $Λ$ must be quite large to represent anything in the span of the subdictionary whenever $ERC (Λ) > 0$ .

18.9.6. Dirac DCT Dictionary#

Theorem 18.77

The $p$ -Babel function for Dirac-DCT dictionary is given by

μ_{p} (k) = k^{\frac{1}{p}} μ \forall 1 \leq k \leq N .

In particular, the standard Babel function is given by

μ_{1} (k) = k μ

Proof. TODO prove it.

Topics in Signal Processing

Dictionaries II

Contents

18.9. Dictionaries II#

18.9.1. Spark#

18.9.1.1. Upper Bounds for Spark#

18.9.2. Coherence#

18.9.3. Babel function#

18.9.4. p-Babel Function#

18.9.5. Exact Recovery Coefficient#

18.9.5.1. ERC and Babel Function#

18.9.5.2. ERC and Coherence#

18.9.5.3. Geometrical Interpretation of ERC#

18.9.6. Dirac DCT Dictionary#