Basis Pursuit
Contents
20.2. Basis Pursuit#
20.2.1. Introduction#
We recall following sparse approximation problems. Given a signal \(\bx \in \CC^N\) which is known to have a sparse representation in a dictionary \(\bDDD\), the exactsparse recovery problem is:
When \(\bx \in \CC^N\) doesn’t have a sparse representation in \(\bDDD\), a \(K\)sparse approximation of \(\bx\) in \(\bDDD\) can be obtained by solving the following problem:
Here \(\bx\) is modeled as \(\bx = \bDDD \ba + \be\) where \(\ba\) denotes a sparse representation of \(\bx\) and \(\be\) denotes the approximation error.
A different way to formulate the approximation problem is to provide an upper bound to the acceptable approximation error \(\\be\_2 \leq \epsilon\) and try to find sparsest possible representation within this approximation error bound as
Basis Pursuit (BP) [23] suggests the convex relaxation of (20.4) by replacing \(\ell_0\)``norm” with \(\ell_1\)norm.
For real signals, it can be implemented as a linear program. For complex signals, it can be implemented as a second order cone program.
In the presence of approximation error (20.6), where \(\bx = \bDDD \ba + \be\) with \(\ba\) being a \(K\)sparse approximate representation of \(\bx\) in \(\bDDD\) we can formulate corresponding \(\ell_1\)minimization problem as:
where \(\epsilon \geq \ \be \_2\) provides an upper bound on the approximation error. This version is known as basis pursuit with inequality constraints (BPIC).
The dual problem constructed using Lagrange multipliers is
This is known as basis pursuit denoising(BPDN). With appropriate choice of \(\lambda\), the two problems BPIC and BPDN are equivalent. This formulation attempts to minimize the \(\ell_1\)norm subject to a penalty term over the approximation error. The Lagrangian constant \(\lambda\) controls how large the penalty due to approximation error will be.
Note that the constraint \(\\bx  \bDDD \ba \_2 \leq \epsilon\) is equivalent to \(\\bx  \bDDD \ba \_2^2 \leq \epsilon^2\). We have used the squared version to construct the dual BPDN problem since the term \(\ x  \bDDD \ba \_2^2\) is easier to differentiate and work with.
Efficient solvers are available to solve BP, BPIC, BPDN problems using convex optimization techniques. They are usually polynomial time and involve sophisticated algorithms for implementation. The good part is a guarantee that a globally unique solution can be found (since the problem is convex). The not so good part is that convex optimization methods are still quite computationally intensive.
An alternative formulation of BPDN is as follows.
The difference in the two formulations is essentially with which term the Lagrangian constant (\(\lambda\) or \(\gamma\)) is placed. By choosing \(\lambda = 1/ (2 \gamma)\), the two formulations are essentially the same (with a scale factor in the objective function). This formulation attempts to minimize the approximation error subject to an \(\ell_1\)norm penalty. Thus, the two formulations differentiate w.r.t. which term is minimized and which term is considered as penalty.
Basis pursuit is not an algorithm but a principle which says that for most real life problems, the solution of \(\ell_0\)minimization problem is same as the solution of the corresponding \(\ell_1\)minimization problem. Actual algorithms for solving the basis pursuit formulation of sparse recovery problem come from convex optimization literature.
We start our discussion with the analysis of exactsparse case.
As part of our theoretical analysis, we would like to explore conditions under which the problems (20.4) and (20.7) are equivalent i.e. there exists a unique solution to both of them and the solution is identical. Under such conditions, the NPhard problem (20.4) can be easily replaced with a tractable (20.7) problem which is convex and solvable in polynomial time.
20.2.2. TwoOrthoCase#
Further simplifying, we consider the case where the dictionary \(\bDDD\) is a twoorthobasis
with \(\Psi\) and \(\Phi\) both being orthonormal bases for \(\CC^N\).
Clearly, \(\bDDD \in \CC^{N \times 2N}\) and \(D = 2N\).
We denote
\[ \Omega = \{ 1, 2, \dots, 2N \} \]as the index set for the representation vectors \(\ba\).
The representation \(\ba\) of a signal \(x\) in \(\bDDD\) can be written as
\[\begin{split} \bx = \bDDD \ba = \begin{bmatrix} \Psi & \Phi \end{bmatrix} \begin{bmatrix} \ba^p \\ \ba^q \end{bmatrix} = \Psi \ba^p + \Phi \ba^q. \end{split}\]We can assign
\[ k_p = \ \ba^{p} \_0 \quad \text{and} \quad k_q = \ \ba^{q} \_0. \]Total sparsity of \(\ba\) is given by
\[ K = \ \ba \_0 = k_p + k_q. \]Whenever \(K \ll N\), we have a sparse representation.
Further, let \(S_p \subseteq \{ 1 , \dots, N \}\) be the support corresponding to \(\ba^p\) part of \(\ba\) (i.e. \(S_p = \supp (\ba^p)\)) and \(S_q \subseteq \{ 1 , \dots, N \}\) be the support corresponding to \(\ba^q\) part of \(\ba\) (i.e. \(S_q = \supp (\ba^q))\).
Clearly, \(S_p = k_p\) and \(S_q  = k_q\).
Note that \(S_p\) and \(S_q\) need not be disjoint.
But, \(S_p\) and \(S_q + N\) are disjoint.
In fact, \(\supp(\ba) = S_p \cup (S_q + N)\).
\(\OneVec_p \in \CC^N\) will denote the indicator vector for \(S_p\); i.e., \(\OneVec_p(i) = 0 \Forall i \notin S_p\) and \(\OneVec_p(i) = 1 \Forall i \in S_p\).
Similarly, \(\OneVec_q \in \CC^N\) will denote the indicator vector for \(S_q\).
\(\OneVec \in \CC^N\) will denote the vector \(\{ 1, \dots, 1 \}\).
Also, \(\OneMat \in \CC^{N \times N}\) will denote a square matrix of all ones.
Note that \(\OneMat = \OneVec \cdot \OneVec^T\).
We now state our main result for equivalence of solutions of (20.4) and (20.7) for the two orthocase. Going forward, we will simply use \(\mu\) to refer to the coherence of \(\bDDD\) (i.e. \(\mu (\bDDD)\)).
Let \(\bDDD\) be a twoorthobasis dictionary \(\bDDD = \begin{bmatrix} \Psi & \Phi \end{bmatrix}\). Let \(\bx = \bDDD \ba\), where \(\bx\) is known. If a \(K\)sparse representation \(\ba\) exists with \(k_p \geq k_q\) such that (\(k_p, k_q\)) obey
then \(\ba\) is the unique solution of both problems (20.4) and (20.7).
A weaker condition is: If
then \(\ba\) is a unique (\(K\)sparse) solution to both (20.4) and (20.7).
Proof. We first show that \(\ba\) is a unique solution of (20.4).
Towards the end of this proof, we show that (20.11) \(\implies\) (20.12).
Due to (20.12),
\[ \ \ba \_0 = K = k_p + k_q < \frac{\sqrt{2}  0.5}{\mu} = \frac{0.414}{\mu} < \frac{1}{\mu}. \]Thus, if \(\ba\) satisfies (20.11), then it is necessarily the sparsest possible representation of \(\bx\) in \(\bDDD\) due to Theorem 18.8.
All other representations are denser (i.e. have more nonzero entries).
Hence \(\ba\) is a unique solution of (20.4).
We next show that \(\ba\) is also a unique solution of (20.7).
\(\ba\) is a feasible vector to (20.7) since \(\bx = \bDDD \ba\) though it need not be an optimal solution.
We have to find criteria under which \(\ba\) is optimal and no other feasible vector \(\bb\) is optimal.
Since \(\ba\) is the unique solution to (20.4), hence \(\ \bb \_0 > \ \ba \_0\) for every other feasible \(\bb\) for (20.7).
We consider the set of alternative feasible vectors to (20.7) given by
\[ C = \left \{ \bb \ST \bb \neq \ba, \ \bb \_1 \leq \ \ba \_1, \ \bb \_0 > \ \ba \_0 \text{ and } \bDDD (\ba  \bb) = \bzero \right \}. \]This set contains all feasible vectors to (20.7) which are
different from \(\ba\)
have larger support (larger \(\ell_0\)“norm”)
satisfy the linear system of equations \(\bx = \bDDD \ba\)
have \(\ell_1\) norm less than or equal to \(\ba\).
If this set is nonempty, then there exists a solution to basis pursuit which is not same as \(\ba\).
If this set is empty, then the solutions of (20.4) and (20.7) coincide and are both unique.
Writing \(\be = \bb  \ba \iff \bb = \be + \ba\), we have
\[ \ \bb \_1 \leq \ \ba \_1 \iff \ \be + \ba \_1  \ \ba \_1 \leq 0. \]Thus, we can rewrite \(C\) as
\[ C_s = \{ \be \ST \be \neq \bzero, \ \be + \ba \_1  \ \ba \_1 \leq 0 \text{ and } \bDDD \be = \bzero\}. \]In order to show that \(C\) is empty, we will show that a larger set containing \(C_s\) is also empty.
Essentially, we wish to consider a larger set whose emptiness can be checked easily against (20.11).
If that larger set is empty due to (20.11), then \(C\) would also be empty and we would have completed the proof.
Emptiness of \(C_s\)
We start by the requirement \(\ \be + \ba \_1  \ \ba \_1 \leq 0\).
Let \(\ba = \begin{bmatrix} {\ba^p} \\ {\ba^q} \end{bmatrix}\) and \(\be = \begin{bmatrix} {\be^p} \\ {\be^q} \end{bmatrix}\) , where \(p\) and \(q\) refer to parts corresponding to the orthonormal bases \(\Psi\) and \(\Phi\) respectively (as described at the beginning of this section).
Note that even if \(\ba^p\) and \(\ba^q\) are sparse, \(\be^p\) and \(\be^q\) need not be.
In fact, support of \(\be^p\) and \(\be^q\) could be very different from \(S_p\) and \(S_q\).
We can now write
\[\begin{split} 0 \geq \ \be + \ba \_1  \ \ba \_1 &= \left ( \sum_{i=1}^N e^p_i + a^p_i    a^p_i  \right ) + \left ( \sum_{i=1}^N e^q_i + a^q_i    a^q_i  \right ) \\ &= \sum_{i \notin S_p} e^p_i  + \sum_{i \notin S_q} e^q_i  \\ &+ \left ( \sum_{i \in S_p} e^p_i + a^p_i    a^p_i  \right ) + \left ( \sum_{i \in S_q} e^q_i + a^q_i    a^q_i  \right ). \end{split}\]We are splitting the sum as follows.
We first split the sum into \(\be^p, \ba^p\) and \(\be^q, \ba^q\) parts.
We split the sum on the \(p\) part to sum over indices in \(S_p\) and indices not in \(S_p\).
We split the sum on the \(q\) part to sum over indices in \(S_q\) and indices not in \(S_q\).
For \(i \notin S_p\), \(a^p_i = 0\) leading to \(e^p_i + \ba^p_i    \ba^p_i  = e^p_i \).
Ditto for \(i \notin S_q\).
We recall from triangle inequality on complex numbers that
\[ x + y  \geq y  x \Forall x, y \in \CC \]which implies \(x + y  y \geq  x\).
Thus,
\[ e^p_i + a^p_i    a^p_i  \geq   e^p_i  \Forall i \in S_p \]and
\[ e^q_i + a^q_i    a^q_i  \geq   e^q_i  \Forall i \in S_q. \]With this, the above condition can be relaxed as
\[ 0 \geq \ \be + \ba \_1  \ \ba \_1 \geq \sum_{i \notin S_p} e^p_i  + \sum_{i \notin S_q} e^q_i  \sum_{i \in S_p} e^p_i   \sum_{i \in S_q} e^q_i. \]Every \(\be\) satisfying this inequality will also satisfy the condition \(\ \be + \ba \_1  \ \ba \_1 \leq 0\).
To simplify notation we can write
\[ \sum_{i \in S_p} e^p_i  = \OneVec_p^T  \be^p  \text{ and } \sum_{i \in S_q} e^q_i = \OneVec_q^T  \be^q . \]Then we have
\[ \\be^p \_1 = \sum_{i \in S_p} e^p_i  + \sum_{i \notin S_p} e^p_i  \iff \sum_{i \notin S_p} e^p_i  = \\be^p \_1  \sum_{i \in S_p} e^p_i  = \\be^p \_1  \OneVec_p^T  \be^p . \]Similarly,
\[ \sum_{i \notin S_q} e^q_i  = \\be^q \_1  \OneVec_q^T  \be^q . \]Thus,
\[ \sum_{i \notin S_p} e^p_i  + \sum_{i \notin S_q} e^q_i  \sum_{i \in S_p} e^p_i   \sum_{i \in S_q} e^q_i = \\be^p \_1  2 \OneVec_p^T  \be^p  + \\be^q \_1  2 \OneVec_q^T  \be^q . \]We can now define the set
\[ C_s^1 = \{ \be \ST \be \neq \bzero, \\be^p \_1 + \\be^q \_1  2 \OneVec_p^T  \be^p   2 \OneVec_q^T  \be^q  \leq 0 \text{ and } \bDDD \be = \bzero\}. \]Clearly, \(C_s \subseteq C_s^1\) and if \(C_s^1\) is empty, then \(C_s\) will also be empty.
Note that this formulation of \(C_s^1\) is dependent only on the support of \(\ba\) and not on values in \(\ba\).
We now turn back to the requirement \(\bDDD \be = \bzero\) and relax it further.
We note that,
\[\begin{split} \bDDD \be = \begin{bmatrix} \Psi & \Phi \end{bmatrix} \begin{bmatrix} \be^p \\ \be^q \end{bmatrix} = \Psi \be^p + \Phi \be^q = \bzero. \end{split}\]Multiplying by \(\Psi^H\) we get
\[ \be^p + \Psi^H \Phi \be^q = \bzero \iff \be^p =  \Psi^H \Phi \be^q \]since \(\Psi^H \Psi = \bI\) (unitary matrix).
Similarly multiplying with \(\Phi^H\), we obtain
\[ \Phi^H \Psi \be^p + \be^q = \bzero \iff \be^q =  \Phi^H \Psi \be^p. \]Note that entries in \(\Psi^H \Phi\) and \(\Phi^H \Psi\) are inner products between columns of \(\bDDD\), hence their magnitudes are upper bounded by \(\mu\) (coherence).
Denote \(\bB = \Psi^H \Phi\) and consider the product \(\bv = \Psi^H \Phi \be^q = \bB \be^q\).
Then
\[ v_i = \sum_{j = 1}^N B_{i j} e^q_j. \]Thus,
\[  v_i  = \left  \sum_{j = 1}^N B_{i j} e^q_j \right  \leq \sum_{j = 1}^N  B_{i j} e^q_j  \leq \mu \sum_{j = 1}^N  e^q_j  = \mu\OneVec^T  \be^q . \]Applying this result on \(\be^p\) we get,
\[  \be^p  =  \Psi^H \Phi \be^q  \preceq \mu \OneMat  \be^q . \]Similarly,
\[  \be^q  =  \Phi^H \Psi \be^p  \preceq \mu \OneMat  \be^p . \]Note that since \(\OneMat = \OneVec \cdot \OneVec^T\), it is a rank1 matrix.
We now construct a set \(C_s^2\) as
\[\begin{split} C_s^2 = \left \{ \be \left  \begin{aligned} \be \neq \bzero\\ \\be^p \_1 + \\be^q \_1  2 \OneVec_p^T  \be^p   2 \OneVec_q^T  \be^q  \leq 0\\  \be^p  \preceq \mu \OneMat  \be^q \\ \text { and } \be^q  \preceq \mu \OneMat  \be^p  \end{aligned} \right. \right \}. \end{split}\]Clearly, \(C_s^1 \subseteq C_s^2\) since for every \(\be \in C_s^1\), \(\bDDD \be = \bzero \implies \be \in C_s^2\).
We now define \(\bf^p =  \be^p  \) and \(\bf^q =  \be^q \) as the absolute value vectors.
Correspondingly, let us define
\[\begin{split} \bf =  \be  = \begin{bmatrix} \bf^p \\ \bf^q \end{bmatrix}. \end{split}\]Clearly, \( \ \be^p \_1 = \OneVec^T \bf^p\) and \( \ \be^q \_1 = \OneVec^T \bf^q\).
Further \(\bf^p \succeq \ZeroVec\); i.e., every entry in \(\bf^p\) is nonnegative.
Similarly, \(\bf^q \succeq \ZeroVec\).
We can then introduce a set \(C_f\) as
\[\begin{split} C_f = \left \{ \bf \left  \begin{aligned} \bf \neq \bzero\\ \OneVec^T \bf^p + \OneVec^T \bf^q  2 \OneVec_p^T \bf^p  2 \OneVec_q^T \bf^q \leq 0\\ \bf^p \preceq \mu \OneMat \bf^q\\ \bf^q \preceq \mu \OneMat \bf^p\\ \text { and } \bf^p \succeq \ZeroVec, \bf^q \succeq \ZeroVec \end{aligned} \right. \right \}. \end{split}\]It is easy to see that if \(\be \in C_s^2\) then \(\bf = \be \in C_f\).
Thus, if \(C_f\) is empty, then \(C_s^2\) should be empty too.
We note that if \(\bf \in C_f\), then for all \(c > 0\), \(c \bf \in C_f\).
Thus, in order to study (the emptiness of) \(C_f\), it is sufficient to study unit \(\ell_1\)norm vectors \(\bf \in C_f\); i.e., there is no unit \(\ell_1\)norm vector in \(C_f\), if and only if \(C_f\) is empty.
Now
\[ \ \bf \_1 = \OneVec^T \bf = \OneVec^T \bf^p + \OneVec^T \bf^q \]since \(\bf \succeq \ZeroVec\).
This leads to:
\[ \ \bf \_1 = 1 \iff \OneVec^T \bf^p + \OneVec^T \bf^q = 1. \]We construct the new set of unit \(\ell_1\)norm vectors
\[\begin{split} C_r = \left \{ f \left  \begin{aligned} \bf \neq \bzero\\ 1  2 \OneVec_p^T \bf^p  2 \OneVec_q^T \bf^q \leq 0\\ \bf^p \preceq \mu \OneMat \bf^q\\ \bf^q \preceq \mu \OneMat \bf^p\\ \OneVec^T \bf^p + \OneVec^T \bf^q = 1 \\ \text { and } \bf^p \succeq \ZeroVec, \bf^q \succeq \ZeroVec \end{aligned} \right. \right \}. \end{split}\]We have \(C_r = \EmptySet \iff C_f = \EmptySet\).
Note that the constraint \(1  2 \OneVec_p^T \bf^p  2 \OneVec_q^T \bf^q \leq 0\) can be rewritten as
\[ \OneVec_p^T \bf^p + \OneVec_q^T \bf^q \geq \frac{1}{2}. \]The set \(C_r\) is much easier to analyze since
If has no explicit dependency on \(\bDDD\). \(\bDDD\) is represented only by a single parameter, its coherence \(\mu\).
All constraints are simple linear constraints. Thus finding the elements of \(C_f\) can be formulated as a linear programming (feasibility) problem.
The order of nonzero entries inside \(\bf^p\) and \(\bf^q\) doesn’t have any influence on the requirements for \(\bf\) to belong to \(C_r\). Thus, without loss of generality, we can focus on vectors for which the first \(k_p\) entries are nonzero in \(\bf^p\) and first \(k_q\) entries are nonzero in \(\bf^q\) respectively.
We next verify that \(C_r\) must be empty under (20.11).
Emptiness of \(C_r\)
In order to find vectors in \(C_r\), we can solve the following linear program.
(20.13)#\[\begin{split} & \underset{\bf^p, \bf^q}{\text{maximize }} & & \OneVec_p^T \bf^p + \OneVec_q^T \bf^q \\ & \text{subject to } & & \bf^p \preceq \mu \OneMat \bf^q\\ & & & \bf^q \preceq \mu \OneMat \bf^p\\ & & & \OneVec^T (\bf^p + \bf^q) = 1\\ & & & \bf^p \succeq \ZeroVec, \bf^q \succeq \ZeroVec.\end{split}\]\(\bf = \bzero\) is a feasible vector for this linear program, hence a solution does exist for this program.
What is interesting is the value of the objective function for the optimal solution.
Let \({\bf^p}^*, {\bf^q}^*\) be (an) optimal solution for this linear program.
If \(\OneVec_p^T {\bf^p}^* + \OneVec_q^T {\bf^q}^* \geq \frac{1}{2}\), then \(\bf^*\) satisfies all the requirements of \(C_r\) and \(C_r\) is indeed not empty.
This doesn’t guarantee that \(C\) will also be nonempty though.
On the contrary, if \(\OneVec_p^T {\bf^p}^* + \OneVec_q^T {\bf^q}^* < \frac{1}{2}\), then \(C_r\) is indeed empty (as one of the requirements cannot be met).
Hence \(C_f\) is also empty leading to \(C \subset C_f\) being empty too.
Thus, a condition which leads to
\[ \OneVec_p^T {\bf^p}^* + \OneVec_q^T {\bf^q}^* < \frac{1}{2} \]is a sufficient condition for equivalence of (20.4) and (20.7).
Consider a feasible \(\bf\) for (20.13).
Let \(\ \bf^p \_1 = \OneVec^T \bf^p = c \).
Since \(\OneVec^T (\bf^p + \bf^q) = 1\), hence \(\ \bf^q \_1 = \OneVec^T \bf^q = 1  c\).
We note that
\[ \OneMat \bf^p = \OneVec \cdot \OneVec^T \bf^p = \ \bf^p \_1 \OneVec = c \OneVec. \]Similarly,
\[ \OneMat \bf^q = (1  c ) \OneVec. \]Thus, the first two constraints change into
\[\begin{split} \bf^p \preceq ( 1  c) \mu \OneVec \\ \bf^q \preceq c \mu \OneVec. \end{split}\]Since the objective is to maximize \(\OneVec_p^T \bf^p + \OneVec_q^T \bf^q\), it is natural to maximize nonzero entries in \(\bf^p\) and \(\bf^q\) corresponding to \(S_p\) and \(S_q\).
A straightforward option is to choose first \(k_p\) entries in \(\bf^p\) to be \((1  c) \mu\) and first \(k_q\) entries in \(\bf^q\) to be \(c \mu\).
Other entries can be chosen arbitrarily to meet the requirement that \(\OneVec^T (\bf^p + \bf^q) = 1\).
With this choice, we have
\[ \OneVec_p^T \bf^p + \OneVec_q^T \bf^q = k_p (1  c ) \mu + k_q c \mu = \mu (k_p  c (k_p  k_q)). \]We recall that we have chosen \(k_p \geq k_q\).
Thus, the expression is maximized if \(c\) is chosen to be as small as possible.
The choice of \(c\) must meet following conditions on \(\ell_1\)norms. (Basically the sum of first \(k_p\) terms of \(\bf_p\) must not be more than the \(\ell_1\) norm of \(\bf^p\). Same for \(\bf^q\)).
\[\begin{split} \ \bf^p \_1 = \OneVec^T \bf^p = c \geq k_p (1  c ) \mu \\ \ \bf^q \_1 = \OneVec^T \bf^q = 1  c \geq k_q c \mu. \end{split}\]Simplifying these inequalities we get
\[\begin{split} c \geq k_p (1  c ) \mu \implies c \geq \frac{k_p \mu}{ 1 + k_p \mu}\\ 1  c \geq k_q c \mu \implies c \leq \frac{1}{ 1 + k_q \mu}. \end{split}\]Since these two conditions must be satisfied, hence we require \(k_p, k_q\) to meet
\[ \frac{k_p \mu}{ 1 + k_p \mu} \leq \frac{1}{ 1 + k_q \mu} \implies k_p k_q \leq \frac{1}{\mu^2}. \]We will verify later that this condition is met if (20.11) holds.
Assuming the condition is met, obviously the smallest possible value of \(c\) is given by \(\frac{k_p \mu}{ 1 + k_p \mu}\).
The maximum value of objective function then becomes
\[\begin{split} \OneVec_p^T \bf^p + \OneVec_q^T \bf^q &= \mu (k_p  c (k_p  k_q)) \\ &= \mu \left (k_p  \frac{k_p \mu}{ 1 + k_p \mu} (k_p  k_q)\right )\\ &= \frac{k_p \mu + k_p k_q \mu^2}{ 1 + k_p \mu}. \end{split}\]Finally, for BP to succeed, we require this expression to be strictly less than half.
This gives us
\[ \frac{k_p \mu + k_p k_q \mu^2}{ 1 + k_p \mu} < \frac{1}{2} \implies 2 k_p k_q \mu^2 + k_p \mu  1 < 0 \]which is the sufficient condition for BP to succeed in the theorem.
We now show that (20.11) \(\implies\) the weaker condition (20.12).
From (20.11) we can write \(k_q\) as
\[ 2 k_p k_q \mu^2 + k_p \mu  1 < 0 \implies 2 k_p k_q \mu^2 < 1  k_p \mu \implies k_q < \frac{1  k_p \mu}{2 k_p \mu^2}. \]Thus,
\[\begin{split} \ \ba \_0 = k_p + k_q &< k_p + \frac{1  k_p \mu}{2 k_p \mu^2} \\ &= \frac{2 \mu^2 k_p^2 + 1  \mu k_p}{2 \mu^2 k_p}\\ &= \frac{1}{\mu} \cdot \frac{2 \mu^2 k_p^2 + 1  \mu k_p}{2 \mu k_p}. \end{split}\]We define \(u = \mu k_p\) and rewrite above as
\[ \ \ba \_0 < \frac{1}{\mu} \frac{2 u^2  u + 1}{2 u}. \]The weaker condition can now be obtained by minimizing the upper bound on R.H.S. of this equation.
We define
\[ f(u) = \frac{2 u^2  u + 1}{2 u}. \]Differentiating and equating with 0, we get
\[ f'(u) = \frac{2 u^2  1}{ 2 u^2} = 0. \]The optimal value is obtained when \(u = \pm \sqrt{0.5}\).
Since both \(\mu\) and \(k_p\) are positive quantities, hence the negative value for \(u\) is rejected and we get \(u = \sqrt{0.5}\).
This gives us
\[ \ \ba \_0 < \frac{1}{\mu} \frac{2  \sqrt{0.5}}{2 \sqrt{0.5}} = \frac{\sqrt{2} 0.5}{\mu}. \]Lastly, the property that arithmetic mean is greater than or equal to geometric mean gives us
\[ k_p k_q \leq \frac{(k_p + k_q)^2}{4} < \frac{(\sqrt{2} 0.5)^2}{4\mu^2} < \frac{1}{\mu^2}. \]
20.2.3. General Case#
We now consider the case where \(\bDDD \in \CC^{N \times D}\) is an arbitrary (redundant) dictionary. We will require that \(\bDDD\) is full row rank. If \(\bDDD\) is not a full row rank matrix then some of its columns (atoms) can be removed to make it so.
We develop sufficient conditions under which solutions of (20.4) and (20.7) match for the general case [28, 30].
Let \(\bDDD\) be an arbitrary full rank redundant dictionary. Let \(\bx = \bDDD \ba\), where \(\bx\) is known. If a sparse representation \(\ba\) exists obeying
then \(\ba\) is the unique solution of both (20.4) and (20.7).
Proof. Due to Theorem 18.27, \(\ba\) is a unique solution for (20.4) since \(\ba\) satisfies (20.14). We need to show that it is also a unique solution to (20.7)
For any other feasible \(\bb\) for (20.7), we have \(\ \bb \_0 > \ \ba \_0\) since it is unique sparsest solution of \(\bx = \bDDD \ba\).
We start with defining a set of alternative feasible vectors to (20.7):
\[\begin{split} C = \left \{ \bb \left  \begin{aligned} \bb \neq \ba \\ \ \bb \_1 \leq \ \ba \_1\\ \ \bb \_0 > \ \ba \_0\\ \text{ and } \bDDD (\bb  \ba) = \bzero \end{aligned} \right. \right \}. \end{split}\]This set contains all possible representations that
are different from \(\ba\)
have larger support
satisfy \(\bDDD \bb = \bx\)
have a better (or at least as good) \(\ell_1\)norm.
We need to show that if (20.14) holds, then the set \(C\) will be empty.
Otherwise, BP would choose a solution different than \(\ba\).
The condition \(\ \bb \_0 > \ \ba \_0\) is redundant under the assumption (20.14).
Following the proof of Theorem 20.6, we define
\[ \be = \bb  \ba. \]We can then rewrite \(C\) as
\[ C_s = \{\be \ST \be \neq \bzero, \ \be + \ba \_1  \ \ba \_1 \leq 0, \text{ and } \bDDD \be = \bzero \}. \]Again, we will enlarge the set \(C_s\) and show that even the larger set is empty when (20.14) holds.
We start with the requirement \(\ \be + \ba \_1  \ \ba \_1 \leq 0\).
A simple permutation of columns of \(\bDDD\) can bring the nonzero entries in \(\ba\) to the beginning.
Thus, without loss of generality, we assume that first \(K\) entries in \(\ba\) are nonzero and the rest are zero.
We can now rewrite the requirement as
\[ \ \be + \ba \_1  \ \ba \_1 = \sum_{j=1}^K \left ( e_j + a_j    a_j  \right ) + \sum_{j > K}  e_j  \leq 0. \]Using the inequality \( x + y    y  \geq   x \), we can relax above condition as
\[  \sum_{j = 1}^K  e_j  + \sum_{j > K}  e_j  \leq 0. \]Let \(\OneVec_K\) denote a vector with \(K\) ones at the beginning and rest zeros.
Then,
\[ \sum_{j = 1}^K  e_j  = \OneVec_K^T  \be . \]Further,
\[ \sum_{j > K}  e_j  = \ \be \_1  \sum_{j = 1}^K  e_j  = \OneVec^T  \be   \OneVec_K^T  \be . \]Thus, we can rewrite above inequality as
\[ \OneVec^T  \be   2 \OneVec_K^T  \be  \leq 0. \]We can now define
\[ C_s^1 = \{\be \ST \be \neq \bzero, \OneVec^T  \be   2 \OneVec_K^T  \be  \leq 0, \text{ and } \bDDD \be = \bzero \}. \]Clearly \(C_s \subseteq C_s^1\).
We will now relax the requirement of \(\bDDD \be = \bzero\).
Multiplying by \(\bDDD^H\), we get
\[ \bDDD^H \bDDD \be = \bzero. \]If \(\be \in C_s^1\), it will also satisfy this equation.
Moreover, if \(\be\) satisfies this, then \(\be\) belongs to the null space of \(\bDDD^H \bDDD\).
Since \(\bDDD\) is full rank, hence \(\be\) has to be in the null space of \(\bDDD\) also.
Thus the two conditions \(\bDDD \be = \bzero\) and \(\bDDD^H \bDDD e = \bzero\) are equivalent.
We note that offdiagonal entries in \(\bDDD^H \bDDD\) are bounded by \(\mu\) while the main diagonal consists of all ones.
So, we can write
\[\begin{split} & \bDDD^H \bDDD \be = \bzero \\ \iff & (\bDDD^H \bDDD  \bI + \bI ) \be = \bzero \\ \iff & \be = (\bDDD^H \bDDD  \bI ) \be. \end{split}\]Suppose \(\bv = \bG \bu\). Then \(v_i = \sum_{j} G_{i j} u_j\).
Thus
\[  v_i  =  \sum_{j} G_{i j} u_j  \leq \sum_{j}  G_{i j} u_j  = \sum_{j}  G_{i j}   u_j . \]This gives us \( \bv  \preceq  \bG   \bv \) where \(\preceq\) indicates component wise inequality.
Taking an entrywise absolute value on both sides, we get
\[  \be  = (\bDDD^H \bDDD  \bI ) \be  \preceq \bDDD^H \bDDD  \bI  \be  \preceq \mu (\OneMat  \bI)  \be . \]The last part is due to the fact that all entries in the vector \( \be  \) and the matrix \(  \bDDD^H \bDDD  \bI \) are nonnegative and the entries in \( \bDDD^H \bDDD  \bI\) are dominated by \(\mu \).
Further,
\[\begin{split} &  \be  \preceq \mu (\OneMat  \bI)  \be  \\ \iff & (1 + \mu)  \be  \preceq \mu \OneMat  \be  = \mu \ \be \_1 \OneVec\\ \iff &  \be  \preceq \frac{\mu \ \be \_1}{1 + \mu} \OneVec. \end{split}\]In the above we used the fact that \(\OneMat  \be  = \OneVec \OneVec^T  \be  = \OneVec \ \be \_1\).
We can now define a new set
\[\begin{split} C_s^2 = \left \{ \be \left  \begin{aligned} & \be \neq \bzero, \\ &\OneVec^T  \be   2 \OneVec_K^T  \be  \leq 0 \\ & \text{ and }  \be  \preceq \frac{\mu \ \be \_1}{1 + \mu} \OneVec \end{aligned} \right. \right \}. \end{split}\]Clearly, \(C_s^1 \subseteq C_s^2\).
We note that \(C_s^2\) is unbounded since if \(\be \in C_s^2\), then \(c \be \in C_s^2 \Forall c \neq 0\).
Thus, in order to study its behavior, it is sufficient to consider the set of vectors with unit norm vectors \(\ \be \_1 = 1\).
We construct the new set as
\[ C_r = \left \{\be \left  \ \be \_1 = 1, 1  2 \OneVec_K^T  \be  \leq 0 \text { and }  \be  \preceq \frac{\mu }{1 + \mu} \OneVec \right. \right \}. \]Note that we replaced \(\OneVec^T  \be  = \ \be \_1 = 1\) in formulating the description of \(C_r\) and the condition \(\be \neq \bzero\) is automatically enforced since \(\ \be \_1 = 1\).
Clearly \(C_s^2 = \EmptySet \iff C_r = \EmptySet\).
In order to satisfy the requirement \(1  2 \OneVec_K^T  \be  \leq 0\), we need to have \(\OneVec_K^T  \be \) as large as possible.
Since this quantity only considers first \(K\) entries in \(\be\), hence the energy in \(\be\) should be concentrated inside the first \(K\) entries to maximize this quantity.
However, entries in \(\be\) are restricted by the third requirement in \(C_r\).
We can maximize it by choosing
\[  e_j  = \frac{\mu }{1 + \mu} \]for first \(K\) entries in \(e\).
We then get
\[ 1  2 \OneVec_K^T  \be  = 1  2 K \frac{\mu }{1 + \mu} \leq 0. \]This gives us
\[\begin{split} & 1  2 K \frac{\mu }{1 + \mu} \leq 0 \\ \iff & 1 + \mu \leq 2 K \mu \\ \iff & 2K \geq \frac{1 + \mu}{\mu}\\ \iff & K \geq \frac{1}{2} \left ( 1 + \frac{1}{\mu} \right). \end{split}\]This is a necessary condition for \(C_r\) to be nonempty.
Thus, if
\[ K < \frac{1}{2} \left ( 1 + \frac{1}{\mu} \right) \]then, the requirement \(1  2 \OneVec_K^T  \be  \leq 0\) is not satisfied and \(C_r\) is empty.
Consequently, \(C\) is empty and the theorem is proved.
We present another result which is based on \(\mu_{1/2}(\bG)\) Definition 18.33 measure of the Gram matrix of the dictionary. This result is due to [28].
Let \(\bx = \bDDD \ba\) and \(\ \ba \_0 < \mu_{1/2}(\bG)\), then then \(\ba\) is the unique solution of both (20.4) and (20.7).
Proof. Unique solution of (20.4).
Let \(\Lambda = \supp(\ba)\) and \(K = \Lambda\).
We have \(K = \ \ba \_0 < \mu_{1/2}(\bG)\).

\[ \spark(\bDDD) \geq 2 \mu_{1/2}(G) +1 \implies \spark(\bDDD) > 2 K + 1 \implies K < \frac{1}{2}\spark(\bDDD). \]
By Theorem 18.23, \(\ba\) is the unique sparsest solution.
Unique solution of (20.7)
We need to show that for any \(\ba'\) satisfying \(\bx = \bDDD \ba'\), we must have \(\ \ba' \_1 > \ \ba \_1\).
We will show that any vector in the null space of \(\bDDD\) exhibits less than \(50\)% concentration on \(\Lambda\); i.e., for every \(\bh \in \NullSpace(\bDDD)\)
\[ \sum_{k \in \Lambda}  h_k  < \frac{1}{2} \ \bh \_1. \]Now
\[ \bDDD \bh = \bzero \implies \bG \bh = \bDDD^H \bDDD \bh = \bzero. \]Subtracting both sides with \(\bh\) we get
\[ \bG \bh  \bh = (\bG  \bI) \bh = \bh. \]Let \(\bF\) denote an \(K \times D\) matrix formed from the rows of \(\bG  \bI\) corresponding to the indices in \(\Lambda\).
Then
\[ (\bG  \bI) \bh = \bh \implies \ \bF \bh \_1 = \sum_{k \in \Lambda}  h_k . \]\(h_k\) for some \(k \in \Lambda\) is the negative of the inner product of some row in \(\bF\) with \(\bh\).
We know that
\[ \ \bF \bh \_1 \leq \ \bF \_1 \ \bh \_1 \]where \(\ \bF \_1\) is the maxcolumnsum norm of \(\bF\).
This gives us
\[ \ \bF \_1 \ \bh \_1 \geq \sum_{k \in \Lambda}  h_k . \]In any column of \(\bF\) the number of entries is \(K\).
One of them is 0 (corresponding to the diagonal entry in \(\bG\)).
Thus, leaving it the rest of the entries are \(K 1\).
By assumption \(\mu_{1/2}(G) > K\).
Thus any set of entries in a column which is less than or equal to \(K\) entries cannot have a sum exceeding \(\frac{1}{2}\).
This gives an upper bound on the maxcolumnsum of \(\bF\); i.e.,
\[ \ \bF \_1 < \frac{1}{2}. \]Thus, we get
\[ \sum_{k \in \Lambda}  h_k  \leq \ \bF \_1 \ \bh \_1 < \frac{1}{2} \ \bh \_1 \]for every \(\bh \in \NullSpace(\bDDD)\).
The rest follows from the fact that for any other \(\ba'\) such that \(\bx = \bDDD \ba' = \bDDD \ba\), we know that
\[ \ \ba' \_1 > \ \ba \_1 \]whenever
\[ \sum_{k \in \Lambda}  h_k  < \frac{1}{2} \ \bh \_1 \]where \(\bh = \ba  \ba'\) (thus \(\bDDD \bh = \bzero\)).
20.2.4. BPIC#
In the subsection, we present a stability guarantee result for BPIC.
Consider an instance of the (20.8) problem defined by the triplet \((\bDDD, \bx, \epsilon)\). Suppose that a vector \(\ba \in \CC^D\) is a feasible solution to (20.8) satisfying the sparsity constraint
The solution \(\widehat{\ba}\) of (20.8) must satisfy
Proof. As before, we define \(\bb = \widehat{\ba}  \ba\).
Then
\[ \ \bDDD \bb \_2 = \\bDDD (\widehat{\ba}  \ba) \_2 = \\bDDD \widehat{\ba}  \bx + \bx \bDDD \ba \_2 \leq 2 \epsilon. \]We now rewrite the inequality in terms of the Gram matrix \(\bG = \bDDD^H \bDDD\).
\[\begin{split} 4 \epsilon^2 \geq &= \\bDDD \bb \_2^2 = \bb^H \bG \bb\\ &= \bb^H (\bG  \bI + \bI) \bb \\ &= \ \bb \_2^2 +\bb^H (\bG  \bI ) \bb. \end{split}\]It is easy to show that:
\[   \bb ^T  \bA   \bb  \leq \bb^H \bA \bb \leq  \bb ^T  \bA   \bb  \]whenever \(\bA\) is Hermitian.
To see this just notice that \(\bb^H \bA \bb\) is a real quantity.
Hence \(\bb^H \bA \bb = \pm  \bb^H \bA \bb \).
Now, using triangle inequality we can easily show that \( \bb^H \bA \bb  \leq  \bb^T  \bA   \bb \).
Since \(\bG \bI\) is Hermitian, hence
\[ \bb^H (\bG  \bI) \bb \geq   \bb ^T  \bG  \bI   \bb . \]Now
\[  \bb ^T  \bG  \bI   \bb  = \sum_{i, j}b_i  \bd_i^H \bd_j  \delta_{i j}   b_j  \leq \mu (\bDDD) \sum_{i, j, i \neq j} b_i   b_j  = \mu (\bDDD)  \bb ^T (\OneMat  \bI)  \bb . \]Only the offdiagonal terms of \(\bG\) remain in the sum, which are all dominated by \(\mu (\bDDD)\).
Thus we get
\[\begin{split} 4 \epsilon^2 &\geq \ \bb \_2^2   \bb ^T (\OneMat  \bI)  \bb \\ &= (1 + \mu (\bDDD)) \ \bb \_2^2  \mu (\bDDD)  \bb ^T \OneMat  \bb \\ &= (1 + \mu (\bDDD)) \ \bb \_2^2  \mu (\bDDD) \ \bb \_1^2. \end{split}\]This is valid since \(v^H \OneMat \bv = \ \bv \_1^2\).
Since \(\widehat{\ba}\) is optimal solution of (20.8), hence
\[ \ \widehat{\ba} \_1 = \ \bb + \ba \_1 \leq \ \ba \_1 \implies \ \bb + \ba \_1  \ \ba \_1 \leq 0. \]Let \(\Lambda = \supp(\ba)\) and \(K = \Lambda\).
By a simple permutation of columns of \(\bDDD\), we can bring the entries in \(\ba\) to the first \(K\) entries making \(\Lambda = \{1, \dots, K\}\).
We will make this assumption going forward without loss of generality.
Let \(\OneVec_K\) be corresponding support vector (of ones in first K places and 0 in rest).
From our previous analysis, we recall that
\[ \ \bb + \ba \_1  \ \ba \_1 \geq \ \bb \_1  2 \OneVec_K^T  \bb . \]Thus
\[ \ \bb \_1  2 \OneVec_K^T  \bb \leq 0 \implies \ \bb \_1 \leq 2 \OneVec_K^T  \bb . \]\( \OneVec_K^T  \bb \) is the sum of first \(K\) terms of \(\bb\).
Considering \(\bb_{\Lambda}\) as a vector \(\in \CC^K\) and using the \(\ell_1\)\(\ell_2\) norm relation \(\ \bv \_1 \leq \sqrt{K} \ \bv \_2 \Forall \bv \in \CC^N\), we get
\[ \OneVec_K^T  \bb  = \ \bb_{\Lambda} \_1 \leq \sqrt{K} \ \bb_{\Lambda} \_2 \leq \sqrt{K} \ \bb \_2. \]Thus,
\[ \ \bb \_1 \leq 2 \OneVec_K^T  \bb  \leq 2 \sqrt{K} \ \bb \_2. \]Putting this back in the previous inequality
\[\begin{split} 4 \epsilon^2 &\geq (1 + \mu (\bDDD)) \ \bb \_2^2  \mu (\bDDD) \ \bb \_1^2\\ &\geq (1 + \mu (\bDDD)) \ \bb \_2^2  \mu (\bDDD) 4 K \ \bb \_2^2 \\ &= (1  (4 K  1) \mu (\bDDD)) \ \bb \_2^2. \end{split}\]We note that this inequality is valid only if
\[ 1  (4 K  1) \mu (\bDDD) > 0. \]This condition can be reformulated as
\[ \ \ba \_0 = K < \frac{1}{4} \left (1 + \frac{1}{\mu (\bDDD)} \right). \]Rewriting the bound on \(\ \bb \_2^2\) we get
\[ \ \bb \_2^2 \leq \frac{4 \epsilon^2}{(1  (4 K  1) \mu (\bDDD)) } \]which is the desired result.
20.2.5. BPDN#
In this subsection we will examine the \(\ell_1\) penalty problem (20.10) more closely.
We will focus on following issues:
Some results from convex analysis useful for our study
Conditions for the minimization of (20.15) over coefficients \(\ba\) supported on a subdictionary \(\bDDD_{\Lambda}\)
Conditions under which the unique minimizer for a subdictionary is also the global minimizer for (20.15)
Application of (20.15) for sparse signal recovery
Application of (20.15) for identification of sparse signals in presence of noise
Application of (20.15) for identification of sparse signals in presence of Gaussian noise
We recall some definitions and results from convex analysis which will help us understand the minimizers for (20.15) problem.
Convex analysis for real valued functions the vector space \((\CC^n, \RR)\) is developed using the bilinear inner product defined as
The subscript \(B\) is there to distinguish it from the standard inner product for the complex coordinate space \(\langle \bx, \by \rangle = \by^H \bx\). The two inner products are related as
We consider real valued functions over the inner product space \(\XX = (\CC^D, \langle \cdot, \cdot \rangle_B)\). Note that the dimension of \(\XX\) is \(2 D\).
A real valued convex function \(f : \XX \to \RR\) satisfies the standard convexity inequality
The objective function for the problem (20.15) is
Clearly, \(L\) is a real valued function over \(\XX\) and it is easy to so that it is a convex function. Moreover \(L(\ba) \geq 0\) always.
We suggest the readers to review the material in Subgradients. For any function \(f : \XX \to \RR\), its subdifferential set is defined as
The elements of subdifferential set are called subgradients. If \(f\) possesses a gradient at \(\bx\), then it is the unique subgradient at \(\bx\). i.e.
where \(\nabla f(\bx)\) is the gradient of \(f\) at \(\bx\).
For convex function, the subdifferential of a sum is the (Minkowski) sum of the subdifferentials (Theorem 9.222); i.e.,
By Fermat’s optimality condition (Theorem 9.233), if \(f\) is a closed, proper convex function, then \(\bx\) is a global minimizer of \(f\) if and only if \(\bzero \in \partial f(\bx)\).
We would be specifically interested in the subdifferential for the function \(\ \ba \_1\).
Let \(\bz \in \XX\). The vector \(\bg \in \XX\) lies in the subdifferential \(\partial \ \bz \_1\) if and only if
\( g_k  \leq 1 \text{ whenever } z_k = 0\).
\(g_k = \sgn (z_k) \text{ whenever } z_k \neq 0\).
The proof is skipped.
We recall that the signum function for complex numbers is defined as
The subdifferential for \(\ell_1\) norm function for \(\RR^n\) is developed in Example 9.67. We note that \(\ell_1\) norm is differentiable at nonzero vectors. Also, we can see that:
\(\ \bg \_{\infty} = 1\) when \(\bz \neq \bzero\).
\(\ \bg \_{\infty} \leq 1\) when \(\bz = \bzero\).
20.2.5.1. Restricted Minimizers#
Suppose \(\Lambda\) index a subdictionary \(\bDDD_{\Lambda}\).
\(\bDDD_{\Lambda}\) is a linearly independent collection of atoms.
Hence a unique \(\ell_2\) best approximation \(\widehat{\bx}_{\Lambda}\) of \(\bx\) using the atoms in \(\bDDD_{\Lambda}\) can be obtained using the least square techniques.
We define the orthogonal projection operator
\[ \bP_{\Lambda} = \bDDD_{\Lambda}\bDDD_{\Lambda}^{\dag}. \]And we get
\[ \widehat{\bx}_{\Lambda} = \bP_{\Lambda} \bx. \]The approximation is orthogonal to the residual; i.e., \((\bx  \widehat{\bx}_{\Lambda}) \perp \widehat{\bx}_{\Lambda}\).
There is a unique coefficient vector \(\bc_{\Lambda}\) supported on \(\Lambda\) that synthesizes the approximation \(\widehat{\bx}_{\Lambda}\).
\[ \bc_{\Lambda} = \bDDD_{\Lambda}^{\dag} \bx = \bDDD_{\Lambda}^{\dag} \widehat{\bx}_{\Lambda}. \]We also have
\[ \widehat{\bx}_{\Lambda} = \bP_{\Lambda} \bx = \bDDD_{\Lambda} \bc_{\Lambda}. \]
\(L\) over vectors supported on \(\Lambda\))
(Minimization ofLet \(\Lambda\) index a linearly independent collection of atoms in \(\bDDD\) and let \(\ba^*\) minimize the objective function \(L(\ba)\) in (20.16) over all coefficient vectors supported on \(\Lambda\) (i.e. \(\supp(\ba) \subseteq \Lambda\)). A necessary and sufficient condition on such a minimizer is that
where \(\bc_{\Lambda} = \bDDD_{\Lambda}^{\dag} \bx\) and the vector \(\bg\) is drawn from \(\partial \ \ba^* \_1\). Moreover, the minimizer \(\ba^*\) is unique.
Proof. Since we are restricted \(\ba\) to be supported on \(\Lambda\) (i.e. \(\ba \in \CC^{\Lambda}\)), hence
The objective function simplifies to
We define \(\widehat{\bx}_{\Lambda} = \bP_{\Lambda} \bx\).
Now, both \(\widehat{\bx}_{\Lambda}\) and \(\bDDD_{\Lambda} \ba_{\Lambda}\) belong to the column space of \(\bDDD_{\Lambda}\) while \(\bx  \widehat{\bx}_{\Lambda}\) is orthogonal to it.
Hence
\[ \bx  \widehat{\bx}_{\Lambda} \perp \widehat{\bx}_{\Lambda}  \bDDD \ba. \]Thus, using the Pythagorean theorem, we get
\[ \ \bx  \bDDD \ba \_2^2 = \ \bx  \widehat{\bx}_{\Lambda} + \widehat{\bx}_{\Lambda}  \bDDD \ba \_2^2 = \\bx  \widehat{\bx}_{\Lambda} \_2^2 + \ \widehat{\bx}_{\Lambda}  \bDDD \ba\_2^2. \]We can rewrite \(L(\ba)\) as
\[ L(\ba) = \frac{1}{2} \\bx  \widehat{\bx}_{\Lambda} \_2^2 + \frac{1}{2} \ \widehat{\bx}_{\Lambda}  \bDDD \ba\_2^2 + \gamma \ \ba \_1. \]Define
\[ F(\ba) = \frac{1}{2} \ \widehat{\bx}_{\Lambda}  \bDDD \ba\_2^2 + \gamma \ \ba \_1. \]Then
\[ L(\ba) = \frac{1}{2} \\bx  \widehat{\bx}_{\Lambda} \_2^2 + F(\ba). \]Note that the term \(\\bx  \widehat{\bx}_{\Lambda} \_2^2\) is constant. It is the squared norm of the least square error.
Thus, minimizing \(L(\ba)\) over the coefficient vectors supported on \(\Lambda\) is equivalent to minimizing \(F(\ba)\) over the same support set.
Note that
\[ \bDDD \ba = \bDDD_{\Lambda} \ba_{\Lambda} \text{ and } \ \ba \_1 = \ \ba_{\Lambda} \_1. \]We can write \(F(\ba)\) as
\[ F(\ba) = \frac{1}{2} \ \widehat{\bx}_{\Lambda}  \bDDD_{\Lambda} \ba_{\Lambda}\_2^2 + \gamma \ \ba_{\Lambda} \_1. \]Note that \(F(\ba)\) depends only on entries in \(\ba\) which are part of the support \(\Lambda\).
We can replace the variable \(\ba_{\Lambda}\) with \(\ba \in \CC^{\Lambda}\) and rewrite \(F(\ba)\) as
\[ F(\ba) = \frac{1}{2} \ \widehat{\bx}_{\Lambda}  \bDDD_{\Lambda} \ba\_2^2 + \gamma \ \ba \_1 \Forall \ba \in \CC^{\Lambda}. \]Since atoms indexed by \(\Lambda\) are linearly independent, hence \(\bDDD_{\Lambda}\) has full column rank.
Thus, the quadratic term \(\ \widehat{\bx}_{\Lambda}  \bDDD_{\Lambda} \ba\_2^2\) is strictly convex.
Since \(\ \ba \_1\) is also convex, \(F(\ba)\) therefore is strictly convex and its minimizer is unique.
Since \(F\) is strictly convex and unconstrained, hence \(\bzero \in \partial F(\ba^*)\) is a necessary and sufficient condition for the coefficient vector \(\ba^*\) to minimize \(F(\ba)\).
The gradient of the first (quadratic) term is
\[ \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right) \ba  \bDDD_{\Lambda}^H \widehat{\bx}_{\Lambda}. \]For the second term we have to consider its subdifferential \(\partial \ \ba \_1\).
Thus, at \(\ba^*\) it follows that
\[ \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right) \ba^*  \bDDD_{\Lambda}^H \widehat{\bx}_{\Lambda} + \gamma \bg = \bzero. \]where \(\bg\) is some subgradient in \(\partial \ \ba^* \_1\).
Premultiplying with \(\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1}\) we get
\[\begin{split} &\ba^*  \bDDD_{\Lambda}^{\dag} \widehat{\bx}_{\Lambda} + \gamma \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \bg = \bzero\\ &\implies \bDDD_{\Lambda}^{\dag} \widehat{\bx}_{\Lambda}  \ba^* = \gamma \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \bg. \end{split}\]Finally, we recall that \(\bDDD_{\Lambda}^{\dag} \widehat{\bx}_{\Lambda} = \bc_{\Lambda}\).
Thus, we get the desired result
\[ \bc_{\Lambda}  \ba^* = \gamma \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \bg. \]
Some bounds follow as a result of this theorem. Readers are suggested to review the material in Matrix Norms.
Suppose that \(\Lambda\) index a subdictionary \(\bDDD_{\Lambda}\) and let \(\ba^*\) minimize the function \(L\) over all coefficient vectors supported on \(\Lambda\). Then following bounds are in force:
Proof. We start with
we take the \(\ell_{\infty}\) norm on both sides and apply some norm bounds
\[\begin{split} \ \bc_{\Lambda}  \ba^* \_{\infty} &= \ \gamma \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \bg \_{\infty}\\ &\leq \gamma \\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \_{\infty} \ \bg \_{\infty}\\ &\leq \gamma \\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \_{\infty}. \end{split}\]The last inequality is valid since from Theorem 20.10 we have: \(\ \bg \_{\infty} \leq 1\).
Now let us premultiply (20.18) with \(\bDDD_{\Lambda}\) and apply \(\ell_2\) norm
\[\begin{split} \\bDDD_{\Lambda} ( \bc_{\Lambda}  \ba^*) \_2 &= \\gamma \bDDD_{\Lambda} \left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \bg \_2\\ &= \gamma \ (\bDDD_{\Lambda}^{\dag})^H \bg \_2 \\ &\leq \ (\bDDD_{\Lambda}^{\dag})^H \_{\infty \to 2} \ \bg \_{\infty} \\ &= \ \bDDD_{\Lambda}^{\dag} \_{2 \to 1} \ \bg \_{\infty} \\ &\leq \ \bDDD_{\Lambda}^{\dag} \_{2 \to 1}. \end{split}\]In this derivation we used facts like \(\ \bA \_{p \to q} = \ \bA^H \_{q' \to p'}\), \(\ \bA \bx \_q \leq \ \bA \_{p \to q} \ \bx \_p\) and \(\ \bg \_{\infty} \leq 1\).
20.2.5.2. The Correlation Condition#
So far we have established a condition which ensures that \(\ba^*\) is a unique minimizer of \(L\) given that \(\ba\) is supported on \(\Lambda\). We now establish a sufficient condition under which \(\ba^*\) is also a global minimizer for \(L(\ba)\).
(Correlation condition for global minimizer)
Assume that \(\Lambda\) indexes a subdictionary. Let \(\ba^*\) minimize the function \(L\) over all coefficient vectors supported on \(\Lambda\). Suppose that
where \(\bg \in \partial \ \ba^* \_1\) is determined by (20.17). Then \(\ba^*\) is the unique global minimizer of \(L\). Moreover, the condition
guarantees that \(\ba^*\) is the unique global minimizer of \(L\).
Proof. Let \(\ba^*\) be the unique minimizer of \(L\) over coefficient vectors supported on \(\Lambda\). Then, the value of the objective function \(L(\ba)\) increases if we change any coordinate of \(\ba^*\) indexed in \(\Lambda\).
What we need is a condition which ensures that the value of objective function also increases if we change any other component of \(\ba^*\) (not indexed by \(\Lambda\)).
If this happens, then \(\ba^*\) will become a local minimizer of \(L\).
Further, since \(L\) is convex, \(\ba^*\) will also be global minimizer.
Towards this, let \(\omega\) be some index not in \(\Lambda\) and \(\be_{\omega} \in \CC^D\) be corresponding unit vector. Let \(\delta \be_{\omega}\) be a small perturbation introduced in \(\omega\)th coordinate. (\(\delta \in \CC\) is a small scalar, though need not be positive real). We need find a condition which ensures
Let us expand the L.H.S. of this inequality:
\[\begin{split} &L (\ba^* + \delta \be_{\omega})  L (\ba^*) =\\ &\quad \left [ \frac{1}{2} \ \bx  \bDDD \ba^*  \delta \bd_{\omega}\_2^2  \frac{1}{2} \ \bx  \bDDD \ba^* \_2^2 \right ] \\ &\quad + \gamma\left [\ \ba^* + \delta \be_{\omega}\_1  \ \ba^* \_1\right ]. \end{split}\]Here we used the fact that \(\bDDD \be_{\omega} = \bd_{\omega}\).
Note that since \(\ba^*\) is supported on \(\Lambda\) and \(\omega \notin \Lambda\), hence
\[ \ \ba^* + \delta \be_{\omega}\_1 = \ \ba^*\_1 + \\delta \be_{\omega}\_1. \]Thus
\[ \ \ba^* + \delta \be_{\omega}\_1  \ \ba^* \_1 = \delta. \]We should also simplify the first bracket.
\[\begin{split} \ \bx  \bDDD \ba^* \_2^2 &= (\bx  \bDDD \ba^*)^H (\bx  \bDDD \ba^*) \\ &= \bx^H \bx + {\ba^*}^H \bDDD^H \bDDD \ba^*  \bx^H \bDDD \ba^*  {\ba^*}^H \bDDD^H \bx. \end{split}\]Similarly
\[\begin{split} \ \bx  \bDDD \ba^*  \delta \bd_{\omega}\_2^2 &= (\bx  \bDDD \ba^*  \delta \bd_{\omega})^H (\bx  \bDDD \ba^*  \delta \bd_{\omega})\\ &= \bx^H \bx + {\ba^*}^H \bDDD^H \bDDD \ba^*  \bx^H \bDDD \ba^*  {\ba^*}^H \bDDD^H \bx\\ & (\bx  \bDDD \ba^*)^H \delta \bd_{\omega}  \delta \bd_{\omega}^H (\bx  \bDDD \ba^*) + \ \delta \bd_{\omega}\_2^2. \end{split}\]Canceling the like terms we get
\[ \ \delta \bd_{\omega}\_2^2  2 \Re (\langle \bx  \bDDD \ba^*, \delta \bd_{\omega} \rangle). \]Thus,
\[\begin{split} &L (\ba^* + \delta \be_{\omega})  L (\ba^*) =\\ &\quad \frac{1}{2} \ \delta \bd_{\omega}\_2^2  \Re (\langle \bx  \bDDD \ba^*, \delta \bd_{\omega} \rangle) + \gamma \delta. \end{split}\]Recall that since \(\ba^*\) is supported on \(\Lambda\), hence \(\bDDD \ba^* = \bDDD_{\Lambda} \ba^*\).
We can further split the middle term by adding and subtracting \(\bDDD_{\Lambda} \bc_{\Lambda}\).
\[\begin{split} \Re (\langle \bx  \bDDD_{\Lambda} \ba^*, \delta \bd_{\omega} \rangle) &= \Re (\langle \bx  \bDDD_{\Lambda} \bc_{\Lambda} + \bDDD_{\Lambda} \bc_{\Lambda}  \bDDD_{\Lambda} \ba^*, \delta \bd_{\omega} \rangle)\\ &= \Re (\langle \bx  \bDDD_{\Lambda} \bc_{\Lambda}, \delta \bd_{\omega} \rangle) + \Re (\langle \bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*), \delta \bd_{\omega} \rangle) \end{split}\]Thus, we can write
\[ L (\ba^* + \delta \be_{\omega})  L (\ba^*) = \frac{1}{2} \ \delta \bd_{\omega}\_2^2  \Re (\langle \bx  \bDDD_{\Lambda} \bc_{\Lambda}, \delta \bd_{\omega} \rangle)  \Re (\langle \bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*), \delta \bd_{\omega} \rangle) + \gamma \delta. \]The term \(\frac{1}{2} \ \delta \bd_{\omega}\_2^2\) is strictly positive giving us
\[ L (\ba^* + \delta \be_{\omega})  L (\ba^*) >  \Re (\langle \bx  \bDDD_{\Lambda} \bc_{\Lambda}, \delta \bd_{\omega} \rangle)  \Re (\langle \bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*), \delta \bd_{\omega} \rangle) + \gamma \delta. \]Using lower triangle inequality we can write
\[ L (\ba^* + \delta \be_{\omega})  L (\ba^*) > \gamma \delta  \langle \bx  \bDDD_{\Lambda} \bc_{\Lambda}, \delta \bd_{\omega} \rangle  \langle \bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*), \delta \bd_{\omega} \rangle. \]Using linearity of inner product, we can take out \(\delta\):
(20.21)#\[ L (\ba^* + \delta \be_{\omega})  L (\ba^*) > \delta \left [ \gamma  \langle \bx  \bDDD_{\Lambda} \bc_{\Lambda}, \bd_{\omega} \rangle  \langle \bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*), \bd_{\omega} \rangle \right ].\]Let us simplify this expression. Since \(\ba^*\) is a unique minimizer over coefficients in \(\CC^{\Lambda}\), hence using Theorem 20.11
\[\begin{split} &\bc_{\Lambda}  \ba^* = \gamma (\bDDD_{\Lambda}^H \bDDD_{\Lambda})^{1} \bg\\ \iff &\bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*) = \gamma \bDDD_{\Lambda}(\bDDD_{\Lambda}^H \bDDD_{\Lambda})^{1} \bg = (\bDDD_{\Lambda}^{\dag})^H \bg. \end{split}\]where \(\bg \in \partial \ \ba^*\_1\).
Thus
\[ \langle \bDDD_{\Lambda} (\bc_{\Lambda}  \ba^*), \bd_{\omega} \rangle = \gamma \langle (\bDDD_{\Lambda}^{\dag})^H \bg, \bd_{\omega} \rangle = \gamma \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle \]using the fact that \(\langle \bA \bx, \by \rangle = \langle \bx, \bA^H \by \rangle\).
Also, we recall that \(\widehat{\bx}_{\Lambda} = \bDDD_{\Lambda} \bc_{\Lambda}\).
Putting the back in (20.21) we obtain:
(20.22)#\[ L (\ba^* + \delta \be_{\omega})  L (\ba^*) > \delta \left [\gamma  \gamma \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle  \langle \bx  \widehat{\bx}_{\Lambda}, \bd_{\omega} \rangle\right ].\]In (20.22), the L.H.S. is nonnegative (our real goal) whenever the term in the bracket on the R.H.S. is nonnegative (since \(\delta\) is positive).
Therefore we want that
\[ \gamma  \gamma \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle  \langle \bx  \widehat{\bx}_{\Lambda}, \bd_{\omega} \rangle \geq 0. \]This can be rewritten as
\[ \langle \bx  \widehat{\bx}_{\Lambda}, \bd_{\omega} \rangle \leq \gamma \left [ 1  \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle \right ]. \]Since this condition should hold for every \(\omega \notin \Lambda\), hence we maximize the L.H.S. and minimize the R.H.S. over \(\omega \notin \Lambda\).
We get
\[ \underset{\omega \notin \Lambda}{\max}\langle \bx  \widehat{\bx}_{\Lambda}, \bd_{\omega} \rangle \leq \underset{\omega \notin \Lambda}{\min} \gamma \left [ 1  \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle\right ] = \gamma \left [1  \underset{\omega \notin \Lambda}{\max}\langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle\right ]. \]Recall that \( \bx  \widehat{\bx}_{\Lambda}\) is orthogonal to the space spanned by atoms in \(\bDDD_{\Lambda}\).
Hence
\[ \underset{\omega \notin \Lambda}{\max}\langle \bx  \widehat{\bx}_{\Lambda}, \bd_{\omega} \rangle = \underset{\omega}{\max}\langle \bx  \widehat{\bx}_{\Lambda}, \bd_{\omega} \rangle = \ \bDDD^H ( \bx  \widehat{\bx}_{\Lambda}) \_{\infty}. \]This gives us the desired sufficient condition
\[ \ \bDDD^H ( \bx  \widehat{\bx}_{\Lambda}) \_{\infty} \leq \gamma \left [1  \underset{\omega \notin \Lambda}{\max}\langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle\right ]. \]This condition still uses \(\bg\). We know that \(\ \bg \_{\infty} \leq 1\).
Let us simplify as follows:
\[\begin{split} \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle &=  (\bDDD_{\Lambda}^{\dag} \bd_{\omega})^H \bg \\ &= \ (\bDDD_{\Lambda}^{\dag} \bd_{\omega})^H \bg \_{\infty}\\ &\leq \ (\bDDD_{\Lambda}^{\dag} \bd_{\omega})^H \_{\infty} \ \bg \_{\infty}\\ &= \ (\bDDD_{\Lambda}^{\dag} \bd_{\omega})\_1 \ \bg \_{\infty}\\ &\leq \ (\bDDD_{\Lambda}^{\dag} \bd_{\omega})\_1. \end{split}\]Another way to understand this is as follows. For any vector \(\bv \in \CC^D\)
\[\begin{split}  \langle \bv, \bg \rangle  &=  \sum_{i=1}^D \overline{g_i} v_i  \\ &\leq \sum_{i=1}^D g_i   v_i  \\ &\leq \left [\sum_{i=1}^D  v_i  \right ] \ \bg \_{\infty}\\ &\leq \ \bv \_1. \end{split}\]Thus
\[ \langle \bDDD_{\Lambda}^{\dag} \bd_{\omega}, \bg \rangle \leq \\bDDD_{\Lambda}^{\dag} \bd_{\omega}\_1. \]Thus, it is also sufficient that
\[ \ \bDDD^H ( \bx  \widehat{\bx}_{\Lambda}) \_{\infty} \leq \gamma \left [1  \underset{\omega \notin \Lambda}{\max} \ (\bDDD_{\Lambda}^{\dag} \bd_{\omega})\_1 \right ]. \]
20.2.5.3. Exact Recovery Coefficient#
We recall that Exact Recovery Coefficient for a subdictionary is defined as
Thus, the sufficient condition can be rewritten as
Note that the L.H.S. in both sufficient conditions is always nonnegative.
Hence, if the R.H.S. is negative (i.e. \(\ERC(\Lambda) < 0\)), the sufficient condition is useless.
On the other hand if \(\ERC(\Lambda) > 0\), then a sufficiently high \(\gamma\) can always be chosen to satisfy the condition in (20.20).
At the same time as \(\gamma \to \infty\), the optimum minimizer is \(\ba^* = \bzero\).
How do we interpret the L.H.S. \(\ \bDDD^H ( \bx  \widehat{\bx}_{\Lambda}) \_{\infty}\)?
Given a nonzero signal \(\bv\) and a dictionary \(\bDDD\), define the function
If \(\bv = \bzero\), then define \(\Maxcor(\bv) = 0\).
This is known as the maximum correlation [79] of a signal with a dictionary.
Essentially, for any signal we normalize it and then find out its maximum inner product (absolute value) with atoms in the dictionary \(\bDDD\). Clearly \(0 \leq \Maxcor(\bv) \leq 1\).
We can see that
We can now interpret
Therefore, the sufficient condition in Theorem 20.13 is strongest when the magnitude of the residual \((\bx  \widehat{\bx}_{\Lambda})\) and its maximum correlation with the dictionary are both small.
Since the maximum correlation of the residual never exceeds one, hence we obtain following (much weaker result)
Let \(\Lambda\) index a subdictionary and let \(\bx\) be an input signal. Suppose that the residual vector \(\bx  \widehat{\bx}_{\Lambda}\) satisfies
Then any coefficient vector \(\ba^*\) that minimizes the function \(L\) must be supported inside \(\Lambda\).
20.2.5.4. Applications of \(\ell_1\) Penalization#
Having setup the basic results in place, we can now study the applications of (20.15).
(BPDN reconstruction guarantees using ERC)
Let \(\Lambda\) index a subdictionary \(\bDDD_{\Lambda}\) for which \(\ERC(\Lambda) \geq 0\). Suppose that \(\bx\) is an input signal whose \(\ell_2\) best approximation over \(\Lambda\) satisfies the correlation condition
Let \(\ba^*\) solve the convex program (20.15) with parameter \(\gamma\). We may conclude that:
Support of \(\ba^*\) is contained in \(\Lambda\).
The error between \(\ba^*\) and the optimal coefficient vector \(\bc_{\Lambda}\) satisfies
\[ \ \ba^*  \bc_{\Lambda} \_{\infty} \leq \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]In particular, \(\supp(\ba^*)\) contains every index \(\lambda\) in \(\Lambda\) for which
\[  \bc_{\Lambda} (\lambda)  > \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]Moreover, the minimizer \(\ba^*\) is unique.
Proof. .
Since the sufficient condition for correlation condition Theorem 20.13 are satisfied, hence \(\ba^*\) which minimizes \(L\) over coefficient vectors in \(\CC^{\Lambda}\) is also a global minimizer of \(L\).
Since \(\ba^* \in \CC^{\Lambda}\), hence \(\supp(\ba^*) \subseteq \Lambda\).
For claim 2, application of Theorem 20.12 gives us
\[ \ \bc_{\Lambda}  \ba^* \_{\infty} \leq \gamma \\left (\bDDD_{\Lambda}^H \bDDD_{\Lambda} \right)^{1} \_{\infty}. \]Since the convex function \(L\) is strictly convex, hence \(\ba^*\) is unique global minimizer.
For claim 3, suppose \(\ba^*(\lambda) = 0\) for some index \(\lambda \in \Lambda\) for which
\[  \bc_{\Lambda} (\lambda)  > \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]Then
\[ \ba^*(\lambda)  \bc_{\Lambda} (\lambda)  =  \bc_{\Lambda} (\lambda)  > \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]But
\[ \ \ba^*  \bc_{\Lambda} \_{\infty} \geq \ba^*(\lambda)  \bc_{\Lambda} (\lambda) . \]This violates the bound that
\[ \ \ba^*  \bc_{\Lambda} \_{\infty} \leq \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]Thus, \(\supp(\ba^*)\) contains every index \(\lambda \in \Lambda\) for which
\[  \bc_{\Lambda} (\lambda)  > \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]
We can formulate a simpler condition in terms of coherence of the dictionary.
(BPDN reconstruction guarantees using coherence)
Suppose that \(K \mu \leq \frac{1}{2}\). Assume that \( \Lambda  \leq K\) i.e. \(\Lambda\) contains at most \(K\) indices. Suppose that \(\bx\) is an input signal whose \(\ell_2\) best approximation over \(\Lambda\) denoted by \(\widehat{\bx}_{\Lambda}\) satisfies the correlation condition
Let \(\ba^*\) solve the convex program (20.15) with parameter \(\gamma\). We may conclude that:
Support of \(\ba^*\) is contained in \(\Lambda\) and
The distance between \(\ba^*\) and the optimal coefficient vector \(\bc_{\Lambda}\) satisfies
In particular, \(\supp(\ba^*)\) contains every index \(\lambda\) in \(\Lambda\) for which
\[  \bc_{\Lambda} (\lambda)  > \gamma \frac{1}{1  (K 1) \mu}. \]Moreover, the minimizer \(\ba^*\) is unique.
Proof. .
We recall from Theorem 18.75 the coherence bounds on ERC as
\[ \ERC(\Lambda) \geq \frac{1  (2K  1) \mu}{1  (K  1)\mu}. \]Thus,
\[ \ \bDDD_{\Lambda}^H (\bx  \widehat{\bx}_{\Lambda}) \_{\infty} \leq \gamma \frac{1  (2K  1) \mu}{1  (K  1) \mu} \leq \gamma \ERC(\Lambda). \]A direct application of Theorem 20.14 validates claims 1 and 4.
We recall from Theorem 18.37 the upper bound on norm of inverse Gram matrix of a subdictionary as
\[ \ G^{1} \_{\infty} = \ G^{1} \_{1} \leq \frac{1}{1  \mu_1(K  1)}\leq \frac{1}{1  (K 1) \mu}. \]Putting this in Theorem 20.14 validates claims 2 and 3.
20.2.5.5. Exact Sparse Reconstruction Problem#
We now show how one can reconstruct an exactly sparse signal solving the convex program (20.15).
(BPDN exact sparse recovery guarantee)
Assume that \(\Lambda\) indexes a subdictionary for which \(\ERC(\Lambda) \geq 0\). Choose an arbitrary coefficient vector \(\bc_{\text{opt}}\) supported on \(\Lambda\). Fix an input signal \(\bx = \bDDD \bc_{\text{opt}}\). Let \(\ba^*(\gamma)\) denote the unique minimizer of (20.15) with parameter \(\gamma\). We may conclude that
There is a positive number \(\gamma_0\) for which \(\gamma < \gamma_0\) implies that \(\supp(\ba^*(\gamma)) = \Lambda\).
In the limit as \(\gamma \to 0\), we have \(\ba^*(\gamma) \to \bc_{\text{opt}}\).
Proof. .
Since there is no noise, hence the best \(\ell_2\) approximation of \(\bx\) over \(\Lambda\)
\[ \widehat{\bx}_{\Lambda} = \bx \]itself and the corresponding coefficient vector is
\[ \bc_{\Lambda} = \bc_{\text{opt}}. \]Therefore
\[ \ \bDDD_{\Lambda}^H (\bx  \widehat{\bx}_{\Lambda}) \_{\infty} = 0 \leq \gamma \ERC(\Lambda). \]Thus, the correlation condition is in force for every positive value of \(\gamma\).
Thus, as per Theorem 20.14, minimizer \(\ba^*(\gamma)\) of the convex program (20.15) must be supported inside \(\Lambda\).
Moreover, we have
\[ \ \ba^*(\gamma)  \bc_{\text{opt}} \_{\infty} \leq \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]Clearly, as \(\gamma \to 0\), we have \( \ba^*(\gamma) \to \bc_{\text{opt}}\).
Finally, recall that \(\supp(\ba^*(\gamma))\) contains very index \(\lambda\) in \(\Lambda\) for which
\[  \bc_{\text{opt}} (\lambda)  > \gamma \left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}. \]In order for every index in \(\Lambda\) to be part of \(\supp(\ba^*(\gamma))\), we require
\[ \frac{\underset{\gamma \in \Gamma}{\min}  \bc_{\text{opt}} (\lambda) } {\left \ \left ( \bDDD_{\Lambda}^H \bDDD_{\Lambda} \right )^{1} \right \_{\infty}} > \gamma. \]Choosing the L.H.S. to be \(\gamma_0\), we get an explicit value of upper bound on \(\gamma\) such that \(\gamma < \gamma_0\) leads to complete discovery of support.