(sec:ssm:srr)=
# Sparse and Redundant Representations
## Dictionaries
```{index} Dictionary, Atom
```
````{prf:definition} Dictionary
:label: def-ssm-dictionary
A *dictionary* for $\CC^N$ is a finite collection $\bDDD$ of unit-norm vectors
which span the whole space.
The elements of a dictionary are called *atoms* and they are denoted by $\phi_{\omega}$
where $\omega$ is drawn from an index set $\Omega$.
The dictionary is written as
$$
\bDDD = \{\phi_{\omega} : \omega \in \Omega \}
$$
where
$$
\| \phi_{\omega} \|_2 = 1 \Forall \omega \in \Omega
$$
and every signal $\bx \in \CC^N$ can be expressed as
$$
\bx = \sum_{\omega \in \Omega} c_{\omega} \phi_{\omega}.
$$
We use the letter $D$ to denote the number of elements in the dictionary; i.e.,
$$
D = | \Omega |.
$$
````
This definition is adapted from {cite}`tropp2004greed`.
The indices may have an interpretation, such as the time-frequency or time-scale localization
of an atom, or they may simply be labels without any underlying meaning.
Note that the dictionary need not provide a unique representation for any vector $\bx \in \CC^N$,
but it provides at least one representation for each $\bx \in \CC^N$.
When $D=N$ we have a set of unit norm vectors which span the whole of $\CC^N$.
Thus we have a basis (not-necessarily an orthonormal basis).
A dictionary cannot have $D < N$. The more interesting case is when $D > N$.
## Redundant Dictionaries and Sparse Signals
With $D > N$, clearly there are more atoms than necessary to provide
a representation of every signal $\bx \in \CC^N$.
Thus such a dictionary is able provide multiple representations to same vector $\bx$.
We call such dictionaries *redundant dictionaries* or *over-complete dictionaries*.
In contrast a basis with $D=N$ is called a *complete dictionary*.
A special class of signals is those signals which have a sparse representation in a given
dictionary $\bDDD$.
````{prf:definition} $(\bDDD,K)$-sparse signals
:label: def-ssm-d-k-sparse-signal
A signal $\bx \in \CC^N$ is called $(\bDDD,K)$-sparse if it can be
expressed as a linear combination of at-most $K$ atoms from the dictionary
$\bDDD$.
````
Normally, for sparse signals, we have $K \ll D$.
It is usually expected that $K \ll N$ also holds.
Let $\Lambda \subset \Omega$ be a subset of indices with $|\Lambda|=K$.
Let $\bx$ be any signal in $\CC^N$ such that $\bx$ can be expressed as
$$
\bx = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \quad \text{where } b_{\lambda} \in \CC.
$$
Note that this is not the only possible representation of $\bx$ in $\bDDD$. This is
just one of the possible representations of $\bx$. The special feature of this representation
is that it is $K$-sparse i.e. only at most $K$ atoms from the dictionary are being used.
Now there are $\binom{D}{K}$ ways in which we can choose a set of $K$ atoms from the
dictionary $\bDDD$.
Thus the set of $(\bDDD,K)$-sparse signals is given by
$$
\Sigma_{(\bDDD,K)} = \{\bx \in \CC^N \ST
\bx = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda},
\quad \Lambda \subseteq \Omega, | \Lambda | = K \}.
$$
This set $\Sigma_{(\bDDD,K)}$ is dependent on the chosen dictionary $\bDDD$.
In the sequel, we will simply refer to it as $\Sigma_K$.
````{prf:example} $K$-sparse signals for standard basis
:label: ex-ssm-k-sparse-signal-set-standard-basis
For the special case where $\bDDD$ is nothing but the standard basis of $\CC^N$, then
$$
\Sigma_K = \{ \bx \in \CC^N \ST \| \bx \|_0 \leq K\};
$$
i.e., the set of signals which have $K$ or less non-zero elements.
````
````{prf:example} $K$-sparse signals for orthonormal basis
:label: ex-ssm-k-sparse-signal-set-onb
In contrast if we choose an orthonormal basis $\Psi$ such that every
$\bx \in \CC^N$ can be expressed as
$$
\bx = \Psi \ba
$$
then with the dictionary $\bDDD = \Psi$, the set of $K$-sparse signals is given by
$$
\Sigma_K = \{ \bx = \Psi \ba \ST \| \ba \|_0 \leq K\}.
$$
````
We also note that for a specific choice of $\Lambda \subseteq \Omega$
with $|\Lambda| = K$, the set of vectors
$$
\{\bx \in \CC^N \ST
\bx = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \}
= \span \{\phi_{\lambda} \ST \lambda \in \Lambda \}
$$
form a subspace of $\CC^N$.
So we have $\binom{D}{K}$ $K$-sparse subspaces contained in the dictionary $\bDDD$.
And the $K$-sparse signals lie in the *union of these subspaces*.
(sec:ssm:sparse:approximation:problem)=
## Sparse Approximation Problem
In sparse approximation problem, we attempt to approximate a given signal $\bx \in \CC^N$ as
a linear combination of $K$ atoms from the dictionary $\bDDD$ where $K \ll N$ and
typically $N \ll D$;
i.e., the number of atoms in a dictionary $\bDDD$ is typically much larger
than the ambient signal space dimension $N$.
Naturally, we wish to obtain a best possible sparse representation of $\bx$
over the atoms $\phi_{\omega} \in \bDDD$ which minimizes the approximation error.
Let $\Lambda$ denote the index set of atoms which are used to create a $K$-sparse
representation of $\bx$ where $\Lambda \subset \Omega$ with $|\Lambda| = K$.
Let $\bx_{\Lambda}$ denote an approximation of $\bx$
over the set of atoms indexed by $\Lambda$.
Then we can write $\bx_{\Lambda}$ as
$$
\bx_{\Lambda} = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda}
\quad \text{where } b_{\lambda} \in \CC.
$$
````{div}
We put all complex valued coefficients $b_{\lambda}$ in the sum into a list $\bb_{\Lambda}$.
The approximation error is given by
$$
\be = \| \bx - \bx_{\Lambda} \|_2.
$$
Clearly we would like to minimize the approximation error over all possible choices of $K$ atoms
and corresponding set of coefficients $\bb_{\Lambda}$.
Thus the sparse approximation problem can be cast as a minimization problem given by
```{math}
:label: eq-ssm-sparse-approximation
\underset{|\Lambda| = K}{\text{min}} \, \underset{\bb_{\Lambda}}{\text{min}}
\left \| \bx - \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \right \|_2.
```
If we choose a particular $\Lambda$, then the inner minimization problem becomes
a straight-forward least squares problem.
But there are $\binom{D}{K}$ possible choices of $\Lambda$ and solving the
inner least squares problem for each of them becomes prohibitively expensive.
We reemphasize here that in this formulation we are using a *fixed* dictionary $\bDDD$
while the vector $\bx \in \CC^N$ is *arbitrary*.
This problem is known as $(\bDDD, K)$-SPARSE approximation problem.
A related problem is known as $(\bDDD, K)$-EXACT-SPARSE problem
where it is known a-priori that $\bx$ is a linear combination of at-most $K$ atoms
from the given dictionary $\bDDD$ i.e. $\bx$ is a $K$-sparse signal as
defined above for the dictionary $\bDDD$.
This formulation simplifies the minimization problem {eq}`eq-ssm-sparse-approximation` since
it is known a priori that for $K$-sparse signals, a $0$ approximation error can be achieved.
The only problem is to find a set of subspaces from the $\binom{D}{K}$ possible $K$-sparse
subspaces which are able to provide a $K$-sparse representation of $\bx$ and among them
choose one. It is imperative to note that even the $K$-sparse representation need not
be unique.
Clearly the EXACT-SPARSE problem is simpler than the SPARSE approximation problem.
Thus if EXACT-SPARSE problem is NP-Hard then so is the harder SPARSE-approximation problem.
It is expected that solving the EXACT-SPARSE problem will provide insights into solving the
SPARSE problem.
In {prf:ref}`res-ssm-sparse-unique-2onb` we identified conditions
under which a sparse representation for a given vector $\bx$ in a two-ortho-basis is unique.
It would be useful to get similar conditions for general dictionaries. such conditions
would help us guarantee the uniqueness of EXACT-SPARSE problem.
````
## Synthesis and Analysis
The atoms of a dictionary $\bDDD$ can be organized into a $N \times D$ matrix as follows:
$$
\Phi \triangleq \begin{bmatrix}
\phi_{\omega_1} & \phi_{\omega_2} & \dots & \phi_{\omega_D}
\end{bmatrix}.
$$
where $\Omega = \{\omega_1, \omega_2, \dots, \omega_N\}$ is the index set for the atoms
of $\bDDD$.
We recall that $\phi_{\omega} \in \CC^N$, hence they have a column
vector representation in the standard basis for $\CC^N$.
The order of columns doesn't matter as long as it remains fixed once chosen.
Thus in matrix terminology a representation of $\bx \in \CC^N$ in the dictionary can
be written as
$$
\bx = \Phi \bb
$$
where $\bb \in \CC^D$ is a vector of coefficients to produce a superposition $\bx$
from the atoms of dictionary $\bDDD$.
Clearly with $D > N$, $\bb$ is not unique.
Rather for every vector
$\bz \in \NullSpace(\Phi)$, we have:
$$
\Phi (\bb + \bz) = \Phi \bb + \Phi \bz = \bx + \bzero = \bx.
$$
```{index} Synthesis matrix
```
````{prf:definition} Synthesis matrix
:label: def:ssm:dictionary:synthesis_matrix
The matrix $\Phi$ is called a *synthesis matrix* since $\bx$
is synthesized from the columns of
$\Phi$ with the coefficient vector $\bb$.
````
We can also view the synthesis matrix $\Phi$ as a linear operator from $\CC^D$ to $\CC^N$.
There is another way to look at $\bx$ through $\Phi$.
```{index} Analysis matrix
```
````{prf:definition} Analysis matrix
:label: def-ssm-dict-analysis-matrix
The conjugate transpose $\Phi^H$ of the synthesis matrix $\Phi$ is called the *analysis matrix*.
It maps a given vector $\bx \in \CC^N$ to a list of inner products with the dictionary:
$$
\bc = \Phi^H \bx
$$
where $\bc \in \CC^D$.
````
Note that in general $\bx \neq \Phi (\Phi^H \bx)$ unless $\bDDD$ is an orthonormal basis.
````{prf:definition} $(\bDDD, K)$ EXACT SPARSE problem
:label: def-ssm-d-k-exact-sparse-problem
With the help of synthesis matrix $\Phi$, the $(\bDDD, K)$ EXACT SPARSE problem
can now be written as
```{math}
:label: eq-ssm-d-k-exact-sparse-problem
& \underset{\ba}{\text{minimize}}
& & \| \ba \|_0 \\
& \text{subject to }
& & \bx = \Phi \ba\\
& \text{and }
& & \| \ba \|_0 \leq K
```
````
If $\bx \notin \Sigma_K$, then the EXACT SPARSE problem is infeasible.
Otherwise, we are looking to find the sparsest possible solution.
````{prf:definition} $(\bDDD, K)$ SPARSE approximation problem
:label: def-ssm-d-k-sparse-approx-prob
With the help of synthesis matrix $\Phi$, the $(\bDDD, K)$ SPARSE approximation
problem can now be written as
```{math}
:label: eq-ssm-d-k-sparse-approx-problem
& \underset{\ba}{\text{minimize}}
& & \| \bx - \Phi \ba \|_2 \\
& \text{subject to }
& & \| \ba \|_0 \leq K.
```
````
This problem can be visualized as a projection of $\bx$ on to
the set $\Sigma_K$. Hence, it always has a solution.
## P-Norms
There are some simple and useful results on relationships between
different $p$-norms listed in this section. We also discuss
some interesting properties of $l_1$-norm specifically.
```{index} Sign vector; complex
```
````{prf:definition} Complex sign vector
:label: def:ssm:sign_vector
Let $\bv \in \CC^N$. Let the entries in $\bv$ be represented as
$$
v_k = r_k \exp (i \theta_k)
$$
where $r_k = | v_k |$ with the convention that $\theta_k = 0$
whenever $r_k = 0$.
The sign vector for $\bv$ denoted by $\sgn(\bv)$ is defined as
$$
\sgn(\bv) = \begin{bmatrix}\sgn(v_1) \\ \vdots \\ \sgn(v_N) \end{bmatrix}
$$
where
$$
\sgn(v_k) = \begin{cases}
\exp (i \theta_k) & \text{ if } r_k \neq 0;\\
0 & \text{ if } r_k = 0.
\end{cases}
$$
````
````{prf:theorem} $\ell_1$ norm as product of vector with its sign
:label: res:ssm:l1_norm_as_inner_product_with_sign_vector
For any $\bv \in \CC^N$:
$$
\| \bv \|_1 = \sgn(\bv)^H \bv = \langle \bv , \sgn(\bv) \rangle.
$$
````
````{prf:proof}
This follows from:
$$
\| \bv \|_1 = \sum_{k=1}^N r_k
= \sum_{k=1}^N \left [r_k e^{i \theta_k} \right ] e^{- i \theta_k}
= \sum_{k=1}^N v_k e^{- i \theta_k} = \sgn(\bv)^H \bv.
$$
Note that whenever $v_k = 0$,
corresponding $0$ entry in $\sgn(\bv)$ has no effect on the sum.
````
````{prf:theorem} Equivalence of $\ell_1$ and $\ell_2$ norms
:label: lem:ssm:l1_norm_l2_bounds
Suppose $\bv \in \CC^N$. Then
$$
\| \bv \|_2 \leq \| \bv\|_1 \leq \sqrt{N} \| \bv \|_2.
$$
````
````{prf:proof}
For the lower bound, we go as follows
$$
\| \bv \|_2^2
= \sum_{i=1}^N | v_i|^2
\leq \left ( \sum_{i=1}^N | v_i|^2 + 2 \sum_{i, j, i \neq j} | v_i | | v_j| \right )
= \left ( \sum_{i=1}^N | v_i| \right )^2 = \| \bv \|_1^2.
$$
This gives us
$$
\| \bv \|_2 \leq \| \bv \|_1.
$$
We can write $\ell_1$ norm as
$$
\| \bv \|_1 = \langle \bv, \sgn (\bv) \rangle.
$$
By Cauchy-Schwartz inequality we have
$$
\langle \bv, \sgn (\bv) \rangle \leq \| \bv \|_2 \| \sgn (\bv) \|_2.
$$
Since $\sgn(\bv)$ can have at most $N$ non-zero values, each with magnitude 1,
$$
\| \sgn (\bv) \|_2^2 \leq N
\implies \| \sgn (\bv) \|_2 \leq \sqrt{N}.
$$
Thus, we get
$$
\| \bv \|_1 \leq \sqrt{N} \| \bv \|_2.
$$
````
````{prf:theorem} Equivalence of $\ell_2$ and $\ell_{\infty}$ norms
:label: res:ssm:l2_upper_bound_max_norm
Let $\bv \in \CC^N$. Then
$$
\| \bv \|_2 \leq \sqrt{N} \| \bv \|_{\infty}.
$$
````
````{prf:proof}
This follows from:
$$
\| \bv \|_2^2
= \sum_{i=1}^N | v_i |^2
\leq N \underset{1 \leq i \leq N}{\max} ( | v_i |^2)
= N \| \bv \|_{\infty}^2.
$$
Thus
$$
\| \bv \|_2 \leq \sqrt{N} \| \bv \|_{\infty}.
$$
````
````{prf:theorem} Relationship between $p$-norms
:label: res:ssm:p_q_norm_bounds
Let $\bv \in \CC^N$.
Let $1 \leq p, q \leq \infty$.
Then
$$
\| \bv \|_q \leq \| \bv \|_p \text{ whenever } p \leq q.
$$
````
````{prf:proof}
TBD
````
````{prf:theorem}
:label: res:ssm:one_vec_l1_norm
Let $\bone \in \CC^N$ be the vector of all ones; i.e., $\bone = (1, \dots, 1)$.
Let $\bv \in \CC^N$ be some arbitrary vector.
Let $| \bv |$ denote the vector of
absolute values of entries in $\bv$;
i.e., $|v|_i = |v_i| \Forall 1 \leq i \leq N$.
Then
$$
\| \bv \|_1 = \bone^T | \bv | = \bone^H | \bv |.
$$
````
````{prf:proof}
This follows from:
$$
\bone^T | \bv | = \sum_{i=1}^N | v |_i = \sum_{i=1}^N | v_i | = \| \bv \|_1.
$$
Finally since $\bone$ consists only of real entries, hence its transpose and Hermitian
transpose are same.
````
````{prf:theorem}
:label: res:ssm:ones_matrix_l1_norm
Let $\OneMat \in \CC^{N \times N}$ be a square matrix of all ones. Let $\bv \in \CC^N$
be some arbitrary vector.
Then
$$
|\bv|^T \OneMat | \bv | = \| \bv \|_1^2.
$$
````
````{prf:proof}
We know that
$$
\OneMat = \bone \bone^T
$$
Thus,
$$
|\bv|^T \OneMat | \bv |
= |\bv|^T \bone \bone^T | \bv |
= (\bone^T | \bv | )^T (\bone^T | \bv |)
= \| \bv \|_1 \| \bv \|_1 = \| \bv \|_1^2.
$$
We used the fact that $\| \bv \|_1 = \bone^T | \bv |$.
````
````{prf:theorem} An upper bound on the $k$-th largest value
:label: res:ssm:k_th_largest_entry_l1_norm
$k$-th largest (magnitude) entry in a vector
$\bx \in \CC^N$ denoted by $x_{(k)}$ obeys
```{math}
:label: eq:ssm:k_th_largest_entry_l1_norm
| x_{(k)} | \leq \frac{\| \bx \|_1}{k}.
```
````
````{prf:proof}
Let $n_1, n_2, \dots, n_N$ be a permutation of $\{ 1, 2, \dots, N \}$ such that
$$
|x_{n_1} | \geq | x_{n_2} | \geq \dots \geq | x_{n_N} |.
$$
Thus, the $k$-th largest entry in $\bx$ is $x_{n_k}$.
It is clear that
$$
\| \bx \|_1 = \sum_{i=1}^N | x_i | = \sum_{i=1}^N |x_{n_i} |.
$$
Obviously
$$
|x_{n_1} | \leq \sum_{i=1}^N |x_{n_i} | = \| \bx \|_1.
$$
Similarly
$$
k |x_{n_k} | = |x_{n_k} | + \dots + |x_{n_k} |
\leq |x_{n_1} | + \dots + |x_{n_k} |
\leq \sum_{i=1}^N |x_{n_i} |
\leq \| \bx \|_1.
$$
Thus
$$
|x_{n_k} | \leq \frac{\| \bx \|_1}{k}.
$$
````
## Sparse Signals
```{div}
In this subsection we explore some useful properties of $\Sigma_K$,
the set of $K$-sparse signals in standard basis
for $\CC^N$.
We recall that
$$
\Sigma_K = \{ \bx \in \CC^N \ST \| \bx \|_0 \leq K \}.
$$
We established before that this set is a union of $\binom{N}{K}$ subspaces of $\CC^N$ each of which
is is constructed by an index set $\Lambda \subset \{1, \dots, N \}$ with $| \Lambda | = K$ choosing
$K$ specific dimensions of $\CC^N$.
We first present some results which connect the $l_1$, $l_2$ and $l_{\infty}$ norms of vectors
in $\Sigma_K$.
```
````{prf:theorem} Relation between norms of sparse vectors
:label: lem:u_sigma_k_norms
Suppose $\bu \in \Sigma_K$.
Then
$$
\frac{\| \bu\|_1}{\sqrt{K}} \leq \| \bu \|_2 \leq \sqrt{K} \| \bu \|_{\infty}.
$$
````
````{prf:proof}
Due to {prf:ref}`res:ssm:l1_norm_as_inner_product_with_sign_vector`,
we can write $\ell_1$ norm as
$$
\| \bu \|_1 = \langle \bu, \sgn (\bu) \rangle.
$$
By Cauchy-Schwartz inequality we have
$$
\langle \bu, \sgn (\bu) \rangle \leq \| \bu \|_2 \| \sgn (\bu) \|_2
$$
Since $\bu \in \Sigma_K$,
$\sgn(\bu)$ can have at most $K$ non-zero values each with magnitude 1.
Thus, we have
$$
\| \sgn (\bu) \|_2^2 \leq K \implies \| \sgn (\bu) \|_2 \leq \sqrt{K}.
$$
Thus we get the lower bound
$$
\| \bu \|_1 \leq \| \bu \|_2 \sqrt{K}
\implies \frac{\| \bu \|_1}{\sqrt{K}} \leq \| \bu \|_2.
$$
Now $| u_i | \leq \max(| u_i |) = \| \bu \|_{\infty}$.
So we have
$$
\| \bu \|_2^2 = \sum_{i= 1}^{N} | u_i |^2 \leq K \| \bu \|_{\infty}^2
$$
since there are only $K$ non-zero terms in the expansion of $\| \bu \|_2^2$.
This establishes the upper bound:
$$
\| \bu \|_2 \leq \sqrt{K} \| \bu \|_{\infty}.
$$
````
## Compressible Signals
In this subsection, we first look at some general results and definitions
related to $K$-term approximations of arbitrary signals $\bx \in \CC^N$.
We then define the notion of a compressible signal and study properties related to it.
### K-term Approximation of General Signals
```{index} Signal; restriction, Signal; mask
```
````{prf:definition} Restriction of a signal
:label: def:ssm:signal_restriction
Let $\bx \in \CC^N$.
Let $T \subset \{ 1, 2, \dots, N\}$ be any index set.
Further let
$$
T = \{t_1, t_2, \dots, t_{|T|}\}
$$
such that
$$
t_1 < t_2 < \dots < t_{|T|}.
$$
Let $\bx_T \in \CC^{|T|}$ be defined as
```{math}
:label: eq:ssm:signal_restriction
\bx_T = \begin{pmatrix}
x_{t_1} & x_{t_2} & \dots & x_{t_{|T|}}
\end{pmatrix}.
```
Then $\bx_T$ is a *restriction* of the signal $\bx$ on the index set $T$.
Alternatively let $\bx_T \in \CC^N$ be defined as
```{math}
:label: eq:ssm:signal_mask
\bx_{T}(i) = \begin{cases}
\bx(i) & \text{ if } i \in T;\\
0 & \text{ otherwise}.
\end{cases}
```
In other words, $\bx_T \in \CC^N$ keeps the entries in $\bx$
indexed by $T$ while sets all other entries to 0.
Then we say that $\bx_T$ is obtained by *masking* $\bx$ with $T$.
````
As an abuse of notation, we will use any of the two definitions
whenever we are referring to $\bx_T$.
The definition being used should be obvious from the context.
````{prf:example} Restrictions on index sets
:label: ex-ssm-signal-restriction-1
$$
\bx = \begin{pmatrix}
-1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0
\end{pmatrix} \in \CC^{10}.
$$
Let
$$
T = \{ 1, 3, 7, 8\}.
$$
Then
$$
\bx_T = \begin{pmatrix}
-1 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{pmatrix} \in \CC^{10}.
$$
Since $|T| = 4$, sometimes we will also write
$$
\bx = \begin{pmatrix}
-1 & 8 & 0 & 0
\end{pmatrix} \in \CC^4.
$$
````
```{index} Signal; $K$-term approximation
```
````{prf:definition} $K$-term signal approximation
:label: def:ssm:k_term_signal_approximation
Let $\bx \in \CC^N$ be an arbitrary signal.
Consider any index set $T \subset \{1, \dots, N \}$
with $|T| = K$.
Then $\bx_T$ is a *$K$-term approximation* of $\bx$.
````
Clearly for any $\bx \in \CC^N$
there are $\binom{N}{K}$ possible $K$-term approximations of $\bx$.
````{prf:example} $K$-term approximation
:label: ex-ssm-k-term-approx-1
Let
$$
\bx = \begin{pmatrix}
-1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0
\end{pmatrix} \in \CC^{10}.
$$
Let $T= \{ 1, 6 \}$. Then
$$
\bx_T = \begin{pmatrix}
-1 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 0 & 0
\end{pmatrix}
$$
is a $2$-term approximation of $\bx$.
If we choose $T= \{7,8,9,10\}$,
the corresponding $4$-term approximation of $\bx$ is
$$
\begin{pmatrix}
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{pmatrix}.
$$
````
````{prf:definition} $K$-largest entries approximation
:label: def:ssm:largest_entries_signal
Let $\bx \in \CC^N$ be an arbitrary signal.
Let $\lambda_1, \dots, \lambda_N$ be
indices of entries in $\bx$ such that
$$
| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.
$$
In case of ties, the order is resolved lexicographically;
i.e., if $|x_i| = |x_j|$
and $i < j$ then $i$ will appear first in the sequence $\{ \lambda_k \}$.
Consider the index set $\Lambda_K = \{ \lambda_1, \lambda_2, \dots, \lambda_K\}$.
The restriction of $\bx$ on $\Lambda_K$ given by $x_{\Lambda_K}$
contains the $K$ largest entries $\bx$
while setting all other entries to 0. This is known
as the *$K$ largest entries approximation* of $\bx$.
This signal is denoted henceforth as $\bx|_K$;
i.e.
```{math}
:label: eq-ssm-best-k-term-approx
\bx|_K = \bx_{\Lambda_K}
```
where $\Lambda_K$ is the index set corresponding
to $K$ largest entries in $\bx$ (magnitude wise).
````
````{prf:example} Largest entries approximation
:label: ex-ssm-k-largest-entries-approx-1
Let
$$
\bx = \begin{pmatrix}
-1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0
\end{pmatrix}.
$$
Then
$$
\bx|_1 = \begin{pmatrix}
0 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{pmatrix}.
$$
$$
\bx|_2 = \begin{pmatrix}
0 & 5 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{pmatrix}.
$$
$$
\bx|_3 = \begin{pmatrix}
0 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0
\end{pmatrix}
$$
$$
\bx|_4 = \bx.
$$
All further $K$ largest entries approximations are same as $\bx$.
````
A pertinent question at this point is:
which $K$-term approximation of $\bx$ is the best
$K$-term approximation?
Certainly in order to compare two approximations we need
some criterion.
Let us choose $\ell_p$ norm as the criterion.
The next result gives an interesting result for
best $K$-term approximations in $\ell_p$ norm sense.
````{prf:theorem} Best $K$-term approximation for $\ell_p$ norms
:label: lem:ssm:best_k_term_approximation
Let $\bx \in \CC^N$.
Let the best $K$ term approximation of $\bx$ be obtained by the
following optimization program:
```{math}
:label: eq:best_k_term_approximation_optimization_problem
& \underset{T \subset \{1, \dots, N\}}{\text{maximize}}
& & \| \bx_T \|_p \\
& \text{subject to }
& & |T| = K.
```
where $p \in [1, \infty]$.
Let an optimal solution for this optimization problem be denoted by
$\bx_{T^*}$.
Then
$$
\| \bx|_K \|_p = \| \bx_{T^*} \|_p;
$$
i.e., the $K$-largest entries approximation of $\bx$
is an optimal solution to
{eq}`eq:best_k_term_approximation_optimization_problem`.
````
````{prf:proof}
For $p=\infty$, the result is obvious.
In the following, we focus on $p \in [1, \infty)$.
We note that maximizing $\| \bx_T \|_p$ is equivalent to maximizing
$ \| \bx_T \|_p^p$.
Let $\lambda_1, \dots, \lambda_N$ be
indices of entries in $x$ such that
$$
| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.
$$
Further let $\{ \omega_1, \dots, \omega_N\}$ be any permutation of $\{1, \dots, N \}$.
Clearly
$$
\| \bx|_K \|_p^{p} = \sum_{i=1}^K |\bx_{\lambda_i}|^{p} \geq \sum_{i=1}^K |\bx_{\omega_i}|^{p}.
$$
Thus if $T^*$ corresponds to an optimal solution of
{eq}`eq:best_k_term_approximation_optimization_problem`
then
$$
\| \bx|_K \|_p^{p} = \| \bx_{T^*} \|_p^{p}.
$$
Thus $\bx|_K$ is an optimal solution to
{eq}`eq:best_k_term_approximation_optimization_problem`.
````
This result helps us establish that whenever we are looking for a best $K$-term
approximation of $\bx$ under any $\ell_p$ norm, all we have to do is to pickup
the $K$-largest entries in $\bx$.
````{prf:definition} Restriction of a matrix
:label: def:ssm:matrix_restriction
Let $\Phi \in \CC^{M \times N}$.
Let $T \subset \{ 1, 2, \dots, N\}$ be any index set.
Further let
$$
T = \{t_1, t_2, \dots, t_{|T|}\}
$$
such that
$$
t_1 < t_2 < \dots < t_{|T|}.
$$
Let $\Phi_T \in \CC^{M \times |T|}$ be defined as
```{math}
:label: eq:ssm:matrix_restriction
\Phi_T = \begin{bmatrix}
\phi_{t_1} & \phi_{t_2} & \dots & \phi_{t_{|T|}}
\end{bmatrix}.
```
Then $\Phi_T$ is a *restriction* of the matrix $\Phi$ on the index set $T$.
Alternatively let $\Phi_T \in \CC^{M \times N}$ be defined as
```{math}
:label: eq:ssm:matrix_mask
(\Phi_{T})_i = \left\{
\begin{array}{ll}
\phi_i & \mbox{if $i \in T$};\\
\bzero & \mbox{otherwise}.
\end{array}
\right.
```
In other words, $\Phi_T \in \CC^{M \times N}$
keeps the columns in $\Phi$
indexed by $T$ while sets all other columns to $\bzero$.
Then we say that $\Phi_T$ is obtained by *masking* $\Phi$ with $T$.
````
As an abuse of notation, we will use any of the two definitions
whenever we are referring to $\Phi_T$. The definition
being used should be obvious from the context.
````{prf:theorem}
:label: lem:ssm:restriction_simplification_sparse_vector
Let $\supp(\bx) = \Lambda$. Then
$$
\Phi \bx = \Phi_{\Lambda} \bx_{\Lambda}.
$$
````
````{prf:proof}
This follows from:
$$
\Phi \bx = \sum_{i=1}^N x_i \phi_i
= \sum_{\lambda_i \in \Lambda} x_{\lambda_i} \phi_{\lambda_i}
= \Phi_{\Lambda} x_{\Lambda}.
$$
````
The result remains valid whether we use
the restriction or the mask version of $\bx_{\Lambda}$
notation as long as same version is used
for both $\Phi$ and $\bx$.
````{prf:corollary}
:label: cor:ssm:matrix_vector_product_disjoint_set_seperation
Let $S$ and $T$ be two disjoint index sets such that
for some $\bx \in \CC^N$
$$
\bx = x_T + x_S
$$
using the mask version of $x_{\Lambda}$ notation.
Then the following holds
$$
\Phi \bx = \Phi_T \bx_T + \Phi_S \bx_S.
$$
````
````{prf:proof}
Straightforward application of
{prf:ref}`lem:ssm:restriction_simplification_sparse_vector`:
$$
\Phi \bx = \Phi \bx_T + \Phi \bx_S = \Phi_T \bx_T + \Phi_S \bx_S.
$$
````
````{prf:theorem}
:label: lem:ssm:restriction_on_matrix_vector_product
Let $T$ be any index set. Let $\Phi \in \CC^{M \times N}$
and $\by \in \CC^M$.
Then
$$
[\Phi^H \by]_T = \Phi_T^H \by.
$$
````
````{prf:proof}
Note that
$$
\Phi^H \by =
\begin{bmatrix}
\langle \phi_1 , \by \rangle\\
\vdots \\
\langle \phi_N , \by \rangle\\
\end{bmatrix}
$$
Now let
$$
T = \{ t_1, \dots, t_K \}.
$$
Then
$$
[\Phi^H \by]_T =
\begin{bmatrix}
\langle \phi_{t_1} , \by \rangle\\
\vdots \\
\langle \phi_{t_K} , \by \rangle\\
\end{bmatrix}
= \Phi_T^H \by.
$$
````
The result remains valid whether we use
the restriction or the mask version of $\Phi_T$
notation.
### Compressible Signals
We will now define the notion of a compressible signal in terms of the decay rate
of magnitude of its entries when sorted in descending order.
```{index} Signal; compressible
```
````{prf:definition} Compressible signal
:label: def:ssm:p_compressible_signal
Let $\bx \in \CC^N$ be an arbitrary signal.
Let $\lambda_1, \dots, \lambda_N$ be
indices of entries in $\bx$ such that
$$
| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.
$$
In case of ties, the order is resolved lexicographically, i.e. if $|x_i| = |x_j|$
and $i < j$ then $i$ will appear first in the sequence $\{ \lambda_k \}$.
Define
```{math}
:label: eq:x_sorted_in_magnitude_descending
\widehat{\bx} = (x_{\lambda_1}, x_{\lambda_2}, \dots, x_{\lambda_N}).
```
The signal $\bx$ is called *$p$-compressible* with magnitude $R$
if there exists $p \in (0, 1]$ such that
```{math}
:label: eq:p_compressible_signal_entry
| \widehat{x}_i |\leq R \cdot i^{-\frac{1}{p}} \quad \forall i=1, 2,\dots, N.
```
````
````{prf:theorem} $1$-compressible signals
:label: lem:ssm:compressible_p_1
Let $\bx$ be be $p$-compressible with $p=1$. Then
$$
\| \bx \|_1 \leq R (1 + \ln (N)).
$$
````
````{prf:proof}
Recalling $\widehat{x}$ from
{eq}`eq:x_sorted_in_magnitude_descending`
it is straightforward to see that
$$
\|\bx\|_1 = \|\widehat{\bx}\|_1
$$
since the $\ell_1$ norm doesn't depend on the ordering of entries in $\bx$.
Now since $\bx$ is $1$-compressible,
hence from {eq}`eq:p_compressible_signal_entry` we have
$$
|\widehat{x}_i | \leq R \frac{1}{i}.
$$
This gives us
$$
\|\widehat{x}\|_1 \leq \sum_{i=1}^N R \frac{1}{i} = R \sum_{i=1}^N \frac{1}{i}.
$$
The sum on the R.H.S. is the $N$-th Harmonic number
(sum of reciprocals of first $N$ natural numbers).
A simple upper bound on Harmonic numbers is
$$
H_k \leq 1 + \ln(k).
$$
This completes the proof.
````
We now demonstrate how a compressible signal is well approximated by a sparse signal.
````{prf:theorem} Sparse approximation of compressible signals
:label: lem:ssm:compressible_p_sparse_approximation
Let $\bx$ be a $p$-compressible signal and let
$\bx|_K$ be its best $K$-term approximation.
Then the $\ell_1$ norm of approximation error satisfies
```{math}
:label: eq:compressible_p_sparse_approximation_error_l1_norm
\| \bx - \bx|_K\|_1 \leq C_p \cdot R \cdot K^{1 - \frac{1}{p}}
```
with
$$
C_p = \left (\frac{1}{p} - 1 \right)^{-1}.
$$
Moreover the $\ell_2$ norm of approximation error satisfies
```{math}
:label: eq:compressible_p_sparse_approximation_error_l2_norm
\| \bx - \bx|_K\|_2 \leq D_p \cdot R \cdot K^{1 - \frac{1}{p}}
```
with
$$
D_p = \left (\frac{2}{p} - 1 \right )^{-1/2}.
$$
````
````{prf:proof}
Expanding the $\ell_1$ approximation error
$$
\| \bx - \bx|_K\|_1 = \sum_{i=K+1}^N |x_{\lambda_i}|
\leq R \sum_{i=K+1}^N i^{-\frac{1}{p}}.
$$
We now approximate the R.H.S. sum with an integral.
$$
\sum_{i=K+1}^N i^{-\frac{1}{p}}
\leq \int_{x=K}^N x^{-\frac{1}{p}} d x
\leq \int_{x=K}^{\infty} x^{-\frac{1}{p}} d x.
$$
Now
$$
\int_{x=K}^{\infty} x^{-\frac{1}{p}} d x =
\left [ \frac{x^{1-\frac{1}{p}}}{1-\frac{1}{p}} \right ]_{K}^{\infty}
= C_p K^{1 - \frac{1}{p}}.
$$
We can similarly show the result for $\ell_2$ norm.
````
```{prf:example} Sparse approximation for $\frac{1}{2}$-compressible signals
:label: ex-ssm-k-sparse-approx-half-compressible
Let $p = \frac{1}{2}$. Then
$$
C_p = \left (\frac{1}{p} - 1 \right)^{-1} = 1
\text{ and }
D_p = \left (\frac{2}{p} - 1 \right )^{-1/2} = \frac{1}{\sqrt{3}}.
$$
Hence
$$
\| \bx - \bx|_K\|_1 \leq \frac{R}{K}
$$
and
$$
\| \bx - \bx|_K\|_2 \leq \frac{1}{\sqrt{3}} \frac{R}{K}.
$$
Both $\ell_1$ and $\ell_2$ approximation error bounds decrease
as $K$ increases at the same rate.
```