18.5. Sparse and Redundant Representations#

18.5.1. Dictionaries#

Definition 18.4 (Dictionary)

A dictionary for $C^{N}$ is a finite collection $D$ of unit-norm vectors which span the whole space.

The elements of a dictionary are called atoms and they are denoted by $ϕ_{ω}$ where $ω$ is drawn from an index set $Ω$ .

The dictionary is written as

D = {ϕ_{ω} : ω \in Ω}

where

‖ ϕ_{ω} ‖_{2} = 1 \forall ω \in Ω

and every signal $x \in C^{N}$ can be expressed as

x = \sum_{ω \in Ω} c_{ω} ϕ_{ω} .

We use the letter $D$ to denote the number of elements in the dictionary; i.e.,

D = | Ω | .

This definition is adapted from [77].

The indices may have an interpretation, such as the time-frequency or time-scale localization of an atom, or they may simply be labels without any underlying meaning.

Note that the dictionary need not provide a unique representation for any vector $x \in C^{N}$ , but it provides at least one representation for each $x \in C^{N}$ .

When $D = N$ we have a set of unit norm vectors which span the whole of $C^{N}$ . Thus we have a basis (not-necessarily an orthonormal basis). A dictionary cannot have $D < N$ . The more interesting case is when $D > N$ .

18.5.2. Redundant Dictionaries and Sparse Signals#

With $D > N$ , clearly there are more atoms than necessary to provide a representation of every signal $x \in C^{N}$ . Thus such a dictionary is able provide multiple representations to same vector $x$ . We call such dictionaries redundant dictionaries or over-complete dictionaries.

In contrast a basis with $D = N$ is called a complete dictionary. A special class of signals is those signals which have a sparse representation in a given dictionary $D$ .

Definition 18.5 ( $(D, K)$ -sparse signals)

A signal $x \in C^{N}$ is called $(D, K)$ -sparse if it can be expressed as a linear combination of at-most $K$ atoms from the dictionary $D$ .

Normally, for sparse signals, we have $K ≪ D$ . It is usually expected that $K ≪ N$ also holds.

Let $Λ \subset Ω$ be a subset of indices with $| Λ | = K$ .

Let $x$ be any signal in $C^{N}$ such that $x$ can be expressed as

x = \sum_{λ \in Λ} b_{λ} ϕ_{λ} where b_{λ} \in C .

Note that this is not the only possible representation of $x$ in $D$ . This is just one of the possible representations of $x$ . The special feature of this representation is that it is $K$ -sparse i.e. only at most $K$ atoms from the dictionary are being used.

Now there are $(\binom{D}{K})$ ways in which we can choose a set of $K$ atoms from the dictionary $D$ .

Thus the set of $(D, K)$ -sparse signals is given by

Σ_{(D, K)} = {x \in C^{N} | x = \sum_{λ \in Λ} b_{λ} ϕ_{λ}, Λ \subseteq Ω, | Λ | = K} .

This set $Σ_{(D, K)}$ is dependent on the chosen dictionary $D$ . In the sequel, we will simply refer to it as $Σ_{K}$ .

Example 18.15 ( $K$ -sparse signals for standard basis)

For the special case where $D$ is nothing but the standard basis of $C^{N}$ , then

Σ_{K} = {x \in C^{N} | ‖ x ‖_{0} \leq K};

i.e., the set of signals which have $K$ or less non-zero elements.

Example 18.16 ( $K$ -sparse signals for orthonormal basis)

In contrast if we choose an orthonormal basis $Ψ$ such that every $x \in C^{N}$ can be expressed as

x = Ψ a

then with the dictionary $D = Ψ$ , the set of $K$ -sparse signals is given by

Σ_{K} = {x = Ψ a | ‖ a ‖_{0} \leq K} .

We also note that for a specific choice of $Λ \subseteq Ω$ with $| Λ | = K$ , the set of vectors

{x \in C^{N} | x = \sum_{λ \in Λ} b_{λ} ϕ_{λ}} = span {ϕ_{λ} | λ \in Λ}

form a subspace of $C^{N}$ .

So we have $(\binom{D}{K})$ $K$ -sparse subspaces contained in the dictionary $D$ . And the $K$ -sparse signals lie in the union of these subspaces.

18.5.3. Sparse Approximation Problem#

In sparse approximation problem, we attempt to approximate a given signal $x \in C^{N}$ as a linear combination of $K$ atoms from the dictionary $D$ where $K ≪ N$ and typically $N ≪ D$ ; i.e., the number of atoms in a dictionary $D$ is typically much larger than the ambient signal space dimension $N$ .

Naturally, we wish to obtain a best possible sparse representation of $x$ over the atoms $ϕ_{ω} \in D$ which minimizes the approximation error.

Let $Λ$ denote the index set of atoms which are used to create a $K$ -sparse representation of $x$ where $Λ \subset Ω$ with $| Λ | = K$ .

Let $x_{Λ}$ denote an approximation of $x$ over the set of atoms indexed by $Λ$ .

Then we can write $x_{Λ}$ as

x_{Λ} = \sum_{λ \in Λ} b_{λ} ϕ_{λ} where b_{λ} \in C .

We put all complex valued coefficients $b_{λ}$ in the sum into a list $b_{Λ}$ . The approximation error is given by

e = ‖ x - x_{Λ} ‖_{2} .

Clearly we would like to minimize the approximation error over all possible choices of $K$ atoms and corresponding set of coefficients $b_{Λ}$ .

Thus the sparse approximation problem can be cast as a minimization problem given by

(18.18)#

min_{| Λ | = K} min_{b_{Λ}} {‖ x - \sum_{λ \in Λ} b_{λ} ϕ_{λ} ‖}_{2} .

If we choose a particular $Λ$ , then the inner minimization problem becomes a straight-forward least squares problem. But there are $(\binom{D}{K})$ possible choices of $Λ$ and solving the inner least squares problem for each of them becomes prohibitively expensive.

We reemphasize here that in this formulation we are using a fixed dictionary $D$ while the vector $x \in C^{N}$ is arbitrary.

This problem is known as $(D, K)$ -SPARSE approximation problem.

A related problem is known as $(D, K)$ -EXACT-SPARSE problem where it is known a-priori that $x$ is a linear combination of at-most $K$ atoms from the given dictionary $D$ i.e. $x$ is a $K$ -sparse signal as defined above for the dictionary $D$ .

This formulation simplifies the minimization problem (18.18) since it is known a priori that for $K$ -sparse signals, a $0$ approximation error can be achieved. The only problem is to find a set of subspaces from the $(\binom{D}{K})$ possible $K$ -sparse subspaces which are able to provide a $K$ -sparse representation of $x$ and among them choose one. It is imperative to note that even the $K$ -sparse representation need not be unique.

Clearly the EXACT-SPARSE problem is simpler than the SPARSE approximation problem. Thus if EXACT-SPARSE problem is NP-Hard then so is the harder SPARSE-approximation problem. It is expected that solving the EXACT-SPARSE problem will provide insights into solving the SPARSE problem.

In Theorem 18.8 we identified conditions under which a sparse representation for a given vector $x$ in a two-ortho-basis is unique. It would be useful to get similar conditions for general dictionaries. such conditions would help us guarantee the uniqueness of EXACT-SPARSE problem.

18.5.4. Synthesis and Analysis#

The atoms of a dictionary $D$ can be organized into a $N \times D$ matrix as follows:

Φ ≜ [\begin{matrix} ϕ_{ω_{1}} & ϕ_{ω_{2}} & \dots & ϕ_{ω_{D}} \end{matrix}] .

where $Ω = {ω_{1}, ω_{2}, \dots, ω_{N}}$ is the index set for the atoms of $D$ . We recall that $ϕ_{ω} \in C^{N}$ , hence they have a column vector representation in the standard basis for $C^{N}$ . The order of columns doesn’t matter as long as it remains fixed once chosen.

Thus in matrix terminology a representation of $x \in C^{N}$ in the dictionary can be written as

x = Φ b

where $b \in C^{D}$ is a vector of coefficients to produce a superposition $x$ from the atoms of dictionary $D$ . Clearly with $D > N$ , $b$ is not unique. Rather for every vector $z \in N (Φ)$ , we have:

Φ (b + z) = Φ b + Φ z = x + 0 = x .

Definition 18.6 (Synthesis matrix)

The matrix $Φ$ is called a synthesis matrix since $x$ is synthesized from the columns of $Φ$ with the coefficient vector $b$ .

We can also view the synthesis matrix $Φ$ as a linear operator from $C^{D}$ to $C^{N}$ .

There is another way to look at $x$ through $Φ$ .

Definition 18.7 (Analysis matrix)

The conjugate transpose $Φ^{H}$ of the synthesis matrix $Φ$ is called the analysis matrix. It maps a given vector $x \in C^{N}$ to a list of inner products with the dictionary:

c = Φ^{H} x

where $c \in C^{D}$ .

Note that in general $x \neq Φ (Φ^{H} x)$ unless $D$ is an orthonormal basis.

Definition 18.8 ( $(D, K)$ EXACT SPARSE problem)

With the help of synthesis matrix $Φ$ , the $(D, K)$ EXACT SPARSE problem can now be written as

(18.19)#

\begin{aligned} \underset{a}{minimize} & ‖ a ‖_{0} \\ subject to & x = Φ a \\ and & ‖ a ‖_{0} \leq K \end{aligned}

If $x \notin Σ_{K}$ , then the EXACT SPARSE problem is infeasible. Otherwise, we are looking to find the sparsest possible solution.

Definition 18.9 ( $(D, K)$ SPARSE approximation problem)

With the help of synthesis matrix $Φ$ , the $(D, K)$ SPARSE approximation problem can now be written as

(18.20)#

\begin{aligned} \underset{a}{minimize} & ‖ x - Φ a ‖_{2} \\ subject to & ‖ a ‖_{0} \leq K . \end{aligned}

This problem can be visualized as a projection of $x$ on to the set $Σ_{K}$ . Hence, it always has a solution.

18.5.5. P-Norms#

There are some simple and useful results on relationships between different $p$ -norms listed in this section. We also discuss some interesting properties of $l_{1}$ -norm specifically.

Definition 18.10 (Complex sign vector)

Let $v \in C^{N}$ . Let the entries in $v$ be represented as

v_{k} = r_{k} \exp (i θ_{k})

where $r_{k} = | v_{k} |$ with the convention that $θ_{k} = 0$ whenever $r_{k} = 0$ .

The sign vector for $v$ denoted by $sgn (v)$ is defined as

\begin{array}{r} sgn (v) = [\begin{array}{c} sgn (v_{1}) \\ ⋮ \\ sgn (v_{N}) \end{array}] \end{array}

where

\begin{array}{r} sgn (v_{k}) = {\begin{cases} \exp (i θ_{k}) & if r_{k} \neq 0; \\ 0 & if r_{k} = 0. \end{cases} \end{array}

Theorem 18.10 ( $ℓ_{1}$ norm as product of vector with its sign)

For any $v \in C^{N}$ :

‖ v ‖_{1} = sgn (v)^{H} v = ⟨ v, sgn (v) ⟩ .

Proof. This follows from:

‖ v ‖_{1} = \sum_{k = 1}^{N} r_{k} = \sum_{k = 1}^{N} [r_{k} e^{i θ_{k}}] e^{- i θ_{k}} = \sum_{k = 1}^{N} v_{k} e^{- i θ_{k}} = sgn (v)^{H} v .

Note that whenever $v_{k} = 0$ , corresponding $0$ entry in $sgn (v)$ has no effect on the sum.

Theorem 18.11 (Equivalence of $ℓ_{1}$ and $ℓ_{2}$ norms)

Suppose $v \in C^{N}$ . Then

‖ v ‖_{2} \leq ‖ v ‖_{1} \leq \sqrt{N} ‖ v ‖_{2} .

Proof. For the lower bound, we go as follows

‖ v ‖_{2}^{2} = \sum_{i = 1}^{N} | v_{i} |^{2} \leq (\sum_{i = 1}^{N} | v_{i} |^{2} + 2 \sum_{i, j, i \neq j} | v_{i} | | v_{j} |) = {(\sum_{i = 1}^{N} | v_{i} |)}^{2} = ‖ v ‖_{1}^{2} .

This gives us

‖ v ‖_{2} \leq ‖ v ‖_{1} .

We can write $ℓ_{1}$ norm as

‖ v ‖_{1} = ⟨ v, sgn (v) ⟩ .

By Cauchy-Schwartz inequality we have

⟨ v, sgn (v) ⟩ \leq ‖ v ‖_{2} ‖ sgn (v) ‖_{2} .

Since $sgn (v)$ can have at most $N$ non-zero values, each with magnitude 1,

‖ sgn (v) ‖_{2}^{2} \leq N ⟹ ‖ sgn (v) ‖_{2} \leq \sqrt{N} .

Thus, we get

‖ v ‖_{1} \leq \sqrt{N} ‖ v ‖_{2} .

Theorem 18.12 (Equivalence of $ℓ_{2}$ and $ℓ_{\infty}$ norms)

Let $v \in C^{N}$ . Then

‖ v ‖_{2} \leq \sqrt{N} ‖ v ‖_{\infty} .

Proof. This follows from:

‖ v ‖_{2}^{2} = \sum_{i = 1}^{N} | v_{i} |^{2} \leq N max_{1 \leq i \leq N} (| v_{i} |^{2}) = N ‖ v ‖_{\infty}^{2} .

Thus

‖ v ‖_{2} \leq \sqrt{N} ‖ v ‖_{\infty} .

Theorem 18.13 (Relationship between $p$ -norms)

Let $v \in C^{N}$ . Let $1 \leq p, q \leq \infty$ . Then

‖ v ‖_{q} \leq ‖ v ‖_{p} whenever p \leq q .

Proof. TBD

Theorem 18.14

Let $1 \in C^{N}$ be the vector of all ones; i.e., $1 = (1, \dots, 1)$ . Let $v \in C^{N}$ be some arbitrary vector. Let $| v |$ denote the vector of absolute values of entries in $v$ ; i.e., $| v |_{i} = | v_{i} | \forall 1 \leq i \leq N$ . Then

‖ v ‖_{1} = 1^{T} | v | = 1^{H} | v | .

Proof. This follows from:

1^{T} | v | = \sum_{i = 1}^{N} | v |_{i} = \sum_{i = 1}^{N} | v_{i} | = ‖ v ‖_{1} .

Finally since $1$ consists only of real entries, hence its transpose and Hermitian transpose are same.

Theorem 18.15

Let $1 \in C^{N \times N}$ be a square matrix of all ones. Let $v \in C^{N}$ be some arbitrary vector. Then

| v |^{T} 1 | v | = ‖ v ‖_{1}^{2} .

Proof. We know that

1 = 1 1^{T}

Thus,

| v |^{T} 1 | v | = | v |^{T} 1 1^{T} | v | = (1^{T} | v |)^{T} (1^{T} | v |) = ‖ v ‖_{1} ‖ v ‖_{1} = ‖ v ‖_{1}^{2} .

We used the fact that $‖ v ‖_{1} = 1^{T} | v |$ .

Theorem 18.16 (An upper bound on the $k$ -th largest value)

$k$ -th largest (magnitude) entry in a vector $x \in C^{N}$ denoted by $x_{(k)}$ obeys

(18.21)#

| x_{(k)} | \leq \frac{‖ x ‖_{1}}{k} .

Proof. Let $n_{1}, n_{2}, \dots, n_{N}$ be a permutation of ${1, 2, \dots, N}$ such that

| x_{n_{1}} | \geq | x_{n_{2}} | \geq \dots \geq | x_{n_{N}} | .

Thus, the $k$ -th largest entry in $x$ is $x_{n_{k}}$ . It is clear that

‖ x ‖_{1} = \sum_{i = 1}^{N} | x_{i} | = \sum_{i = 1}^{N} | x_{n_{i}} | .

Obviously

| x_{n_{1}} | \leq \sum_{i = 1}^{N} | x_{n_{i}} | = ‖ x ‖_{1} .

Similarly

k | x_{n_{k}} | = | x_{n_{k}} | + \dots + | x_{n_{k}} | \leq | x_{n_{1}} | + \dots + | x_{n_{k}} | \leq \sum_{i = 1}^{N} | x_{n_{i}} | \leq ‖ x ‖_{1} .

Thus

| x_{n_{k}} | \leq \frac{‖ x ‖_{1}}{k} .

18.5.6. Sparse Signals#

In this subsection we explore some useful properties of $Σ_{K}$ , the set of $K$ -sparse signals in standard basis for $C^{N}$ .

We recall that

Σ_{K} = {x \in C^{N} | ‖ x ‖_{0} \leq K} .

We established before that this set is a union of $(\binom{N}{K})$ subspaces of $C^{N}$ each of which is is constructed by an index set $Λ \subset {1, \dots, N}$ with $| Λ | = K$ choosing $K$ specific dimensions of $C^{N}$ .

We first present some results which connect the $l_{1}$ , $l_{2}$ and $l_{\infty}$ norms of vectors in $Σ_{K}$ .

Theorem 18.17 (Relation between norms of sparse vectors)

Suppose $u \in Σ_{K}$ . Then

\frac{‖ u ‖_{1}}{\sqrt{K}} \leq ‖ u ‖_{2} \leq \sqrt{K} ‖ u ‖_{\infty} .

Proof. Due to Theorem 18.10, we can write $ℓ_{1}$ norm as

‖ u ‖_{1} = ⟨ u, sgn (u) ⟩ .

By Cauchy-Schwartz inequality we have

⟨ u, sgn (u) ⟩ \leq ‖ u ‖_{2} ‖ sgn (u) ‖_{2}

Since $u \in Σ_{K}$ , $sgn (u)$ can have at most $K$ non-zero values each with magnitude 1. Thus, we have

‖ sgn (u) ‖_{2}^{2} \leq K ⟹ ‖ sgn (u) ‖_{2} \leq \sqrt{K} .

Thus we get the lower bound

‖ u ‖_{1} \leq ‖ u ‖_{2} \sqrt{K} ⟹ \frac{‖ u ‖_{1}}{\sqrt{K}} \leq ‖ u ‖_{2} .

Now $| u_{i} | \leq max (| u_{i} |) = ‖ u ‖_{\infty}$ . So we have

‖ u ‖_{2}^{2} = \sum_{i = 1}^{N} | u_{i} |^{2} \leq K ‖ u ‖_{\infty}^{2}

since there are only $K$ non-zero terms in the expansion of $‖ u ‖_{2}^{2}$ . This establishes the upper bound:

‖ u ‖_{2} \leq \sqrt{K} ‖ u ‖_{\infty} .

18.5.7. Compressible Signals#

In this subsection, we first look at some general results and definitions related to $K$ -term approximations of arbitrary signals $x \in C^{N}$ . We then define the notion of a compressible signal and study properties related to it.

18.5.7.1. K-term Approximation of General Signals#

Definition 18.11 (Restriction of a signal)

Let $x \in C^{N}$ . Let $T \subset {1, 2, \dots, N}$ be any index set. Further let

T = {t_{1}, t_{2}, \dots, t_{| T |}}

such that

t_{1} < t_{2} < \dots < t_{| T |} .

Let $x_{T} \in C^{| T |}$ be defined as

(18.22)#

x_{T} = (\begin{matrix} x_{t_{1}} & x_{t_{2}} & \dots & x_{t_{| T |}} \end{matrix}) .

Then $x_{T}$ is a restriction of the signal $x$ on the index set $T$ .

Alternatively let $x_{T} \in C^{N}$ be defined as

(18.23)#

\begin{array}{r} x_{T} (i) = {\begin{cases} x (i) & if i \in T; \\ 0 & otherwise . \end{cases} \end{array}

In other words, $x_{T} \in C^{N}$ keeps the entries in $x$ indexed by $T$ while sets all other entries to 0. Then we say that $x_{T}$ is obtained by masking $x$ with $T$ .

As an abuse of notation, we will use any of the two definitions whenever we are referring to $x_{T}$ . The definition being used should be obvious from the context.

Example 18.17 (Restrictions on index sets)

x = (\begin{matrix} - 1 & 5 & 8 & 0 & 0 & - 3 & 0 & 0 & 0 & 0 \end{matrix}) \in C^{10} .

Let

T = {1, 3, 7, 8} .

Then

x_{T} = (\begin{matrix} - 1 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}) \in C^{10} .

Since $| T | = 4$ , sometimes we will also write

x = (\begin{matrix} - 1 & 8 & 0 & 0 \end{matrix}) \in C^{4} .

Definition 18.12 ( $K$ -term signal approximation)

Let $x \in C^{N}$ be an arbitrary signal. Consider any index set $T \subset {1, \dots, N}$ with $| T | = K$ . Then $x_{T}$ is a $K$ -term approximation of $x$ .

Clearly for any $x \in C^{N}$ there are $(\binom{N}{K})$ possible $K$ -term approximations of $x$ .

Example 18.18 ( $K$ -term approximation)

Let

x = (\begin{matrix} - 1 & 5 & 8 & 0 & 0 & - 3 & 0 & 0 & 0 & 0 \end{matrix}) \in C^{10} .

Let $T = {1, 6}$ . Then

x_{T} = (\begin{matrix} - 1 & 0 & 0 & 0 & 0 & - 3 & 0 & 0 & 0 & 0 \end{matrix})

is a $2$ -term approximation of $x$ .

If we choose $T = {7, 8, 9, 10}$ , the corresponding $4$ -term approximation of $x$ is

(\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}) .

Definition 18.13 ( $K$ -largest entries approximation)

Let $x \in C^{N}$ be an arbitrary signal. Let $λ_{1}, \dots, λ_{N}$ be indices of entries in $x$ such that

| x_{λ_{1}} | \geq | x_{λ_{2}} | \geq \dots \geq | x_{λ_{N}} | .

In case of ties, the order is resolved lexicographically; i.e., if $| x_{i} | = | x_{j} |$ and $i < j$ then $i$ will appear first in the sequence ${λ_{k}}$ .

Consider the index set $Λ_{K} = {λ_{1}, λ_{2}, \dots, λ_{K}}$ . The restriction of $x$ on $Λ_{K}$ given by $x_{Λ_{K}}$ contains the $K$ largest entries $x$ while setting all other entries to 0. This is known as the $K$ largest entries approximation of $x$ .

This signal is denoted henceforth as $x |_{K}$ ; i.e.

(18.24)#

x |_{K} = x_{Λ_{K}}

where $Λ_{K}$ is the index set corresponding to $K$ largest entries in $x$ (magnitude wise).

Example 18.19 (Largest entries approximation)

Let

x = (\begin{matrix} - 1 & 5 & 8 & 0 & 0 & - 3 & 0 & 0 & 0 & 0 \end{matrix}) .

Then

x |_{1} = (\begin{matrix} 0 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}) .

x |_{2} = (\begin{matrix} 0 & 5 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}) .

x |_{3} = (\begin{matrix} 0 & 5 & 8 & 0 & 0 & - 3 & 0 & 0 & 0 & 0 \end{matrix})

x |_{4} = x .

All further $K$ largest entries approximations are same as $x$ .

A pertinent question at this point is: which $K$ -term approximation of $x$ is the best $K$ -term approximation? Certainly in order to compare two approximations we need some criterion. Let us choose $ℓ_{p}$ norm as the criterion. The next result gives an interesting result for best $K$ -term approximations in $ℓ_{p}$ norm sense.

Theorem 18.18 (Best $K$ -term approximation for $ℓ_{p}$ norms)

Let $x \in C^{N}$ . Let the best $K$ term approximation of $x$ be obtained by the following optimization program:

(18.25)#

\begin{aligned} \underset{T \subset {1, \dots, N}}{maximize} & ‖ x_{T} ‖_{p} \\ subject to & | T | = K . \end{aligned}

where $p \in [1, \infty]$ .

Let an optimal solution for this optimization problem be denoted by $x_{T^{*}}$ . Then

‖ x |_{K} ‖_{p} = ‖ x_{T^{*}} ‖_{p};

i.e., the $K$ -largest entries approximation of $x$ is an optimal solution to (18.25).

Proof. For $p = \infty$ , the result is obvious. In the following, we focus on $p \in [1, \infty)$ .

We note that maximizing $‖ x_{T} ‖_{p}$ is equivalent to maximizing $‖ x_{T} ‖_{p}^{p}$ .

Let $λ_{1}, \dots, λ_{N}$ be indices of entries in $x$ such that

| x_{λ_{1}} | \geq | x_{λ_{2}} | \geq \dots \geq | x_{λ_{N}} | .

Further let ${ω_{1}, \dots, ω_{N}}$ be any permutation of ${1, \dots, N}$ . Clearly

‖ x |_{K} ‖_{p}^{p} = \sum_{i = 1}^{K} | x_{λ_{i}} |^{p} \geq \sum_{i = 1}^{K} | x_{ω_{i}} |^{p} .

Thus if $T^{*}$ corresponds to an optimal solution of (18.25) then

‖ x |_{K} ‖_{p}^{p} = ‖ x_{T^{*}} ‖_{p}^{p} .

Thus $x |_{K}$ is an optimal solution to (18.25).

This result helps us establish that whenever we are looking for a best $K$ -term approximation of $x$ under any $ℓ_{p}$ norm, all we have to do is to pickup the $K$ -largest entries in $x$ .

Definition 18.14 (Restriction of a matrix)

Let $Φ \in C^{M \times N}$ . Let $T \subset {1, 2, \dots, N}$ be any index set. Further let

T = {t_{1}, t_{2}, \dots, t_{| T |}}

such that

t_{1} < t_{2} < \dots < t_{| T |} .

Let $Φ_{T} \in C^{M \times | T |}$ be defined as

(18.26)#

Φ_{T} = [\begin{matrix} ϕ_{t_{1}} & ϕ_{t_{2}} & \dots & ϕ_{t_{| T |}} \end{matrix}] .

Then $Φ_{T}$ is a restriction of the matrix $Φ$ on the index set $T$ .

Alternatively let $Φ_{T} \in C^{M \times N}$ be defined as

(18.27)#

\begin{array}{r} (Φ_{T})_{i} = {\begin{cases} ϕ_{i} & if i \in T; \\ 0 & otherwise . \end{cases} \end{array}

In other words, $Φ_{T} \in C^{M \times N}$ keeps the columns in $Φ$ indexed by $T$ while sets all other columns to $0$ . Then we say that $Φ_{T}$ is obtained by masking $Φ$ with $T$ .

As an abuse of notation, we will use any of the two definitions whenever we are referring to $Φ_{T}$ . The definition being used should be obvious from the context.

Theorem 18.19

Let $supp (x) = Λ$ . Then

Φ x = Φ_{Λ} x_{Λ} .

Proof. This follows from:

Φ x = \sum_{i = 1}^{N} x_{i} ϕ_{i} = \sum_{λ_{i} \in Λ} x_{λ_{i}} ϕ_{λ_{i}} = Φ_{Λ} x_{Λ} .

The result remains valid whether we use the restriction or the mask version of $x_{Λ}$ notation as long as same version is used for both $Φ$ and $x$ .

Corollary 18.1

Let $S$ and $T$ be two disjoint index sets such that for some $x \in C^{N}$

x = x_{T} + x_{S}

using the mask version of $x_{Λ}$ notation. Then the following holds

Φ x = Φ_{T} x_{T} + Φ_{S} x_{S} .

Proof. Straightforward application of Theorem 18.19:

Φ x = Φ x_{T} + Φ x_{S} = Φ_{T} x_{T} + Φ_{S} x_{S} .

Theorem 18.20

Let $T$ be any index set. Let $Φ \in C^{M \times N}$ and $y \in C^{M}$ . Then

[Φ^{H} y]_{T} = Φ_{T}^{H} y .

Proof. Note that

\begin{array}{r} Φ^{H} y = [\begin{array}{c} ⟨ ϕ_{1}, y ⟩ \\ ⋮ \\ ⟨ ϕ_{N}, y ⟩ \end{array}] \end{array}

Now let

T = {t_{1}, \dots, t_{K}} .

Then

\begin{array}{r} [Φ^{H} y]_{T} = [\begin{array}{c} ⟨ ϕ_{t_{1}}, y ⟩ \\ ⋮ \\ ⟨ ϕ_{t_{K}}, y ⟩ \end{array}] = Φ_{T}^{H} y . \end{array}

The result remains valid whether we use the restriction or the mask version of $Φ_{T}$ notation.

18.5.7.2. Compressible Signals#

We will now define the notion of a compressible signal in terms of the decay rate of magnitude of its entries when sorted in descending order.

Definition 18.15 (Compressible signal)

Let $x \in C^{N}$ be an arbitrary signal. Let $λ_{1}, \dots, λ_{N}$ be indices of entries in $x$ such that

| x_{λ_{1}} | \geq | x_{λ_{2}} | \geq \dots \geq | x_{λ_{N}} | .

In case of ties, the order is resolved lexicographically, i.e. if $| x_{i} | = | x_{j} |$ and $i < j$ then $i$ will appear first in the sequence ${λ_{k}}$ . Define

(18.28)#

\hat{x} = (x_{λ_{1}}, x_{λ_{2}}, \dots, x_{λ_{N}}) .

The signal $x$ is called $p$ -compressible with magnitude $R$ if there exists $p \in (0, 1]$ such that

(18.29)#

| {\hat{x}}_{i} | \leq R \cdot i^{- \frac{1}{p}} \forall i = 1, 2, \dots, N .

Theorem 18.21 ( $1$ -compressible signals)

Let $x$ be be $p$ -compressible with $p = 1$ . Then

‖ x ‖_{1} \leq R (1 + \ln (N)) .

Proof. Recalling $\hat{x}$ from (18.28) it is straightforward to see that

‖ x ‖_{1} = ‖ \hat{x} ‖_{1}

since the $ℓ_{1}$ norm doesn’t depend on the ordering of entries in $x$ .

Now since $x$ is $1$ -compressible, hence from (18.29) we have

| {\hat{x}}_{i} | \leq R \frac{1}{i} .

This gives us

‖ \hat{x} ‖_{1} \leq \sum_{i = 1}^{N} R \frac{1}{i} = R \sum_{i = 1}^{N} \frac{1}{i} .

The sum on the R.H.S. is the $N$ -th Harmonic number (sum of reciprocals of first $N$ natural numbers). A simple upper bound on Harmonic numbers is

H_{k} \leq 1 + \ln (k) .

This completes the proof.

We now demonstrate how a compressible signal is well approximated by a sparse signal.

Theorem 18.22 (Sparse approximation of compressible signals)

Let $x$ be a $p$ -compressible signal and let $x |_{K}$ be its best $K$ -term approximation. Then the $ℓ_{1}$ norm of approximation error satisfies

(18.30)#

‖ x - x |_{K} ‖_{1} \leq C_{p} \cdot R \cdot K^{1 - \frac{1}{p}}

with

C_{p} = {(\frac{1}{p} - 1)}^{- 1} .

Moreover the $ℓ_{2}$ norm of approximation error satisfies

(18.31)#

‖ x - x |_{K} ‖_{2} \leq D_{p} \cdot R \cdot K^{1 - \frac{1}{p}}

with

D_{p} = {(\frac{2}{p} - 1)}^{- 1 / 2} .

Proof. Expanding the $ℓ_{1}$ approximation error

‖ x - x |_{K} ‖_{1} = \sum_{i = K + 1}^{N} | x_{λ_{i}} | \leq R \sum_{i = K + 1}^{N} i^{- \frac{1}{p}} .

We now approximate the R.H.S. sum with an integral.

\sum_{i = K + 1}^{N} i^{- \frac{1}{p}} \leq \int_{x = K}^{N} x^{- \frac{1}{p}} d x \leq \int_{x = K}^{\infty} x^{- \frac{1}{p}} d x .

Now

\int_{x = K}^{\infty} x^{- \frac{1}{p}} d x = {[\frac{x^{1 - \frac{1}{p}}}{1 - \frac{1}{p}}]}_{K}^{\infty} = C_{p} K^{1 - \frac{1}{p}} .

We can similarly show the result for $ℓ_{2}$ norm.

Example 18.20 (Sparse approximation for $\frac{1}{2}$ -compressible signals)

Let $p = \frac{1}{2}$ . Then

C_{p} = {(\frac{1}{p} - 1)}^{- 1} = 1 and D_{p} = {(\frac{2}{p} - 1)}^{- 1 / 2} = \frac{1}{\sqrt{3}} .

Hence

‖ x - x |_{K} ‖_{1} \leq \frac{R}{K}

and

‖ x - x |_{K} ‖_{2} \leq \frac{1}{\sqrt{3}} \frac{R}{K} .

Both $ℓ_{1}$ and $ℓ_{2}$ approximation error bounds decrease as $K$ increases at the same rate.

Topics in Signal Processing

Sparse and Redundant Representations

Contents

18.5. Sparse and Redundant Representations#

18.5.1. Dictionaries#

18.5.2. Redundant Dictionaries and Sparse Signals#

18.5.3. Sparse Approximation Problem#

18.5.4. Synthesis and Analysis#

18.5.5. P-Norms#

18.5.6. Sparse Signals#

18.5.7. Compressible Signals#

18.5.7.1. K-term Approximation of General Signals#

18.5.7.2. Compressible Signals#