Introduction
Contents
22.1. Introduction#
When designing a dictionary for a particular application, we have several options [70]. On the one hand we can go through the long list of analytically constructed or tunable dictionaries and select one of them as suitable for the application in concern. On the other hand, we can take up a number of real example signals from our application and try to construct a dictionary which is optimized for these.
Dictionary learning (DL)
[30, 75]
is a process which attempts to solve the problem of constructing a dictionary directly from the set of example signals.
The atoms of a learnt dictionary come from the underlying
empirical data of training set of example signals
for the specific application.
While analytically constructed dictionaries are typically
meant for only specific applications,
the learning method allows one to construct dictionaries
for any family of signals which are amenable to
the sparse and redundant representations model.
This certainly comes at a cost.
Learnt dictionaries have to be held completely in memory
explicitly as they happen to be lack any structure.
Thus they don’t provide an efficient implementation of analysis
(
We start by formalizing the notation for Dictionary Learning (DL).
We consider a set of
example signals put together in a signal matrix .Consider a dictionary
.Let
be the sparsest possible representation of in withWe put all
together in a matrix .Then we have
where
represents approximation error.We are looking for best dictionary from the set of possible dictionaries such that we are able to get sparse representations of
with low approximation error.We can quantify the notion of good approximation by putting an upper bound on the norm of approximation error as
.Combining these ideas, we introduce the notion of a sparse signal model denoted as
which consists of a dictionary providing -sparse representations for a class of signals (from which the example signals are drawn) with an upper bound on approximation error given by .The DL problem tries to learn best model
based on the example signals .
We can formulate the DL problem as an optimization problem as follows:
(22.1)#In this version, we are trying to minimize total sparsity of
while using the upper bound on approximation error as optimization constraint.We are not enforcing sparsity constraint that
.Alternatively we can also write:
(22.2)#In this version we are trying to minimize approximation error while keeping
-sparsity as optimization constraint.We are not enforcing the constraint that
.Unfortunately, there doesn’t exist any computationally tractable algorithm to solve these optimization problems.
An alternative is to consider a heuristic iterative approach presented in Algorithm 22.1.
(Dictionary learning: iterative approach)
Initialization:
Choose an initial dictionary
. # iteration counter. \Repeat{halting criteria is true}{
Algorithm:
If halting criteria is met: break.
.Obtain representations
for with the given and sparsity level using some sparse approximation algorithm.Obtain new dictionary
from the signals and their representations .
Some possible halting criteria are:
Stop after a fixed number of iterations.
Stop when
for every example.Stop when approximation errors stop improving.
For initialization of the algorithm, we have few options.
We can start with a randomly generated dictionary.
We can select
examples from and put them together as our starting dictionary.We can start with some analytical dictionary suitable for the application domain.
22.1.1. Dictionary Learning Methods#
Two popular algorithms implementing this approach are MOD (Method of Optimal Directions) and K-SVD.
22.1.1.1. Method Of Optimal Directions#
In MOD [34], dictionary update is formulated as
A straight forward least squares solution is obtained as
22.1.1.2. K-SVD#
K-SVD [1] is slightly different.
Rather than recomputing whole dictionary
at once by solving the LS problem, it updates atoms of
Let
be the index of atom being updated.Consider the error
refers to the -th row of ; i.e. entries for atom in every example. means the approximation error when the atom (and corresponding entries in ) has been dropped from consideration.We can see that
Hence
On the L.H.S. we have total approximation error.
On the R.H.S. we have the same expressed in terms of
and where doesn’t depend on .An optimal
is one which can minimize R.H.S.This can easily be obtained by rank-1 approximation of
.The rank-1 approximation gives us both
as well as the entries in the -th row of given by . There is a small catch though.In general the rank-1 approximation can lead to a dense
meaning the atom gets used in many signal representations. We wish to avoid that. We don’t want to appear in many signal representations.For this, we identify representations
in which atom appears, and let them be indexed by .We then restrict
to signals indexed by to avoid a dense .Rank-1 approximation can be easily obtained using singular value decomposition.
We perform SVD of
.We then pick
. is the new update for .Further
is the update for .Rest of
is left with zero entries.We repeat the process for each of the atoms in
to obtain next update of .