Proximal Mappings and Operators
Contents
12.3. Proximal Mappings and Operators#
Throughout this section
12.3.1. Proximal Mapping#
Definition 12.1 (Proximal mapping)
For a function
It is a point to set mapping.
It maps each point
to a subset of points in which minimize the R.H.S..The set of points which minimize the R.H.S. are known as proximal points for a given
w.r.t. the function .The set of proximal points may be empty, singleton or have more than one points.
The function
given byis the objective function for the computation of the proximal points for a given point
.The proximal points of
are the minimizers of the problem of unconstrained minimization of .
12.3.1.1. Zero Function#
Example 12.1 (Zero function)
Consider
This is a very simple example. The proximal point
for every point is the same point.
Note that the gradient of the zero function
is zero everywhere. So every point in the
domain of
12.3.1.2. Zero Everywhere Except at Functions#
Let us look at real functions which are
zero everywhere except at
Example 12.2 (Negative value at
Let
The proximal mapping is given by
This example clearly demonstrates the point to set mapping
nature of the proximal mapping. While for most values
of
Proof. We start by constructing the objective function
Consider the case where
The minimum value of
is attained at and it is valid since .The value of
is attained at .There are three possibilities.
If
, then the unique minimizer of is at . The condition can be written as and .If
, then the unique minimizer of is at . The condition can be written as .If
, then and are both minimizers of . The condition can be written as .
Now the case of
The term
reduces to at .The term
reduces to for all .Thus, the minimizer is at
.We note that the minimizer agrees with the result obtained for the case of
above where .
In conclusion, we see that
For
, the minimizer is .For
, the minimizer is .For
, there are two minimizers, and .
Example 12.3 (Positive value at
Let
The proximal mapping is given by
This example illustrates the fact that the proximal mapping may be empty in some cases. In other words, for some points, there are no proximal points.
Proof. We start by constructing the function
Consider the case where
The minimum value of
is attained at and it is valid since .The term
for .Thus the minimizer is at
for all .
Now consider the case where
The function
reduces toWe can see that
.However,
doesn’t attain the value for any .Hence, the set of minimizers is empty.
In conclusion, this function
has a unique minimizer
for all .has no minimizer for
.
12.3.1.3. Constant Value Function#
Example 12.4 (Constant value function)
Let
12.3.1.4. 1D Linear Function I#
Example 12.5 (1D Linear function I)
Let
Define:
Differentiating, we get:
Setting
The second derivative
Thus,
12.3.1.5. 1D Linear Function II#
Example 12.6 (1D Linear function II)
Let
Define:
Differentiating, we get:
Setting
The second derivative
Thus,
12.3.1.6. 1D Affine Function#
Example 12.7 (1D Affine function)
Let
Define:
Differentiating, we get:
Setting
The second derivative
Thus,
12.3.1.7. Nonemptiness Conditions#
It is imperative to identify conditions under which a proximal mapping is nonempty.
Theorem 12.1 (Nonemptiness under closedness and coerciveness)
Let
for any
Then the set
Recall from Definition 8.8 that
a function
Proof. Define:
Then
. is a sum of two closed functions. Hence is a closed function due to Theorem 3.95.It is given that
is coercive. is proper since is proper and is real valued.Thus,
is proper, closed and coercive.Due to the Weierstrass’ theorem (Theorem 8.9), a proper, closed and coercive function
has a nonempty and compact set of minimizers.
12.3.2. Proximal Operator#
Under suitable conditions,
Theorem 12.2 (First prox theorem)
Let
Proof. Define:
Then
is a closed and convex function. is a closed and strongly convex function (Theorem 9.268).Hence, their sum
is a closed and strongly convex function (Theorem 9.270).Since
is proper, and is real valued, hence is also proper.Thus,
is a proper, closed and strongly convex function.Due to Theorem 9.272, there exists a unique minimizer for
.Thus, the set
is a singleton.
With this result, we are ready to introduce the proximal operator.
Definition 12.2 (Proximal operator)
Let
In other words, we write
Thus, for a proper, closed and convex function
Readers are suggested to look at some of the examples in the Examples below before proceeding with the proximal calculus rules.
12.3.3. Proximal Calculus#
This section deals with the rules which enable us to compute the proximal mappings for a function if we know the proximal mappings of underlying building blocks.
12.3.3.1. Separable Functions#
Theorem 12.3 (Proximal mappings of separable functions)
Suppose that
Then for every
Note that the theorem is presented in terms of
sets. The L.H.S.
Proof. We start with the definition of the proximal mapping
and expand it in terms of the separable functions.
We note that the quadratic term
Here the
Corollary 12.1 (Proximal operator for separable convex functions)
Let
Let
Further assume that
Then for every
In the special case, where
then
Proof. Since
Applications: Example 12.16.
12.3.3.2. Scaling and Translation#
Theorem 12.4 (Scaling and translation of input)
Let
Then given
Proof. Starting from the definition of the proximal mapping
The objective function of this minimization problem is
We introduce a change of variable
to construct an equivalent minimization problem. See Proposition 8.2 for the construction of equivalent optimization problems by change of variable.Then
.The objective function changes to
The minimizers of this objective function (over
) are given byNote that the scaling term
in the objective function doesn’t impact the set of minimizers. It only impacts the optimal value.Since
, hence minimizers of the original objective function are given by
Theorem 12.5 (Proximal mapping for
Let
Then
Proof. Starting from the definition of the proximal mapping
The objective function is
Introduce a change of variable
.Then
.The objective function changes to
The minimizers of this objective function are given by
The minimizers of the original objective function are then given by
12.3.3.3. Quadratic Perturbation#
Theorem 12.6 (Quadratic perturbation)
Let
where
Proof. Starting from the definition of the proximal mapping
Since the quantities
Applications: Example 12.10.
12.3.3.4. Composition with Affine Mapping#
Theorem 12.7 (Composition with affine mapping)
Let
where
for some constant
Then for any
Proof. Starting from the definition of the proximal mapping
Introduce a variable
.The optimization problem transforms to
Since
is proper, closed and convex, hence this problem has a unique solution. TODO. HOW?Let the optimal solution be given by
.
12.3.4. Examples#
Remainder of this section will be dedicated for the
computation of proximal mappings or operators
for different functions. Most of these functions
are convex (with the exception of a few).
The proximal mappings will be computed either
from the first principles (by solving the
minimization problem of
Some key ideas that will be repeatedly used in the computation
of the proximal operator in this section.
Assume that
if
for some , then must be one of its minimizers (Theorem 10.49).If a minimizer of
exists and is not attained at any point of differentiability, then it must be attained at a point of nondifferentiability (Theorem 10.50).Since for a convex function
, the existence of a unique proximal mapping is guaranteed, hence the minimizer of the function exists.Then the minimizer of
is either at a point of differentiability where the gradient vanishes or at the boundary where it is nondifferentiable.
12.3.5. 1-dim Examples#
12.3.5.1. Indicator Functions#
Example 12.8 (Indicator function for an interval)
Let
The proximal operator is given by
Proof. .
We construct the objective function
Let
denote the minimizer of .Let
denote the function .First consider the case where
.Then
over and otherwise. . . The boundary points are and .The minimizer of
is .Therefore if
, then .For the cases where
, the minimizer must be one of the boundary points.If
, then is an increasing function over .Hence for
, .If
, then is a decreasing function over .Hence for
, .Thus, if
, then the proximal operator is given byFor
, .Thus,
for .Hence the minimizer
for and for .In other words,
Combining these two cases
12.3.5.2. Linear#
Example 12.9 (Linear over
Let
The proximal operator is given by
Proof. .
. is differentiable over .Let
is a proper convex function with . is differentiable over . for all .Setting it to zero, we get
.Thus, if
, then the minimizer is .Otherwise,
obtains its minimum value at which is the only point of nondifferentiability in its domain.Thus, if
, then the minimizer is .
Example 12.10 (Linear over an interval
Let
The proximal operator is given by
Proof. We note that
From Example 12.8, the proximal operator for the indicator function is given by
We can write
as a quadratic perturbation of given byFollowing Theorem 12.6,
12.3.5.3. Absolute Value#
Example 12.11 (Scaled absolute value)
Let
The proximal operator is given by
Proof. .
Let
is differentiable everywhere except at .At
, .If the minimizer is obtained at
, then giving us .Thus, a minimizer at
is attained if .If the minimizer is obtained at
, then giving us .Thus, a minimizer at
is attained if .Consequently, if
, then the minimizer of must be at the only point of nondifferentiability, namely .This gives us
The conditions
and can be combined as .The condition
simplifies to .We can see that the three cases for
can be simplified to the expression
12.3.5.4. Cube#
Example 12.12 (Scaled cube over
Let
The proximal operator is given by
Proof. .
We construct the objective function
is differentiable for . for .Note that if
then for every .Hence
does not have a positive root for .For
, setting to zero, we get whose positive solution is:We can see that
has a positive solution if and only if .If
, then the derivative never vanishes for any .Hence the minimum of
is attained at the only point of nondifferentiability .Thus, the minimizer of
for is .Hence
Note that
This concludes the proof.
12.3.5.5. Logarithms#
Example 12.13 (Scaled negative logarithm)
Let
The proximal operator is given by
Proof. .
We construct the objective function
is differentiable for . for . is a minimizer ifWe note that
for every .Hence
as desired.Hence
.
As we can see,
12.3.6. Affine Functions#
Example 12.14 (Affine function)
Let
Differentiating
, we get .Setting it to zero, we see that
is the minimizer.
Hence
12.3.7. Convex Quadratic Functions#
Example 12.15 (Convex quadratic)
Let
Differentiating
, we getSetting this to zero, we see that
Hence
12.3.8. Norms#
12.3.8.1. Norm#
Example 12.16 (Scaled
Let
Then, the proximal operator is given by
where the univariate soft-thresholding operator
and the multivariate soft-thresholding operator
12.3.8.2. “Norm”#
Example 12.17 (Scaled
Let
where
Then, the proximal operator is given by
where the univariate hard-thresholding operator
Note this this version of hard thresholding
operator is a point to set mapping. In this
case, the operator leads to two possible
values when
Proof. We note that
where
Following Example 12.2,
Since the proximal operator is not affected by a constant offset, hence
.We can see that
.It is clear that the proximal mappings are not unique.
By Theorem 12.3,
as desired.
12.3.9. Logarithms#
Example 12.18 (Negative sum of logs)
Let
Then, the proximal operator is given by