Proximal Mappings and Operators
Contents
12.3. Proximal Mappings and Operators#
Throughout this section \(\VV\) represents a Euclidean space; i.e., an \(n\)-dimensional space endowed with an inner product \(\langle \cdot, \cdot \rangle\) and the Euclidean norm \(\| \cdot \| = \sqrt{\langle \cdot, \cdot \rangle}\).
12.3.1. Proximal Mapping#
(Proximal mapping)
For a function \(f : \VV \to \RERL\), the proximal mapping of \(f\) is given by
It is a point to set mapping.
It maps each point \(\bx \in \VV\) to a subset of points in \(\VV\) which minimize the R.H.S..
The set of points which minimize the R.H.S. are known as proximal points for a given \(\bx\) w.r.t. the function \(f\).
The set of proximal points may be empty, singleton or have more than one points.
The function \(h_{\bx} : \VV \to \RR\) given by
\[ h_{\bx}(\bu) = f(\bu) + \frac{1}{2} \| \bu - \bx \|^2 \]is the objective function for the computation of the proximal points for a given point \(\bx \in \VV\).
The proximal points of \(\bx\) are the minimizers of the problem of unconstrained minimization of \(h_{\bx}\).
12.3.1.1. Zero Function#
(Zero function)
Consider \(f : \RR \to \RR\) defined as \(f(x) = 0\).
This is a very simple example. The proximal point for every point is the same point. Note that the gradient of the zero function is zero everywhere. So every point in the domain of \(f\) is an optimal point from the perspective of maximization or minimization. Thus the proximal mapping doesn’t need move to a different point.
12.3.1.2. Zero Everywhere Except at \(x=0\) Functions#
Let us look at real functions which are zero everywhere except at \(x=0\). There are two possibilities. \(f(x)\) is positive or negative at \(x=0\). The two examples below illustrate the point to set nature of proximal mappings. We show different situations where the number of proximal points for a given point is one, two or zero.
\(x=0\))
(Negative value atLet \(t > 0\). Let \(f: \RR \to \RR\) be given as
The proximal mapping is given by
This example clearly demonstrates the point to set mapping nature of the proximal mapping. While for most values of \(x\), the proximal mapping is a singleton, but for \(|x| = \sqrt{ 2 t}\), the proximal mapping consists of a set of two different values.
Proof. We start by constructing the objective function
Consider the case where \(x \neq 0\).
The minimum value of \(\frac{1}{2} (u - x )^2\) is \(0\) attained at \(u=x\) and it is valid since \(u = x \neq 0\).
The value of \(- t + \frac{1}{2} x^2\) is attained at \(u=0\).
There are three possibilities.
If \(0 > - t + \frac{1}{2} x^2\), then the unique minimizer of \(h_x\) is at \(u=0\). The condition can be written as \(|x| < \sqrt{2 t}\) and \(x \neq 0\).
If \(0 < - t + \frac{1}{2} x^2\), then the unique minimizer of \(h_x\) is at \(u=x\). The condition can be written as \(|x| > \sqrt{2 t}\).
If \(0 = - t + \frac{1}{2} x^2\), then \(u=0\) and \(u=x\) are both minimizers of \(h_x\). The condition can be written as \(|x| = \sqrt{2 t}\).
Now the case of \(x=0\).
The term \(- t + \frac{1}{2} x^2\) reduces to \(-t < 0\) at \(u=0\).
The term \(\frac{1}{2}(u - x )^2\) reduces to \(\frac{1}{2} u^2 > 0\) for all \(u \neq 0\).
Thus, the minimizer is at \(u=0\).
We note that the minimizer agrees with the result obtained for the case of \(|x| < \sqrt{2 t}\) above where \(x \neq 0\).
In conclusion, we see that
For \(|x | < \sqrt{2 t}\), the minimizer is \(u=0\).
For \(|x| > \sqrt{2 t}\), the minimizer is \(u=x\).
For \(|x| = \sqrt{2 t}\), there are two minimizers, \(u=0\) and \(u=x\).
\(x=0\))
(Positive value atLet \(t > 0\). Let \(f: \RR \to \RR\) be given as
The proximal mapping is given by
This example illustrates the fact that the proximal mapping may be empty in some cases. In other words, for some points, there are no proximal points.
Proof. We start by constructing the function
Consider the case where \(x \neq 0\).
The minimum value of \(\frac{1}{2} (u - x )^2\) is \(0\) attained at \(u=x\) and it is valid since \(u = x \neq 0\).
The term \(t + \frac{1}{2} x^2 > 0\) for \(u=0\).
Thus the minimizer is at \(u=x\) for all \(x \neq 0\).
Now consider the case where \(x=0\).
The function \(h_x\) reduces to
\[\begin{split} h_x(u) = \begin{cases} \frac{1}{2} u^2 & u \neq 0;\\ t & u = 0. \end{cases} \end{split}\]We can see that \(\inf_{u \in \RR} h_x(u) = 0\).
However, \(h_x\) doesn’t attain the value \(0\) for any \(u \in \RR\).
Hence, the set of minimizers is empty.
In conclusion, this function
has a unique minimizer \(u=x\) for all \(x \neq 0\).
has no minimizer for \(x=0\).
12.3.1.3. Constant Value Function#
(Constant value function)
Let \(f : \RR \to \RR\) defined as \(f(x) = c\) where \(c \in \RR\).
12.3.1.4. 1D Linear Function I#
(1D Linear function I)
Let \(f : \RR \to \RR\) be given as \(f(x) = x\).
Define:
Differentiating, we get:
Setting \(g'(u) = 0\), we get:
The second derivative \(g''(u) = 1\) confirms that it is indeed the minimizer.
Thus,
12.3.1.5. 1D Linear Function II#
(1D Linear function II)
Let \(f : \RR \to \RR\) be given as \(f(x) = \lambda x\) with \(\lambda \in \RR\).
Define:
Differentiating, we get:
Setting \(g'(u) = 0\), we get:
The second derivative \(g''(u) = 1\) confirms that it is indeed the minimizer.
Thus,
12.3.1.6. 1D Affine Function#
(1D Affine function)
Let \(f : \RR \to \RR\) be given as \(f(x) = \alpha x + \beta\) with \(\alpha, \beta \in \RR\).
Define:
Differentiating, we get:
Setting \(g'(u) = 0\), we get:
The second derivative \(g''(u) = 1\) confirms that it is indeed the minimizer.
Thus,
12.3.1.7. Nonemptiness Conditions#
It is imperative to identify conditions under which a proximal mapping is nonempty.
(Nonemptiness under closedness and coerciveness)
Let \(f : \VV \to \RERL\) be a proper and closed function. Assume that the function
for any \(\bx \in \VV\) is coercive.
Then the set \(\prox_f(\bx)\) is nonempty and compact for any \(\bx \in \VV\).
Recall from Definition 8.8 that a function \(h\) is coercive if for every sequence \(\{ \bx_n \}\) such that \( \lim_{k \to \infty} \| \bx_k \| = \infty\), we have \(\lim_{k \to \infty} h(\bx_k) = \infty\).
Proof. Define:
Then \(\prox_f(\bx) = \underset{\bu \in \VV}{\argmin} h(\bu)\).
\(h\) is a sum of two closed functions. Hence \(h\) is a closed function due to Theorem 3.95.
It is given that \(h\) is coercive.
\(h\) is proper since \(f\) is proper and \(\frac{1}{2} \| \bu - \bx \|^2\) is real valued.
Thus, \(h\) is proper, closed and coercive.
Due to the Weierstrass’ theorem (Theorem 8.9), a proper, closed and coercive function \(g\) has a nonempty and compact set of minimizers.
12.3.2. Proximal Operator#
Under suitable conditions, \(\prox_f(\bx)\) is always a singleton for every \(\bx \in \VV\). Then the proximal mapping can be thought of as a function \(\prox_f : \VV \to \VV\) mapping every point in \(\VV\) to a proximal point.
(First prox theorem)
Let \(f : \VV \to \RERL\) be a proper, closed and convex function. Then, \(\prox_f(\bx)\) is a singleton for every \(\bx \in \VV\).
Proof. Define:
Then
\(f\) is a closed and convex function.
\(d_{\bx}(\bu) = \frac{1}{2} \| \bu - \bx \|^2\) is a closed and strongly convex function (Theorem 9.268).
Hence, their sum \(h_{\bx}\) is a closed and strongly convex function (Theorem 9.270).
Since \(f\) is proper, and \(d_{\bx}\) is real valued, hence \(h_{\bx}\) is also proper.
Thus, \(h_{\bx}\) is a proper, closed and strongly convex function.
Due to Theorem 9.272, there exists a unique minimizer for \(h_{\bx}\).
Thus, the set \(\prox_f(\bx) = \argmin h_{\bx}(\bu)\) is a singleton.
With this result, we are ready to introduce the proximal operator.
(Proximal operator)
Let \(f : \VV \to \RERL\) be a proper, closed and convex function. Since the point to set mapping \(\prox_f : \VV \to 2^{\VV}\) maps every point in \(\VV\) to a singleton subset of \(\VV\) due to Theorem 12.2, we abuse the notation and redefine \(\prox_f: \VV \to \VV\) as a point to point mapping. We call this mapping as a proximal operator of \(f\).
In other words, we write \(\prox_f(\bx) = \by\) rather than \(\prox_f(\bx) = \{\by\}\).
Thus, for a proper, closed and convex function \(f\), the operator \(\prox_f: \VV \to \VV\) maps each point \(\bx \in \VV\) to a unique minimizer of the function \(h_{\bx}(\bu) = f(\bu) + \frac{1}{2} \| \bu - \bx \|^2\).
Readers are suggested to look at some of the examples in the Examples below before proceeding with the proximal calculus rules.
12.3.3. Proximal Calculus#
This section deals with the rules which enable us to compute the proximal mappings for a function if we know the proximal mappings of underlying building blocks.
12.3.3.1. Separable Functions#
(Proximal mappings of separable functions)
Suppose that \(f: \VV_1 \oplus \dots \oplus \VV_m \to \RERL\) is given by
Then for every \(\bx_1 \in \VV_1, \dots, \bx_m \in \VV_m\),
Note that the theorem is presented in terms of sets. The L.H.S. \(\prox_f(\bx_1, \dots, \bx_m)\) is a set. Similarly, R.H.S. is a Cartesian product of proximal mappings \(\prox_{f_i}(\bx_i)\) which are individually sets.
Proof. We start with the definition of the proximal mapping and expand it in terms of the separable functions. We note that the quadratic term \(\frac{1}{2} \| \cdot - \bx \|^2\) is also separable.
Here the \(\prod\) symbol denotes the Cartesian product.
(Proximal operator for separable convex functions)
Let \(f: \RR^n \to \RERL\) be a proper, closed and convex function. Assume that \(\RR^n = \RR^{n_1} \oplus \dots \oplus \RR^{n_m}\) with \(n = n_1 + \dots + n_m\).
Let \(f\) be separable over \(\RR^{n_1} \oplus \dots \oplus \RR^{n_m}\) and given by
Further assume that \(f_i\) are proper, closed and convex functions themselves.
Then for every \(\bx_1 \in \RR^{n_1}, \dots, \bx_m \in \RR^{n_m}\),
In the special case, where \(f\) is separable over each coordinate; i.e.
then
Proof. Since \(f\) and \(f_i\) are proper, closed and convex, hence due to Theorem 12.2, the proximal mappings \(\prox_f(\bx)\) and \(\prox_{f_i}(\bx_i)\) are singletons. In this case, the Cartesian product reduces to the concatenation of coordinates.
Applications: Example 12.16.
12.3.3.2. Scaling and Translation#
(Scaling and translation of input)
Let \(g: \VV \to \RERL\) be a proper function. Let \(t \neq 0\) be a scaling parameter and \(\ba \in \VV\) be a translation parameter. Define \(f: \VV \to \RERL\) as
Then given \(\prox_g\), the \(\prox_f\) is
Proof. Starting from the definition of the proximal mapping
The objective function of this minimization problem is
\[ g(t \bu + \ba) + \frac{1}{2} \| \bu - \bx \|^2. \]We introduce a change of variable \(\bz = t \bu + \ba\) to construct an equivalent minimization problem. See Proposition 8.2 for the construction of equivalent optimization problems by change of variable.
Then \(\bu = \frac{1}{t}(\bz - \ba)\).
The objective function changes to
\[\begin{split} & g(\bz) + \frac{1}{2} \left \| \frac{1}{t}(\bz - \ba) - \bx \right \|^2 \\ &= \frac{1}{t^2}\left [ (t^2 g)(\bz) + \frac{1}{2} \left \| \bz - (t\bx + \ba) \right \|^2 \right]. \end{split}\]The minimizers of this objective function (over \(\bz\)) are given by
\[ \bz \in \prox_{t^2 g}(t \bx + \ba). \]Note that the scaling term \(\frac{1}{t^2}\) in the objective function doesn’t impact the set of minimizers. It only impacts the optimal value.
Since \(\bu = \frac{1}{t}(\bz - \ba)\), hence minimizers of the original objective function are given by
\[ \bu \in \frac{1}{t} [\prox_{t^2 g}(t \bx + \ba) - \ba]. \]
\(t g ( \cdot / t)\))
(Proximal mapping forLet \(g: \VV \to \RERL\) be a proper function. Let \(t \neq 0\) be a scaling parameter. Define \(f: \VV \to \RERL\) as
Then
Proof. Starting from the definition of the proximal mapping
The objective function is
\[ t g(\frac{\bu}{t}) + \frac{1}{2} \| \bu - \bx \|^2. \]Introduce a change of variable \(\bz = \frac{\bu}{t}\).
Then \(\bu = t \bz\).
The objective function changes to
\[\begin{split} & t g(\bz) + \frac{1}{2} \| t \bz - \bx \|^2 \\ & = t^2 \left [ \frac{g(\bz)}{t} + \frac{1}{2} \left \| \bz - \frac{\bx}{t} \right \|^2 \right ]. \end{split}\]The minimizers of this objective function are given by
\[ \bz \in \prox_{g / t} (\bx / t). \]The minimizers of the original objective function are then given by
\[ \bu \in t \, \prox_{g / t} (\bx / t). \]
12.3.3.3. Quadratic Perturbation#
(Quadratic perturbation)
Let \(g: \VV \to \RERL\) be a proper function. Define \(f: \VV \to \RERL\) as
where \(c > 0\), \(\ba \in \VV\) and \(d \in \RR\). Then
Proof. Starting from the definition of the proximal mapping
Since the quantities \(\bx\), \(\ba\) and \(d\) are constant w.r.t. \(\bu\), hence we have removed or added terms which solely depend on these quantities from the objective function as and when required in the derivation above.
Applications: Example 12.10.
12.3.3.4. Composition with Affine Mapping#
(Composition with affine mapping)
Let \(g: \RR^m \to \RERL\) be a proper, closed and convex function. Let \(f: \VV \to \RERL\) be given by
where \(\bb \in \RR^m\) and \(\bAAA : \VV \to \RR^m\) is a linear transformation satisfying the property
for some constant \(\alpha > 0\) and \(\bI: \VV \to \VV\) is the identity transformation.
Then for any \(\bx \in \VV\),
Proof. Starting from the definition of the proximal mapping
Introduce a variable \(\bz = \bAAA(\bu) + \bb\).
The optimization problem transforms to
\[\begin{split} & \min_{\bu \in \VV, \bz \in \RR^m} & g(\bz) + \frac{1}{2} \| \bu - \bx \|^2 \\ & \text{ subject to } & \bz = \bAAA(\bu) + \bb$. \end{split}\]Since \(g\) is proper, closed and convex, hence this problem has a unique solution. TODO. HOW?
Let the optimal solution be given by \(\bz^* \bu^*\).
12.3.4. Examples#
Remainder of this section will be dedicated for the computation of proximal mappings or operators for different functions. Most of these functions are convex (with the exception of a few). The proximal mappings will be computed either from the first principles (by solving the minimization problem of \(f(u) + \frac{1}{2}\| \bu - \bx \|^2\)) or by making use of the proximal calculus rules developed above.
Some key ideas that will be repeatedly used in the computation of the proximal operator in this section. Assume that \(f\) is a convex function with \(S = \dom f\) and is differentiable over an open set \(U \subseteq S\). This happens when \(\dom f\) is not an open set, \(f\) is differentiable in the interior of \(\dom f\) but not at the boundary points.
if \(f'(u) = 0\) for some \(u \in U\), then \(u\) must be one of its minimizers (Theorem 10.49).
If a minimizer of \(f\) exists and is not attained at any point of differentiability, then it must be attained at a point of nondifferentiability (Theorem 10.50).
Since for a convex function \(f\), the existence of a unique proximal mapping is guaranteed, hence the minimizer of the function \(h_{\bx}\) exists.
Then the minimizer of \(h_{\bx}\) is either at a point of differentiability where the gradient vanishes or at the boundary where it is nondifferentiable.
12.3.5. 1-dim Examples#
12.3.5.1. Indicator Functions#
(Indicator function for an interval)
Let \(r \in [0, \infty]\). Let \(f : \RR \to \RERL\) be given by
The proximal operator is given by
Proof. .
We construct the objective function
\[ h_x(u) = I_{[0,r] \cap \RR}(x) + \frac{1}{2} (u - x)^2. \]Let \(\tilde{u}\) denote the minimizer of \(h_x(u)\).
Let \(w\) denote the function \(w(u) = \frac{1}{2} (u - x)^2\).
First consider the case where \(r < \infty\).
Then \(h_x(u) = w(u)\) over \([0, r]\) and \(\infty\) otherwise.
\(\dom h_x = [0, r]\). \(\interior \dom h_x = (0, r)\). The boundary points are \(0\) and \(r\).
The minimizer of \(w(u)\) is \(u = x\).
Therefore if \(0 \leq x \leq r\), then \(\tilde{u} = x\).
For the cases where \(x \notin [0, r]\), the minimizer must be one of the boundary points.
If \(x < 0\), then \(w(u)\) is an increasing function over \([0, r]\).
Hence for \(x < 0\), \(\tilde{u} = 0\).
If \(x > r\), then \(w(u)\) is a decreasing function over \([0,r]\).
Hence for \(x > r\), \(\tilde{u} = r\).
Thus, if \(r < \infty\), then the proximal operator is given by
\[\begin{split} \prox_f(x) = \begin{cases} x, & 0 \leq x \leq r \\ 0, & x < 0 \\ r, & x > r \end{cases} = \min \{ \max \{x, 0 \}, r \}. \end{split}\]For \(r = \infty\), \(f(x) = I_{[0, \infty)}(x)\).
Thus, \(h_x(u) = w(u)\) for \(u \geq 0\).
Hence the minimizer \(\tilde{u} = x\) for \(x \geq 0\) and \(\tilde{u} = 0\) for \(x < 0\).
In other words,
\[ \prox_f(x) = [x]_+ = \max\{x, 0\} = \min \{ \max \{x, 0 \}, \infty \}. \]Combining these two cases
\[ \prox_f(x) = \min \{ \max \{x, 0 \}, r \}. \]
12.3.5.2. Linear#
\(\RR_+\))
(Linear overLet \(\mu \in \RR\). Let \(f : \RR \to \RERL\) be given by
The proximal operator is given by
Proof. .
\(\dom f = \RR_+\).
\(f\) is differentiable over \(\RR_{++}\).
Let
\[\begin{split} h_x(u) = = f(u) + \frac{1}{2} (u - x)^2 = \begin{cases} \mu u + \frac{1}{2} (u - x)^2, & u \geq 0; \\ \infty, & u < 0. \end{cases} \end{split}\]\(h_x\) is a proper convex function with \(\dom h_x = \RR_+\).
\(h_x\) is differentiable over \(\RR_{++}\).
\(h'_x(u) = \mu + u - x\) for all \(u > 0\).
Setting it to zero, we get \(u = x - \mu\).
Thus, if \(x > \mu\), then the minimizer is \(u = x - \mu\).
Otherwise, \(h_x\) obtains its minimum value at \(u=0\) which is the only point of nondifferentiability in its domain.
Thus, if \(x \leq \mu\), then the minimizer is \(u=0\).
\([0, a]\))
(Linear over an intervalLet \(\mu \in \RR\) and let \(a \in [0, \infty]\). Let \(f : \RR \to \RERL\) be given by
The proximal operator is given by
Proof. We note that
From Example 12.8, the proximal operator for the indicator function is given by
\[ \prox_{I_{[0,a] \cap \RR}}(x) = \min \{ \max \{x, 0 \}, a \}. \]We can write \(f\) as a quadratic perturbation of \(I_{[0,a] \cap \RR}\) given by
\[ f(x) = I_{[0,a] \cap \RR}(x) + \frac{0}{2} x^2 + \mu x + 0. \]Following Theorem 12.6,
\[ \prox_f(x) = \prox_{I_{[0,a] \cap \RR}}(x - \mu) = \min \{ \max \{x - \mu, 0 \}, a \}. \]
12.3.5.3. Absolute Value#
(Scaled absolute value)
Let \(t \in \RR_+\). Let \(f : \RR \to \RR\) be given by
The proximal operator is given by
Proof. .
Let
\[ h_x(u) = t | u | + \frac{1}{2} (u - x)^2. \]\(h_x\) is differentiable everywhere except at \(u = 0\).
At \(u \neq 0\), \(h'_x(u) = \sgn(u) t + u - x\).
If the minimizer is obtained at \(u > 0\), then \(t + u - x = 0\) giving us \(u= x -t\).
Thus, a minimizer at \(u > 0\) is attained if \(x > t\).
If the minimizer is obtained at \(u < 0\), then \(-t + u - x = 0\) giving us \(u = x + t\).
Thus, a minimizer at \(u < 0\) is attained if \(x < -t\).
Consequently, if \(x \in [-t, t]\), then the minimizer of \(h_x\) must be at the only point of nondifferentiability, namely \(u=0\).
This gives us
\[\begin{split} \prox_f(x) = \begin{cases} x - t, & x > t\\ x + t, & x < -t \\ 0, & -t \leq x \leq t. \end{cases} \end{split}\]The conditions \(x > t\) and \(x < -t\) can be combined as \(|x| > t\).
The condition \(-t \leq x \leq t\) simplifies to \(|x| \leq |t|\).
We can see that the three cases for \(\prox_f(x)\) can be simplified to the expression
\[ \prox_f(x) = [|x| - t]_+ \sgn (x). \]
12.3.5.4. Cube#
\(\RR_+\))
(Scaled cube overLet \(t > 0\). Let \(f : \RR \to \RERL\) be given by
The proximal operator is given by
Proof. .
We construct the objective function
\[\begin{split} h_x(u) = f(u) + \frac{1}{2} (u - x)^2 = \begin{cases} t u^3 + \frac{1}{2} (u - x)^2, & u \geq 0; \\ \infty, & u < 0. \end{cases} \end{split}\]\(h_x\) is differentiable for \(u > 0\).
\(h'_x(u) = 3 t u^2 + u - x\) for \(u > 0\).
Note that if \(x \leq 0\) then \(h'_x(u) > 0\) for every \(u > 0\).
Hence \(h'_x(u) = 0\) does not have a positive root for \(x \leq 0\).
For \(x > 0\), setting \(h_x(u)\) to zero, we get \(3 u^2 + u - x = 0\) whose positive solution is:
\[ u = \frac{-1 + \sqrt{1 + 12 t x}}{6 t}. \]We can see that \(h'_x(u) = 0\) has a positive solution if and only if \(x > 0\).
If \(x \leq 0\), then the derivative never vanishes for any \(u > 0\).
Hence the minimum of \(h_x\) is attained at the only point of nondifferentiability \(u=0\).
Thus, the minimizer of \(h_x(u)\) for \(x \leq 0\) is \(u=0\).
Hence
\[\begin{split} \prox_f(x) = \begin{cases} \frac{-1 + \sqrt{1 + 12 t x}}{6 t}, & x > 0 \\ 0, & x \leq 0. \end{cases} \end{split}\]Note that
\[\begin{split} \frac{-1 + \sqrt{1 + 12 t [x]_+}}{6 t}= \begin{cases} \frac{-1 + \sqrt{1 + 12 t x}}{6 t}, & x > 0 \\ 0, & x \leq 0. \end{cases} \end{split}\]This concludes the proof.
12.3.5.5. Logarithms#
(Scaled negative logarithm)
Let \(t > 0\). Let \(f : \RR \to \RERL\) be given by
The proximal operator is given by
Proof. .
We construct the objective function
\[\begin{split} h_x(u) = f(u) + \frac{1}{2} (u - x)^2 = \begin{cases} - t \ln u + \frac{1}{2} (u - x)^2, & u > 0; \\ \infty, & u \leq 0. \end{cases} \end{split}\]\(h_x\) is differentiable for \(u > 0\).
\(h'_x(u) = - \frac{t}{u} + (u - x)\) for \(u > 0\).
\(\tilde{u} > 0\) is a minimizer if
\[\begin{split} & - \frac{t}{\tilde{u}} + (\tilde{u} - x) = 0 \\ & \implies \tilde{u}^2 - x \tilde{u} -t = 0 \\ & \implies \tilde{u} = \frac{x + \sqrt{x^2 + 4 t}}{2}. \end{split}\]We note that \(x + \sqrt{x^2 + 4 t} > 0\) for every \(x \in \RR\).
Hence \(\tilde{u} > 0\) as desired.
Hence \(\prox_f(x) = \frac{x + \sqrt{x^2 + 4 t}}{2}\).
As we can see, \(\dom h_x\) is the open set \(\RR_{++}\) and \(h_x\) is differentiable at every point in its domain. Since \(h_x\) must have a unique minimizer, hence \(h'_x(u) = 0\) must have a unique positive solution.
12.3.6. Affine Functions#
(Affine function)
Let \(f(\bx) = \langle \bx, \ba \rangle + b\) where \(\ba \in \VV\) and \(b \in \RR\).
Differentiating \(h_{\bx}\), we get \(\nabla h_{\bx}(\bu) = \ba + \bu - \bx\).
Setting it to zero, we see that \(\bu = \bx - \ba\) is the minimizer.
Hence
12.3.7. Convex Quadratic Functions#
(Convex quadratic)
Let \(f : \RR^n \to \RR\) be given by \(f(\bx) = \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c\) where \(\bA \in \SS^n_+\), \(\bb \in \RR^n\) and \(c \in \RR\).
Differentiating \(h_{\bx}\), we get
\[ \nabla h_{\bx}(\bu) = \bA \bu + \bb + \bu - \bx. \]Setting this to zero, we see that
\[\begin{split} & \bA \bu + \bb + \bu - \bx = \bzero\\ \iff & (\bA + \bI) \bu = \bx - \bb \\ \iff & \bu = (\bA + \bI)^{-1}(\bx - \bb). \end{split}\]
Hence
12.3.8. Norms#
12.3.8.1. \(\ell_1\) Norm#
\(\ell_1\) norm)
(ScaledLet \(\gamma > 0\). Let \(f : \RR^n \to \RR\) be given by
Then, the proximal operator is given by
where the univariate soft-thresholding operator \(\st_{\gamma} : \RR \to \RR\) is defined as
and the multivariate soft-thresholding operator \(\st_{\gamma} : \RR^n \to \RR^n\) is defined by the component wise application of the univariate soft thresholding operator
Proof. We note that
where \(g(x) = \gamma | x|\).
By Example 12.11,
By Corollary 12.1,
12.3.8.2. \(\ell_0\) “Norm”#
\(\ell_0\) “norm”)
(ScaledLet \(\gamma > 0\). Let \(f : \RR^n \to \RR\) be given by
where
Then, the proximal operator is given by
where the univariate hard-thresholding operator \(\HHH_{t} : \RR \to 2^{\RR}\) for any \(t > 0\) is defined as
Note this this version of hard thresholding operator is a point to set mapping. In this case, the operator leads to two possible values when \(|x| = t\). The \(\ell_0\)-“norm” is not a convex function. Hence its proximal mapping is not always a singleton.
Proof. We note that \(g(x) = h(x) + \gamma\)
where
Following Example 12.2,
\[\begin{split} \prox_h(x) = \begin{cases} \{ 0 \}, & |x| < \sqrt{2 \gamma},\\ \{ x \} & |x| > \sqrt{2 \gamma}, \\ \{ 0, x \} & | x | = \sqrt {2 \gamma}. \end{cases} \end{split}\]Since the proximal operator is not affected by a constant offset, hence \(\prox_g = \prox_h\).
We can see that \(\prox_g = \HHH_{\sqrt{2 \gamma}}\).
It is clear that the proximal mappings are not unique.
By Theorem 12.3,
\[ \prox_f(\bx) = \prox_g(x_1) \times \dots \times \prox_g(x_n) = \HHH_{\sqrt{2\gamma}}(x_1) \times \dots \times \HHH_{\sqrt{2\gamma}}(x_n) \]as desired.
12.3.9. Logarithms#
(Negative sum of logs)
Let \(\gamma > 0\). Let \(f : \RR^n \to \RERL\) be given by
Then, the proximal operator is given by
Proof. We note that
where
From Example 12.13,
By Corollary 12.1,