12.3. Proximal Mappings and Operators#

Throughout this section V represents a Euclidean space; i.e., an n-dimensional space endowed with an inner product , and the Euclidean norm =,.

12.3.1. Proximal Mapping#

Definition 12.1 (Proximal mapping)

For a function f:V(,], the proximal mapping of f is given by

proxf(x)argminuV{f(u)+12ux2}xV.
  • It is a point to set mapping.

  • It maps each point xV to a subset of points in V which minimize the R.H.S..

  • The set of points which minimize the R.H.S. are known as proximal points for a given x w.r.t. the function f.

  • The set of proximal points may be empty, singleton or have more than one points.

  • The function hx:VR given by

    hx(u)=f(u)+12ux2

    is the objective function for the computation of the proximal points for a given point xV.

  • The proximal points of x are the minimizers of the problem of unconstrained minimization of hx.

12.3.1.1. Zero Function#

Example 12.1 (Zero function)

Consider f:RR defined as f(x)=0.

proxf(x)=argminuR{0+12|ux|2}={x}.

This is a very simple example. The proximal point for every point is the same point. Note that the gradient of the zero function is zero everywhere. So every point in the domain of f is an optimal point from the perspective of maximization or minimization. Thus the proximal mapping doesn’t need move to a different point.

12.3.1.2. Zero Everywhere Except at x=0 Functions#

Let us look at real functions which are zero everywhere except at x=0. There are two possibilities. f(x) is positive or negative at x=0. The two examples below illustrate the point to set nature of proximal mappings. We show different situations where the number of proximal points for a given point is one, two or zero.

Example 12.2 (Negative value at x=0)

Let t>0. Let f:RR be given as

f(x)={0x0;tx=0.

The proximal mapping is given by

proxf(x)={{0},|x|<2t,{x}|x|>2t,{0,x}|x|=2t.

This example clearly demonstrates the point to set mapping nature of the proximal mapping. While for most values of x, the proximal mapping is a singleton, but for |x|=2t, the proximal mapping consists of a set of two different values.

Proof. We start by constructing the objective function

hx(u)=f(u)+12(ux)2={12(ux)2u0;t+12x2u=0.

Consider the case where x0.

  1. The minimum value of 12(ux)2 is 0 attained at u=x and it is valid since u=x0.

  2. The value of t+12x2 is attained at u=0.

  3. There are three possibilities.

  4. If 0>t+12x2, then the unique minimizer of hx is at u=0. The condition can be written as |x|<2t and x0.

  5. If 0<t+12x2, then the unique minimizer of hx is at u=x. The condition can be written as |x|>2t.

  6. If 0=t+12x2, then u=0 and u=x are both minimizers of hx. The condition can be written as |x|=2t.

Now the case of x=0.

  1. The term t+12x2 reduces to t<0 at u=0.

  2. The term 12(ux)2 reduces to 12u2>0 for all u0.

  3. Thus, the minimizer is at u=0.

  4. We note that the minimizer agrees with the result obtained for the case of |x|<2t above where x0.

In conclusion, we see that

  1. For |x|<2t, the minimizer is u=0.

  2. For |x|>2t, the minimizer is u=x.

  3. For |x|=2t, there are two minimizers, u=0 and u=x.

Example 12.3 (Positive value at x=0)

Let t>0. Let f:RR be given as

f(x)={0x0;tx=0.

The proximal mapping is given by

proxf(x)={{x},x0;x=0.

This example illustrates the fact that the proximal mapping may be empty in some cases. In other words, for some points, there are no proximal points.

Proof. We start by constructing the function

hx(u)=f(u)+12(ux)2={12(ux)2u0;t+12x2u=0.

Consider the case where x0.

  1. The minimum value of 12(ux)2 is 0 attained at u=x and it is valid since u=x0.

  2. The term t+12x2>0 for u=0.

  3. Thus the minimizer is at u=x for all x0.

Now consider the case where x=0.

  1. The function hx reduces to

    hx(u)={12u2u0;tu=0.
  2. We can see that infuRhx(u)=0.

  3. However, hx doesn’t attain the value 0 for any uR.

  4. Hence, the set of minimizers is empty.

In conclusion, this function

  1. has a unique minimizer u=x for all x0.

  2. has no minimizer for x=0.

12.3.1.3. Constant Value Function#

Example 12.4 (Constant value function)

Let f:RR defined as f(x)=c where cR.

proxf(x)=argminuR{c+12|ux|2}={x}.

12.3.1.4. 1D Linear Function I#

Example 12.5 (1D Linear function I)

Let f:RR be given as f(x)=x.

Define:

g(u)=f(u)+12|ux|2=u+12(ux)2.

Differentiating, we get:

g(u)=1+(ux).

Setting g(u)=0, we get:

1+ux=0u=x1.

The second derivative g(u)=1 confirms that it is indeed the minimizer.

Thus,

proxf(x)={x1}.

12.3.1.5. 1D Linear Function II#

Example 12.6 (1D Linear function II)

Let f:RR be given as f(x)=λx with λR.

Define:

g(u)=f(u)+12|ux|2=λu+12(ux)2.

Differentiating, we get:

g(u)=λ+(ux).

Setting g(u)=0, we get:

λ+ux=0u=xλ.

The second derivative g(u)=1 confirms that it is indeed the minimizer.

Thus,

proxf(x)={xλ}.

12.3.1.6. 1D Affine Function#

Example 12.7 (1D Affine function)

Let f:RR be given as f(x)=αx+β with α,βR.

Define:

g(u)=f(u)+12|ux|2=αu+β+12(ux)2.

Differentiating, we get:

g(u)=α+(ux).

Setting g(u)=0, we get:

α+ux=0u=xα.

The second derivative g(u)=1 confirms that it is indeed the minimizer.

Thus,

proxf(x)={xα}.

12.3.1.7. Nonemptiness Conditions#

It is imperative to identify conditions under which a proximal mapping is nonempty.

Theorem 12.1 (Nonemptiness under closedness and coerciveness)

Let f:V(,] be a proper and closed function. Assume that the function

uf(u)+12ux2

for any xV is coercive.

Then the set proxf(x) is nonempty and compact for any xV.

Recall from Definition 8.8 that a function h is coercive if for every sequence {xn} such that limkxk=, we have limkh(xk)=.

Proof. Define:

h(u)f(u)+12ux2.
  1. Then proxf(x)=argminuVh(u).

  2. h is a sum of two closed functions. Hence h is a closed function due to Theorem 3.95.

  3. It is given that h is coercive.

  4. h is proper since f is proper and 12ux2 is real valued.

  5. Thus, h is proper, closed and coercive.

  6. Due to the Weierstrass’ theorem (Theorem 8.9), a proper, closed and coercive function g has a nonempty and compact set of minimizers.

12.3.2. Proximal Operator#

Under suitable conditions, proxf(x) is always a singleton for every xV. Then the proximal mapping can be thought of as a function proxf:VV mapping every point in V to a proximal point.

Theorem 12.2 (First prox theorem)

Let f:V(,] be a proper, closed and convex function. Then, proxf(x) is a singleton for every xV.

Proof. Define:

hx(u)f(u)+12ux2.

Then

proxf(x)=argminuVhx(u).
  1. f is a closed and convex function.

  2. dx(u)=12ux2 is a closed and strongly convex function (Theorem 9.268).

  3. Hence, their sum hx is a closed and strongly convex function (Theorem 9.270).

  4. Since f is proper, and dx is real valued, hence hx is also proper.

  5. Thus, hx is a proper, closed and strongly convex function.

  6. Due to Theorem 9.272, there exists a unique minimizer for hx.

  7. Thus, the set proxf(x)=argminhx(u) is a singleton.

With this result, we are ready to introduce the proximal operator.

Definition 12.2 (Proximal operator)

Let f:V(,] be a proper, closed and convex function. Since the point to set mapping proxf:V2V maps every point in V to a singleton subset of V due to Theorem 12.2, we abuse the notation and redefine proxf:VV as a point to point mapping. We call this mapping as a proximal operator of f.

In other words, we write proxf(x)=y rather than proxf(x)={y}.

Thus, for a proper, closed and convex function f, the operator proxf:VV maps each point xV to a unique minimizer of the function hx(u)=f(u)+12ux2.

Readers are suggested to look at some of the examples in the Examples below before proceeding with the proximal calculus rules.

12.3.3. Proximal Calculus#

This section deals with the rules which enable us to compute the proximal mappings for a function if we know the proximal mappings of underlying building blocks.

12.3.3.1. Separable Functions#

Theorem 12.3 (Proximal mappings of separable functions)

Suppose that f:V1Vm(,] is given by

f(x1,,xm)=i=1mfi(xi)xiVi,i=1,,m.

Then for every x1V1,,xmVm,

proxf(x1,,xm)=proxf1(x1)××proxfm(xm).

Note that the theorem is presented in terms of sets. The L.H.S. proxf(x1,,xm) is a set. Similarly, R.H.S. is a Cartesian product of proximal mappings proxfi(xi) which are individually sets.

Proof. We start with the definition of the proximal mapping and expand it in terms of the separable functions. We note that the quadratic term 12x2 is also separable.

proxf(x)=argminu{f(u)+12ux2}=argminu1,,um{i=1mfi(ui)+12i=1muixi2}=argminu1,,um{i=1m(fi(ui)+12uixi2)}=i=1margminui{fi(ui)+uixi2}=i=1mproxfi(xi).

Here the symbol denotes the Cartesian product.

Corollary 12.1 (Proximal operator for separable convex functions)

Let f:Rn(,] be a proper, closed and convex function. Assume that Rn=Rn1Rnm with n=n1++nm.

Let f be separable over Rn1Rnm and given by

f(x1,,xm)=i=1mfi(xi)xiRni,i=1,,m.

Further assume that fi are proper, closed and convex functions themselves.

Then for every x1Rn1,,xmRnm,

proxf(x1,,xm)=(proxf1(x1),,proxfm(xm)).

In the special case, where f is separable over each coordinate; i.e.

f(x)=i=1nfi(xi)

then

proxf(x)=(proxf1(x1),,proxfn(xn)).

Proof. Since f and fi are proper, closed and convex, hence due to Theorem 12.2, the proximal mappings proxf(x) and proxfi(xi) are singletons. In this case, the Cartesian product reduces to the concatenation of coordinates.

Applications: Example 12.16.

12.3.3.2. Scaling and Translation#

Theorem 12.4 (Scaling and translation of input)

Let g:V(,] be a proper function. Let t0 be a scaling parameter and aV be a translation parameter. Define f:V(,] as

f(x)=g(tx+a).

Then given proxg, the proxf is

proxf(x)=1t[proxt2g(tx+a)a].

Proof. Starting from the definition of the proximal mapping

proxf(x)=argminuV{f(u)+12ux2}=argminuV{g(tu+a)+12ux2}.
  1. The objective function of this minimization problem is

    g(tu+a)+12ux2.
  2. We introduce a change of variable z=tu+a to construct an equivalent minimization problem. See Proposition 8.2 for the construction of equivalent optimization problems by change of variable.

  3. Then u=1t(za).

  4. The objective function changes to

    g(z)+121t(za)x2=1t2[(t2g)(z)+12z(tx+a)2].
  5. The minimizers of this objective function (over z) are given by

    zproxt2g(tx+a).

    Note that the scaling term 1t2 in the objective function doesn’t impact the set of minimizers. It only impacts the optimal value.

  6. Since u=1t(za), hence minimizers of the original objective function are given by

    u1t[proxt2g(tx+a)a].

Theorem 12.5 (Proximal mapping for tg(/t))

Let g:V(,] be a proper function. Let t0 be a scaling parameter. Define f:V(,] as

f(x)=tg(xt).

Then

proxf(x)=tproxg/t(x/t).

Proof. Starting from the definition of the proximal mapping

proxf(x)=argminuV{f(u)+12ux2}=argminuV{tg(ut)+12ux2}.
  1. The objective function is

    tg(ut)+12ux2.
  2. Introduce a change of variable z=ut.

  3. Then u=tz.

  4. The objective function changes to

    tg(z)+12tzx2=t2[g(z)t+12zxt2].
  5. The minimizers of this objective function are given by

    zproxg/t(x/t).
  6. The minimizers of the original objective function are then given by

    utproxg/t(x/t).

12.3.3.3. Quadratic Perturbation#

Theorem 12.6 (Quadratic perturbation)

Let g:V(,] be a proper function. Define f:V(,] as

f(x)=g(x)+c2x2+x,a+d

where c>0, aV and dR. Then

proxf(x)=prox1c+1g(xac+1).

Proof. Starting from the definition of the proximal mapping

proxf(x)=argminuV{f(u)+12ux2}=argminuV{g(u)+c2u2+u,a+d+12ux2}=argminuV{g(u)+c+12u2+u,ax+12x2+d}=argminuV{g(u)+c+12(u22u,xac+1)}=argminuV{g(u)+c+12u(xac+1)2}=argminuV{(1c+1g)(u)+12u(xac+1)2}=prox1c+1g(xac+1).

Since the quantities x, a and d are constant w.r.t. u, hence we have removed or added terms which solely depend on these quantities from the objective function as and when required in the derivation above.

Applications: Example 12.10.

12.3.3.4. Composition with Affine Mapping#

Theorem 12.7 (Composition with affine mapping)

Let g:Rm(,] be a proper, closed and convex function. Let f:V(,] be given by

f(x)=g(A(x)+b)

where bRm and A:VRm is a linear transformation satisfying the property

AAT=αI

for some constant α>0 and I:VV is the identity transformation.

Then for any xV,

proxf(x)=x+1αAT(proxαg(A(x)+b)A(x)b).

Proof. Starting from the definition of the proximal mapping

proxf(x)=argminuV{f(u)+12ux2}=argminuV{g(A(u)+b)+12ux2}.
  1. Introduce a variable z=A(u)+b.

  2. The optimization problem transforms to

    minuV,zRmg(z)+12ux2 subject to z=A(u)+b$.
  3. Since g is proper, closed and convex, hence this problem has a unique solution. TODO. HOW?

  4. Let the optimal solution be given by zu.

12.3.4. Examples#

Remainder of this section will be dedicated for the computation of proximal mappings or operators for different functions. Most of these functions are convex (with the exception of a few). The proximal mappings will be computed either from the first principles (by solving the minimization problem of f(u)+12ux2) or by making use of the proximal calculus rules developed above.

Some key ideas that will be repeatedly used in the computation of the proximal operator in this section. Assume that f is a convex function with S=domf and is differentiable over an open set US. This happens when domf is not an open set, f is differentiable in the interior of domf but not at the boundary points.

  1. if f(u)=0 for some uU, then u must be one of its minimizers (Theorem 10.49).

  2. If a minimizer of f exists and is not attained at any point of differentiability, then it must be attained at a point of nondifferentiability (Theorem 10.50).

  3. Since for a convex function f, the existence of a unique proximal mapping is guaranteed, hence the minimizer of the function hx exists.

  4. Then the minimizer of hx is either at a point of differentiability where the gradient vanishes or at the boundary where it is nondifferentiable.

12.3.5. 1-dim Examples#

12.3.5.1. Indicator Functions#

Example 12.8 (Indicator function for an interval)

Let r[0,]. Let f:R(,] be given by

f(x)=I[0,r]R(x).

The proximal operator is given by

proxf(x)=min{max{x,0},r}.

Proof. .

  1. We construct the objective function

    hx(u)=I[0,r]R(x)+12(ux)2.
  2. Let u~ denote the minimizer of hx(u).

  3. Let w denote the function w(u)=12(ux)2.

  4. First consider the case where r<.

  5. Then hx(u)=w(u) over [0,r] and otherwise.

  6. domhx=[0,r]. intdomhx=(0,r). The boundary points are 0 and r.

  7. The minimizer of w(u) is u=x.

  8. Therefore if 0xr, then u~=x.

  9. For the cases where x[0,r], the minimizer must be one of the boundary points.

  10. If x<0, then w(u) is an increasing function over [0,r].

  11. Hence for x<0, u~=0.

  12. If x>r, then w(u) is a decreasing function over [0,r].

  13. Hence for x>r, u~=r.

  14. Thus, if r<, then the proximal operator is given by

    proxf(x)={x,0xr0,x<0r,x>r=min{max{x,0},r}.
  15. For r=, f(x)=I[0,)(x).

  16. Thus, hx(u)=w(u) for u0.

  17. Hence the minimizer u~=x for x0 and u~=0 for x<0.

  18. In other words,

    proxf(x)=[x]+=max{x,0}=min{max{x,0},}.
  19. Combining these two cases

    proxf(x)=min{max{x,0},r}.

12.3.5.2. Linear#

Example 12.9 (Linear over R+)

Let μR. Let f:R(,] be given by

f(x)={μx,x0;,x<0.

The proximal operator is given by

proxf(x)=[xμ]+.

Proof. .

  1. domf=R+.

  2. f is differentiable over R++.

  3. Let

    hx(u)==f(u)+12(ux)2={μu+12(ux)2,u0;,u<0.
  4. hx is a proper convex function with domhx=R+.

  5. hx is differentiable over R++.

  6. hx(u)=μ+ux for all u>0.

  7. Setting it to zero, we get u=xμ.

  8. Thus, if x>μ, then the minimizer is u=xμ.

  9. Otherwise, hx obtains its minimum value at u=0 which is the only point of nondifferentiability in its domain.

  10. Thus, if xμ, then the minimizer is u=0.

Example 12.10 (Linear over an interval [0,a])

Let μR and let a[0,]. Let f:R(,] be given by

f(x)={μx,x[0,a];, otherwise.

The proximal operator is given by

proxf(x)=min{max{xμ,0},a}.

Proof. We note that

f(x)=I[0,a]R(x)+μx.
  1. From Example 12.8, the proximal operator for the indicator function is given by

    proxI[0,a]R(x)=min{max{x,0},a}.
  2. We can write f as a quadratic perturbation of I[0,a]R given by

    f(x)=I[0,a]R(x)+02x2+μx+0.
  3. Following Theorem 12.6,

    proxf(x)=proxI[0,a]R(xμ)=min{max{xμ,0},a}.

12.3.5.3. Absolute Value#

Example 12.11 (Scaled absolute value)

Let tR+. Let f:RR be given by

f(x)=t|x|.

The proximal operator is given by

proxf(x)=[|x|t]+sgn(x).

Proof. .

  1. Let

    hx(u)=t|u|+12(ux)2.
  2. hx is differentiable everywhere except at u=0.

  3. At u0, hx(u)=sgn(u)t+ux.

  4. If the minimizer is obtained at u>0, then t+ux=0 giving us u=xt.

  5. Thus, a minimizer at u>0 is attained if x>t.

  6. If the minimizer is obtained at u<0, then t+ux=0 giving us u=x+t.

  7. Thus, a minimizer at u<0 is attained if x<t.

  8. Consequently, if x[t,t], then the minimizer of hx must be at the only point of nondifferentiability, namely u=0.

  9. This gives us

    proxf(x)={xt,x>tx+t,x<t0,txt.
  10. The conditions x>t and x<t can be combined as |x|>t.

  11. The condition txt simplifies to |x||t|.

  12. We can see that the three cases for proxf(x) can be simplified to the expression

    proxf(x)=[|x|t]+sgn(x).

12.3.5.4. Cube#

Example 12.12 (Scaled cube over R+)

Let t>0. Let f:R(,] be given by

f(x)={tx3,x0;,x<0.

The proximal operator is given by

proxf(x)=1+1+12t[x]+6t.

Proof. .

  1. We construct the objective function

    hx(u)=f(u)+12(ux)2={tu3+12(ux)2,u0;,u<0.
  2. hx is differentiable for u>0.

  3. hx(u)=3tu2+ux for u>0.

  4. Note that if x0 then hx(u)>0 for every u>0.

  5. Hence hx(u)=0 does not have a positive root for x0.

  6. For x>0, setting hx(u) to zero, we get 3u2+ux=0 whose positive solution is:

    u=1+1+12tx6t.
  7. We can see that hx(u)=0 has a positive solution if and only if x>0.

  8. If x0, then the derivative never vanishes for any u>0.

  9. Hence the minimum of hx is attained at the only point of nondifferentiability u=0.

  10. Thus, the minimizer of hx(u) for x0 is u=0.

  11. Hence

    proxf(x)={1+1+12tx6t,x>00,x0.
  12. Note that

    1+1+12t[x]+6t={1+1+12tx6t,x>00,x0.
  13. This concludes the proof.

12.3.5.5. Logarithms#

Example 12.13 (Scaled negative logarithm)

Let t>0. Let f:R(,] be given by

f(x)={tlnx,x>0;,x0.

The proximal operator is given by

proxf(x)=x+x2+4t2.

Proof. .

  1. We construct the objective function

    hx(u)=f(u)+12(ux)2={tlnu+12(ux)2,u>0;,u0.
  2. hx is differentiable for u>0.

  3. hx(u)=tu+(ux) for u>0.

  4. u~>0 is a minimizer if

    tu~+(u~x)=0u~2xu~t=0u~=x+x2+4t2.
  5. We note that x+x2+4t>0 for every xR.

  6. Hence u~>0 as desired.

  7. Hence proxf(x)=x+x2+4t2.

As we can see, domhx is the open set R++ and hx is differentiable at every point in its domain. Since hx must have a unique minimizer, hence hx(u)=0 must have a unique positive solution.

12.3.6. Affine Functions#

Example 12.14 (Affine function)

Let f(x)=x,a+b where aV and bR.

hx(u)=u,a+b+12ux2.
  1. Differentiating hx, we get hx(u)=a+ux.

  2. Setting it to zero, we see that u=xa is the minimizer.

Hence

proxf(x)=xa.

12.3.7. Convex Quadratic Functions#

Example 12.15 (Convex quadratic)

Let f:RnR be given by f(x)=12xTAx+bTx+c where AS+n, bRn and cR.

hx(u)=12uTAu+bTu+c+12ux2.
  1. Differentiating hx, we get

    hx(u)=Au+b+ux.
  2. Setting this to zero, we see that

    Au+b+ux=0(A+I)u=xbu=(A+I)1(xb).

Hence

proxf(x)=(A+I)1(xb).

12.3.8. Norms#

12.3.8.1. 1 Norm#

Example 12.16 (Scaled 1 norm)

Let γ>0. Let f:RnR be given by

f(x)=γx1=i=1nγ|xi|.

Then, the proximal operator is given by

proxf(x)=STγ(x)

where the univariate soft-thresholding operator STγ:RR is defined as

STγ(x)={xγforx>γx+γforx<γ0for|x|γ

and the multivariate soft-thresholding operator STγ:RnRn is defined by the component wise application of the univariate soft thresholding operator

STγ(x)=(STγ(xj))j=1n.

Proof. We note that

f(x)=i=1ng(xi)

where g(x)=γ|x|.

By Example 12.11,

proxg(x)=[|x|γ]+sgn(x)=STγ(x).

By Corollary 12.1,

proxf(x)=(STγ(xi))i=1n=STγ(x).

12.3.8.2. 0 “Norm”#

Example 12.17 (Scaled 0 “norm”)

Let γ>0. Let f:RnR be given by

f(x)=γx0=i=1ng(xi)

where

g(x)={γ,x0;0,x=0.

Then, the proximal operator is given by

proxf(x)=H2γ(x1)××H2γ(xn)

where the univariate hard-thresholding operator Ht:R2R for any t>0 is defined as

Ht(x)={{0}|x|<t{x}|x|>t{0,x}|x|=t.

Note this this version of hard thresholding operator is a point to set mapping. In this case, the operator leads to two possible values when |x|=t. The 0-“norm” is not a convex function. Hence its proximal mapping is not always a singleton.

Proof. We note that g(x)=h(x)+γ

where

h(x)={0,x0;γ,x=0.
  1. Following Example 12.2,

    proxh(x)={{0},|x|<2γ,{x}|x|>2γ,{0,x}|x|=2γ.
  2. Since the proximal operator is not affected by a constant offset, hence proxg=proxh.

  3. We can see that proxg=H2γ.

  4. It is clear that the proximal mappings are not unique.

  5. By Theorem 12.3,

    proxf(x)=proxg(x1)××proxg(xn)=H2γ(x1)××H2γ(xn)

    as desired.

12.3.9. Logarithms#

Example 12.18 (Negative sum of logs)

Let γ>0. Let f:Rn(,] be given by

f(x)={γi=1nlnxixsucc0; otherwise .

Then, the proximal operator is given by

proxf(x)=(xi+xi2+4γ2)i=1n.

Proof. We note that

f(x)=i=1ng(xi)

where

g(x)={γlnx,x>0;,x0.

From Example 12.13,

proxg(x)=x+x2+4t2.

By Corollary 12.1,

proxf(x)=(proxg(xi))i=1n.