10.2. Projection on Convex Sets#

We will assume V to be real n-dimensional vector space space with an inner product , , the induced norm and corresponding dual norm for the dual space V.

We are interested in mapping a point x to the nearest point in a set C. In general, this problem may have zero, one or multiple solutions. However, in the special case where C is a nonempty, closed and convex set, then there is exactly one such point in C which is nearest to a given point x. This nearest point is called the projection of x on to C.

Main references for this section are [6, 9].

10.2.1. Projection Theorem#

Theorem 10.6 (Projection theorem)

Let C be a nonempty, closed and convex subset of V. For every xV, there exists a unique vector that minimizes zx over all zC. This vector is called the projection of x on C.

Proof. We fix some xV and choose an element wC.

  1. Consider the function

    g(z)=12zx2.
  2. Minimizing zx over all C is equivalent to minimizing g(z) over the set

    D={zC|zxwx}.
  3. We note that D is a compact set and g is a l.s.c., closed and coercive function.

  4. By Weierstrass’ theorem Corollary 8.2, the set of minimizers for g is nonempty and compact.

We now show that the minimizer is unique.

  1. g is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.

  2. Hence, the minimizer is unique due to Theorem 10.2.

10.2.2. Orthogonal Projection#

Definition 10.6 (Orthogonal Projection Mapping)

Let C be a nonempty, closed and convex subset of V. The orthogonal projection mapping PC:VV is defined by:

PC(x)argminyCyxxV.

This mapping is well defined since the projection is unique for a nonempty, closed and convex set C due to Theorem 10.6.

The vector PC(x) is called the projection of x on the set C.

10.2.3. Characterization#

Theorem 10.7 (Orthogonal projection characterization)

Let C be a nonempty, closed and convex subset of V. For every vector xV, a vector zC is its projection if and only if

yz,xz0yC.

This result is also known as the second projection theorem [6].

Proof. Assume that for some zC, yz,xz0yC holds true.

  1. For any yC

    yx2=(yz)(xz)2=yz2+zx22yz,xzzx22yz,xz.
  2. Thus,

    yx2zx2yC.
  3. Thus, z is indeed the projection of x on C.

Conversely, assume that z is the projection of x on C.

  1. Let yC be arbitrary.

  2. For any t0, define yt=ty+(1t)z.

  3. Then, we have

    xyt2=tx+(1t)x+ty(1t)z2=t(xy)+(1t)(xz)2=t2xy2+(1t)2xz2+2t(1t)xy,xz.
  4. Viewing xyt2 as a function of t and differentiating w.r.t. t, we have

    ddtxyt2|t=0=2xz2+2xy,xz=2xz,xz+2xy,xz=2yz,xz.
  5. Since t=0 minimizes xyt2 over t[0,1], we must have

    ddtxyt2|t=00.
  6. Thus, we require that

    yz,xz0

    must hold true for every yC.

Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where V=Rn.

Proof. Define a function f:RnR as

f(y)=yx2.

Then, the projection problem can be cast as an optimization problem

minimize f(y)subject to yC.

Note that the gradient of f is given by

f(y)=yx,yx=(y,y2y,x+x,x)=2(yx).

By Theorem 10.47, z is an optimal solution if and only if

f(z)T(yz)0yC.

In other words

2(zx)T(yz)0yC.

We can simplify this as

xz,yz0yC.

Theorem 10.8 (Orthogonal projection on an affine subspace)

Let C be an affine subspace of V. Let S be the linear subspace parallel to C. For every vector xV, a vector zC is its projection if and only if

xzS.

Proof. Since C is an affine subspace of V, hence C is nonempty, convex and closed (as V is finite dimensional).

  1. By Theorem 10.7, z is the projection of x on C if and only if for every yC, we have

    yz,xz0.
  2. But yC if and only if yzS.

  3. Hence the condition is equivalent to

    w,xz0wS.
  4. But then, it must be an equality since w and w both belong to S. Thus, we have

    w,xz=0wS.
  5. In other words, xzS.

10.2.4. Distance Function#

Recall that the distance of a point xV from a set C is defined as

dC(x)infyCxy.

Theorem 10.9 (Distance function for nonempty, closed and convex set)

Let C be a nonempty, closed and convex subset of V. Then the function dC:VR defining the distance of a point xV from the set C satisfies

dC(x)=xPC(x).

Proof. By Theorem 10.6, there exists a unique point PC(x) which minimizes the distance between x and C. Hence

dC(x)=xPC(x)

must hold true.

Theorem 10.10 (Distance function for nonempty, closed and convex set is convex)

Let C be a nonempty, closed and convex subset of V. Let dC:VR be the distance to the set C function as defined in Theorem 10.9. Then, dC is convex.

Proof. Assume, for contradiction, that dC is not convex.

  1. Then, there exist x,yV and t(0,1) such that

    dC(tx+(1t)y)>tdC(x)+(1t)dC(y).
  2. Let u=PC(x) and v=PC(y). By definition, u,vC.

  3. Then,

    tdC(x)+(1t)dC(y)=tux+(1t)vy.
  4. Since C is convex, hence, tu+(1t)vC.

  5. Since, dC(tx+(1t)y) minimizes the distance of C from the point tx+(1t)y, hence

    tu+(1t)vtx(1t)ydC(tx+(1t)y).
  6. Rewriting,

    dC(tx+(1t)y)t(ux)+(1t)(vy)tux+(1t)vy

    due to triangle inequality.

  7. But, this leads to the contradiction

    tux+(1t)vy<dC(tx+(1t)y)tux+(1t)vy.
  8. Hence, dC must be convex.

10.2.5. Nonexpansiveness#

Definition 10.7 (Nonexpansiveness property)

Let V be a normed linear space. An operator T:VV is called nonexpansive if

T(x)T(y)yxx,yV.

In other words, the distance between mapped points in V is always less than or equal to the distance between original points in V.

Theorem 10.11 (Nonexpansive operators are Lipschitz continuous)

Let V a be normed linear space. A nonexpansive operator T:VV is Lipschitz continuous. Hence, it is uniformly continuous.

Proof. Recall from Definition 3.45 that if T is a Lipschitz map, then there exists L>0 such that

T(x)T(y)Lxy

for every x,yV.

For a nonexpansive operator, such L=1. This T is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.

Definition 10.8 (Firm nonexpansiveness property)

Let V be a real inner product space. An operator T:VV is called firmly nonexpansive if

T(x)T(y),xyT(x)T(y)2

holds true for every x,yV.

Theorem 10.12 (A firmly nonexpansive operator is nonexpansive)

Let V be a real inner product space. Let T:VV be a firmly nonexpansive operator. Then, T is nonexpansive.

Proof. For every x,yV, we have

T(x)T(y)2T(x)T(y),xy.

Applying Cauchy Schwartz inequality on R.H.S., we get

T(x)T(y)2T(x)T(y)xy.

Canceling terms, we get:

T(x)T(y)xy

which is the nonexpansive property.

Theorem 10.13 (Orthogonal projection is nonexpansive)

Let C be a nonempty, closed and convex subset of V. Let PC:VV be the orthogonal projection operator as defined in Definition 10.6 is nonexpansive (and therefore continuous).

In other words,

PC(x)PC(y)xyx,yV.

Proof. Let x,yV.

  1. By Theorem 10.7,

    wPC(x),xPC(x)0wC.
  2. In particular PC(y)C. Hence,

    PC(y)PC(x),xPC(x)0.
  3. Similarly, starting with PC(x), we obtain

    PC(x)PC(y),yPC(y)0.
  4. Adding these two inequalities, we obtain

    PC(y)PC(x),xPC(x)y+PC(y)0.
  5. By rearranging the terms, we get

PC(y)PC(x),PC(y)PC(x)PC(y)PC(x),yx.
  1. Applying the Cauchy Schwartz inequality on the R.H.S., we obtain

PC(y)PC(x)2PC(y)PC(x)yx.
  1. Thus, PC is nonexpansive.

  2. Since PC is nonexpansive, hence PC is continuous also.

Theorem 10.14 (Orthogonal projection is firmly nonexpansive)

Let C be a nonempty, closed and convex subset of V. Let PC:VV be the orthogonal projection operator as defined in Definition 10.6. Then PC is a firmly nonexpansive operator.

In other words,

PC(x)PC(y),xyPC(x)PC(y)2

holds true for every x,yV.

Proof. Recall from Theorem 10.7 that for any uV and vC

vPC(u),uPC(u)0.
  1. Substituting u=x and v=PC(y), we obtain

    PC(y)PC(x),xPC(x)0.
  2. Substituting u=y and v=PC(x), we obtain

    PC(x)PC(y),yPC(y)0.
  3. Adding the two inequalities gives us

    PC(x)PC(y),yPC(y)x+PC(x)0PC(x)PC(y),(yx)+(PC(x)PC(y))0PC(x)PC(y)2PC(x)PC(y),xy

    as desired.

10.2.6. Squared Distance Function#

Definition 10.9 (Squared distance function to a nonempty set)

Let C be a nonempty subset of V. The squared distance to set C function denoted as φC:VR is defined as:

φC(x)12dC2(x).

We also define ψC:VR as:

ψC(x)12(x2dC2(x)).

Theorem 10.15 (Expression for ψC)

Let C be a nonempty subset of V. Then, the function ψC as defined in Definition 10.9 is given by

ψC(x)=supyC[y,x12y2].

Proof. We proceed as follows.

  1. Expanding on the definition of dC2

    dC2(x)=infyCxy2=infyCxy,xy=infyC(x22x,y+y2)=infyC(x2(2x,yy2))=x2supyC(2x,yy2).
  2. Thus,

    x2dC2(x)=supyC(2x,yy2).
  3. Thus,

    ψC(x)=12(x2dC2(x))=supyC[x,y12y2].

Theorem 10.16 (ψC is convex)

Let C be a nonempty subset of V. Then, the function ψC as defined in Definition 10.9 is convex.

Beauty of this result is the fact that ψC is convex irrespective of whether C is convex or not.

Proof. We proceed as follows.

  1. For every yC, the function gy:VR, given by

    gy(x)=y,x12y2

    is an affine function.

  2. gy is convex for every yC due to Theorem 9.68.

  3. Now,

    ψC(y)=supyCgy.
  4. Thus, ψC is a pointwise supremum of convex functions.

  5. Thus, by Theorem 9.114, ψC is convex.

Theorem 10.17 (Squared distance function for nonempty, closed and convex sets)

Let C be a nonempty, closed and convex subset of V. Then, the squared distance function φC is given by

φC(x)=12xPC(x)2.

This follows directly from Theorem 10.10.

10.2.7. Gradients and Subgradients#

Theorem 10.18 (Gradient of the squared distance function)

Let C be a nonempty, closed and convex subset of V. The gradient of the squared distance function φC as defined in Definition 10.9 at xV is given by:

φC(x)=xPC(x)xV.

Proof. We proceed as follows.

  1. Let xV.

  2. Let zx=xPC(x).

  3. Consider the function

    gx(d)=φC(x+d)φC(x)d,zx.
  4. If

    limd0gx(d)d=0

    then zx is indeed the gradient of φC at x.

  5. By definition of orthogonal projection, for any dV,

    x+dPC(x+d)2x+dPC(x)2

    as PC(x+d) is the nearest point to x+d in C. PC(x) is just another point in C.

  6. Thus, for any dV

    gx(d)=φC(x+d)φC(x)d,zx=12x+dPC(x+d)212xPC(x)2d,zx12x+dPC(x)212xPC(x)2d,zx.
  7. Recall that for a norm induced by the inner product

    a+b2=a+b,a+b=a2+2a,b+b2.
  8. Thus,

    x+dPC(x)2=d+(xPC(x))2=d2+xPC(x)2+2d,xPC(x).
  9. Putting it back and simplifying, we obtain

    gx(d)12d2+d,xPC(x)d,zx=12d2.
  10. Proceeding similarly, we also have

    gx(d)12d2.
  11. Since φC is convex, hence gx is also convex.

  12. Thus,

    0=gx(0)=gx(12d+12(d))12gx(d)+12gx(d).
  13. Thus,

    gx(d)gx(d)12d2.
  14. Combining, we have

    12d2gx(d)12d2.
  15. Or, in terms of absolute values.

    |gx(d)|12d2.
  16. Then,

    |gx(d)|d12d.
  17. Thus,

    limd0gx(d)d=0
  18. Thus, zx=xPC(x) is indeed the gradient of φC at x.

Theorem 10.19 (Gradient of ψC.)

Let C be a nonempty, closed and convex subset of V. The gradient of the function ψC as defined in Definition 10.9 at xV is given by:

ψC(x)=PC(x)xV.

Proof. We have

ψC(x)=12(x2dC2(x))=12(x2φC(x).

Hence,

ψC(x)=xφC(x)=x(xPC(x))=PC(x).

Remark 10.5 (Distance function and square distance function relation)

We note that φC=gdC where g(t)=12[t]+2.

g is a nonincreasing real-valued convex differentiable function. We also note that

g(t)=2[t]+.

Theorem 10.20 (Subdifferential of the distance function)

Let C be a nonempty, closed and convex subset of V. The subdifferential of the distance function dC is given by

dC(x)={{xPC(x)dC(x)},xCNC(x)B[0,1],xC.

NC(x) denotes the normal cone of all vectors normal to the set C at a point xC.

Since dC is a singleton for xC, hence dC is differentiable at xC.

Proof. We can get the subdifferentials for dC by applying the chain rule.

  1. Recall that φC=gdC where g(t)=12[t]+2.

  2. Thus, by subdifferential chain rule (Theorem 9.229):

    φC(x)=g(dC(x))dC(x)=[dC(x)]+dC(x)=dC(x)dC(x).

    We used the fact that dC is nonnegative.

  3. Since φC is differentiable, hence φC(x)={xPC(x)}.

  4. If xC, then, dC(x)>0.

  5. Thus, for xC

    dC(x)={xPC(x)dC(x)}.
  6. For xC, dC(x)=0.

  7. We need to show that dC(x)=NC(x)B[0,1] in this case.

  8. Consider any ddC(x).

  9. Then, by subgradient inequality

    dC(y)dC(x)+yx,d=yx,dyV

    since dC(x)=0.

  10. Then, in particular, for any yC

    dC(y)=0yx,d.
  11. Thus, dNC(x).

10.2.8. Conjugates#

Theorem 10.21 (Conjugate of norm squared plus indicator)

Let C be a nonempty subset of V. Let f:V(,] be given by:

f(x)=12x2+δC(x).

Then, its conjugate is given by:

f(y)=12y212dC2(y)=ψC(y).

Further, if C is nonempty, closed and convex, then f=f. In other words, ψC=f.

Proof. Recall from Definition 9.78 that

f(y)=supxV{x,yf(x)}yV.

Expanding, we obtain

supxV{x,yf(x)}=supxV{x,y12x2δC(x)}=supxC{x,y12x2}=ψC(y).

The last result is due to Theorem 10.15.

If C is nonempty, closed and convex, then, f is proper closed and convex. Then, due to Theorem 9.242, the biconjugate of f is f itself.

10.2.9. Smoothness#

Recall from Definition 9.80 that a function f:VR is L-smooth if

f(x)f(y)Lxyx,yV.

Since the norm in this section is induced by the inner product, hence it is self dual.

Thus, the smoothness criteria becomes:

f(x)f(y)Lxyx,yV.

Theorem 10.22 (Smoothness of the squared distance function)

The squared distance function φC is 1-smooth.

Proof. Recall the definition of φC from Definition 10.9.

  1. By Theorem 10.18,

    φC(x)=xPC(x).
  2. Hence,

    φC(x)φC(y)2=xPC(x)y+PC(y)2=(xy)(PC(x)PC(y))2=xy22PC(x)PC(y),xy+PC(x)PC(y)2xy22PC(x)PC(y)2+PC(x)PC(y)2 firm nonexpansiveness =xy2PC(x)PC(y)2xy2.
  3. Thus,

    φC(x)φC(y)xy.
  4. Thus, φC is 1-smooth.

Theorem 10.23 (Smoothness of the ψC function)

The function ψC is 1-smooth.

Proof. We recall from Theorem 10.19 that ψC(x)=PC(x).

Hence,

ψC(x)ψC(y)=PC(x)PC(y)xy

due to the nonexpansiveness property (Theorem 10.13).

10.2.10. POCS Problems#

In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.

10.2.10.1. Equality Constrained Quadratic Programming#

Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.

Example 10.1 (Equality constrained quadratic programming)

We consider the quadratic programming problem

minimize 12x2+cTxsubject to Ax=0.

where

  1. xRn is the optimization variable.

  2. cRn is a given vector.

  3. ARm×n is an m×n matrix of rank m. Assume that m<n.

We proceed towards converting this problem into a projection on a convex set problem as follows.

  1. By adding a constant term 12c2 to the objective function, we obtain an equivalent problem.

    minimize 12c+x2subject to Ax=0.
  2. The set C={x|Ax=0} is the null space of the matrix A which is linear subspace, hence a nonempty, closed and convex set.

  3. Minimizing 12c+x2 is equivalent to minimizing (c)x subject to xC.

  4. (c)x is d(c,x), the distance between c and a point x.

  5. Thus, we are minimizing the distance of the point c among xC.

  6. This is nothing but the distance of c from the set C.

  7. Since, C is nonempty, close and convex, hence, there is a unique x which minimizes the distance due to the projection theorem.

  8. Thus, the solution is the projection of the vector c on the subspace C.

  9. By Theorem 10.8, x is the unique projection of c on C if and only if cxC.

  10. In other words, $c+x,x=0xC.$

  11. A closed form solution to this problem does exist given by

    x=(IAT(AAT)1bA)c.
  12. It is indeed the unique solution to this quadratic programming problem.