# 10.2. Projection on Convex Sets#

We will assume $$\VV$$ to be real $$n$$-dimensional vector space space with an inner product $$\langle \cdot, \cdot \rangle$$ , the induced norm $$\| \cdot \|$$ and corresponding dual norm $$\| \cdot \|_*$$ for the dual space $$\VV^*$$.

We are interested in mapping a point $$\bx$$ to the nearest point in a set $$C$$. In general, this problem may have zero, one or multiple solutions. However, in the special case where $$C$$ is a nonempty, closed and convex set, then there is exactly one such point in $$C$$ which is nearest to a given point $$\bx$$. This nearest point is called the projection of $$\bx$$ on to $$C$$.

Main references for this section are [6, 9].

## 10.2.1. Projection Theorem#

Theorem 10.6 (Projection theorem)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. For every $$\bx \in \VV$$, there exists a unique vector that minimizes $$\| \bz - \bx \|$$ over all $$\bz \in C$$. This vector is called the projection of $$\bx$$ on $$C$$.

Proof. We fix some $$\bx \in \VV$$ and choose an element $$\bw \in C$$.

1. Consider the function

$g(\bz) = \frac{1}{2} \| \bz - \bx \|^2.$
2. Minimizing $$\| \bz - \bx \|$$ over all $$C$$ is equivalent to minimizing $$g(\bz)$$ over the set

$D = \{ \bz \in C \ST \| \bz - \bx \| \leq \| \bw - \bx \| \}.$
3. We note that $$D$$ is a compact set and $$g$$ is a l.s.c., closed and coercive function.

4. By Weierstrass’ theorem Corollary 8.2, the set of minimizers for $$g$$ is nonempty and compact.

We now show that the minimizer is unique.

1. $$g$$ is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.

2. Hence, the minimizer is unique due to Theorem 10.2.

## 10.2.2. Orthogonal Projection#

Definition 10.6 (Orthogonal Projection Mapping)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. The orthogonal projection mapping $$P_C : \VV \to \VV$$ is defined by:

$P_C(\bx) \triangleq \underset{\by \in C}{\argmin} \| \by - \bx \| \Forall \bx \in \VV.$

This mapping is well defined since the projection is unique for a nonempty, closed and convex set $$C$$ due to Theorem 10.6.

The vector $$P_C(\bx)$$ is called the projection of $$\bx$$ on the set $$C$$.

## 10.2.3. Characterization#

Theorem 10.7 (Orthogonal projection characterization)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. For every vector $$\bx \in \VV$$, a vector $$\bz \in C$$ is its projection if and only if

$\langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C.$

This result is also known as the second projection theorem .

Proof. Assume that for some $$\bz \in C$$, $$\langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C$$ holds true.

1. For any $$\by \in C$$

$\begin{split} \| \by - \bx \|^2 &= \| (\by - \bz) - (\bx - \bz) \|^2 \\ &= \| \by - \bz \|^2 + \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle \\ &\geq \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle. \end{split}$
2. Thus,

$\| \by - \bx \|^2 \geq \| \bz - \bx \|^2 \Forall \by \in C.$
3. Thus, $$\bz$$ is indeed the projection of $$\bx$$ on $$C$$.

Conversely, assume that $$\bz$$ is the projection of $$\bx$$ on $$C$$.

1. Let $$\by \in C$$ be arbitrary.

2. For any $$t \geq 0$$, define $$\by_t = t \by + (1 -t ) \bz$$.

3. Then, we have

$\begin{split} \| \bx - \by_t \|^2 &= \| t \bx + (1-t) \bx + - t \by - (1 -t ) \bz \|^2 \\ &= \| t (\bx - \by) + (1- t) (\bx - \bz) \|^2\\ &= t^2 \| \bx - \by \|^2 + (1-t)^2 \| \bx - \bz \|^2 + 2 t (1-t) \langle \bx - \by, \bx - \bz \rangle. \end{split}$
4. Viewing $$\| \bx - \by_t \|^2$$ as a function of $$t$$ and differentiating w.r.t. $$t$$, we have

$\begin{split} \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} &= -2 \| \bx - \bz \|^2 + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \bx - \bz, \bx - \bz \rangle + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \by - \bz, \bx - \bz \rangle. \end{split}$
5. Since $$t=0$$ minimizes $$\| \bx - \by_t \|^2$$ over $$t \in [0,1]$$, we must have

$\left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} \geq 0.$
6. Thus, we require that

$\langle \by - \bz, \bx - \bz \rangle \leq 0$

must hold true for every $$\by \in C$$.

Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where $$\VV = \RR^n$$.

Proof. Define a function $$f: \RR^n \to \RR$$ as

$f(\by) = \| \by - \bx \|^2.$

Then, the projection problem can be cast as an optimization problem

$\begin{split} & \text{minimize } & & f(\by) \\ & \text{subject to } & & \by \in C. \end{split}$

Note that the gradient of $$f$$ is given by

$\nabla f (\by) = \nabla \langle \by - \bx, \by - \bx \rangle = \nabla (\langle \by, \by \rangle - 2 \langle \by, \bx \rangle + \langle \bx, \bx \rangle) = 2 (\by - \bx).$

By Theorem 10.47, $$\bz$$ is an optimal solution if and only if

$f(\bz)^T (\by - \bz) \geq 0 \Forall \by \in C.$

In other words

$2 (\bz - \bx)^T (\by - \bz) \geq 0 \Forall \by \in C.$

We can simplify this as

$\langle \bx - \bz, \by - \bz \rangle \leq 0 \Forall \by \in C.$

Theorem 10.8 (Orthogonal projection on an affine subspace)

Let $$C$$ be an affine subspace of $$\VV$$. Let $$S$$ be the linear subspace parallel to $$C$$. For every vector $$\bx \in \VV$$, a vector $$\bz \in C$$ is its projection if and only if

$\bx - \bz \in S^{\perp}.$

Proof. Since $$C$$ is an affine subspace of $$\VV$$, hence $$C$$ is nonempty, convex and closed (as $$\VV$$ is finite dimensional).

1. By Theorem 10.7, $$\bz$$ is the projection of $$\bx$$ on $$C$$ if and only if for every $$\by \in C$$, we have

$\langle \by - \bz, \bx - \bz \rangle \leq 0.$
2. But $$\by \in C$$ if and only if $$\by - \bz \in S$$.

3. Hence the condition is equivalent to

$\langle \bw, \bx - \bz \rangle \leq 0 \Forall \bw \in S.$
4. But then, it must be an equality since $$\bw$$ and $$-\bw$$ both belong to $$S$$. Thus, we have

$\langle \bw, \bx - \bz \rangle = 0 \Forall \bw \in S.$
5. In other words, $$\bx - \bz \in S^{\perp}$$.

## 10.2.4. Distance Function#

Recall that the distance of a point $$\bx \in \VV$$ from a set $$C$$ is defined as

$d_C(\bx) \triangleq \underset{\by \in C}{\inf} \| \bx - \by \|.$

Theorem 10.9 (Distance function for nonempty, closed and convex set)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. Then the function $$d_C : \VV \to \RR$$ defining the distance of a point $$\bx \in \VV$$ from the set $$C$$ satisfies

$d_C(\bx) = \| \bx - P_C(\bx) \|.$

Proof. By Theorem 10.6, there exists a unique point $$P_C(\bx)$$ which minimizes the distance between $$\bx$$ and $$C$$. Hence

$d_C(\bx) = \| \bx - P_C(\bx) \|$

must hold true.

Theorem 10.10 (Distance function for nonempty, closed and convex set is convex)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. Let $$d_C : \VV \to \RR$$ be the distance to the set $$C$$ function as defined in Theorem 10.9. Then, $$d_C$$ is convex.

Proof. Assume, for contradiction, that $$d_C$$ is not convex.

1. Then, there exist $$\bx, \by \in \VV$$ and $$t \in (0,1)$$ such that

$d_C(t \bx + (1-t) \by) > t d_C(\bx) + (1-t)d_C(\by).$
2. Let $$\bu = P_C(\bx)$$ and $$\bv = P_C(\by)$$. By definition, $$\bu, \bv \in C$$.

3. Then,

$t d_C(\bx) + (1-t)d_C(\by) = t \| \bu - \bx \| + (1-t) \| \bv - \by \|.$
4. Since $$C$$ is convex, hence, $$t \bu + (1-t) \bv \in C$$.

5. Since, $$d_C(t \bx + (1-t) \by)$$ minimizes the distance of $$C$$ from the point $$t \bx + (1-t) \by$$, hence

$\| t \bu + (1-t) \bv - t \bx - (1-t) \by \| \geq d_C(t \bx + (1-t) \by).$
6. Rewriting,

$d_C(t \bx + (1-t) \by) \leq \| t (\bu - \bx) + (1-t) (\bv - \by) \| \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \|$

due to triangle inequality.

$t \| \bu - \bx \| + (1-t) \| \bv - \by \| < d_C(t \bx + (1-t) \by) \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \|.$
8. Hence, $$d_C$$ must be convex.

## 10.2.5. Nonexpansiveness#

Definition 10.7 (Nonexpansiveness property)

Let $$\VV$$ be a normed linear space. An operator $$T : \VV \to \VV$$ is called nonexpansive if

$\| T (\bx) - T (\by) \| \leq \| \by - \bx \| \Forall \bx, \by \in \VV.$

In other words, the distance between mapped points in $$\VV$$ is always less than or equal to the distance between original points in $$\VV$$.

Theorem 10.11 (Nonexpansive operators are Lipschitz continuous)

Let $$\VV$$ a be normed linear space. A nonexpansive operator $$T : \VV \to \VV$$ is Lipschitz continuous. Hence, it is uniformly continuous.

Proof. Recall from Definition 3.45 that if $$T$$ is a Lipschitz map, then there exists $$L > 0$$ such that

$\| T (\bx) - T (\by) \| \leq L \| \bx - \by \|$

for every $$\bx, \by \in \VV$$.

For a nonexpansive operator, such $$L = 1$$. This $$T$$ is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.

Definition 10.8 (Firm nonexpansiveness property)

Let $$\VV$$ be a real inner product space. An operator $$T : \VV \to \VV$$ is called firmly nonexpansive if

$\langle T(\bx) - T(\by), \bx- \by \rangle \geq \| T(\bx) - T (\by) \|^2$

holds true for every $$\bx, \by \in \VV$$.

Theorem 10.12 (A firmly nonexpansive operator is nonexpansive)

Let $$\VV$$ be a real inner product space. Let $$T : \VV \to \VV$$ be a firmly nonexpansive operator. Then, $$T$$ is nonexpansive.

Proof. For every $$\bx, \by \in \VV$$, we have

$\| T(\bx) - T (\by) \|^2 \leq \langle T(\bx) - T(\by), \bx- \by \rangle.$

Applying Cauchy Schwartz inequality on R.H.S., we get

$\| T(\bx) - T (\by) \|^2 \leq \| T(\bx) - T(\by) \| \| \bx- \by \|.$

Canceling terms, we get:

$\| T(\bx) - T (\by) \| \leq \| \bx- \by \|$

which is the nonexpansive property.

Theorem 10.13 (Orthogonal projection is nonexpansive)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. Let $$P_C : \VV \to \VV$$ be the orthogonal projection operator as defined in Definition 10.6 is nonexpansive (and therefore continuous).

In other words,

$\| P_C (\bx) - P_C (\by) \| \leq \| \bx - \by \| \Forall \bx, \by \in \VV.$

Proof. Let $$\bx, \by \in \VV$$.

1. By Theorem 10.7,

$\langle \bw - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0 \Forall \bw \in C.$
2. In particular $$P_C(\by) \in C$$. Hence,

$\langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0.$
3. Similarly, starting with $$P_C(\bx)$$, we obtain

$\langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0.$
4. Adding these two inequalities, we obtain

$\langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) - \by + P_C(\by) \rangle \leq 0.$
5. By rearranging the terms, we get

$\langle P_C(\by) - P_C(\bx), P_C(\by) - P_C(\bx) \rangle \leq \langle P_C(\by) - P_C(\bx), \by - \bx \rangle.$
1. Applying the Cauchy Schwartz inequality on the R.H.S., we obtain

$\| P_C(\by) - P_C(\bx) \|^2 \leq \| P_C(\by) - P_C(\bx) \| \| \by - \bx \|.$
1. Thus, $$P_C$$ is nonexpansive.

2. Since $$P_C$$ is nonexpansive, hence $$P_C$$ is continuous also.

Theorem 10.14 (Orthogonal projection is firmly nonexpansive)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. Let $$P_C : \VV \to \VV$$ be the orthogonal projection operator as defined in Definition 10.6. Then $$P_C$$ is a firmly nonexpansive operator.

In other words,

$\langle P_C(\bx) - P_C(\by), \bx- \by \rangle \geq \| P_C(\bx) - P_C (\by) \|^2$

holds true for every $$\bx, \by \in \VV$$.

Proof. Recall from Theorem 10.7 that for any $$\bu \in \VV$$ and $$\bv \in C$$

$\langle \bv - P_C(\bu), \bu - P_C(\bu) \rangle \leq 0.$
1. Substituting $$\bu = \bx$$ and $$\bv = P_C(\by)$$, we obtain

$\langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0.$
2. Substituting $$\bu = \by$$ and $$\bv = P_C(\bx)$$, we obtain

$\langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0.$
3. Adding the two inequalities gives us

$\begin{split} & \langle P_C(\bx) - P_C(\by), \by - P_C(\by) - \bx + P_C(\bx) \rangle \leq 0 \\ & \iff \langle P_C(\bx) - P_C(\by), (\by - \bx) + (P_C(\bx) - P_C(\by)) \rangle \leq 0\\ &\iff \| P_C(\bx) - P_C(\by) \|^2 \leq \langle P_C(\bx) - P_C(\by), \bx - \by \rangle \end{split}$

as desired.

## 10.2.6. Squared Distance Function#

Definition 10.9 (Squared distance function to a nonempty set)

Let $$C$$ be a nonempty subset of $$\VV$$. The squared distance to set $$C$$ function denoted as $$\varphi_C : \VV \to \RR$$ is defined as:

$\varphi_C(\bx) \triangleq \frac{1}{2} d_C^2(\bx).$

We also define $$\psi_C : \VV \to \RR$$ as:

$\psi_C(\bx) \triangleq \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right).$

Theorem 10.15 (Expression for $$\psi_C$$)

Let $$C$$ be a nonempty subset of $$\VV$$. Then, the function $$\psi_C$$ as defined in Definition 10.9 is given by

$\psi_C(\bx) = \underset{\by \in C}{\sup} \left [ \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2 \right ].$

Proof. We proceed as follows.

1. Expanding on the definition of $$d_C^2$$

$\begin{split} d_C^2(\bx) &= \inf_{\by \in C} \| \bx - \by \|^2 \\ &= \inf_{\by \in C} \langle \bx - \by, \bx - \by \rangle \\ &= \inf_{\by \in C} (\| \bx \|^2 - 2 \langle \bx, \by \rangle + \| \by \|^2) \\ &= \inf_{\by \in C} (\| \bx \|^2 - (2 \langle \bx, \by \rangle - \| \by \|^2)) \\ &= \| \bx \|^2 - \sup_{\by \in C} (2 \langle \bx, \by \rangle - \| \by \|^2). \end{split}$
2. Thus,

$\| \bx \|^2 - d_C^2 (\bx) = \sup_{\by \in C} ( 2 \langle \bx, \by \rangle - \| \by \|^2 ).$
3. Thus,

$\psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \sup_{\by \in C} \left [\langle \bx, \by \rangle - \frac{1}{2} \| \by \|^2 \right ].$

Theorem 10.16 ($$\psi_C$$ is convex)

Let $$C$$ be a nonempty subset of $$\VV$$. Then, the function $$\psi_C$$ as defined in Definition 10.9 is convex.

Beauty of this result is the fact that $$\psi_C$$ is convex irrespective of whether $$C$$ is convex or not.

Proof. We proceed as follows.

1. For every $$\by \in C$$, the function $$g_y : \VV \to \RR$$, given by

$g_y (\bx) = \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2$

is an affine function.

2. $$g_y$$ is convex for every $$\by \in C$$ due to Theorem 9.68.

3. Now,

$\psi_C (\by) = \sup{\by \in C} g_y.$
4. Thus, $$\psi_C$$ is a pointwise supremum of convex functions.

5. Thus, by Theorem 9.114, $$\psi_C$$ is convex.

Theorem 10.17 (Squared distance function for nonempty, closed and convex sets)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. Then, the squared distance function $$\varphi_C$$ is given by

$\varphi_C(\bx) = \frac{1}{2}\| \bx - P_C(\bx) \|^2.$

This follows directly from Theorem 10.10.

Theorem 10.18 (Gradient of the squared distance function)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. The gradient of the squared distance function $$\varphi_C$$ as defined in Definition 10.9 at $$\bx \in \VV$$ is given by:

$\nabla \varphi_C(\bx) = \bx - P_C(\bx) \Forall \bx \in \VV.$

Proof. We proceed as follows.

1. Let $$\bx \in \VV$$.

2. Let $$\bz_x = \bx - P_C(\bx)$$.

3. Consider the function

$g_x(\bd) = \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle.$
4. If

$\lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0$

then $$\bz_x$$ is indeed the gradient of $$\varphi_C$$ at $$\bx$$.

5. By definition of orthogonal projection, for any $$\bd \in \VV$$,

$\| \bx + \bd - P_C(\bx + \bd) \|^2 \leq \| \bx + \bd - P_C(\bx) \|^2$

as $$P_C(\bx + \bd)$$ is the nearest point to $$\bx + \bd$$ in $$C$$. $$P_C(\bx)$$ is just another point in $$C$$.

6. Thus, for any $$\bd \in \VV$$

$\begin{split} g_x(\bd) &= \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle \\ &= \frac{1}{2} \| \bx + \bd - P_C(\bx + \bd) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle \\ &\leq \frac{1}{2} \| \bx + \bd - P_C(\bx) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle. \end{split}$
7. Recall that for a norm induced by the inner product

$\| \ba + \bb \|^2 = \langle \ba + \bb, \ba + \bb \rangle = \| \ba \|^2 + 2 \langle \ba, \bb \rangle + \| \bb \|^2.$
8. Thus,

$\begin{split} \| \bx + \bd - P_C(\bx)\|^2 &= \| \bd + (\bx - P_C(\bx)) \|^2\\ &= \| \bd \|^2 + \| \bx - P_C(\bx)\|^2 + 2 \langle \bd, \bx - P_C(\bx) \rangle. \end{split}$
9. Putting it back and simplifying, we obtain

$g_x(\bd) \leq \frac{1}{2}\| \bd \|^2 + \langle \bd, \bx - P_C(\bx)\rangle - \langle \bd, \bz_x \rangle = \frac{1}{2}\| \bd \|^2.$
10. Proceeding similarly, we also have

$g_x(-\bd) \leq \frac{1}{2}\| \bd \|^2.$
11. Since $$\varphi_C$$ is convex, hence $$g_x$$ is also convex.

12. Thus,

$0 = g_x(\bzero) = g_x \left (\frac{1}{2} \bd + \frac{1}{2} (-\bd) \right ) \leq \frac{1}{2} g_x(\bd) + \frac{1}{2} g_x(-\bd).$
13. Thus,

$g_x(\bd) \geq - g_x(-\bd) \geq - \frac{1}{2}\| \bd \|^2.$
14. Combining, we have

$- \frac{1}{2}\| \bd \|^2 \leq g_x(\bd) \leq \frac{1}{2}\| \bd \|^2.$
15. Or, in terms of absolute values.

$|g_x(\bd)| \leq \frac{1}{2}\| \bd \|^2.$
16. Then,

$\frac{|g_x(\bd)|}{\| \bd \|} \leq \frac{1}{2}\| \bd \|.$
17. Thus,

$\lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0$
18. Thus, $$\bz_x = \bx - P_C(\bx)$$ is indeed the gradient of $$\varphi_C$$ at $$\bx$$.

Theorem 10.19 (Gradient of $$\psi_C$$.)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. The gradient of the function $$\psi_C$$ as defined in Definition 10.9 at $$\bx \in \VV$$ is given by:

$\nabla \psi_C(\bx) = P_C(\bx) \Forall \bx \in \VV.$

Proof. We have

$\psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \frac{1}{2} (\| \bx \|^2 - \varphi_C(\bx).$

Hence,

$\nabla \psi_C(\bx) = \bx - \nabla \varphi_C(\bx) = \bx - (\bx - P_C(\bx)) = P_C(\bx).$

Remark 10.5 (Distance function and square distance function relation)

We note that $$\varphi_C = g \circ d_C$$ where $$g(t) = \frac{1}{2}[t]_+^2$$.

$$g$$ is a nonincreasing real-valued convex differentiable function. We also note that

$g'(t) = 2 [t]_+.$

Theorem 10.20 (Subdifferential of the distance function)

Let $$C$$ be a nonempty, closed and convex subset of $$\VV$$. The subdifferential of the distance function $$d_C$$ is given by

$\begin{split} \partial d_C (\bx) = \begin{cases} \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)}\right \}, & \bx \notin C\\ N_C(\bx) \cap B[\bzero, 1], & \bx \in C \end{cases}. \end{split}$

$$N_C(\bx)$$ denotes the normal cone of all vectors normal to the set $$C$$ at a point $$\bx \in C$$.

Since $$\partial d_C$$ is a singleton for $$\bx \notin C$$, hence $$d_C$$ is differentiable at $$\bx \notin C$$.

Proof. We can get the subdifferentials for $$d_C$$ by applying the chain rule.

1. Recall that $$\varphi_C = g \circ d_C$$ where $$g(t) = \frac{1}{2}[t]_+^2$$.

2. Thus, by subdifferential chain rule (Theorem 9.229):

$\partial \varphi_C (\bx) = g'(d_C(\bx)) \partial d_C(\bx) = [d_C(\bx)]_+ \partial d_C(\bx) = d_C(\bx) \partial d_C(\bx).$

We used the fact that $$d_C$$ is nonnegative.

3. Since $$\varphi_C$$ is differentiable, hence $$\partial \varphi_C(\bx) = \{\bx - P_C(\bx) \}$$.

4. If $$\bx \notin C$$, then, $$d_C(\bx) > 0$$.

5. Thus, for $$\bx \notin C$$

$\partial d_C(\bx) = \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)} \right \}.$
6. For $$\bx \in C$$, $$d_C(\bx) = 0$$.

7. We need to show that $$\partial d_C(\bx) = N_C(\bx) \cap B[\bzero, 1]$$ in this case.

8. Consider any $$\bd \in \partial d_C(\bx)$$.

$d_C(\by) \geq d_C(\bx) + \langle \by - \bx, \bd \rangle = \langle \by - \bx, \bd \rangle \Forall \by \in \VV$

since $$d_C(\bx) = 0$$.

10. Then, in particular, for any $$\by \in C$$

$d_C(\by) = 0 \geq \langle \by - \bx, \bd \rangle.$
11. Thus, $$\bd \in N_C(\bx)$$.

## 10.2.8. Conjugates#

Theorem 10.21 (Conjugate of norm squared plus indicator)

Let $$C$$ be a nonempty subset of $$\VV$$. Let $$f : \VV \to \RERL$$ be given by:

$f(\bx) = \frac{1}{2} \| \bx \|^2 + \delta_C(\bx).$

Then, its conjugate is given by:

$f^*(\by) = \frac{1}{2}\| \by \|^2 - \frac{1}{2} d_C^2 (\by) = \psi_C(\by).$

Further, if $$C$$ is nonempty, closed and convex, then $$f^{**} = f$$. In other words, $$\psi_C^* = f$$.

Proof. Recall from Definition 9.78 that

$f^*(\by) = \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} \Forall \by \in \VV^*.$

Expanding, we obtain

$\begin{split} \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} &= \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - \frac{1}{2} \| \bx \|^2 - \delta_C(\bx)\} \\ &= \underset{\bx \in C}{\sup} \{ \langle \bx, \by \rangle - \frac{1}{2} \| \bx \|^2 \} \\ &= \psi_C(\by). \end{split}$

The last result is due to Theorem 10.15.

If $$C$$ is nonempty, closed and convex, then, $$f$$ is proper closed and convex. Then, due to Theorem 9.242, the biconjugate of $$f$$ is $$f$$ itself.

## 10.2.9. Smoothness#

Recall from Definition 9.80 that a function $$f : \VV \to \RR$$ is $$L$$-smooth if

$\| \nabla f(\bx) - \nabla f(\by)\|_* \leq L \| \bx - \by \| \Forall \bx, \by \in \VV.$

Since the norm in this section is induced by the inner product, hence it is self dual.

Thus, the smoothness criteria becomes:

$\| \nabla f(\bx) - \nabla f(\by)\|_* \leq L \| \bx - \by \| \Forall \bx, \by \in \VV.$

Theorem 10.22 (Smoothness of the squared distance function)

The squared distance function $$\varphi_C$$ is 1-smooth.

Proof. Recall the definition of $$\varphi_C$$ from Definition 10.9.

1. $\nabla \varphi_C(\bx) = \bx - P_C(\bx).$
2. Hence,

$\begin{split} & \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \|^2 \\ &= \| \bx - P_C(\bx) - \by + P_C(\by) \|^2 \\ &= \| (\bx - \by) - (P_C(\bx) - P_C(\by)) \|^2 \\ &= \| \bx - \by \|^2 - 2 \langle P_C(\bx) - P_C(\by), \bx - \by \rangle + \| P_C(\bx) - P_C(\by) \|^2\\ &\leq \| \bx - \by \|^2 - 2 \| P_C(\bx) - P_C(\by) \|^2 + \| P_C(\bx) - P_C(\by) \|^2 & \text{ firm nonexpansiveness } \\ &= \| \bx - \by \|^2 - \| P_C(\bx) - P_C(\by) \|^2 \\ &\leq \| \bx - \by \|^2. \end{split}$
3. Thus,

$\| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \| \leq \| \bx - \by \|.$
4. Thus, $$\varphi_C$$ is 1-smooth.

Theorem 10.23 (Smoothness of the $$\psi_C$$ function)

The function $$\psi_C$$ is 1-smooth.

Proof. We recall from Theorem 10.19 that $$\nabla \psi_C(\bx) = P_C(\bx)$$.

Hence,

$\| \nabla \psi_C(\bx) - \nabla \psi_C(\by) \| = \| P_C(\bx) - P_C(\by) \| \leq \| \bx - \by \|$

due to the nonexpansiveness property (Theorem 10.13).

## 10.2.10. POCS Problems#

In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.

### 10.2.10.1. Equality Constrained Quadratic Programming#

Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.

Example 10.1 (Equality constrained quadratic programming)

We consider the quadratic programming problem

$\begin{split} & \text{minimize } & & \frac{1}{2} \| \bx \|^2 + \bc^T \bx \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}$

where

1. $$\bx \in \RR^n$$ is the optimization variable.

2. $$\bc \in \RR^n$$ is a given vector.

3. $$\bA \in \RR^{m \times n}$$ is an $$m \times n$$ matrix of rank $$m$$. Assume that $$m < n$$.

We proceed towards converting this problem into a projection on a convex set problem as follows.

1. By adding a constant term $$\frac{1}{2} \| \bc \|^2$$ to the objective function, we obtain an equivalent problem.

$\begin{split} & \text{minimize } & & \frac{1}{2} \| \bc + \bx \|^2 \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}$
2. The set $$C = \{ \bx \ST \bA \bx = \bzero \}$$ is the null space of the matrix $$\bA$$ which is linear subspace, hence a nonempty, closed and convex set.

3. Minimizing $$\frac{1}{2} \| \bc + \bx \|^2$$ is equivalent to minimizing $$\| (-\bc) - \bx \|$$ subject to $$\bx \in C$$.

4. $$\| (-\bc) - \bx \|$$ is $$d(-\bc, \bx)$$, the distance between $$-\bc$$ and a point $$\bx$$.

5. Thus, we are minimizing the distance of the point $$-\bc$$ among $$\bx \in C$$.

6. This is nothing but the distance of $$-\bc$$ from the set $$C$$.

7. Since, $$C$$ is nonempty, close and convex, hence, there is a unique $$\bx^*$$ which minimizes the distance due to the projection theorem.

8. Thus, the solution is the projection of the vector $$-\bc$$ on the subspace $$C$$.

9. By Theorem 10.8, $$\bx^*$$ is the unique projection of $$-\bc$$ on $$C$$ if and only if $$-\bc - \bx^* \in C^{\perp}$$.

10. In other words, $$$\langle \bc + \bx^*, \bx \rangle = 0 \Forall \bx \in C.$$$

11. A closed form solution to this problem does exist given by

$\bx^* = - (\bI - \bA^T (\bA \bA^T)^{-1} bA) \bc.$
12. It is indeed the unique solution to this quadratic programming problem.