10.2. Projection on Convex Sets#

We will assume $V$ to be real $n$ -dimensional vector space space with an inner product $⟨ \cdot, \cdot ⟩$ , the induced norm $‖ \cdot ‖$ and corresponding dual norm $‖ \cdot ‖_{*}$ for the dual space $V^{*}$ .

We are interested in mapping a point $x$ to the nearest point in a set $C$ . In general, this problem may have zero, one or multiple solutions. However, in the special case where $C$ is a nonempty, closed and convex set, then there is exactly one such point in $C$ which is nearest to a given point $x$ . This nearest point is called the projection of $x$ on to $C$ .

Main references for this section are [6, 9].

10.2.1. Projection Theorem#

Theorem 10.6 (Projection theorem)

Let $C$ be a nonempty, closed and convex subset of $V$ . For every $x \in V$ , there exists a unique vector that minimizes $‖ z - x ‖$ over all $z \in C$ . This vector is called the projection of $x$ on $C$ .

Proof. We fix some $x \in V$ and choose an element $w \in C$ .

Consider the function

$g (z) = \frac{1}{2} ‖ z - x ‖^{2} .$
Minimizing $‖ z - x ‖$ over all $C$ is equivalent to minimizing $g (z)$ over the set

$D = {z \in C | ‖ z - x ‖ \leq ‖ w - x ‖} .$
We note that $D$ is a compact set and $g$ is a l.s.c., closed and coercive function.
By Weierstrass’ theorem Corollary 8.2, the set of minimizers for $g$ is nonempty and compact.

We now show that the minimizer is unique.

$g$ is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.
Hence, the minimizer is unique due to Theorem 10.2.

10.2.2. Orthogonal Projection#

Definition 10.6 (Orthogonal Projection Mapping)

Let $C$ be a nonempty, closed and convex subset of $V$ . The orthogonal projection mapping $P_{C} : V \to V$ is defined by:

P_{C} (x) ≜ \underset{y \in C}{\arg \min} ‖ y - x ‖ \forall x \in V .

This mapping is well defined since the projection is unique for a nonempty, closed and convex set $C$ due to Theorem 10.6.

The vector $P_{C} (x)$ is called the projection of $x$ on the set $C$ .

10.2.3. Characterization#

Theorem 10.7 (Orthogonal projection characterization)

Let $C$ be a nonempty, closed and convex subset of $V$ . For every vector $x \in V$ , a vector $z \in C$ is its projection if and only if

⟨ y - z, x - z ⟩ \leq 0 \forall y \in C .

This result is also known as the second projection theorem [6].

Proof. Assume that for some $z \in C$ , $⟨ y - z, x - z ⟩ \leq 0 \forall y \in C$ holds true.

For any $y \in C$

$\begin{aligned} ‖ y - x ‖^{2} & = ‖ (y - z) - (x - z) ‖^{2} \\ = ‖ y - z ‖^{2} + ‖ z - x ‖^{2} - 2 ⟨ y - z, x - z ⟩ \\ \geq ‖ z - x ‖^{2} - 2 ⟨ y - z, x - z ⟩ . \end{aligned}$
Thus,

$‖ y - x ‖^{2} \geq ‖ z - x ‖^{2} \forall y \in C .$
Thus, $z$ is indeed the projection of $x$ on $C$ .

Conversely, assume that $z$ is the projection of $x$ on $C$ .

Let $y \in C$ be arbitrary.
For any $t \geq 0$ , define $y_{t} = t y + (1 - t) z$ .
Then, we have

$\begin{aligned} ‖ x - y_{t} ‖^{2} & = ‖ t x + (1 - t) x + - t y - (1 - t) z ‖^{2} \\ = ‖ t (x - y) + (1 - t) (x - z) ‖^{2} \\ = t^{2} ‖ x - y ‖^{2} + (1 - t)^{2} ‖ x - z ‖^{2} + 2 t (1 - t) ⟨ x - y, x - z ⟩ . \end{aligned}$
Viewing $‖ x - y_{t} ‖^{2}$ as a function of $t$ and differentiating w.r.t. $t$ , we have

$\begin{aligned} {\frac{d}{d t} ‖ x - y_{t} ‖^{2} |}_{t = 0} & = - 2 ‖ x - z ‖^{2} + 2 ⟨ x - y, x - z ⟩ \\ = - 2 ⟨ x - z, x - z ⟩ + 2 ⟨ x - y, x - z ⟩ \\ = - 2 ⟨ y - z, x - z ⟩ . \end{aligned}$
Since $t = 0$ minimizes $‖ x - y_{t} ‖^{2}$ over $t \in [0, 1]$ , we must have

${\frac{d}{d t} ‖ x - y_{t} ‖^{2} |}_{t = 0} \geq 0.$
Thus, we require that

$⟨ y - z, x - z ⟩ \leq 0$

must hold true for every $y \in C$ .

Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where $V = R^{n}$ .

Proof. Define a function $f : R^{n} \to R$ as

f (y) = ‖ y - x ‖^{2} .

Then, the projection problem can be cast as an optimization problem

\begin{aligned} minimize & f (y) \\ subject to & y \in C . \end{aligned}

Note that the gradient of $f$ is given by

\nabla f (y) = \nabla ⟨ y - x, y - x ⟩ = \nabla (⟨ y, y ⟩ - 2 ⟨ y, x ⟩ + ⟨ x, x ⟩) = 2 (y - x) .

By Theorem 10.47, $z$ is an optimal solution if and only if

f (z)^{T} (y - z) \geq 0 \forall y \in C .

In other words

2 (z - x)^{T} (y - z) \geq 0 \forall y \in C .

We can simplify this as

⟨ x - z, y - z ⟩ \leq 0 \forall y \in C .

Theorem 10.8 (Orthogonal projection on an affine subspace)

Let $C$ be an affine subspace of $V$ . Let $S$ be the linear subspace parallel to $C$ . For every vector $x \in V$ , a vector $z \in C$ is its projection if and only if

x - z \in S^{⊥} .

Proof. Since $C$ is an affine subspace of $V$ , hence $C$ is nonempty, convex and closed (as $V$ is finite dimensional).

By Theorem 10.7, $z$ is the projection of $x$ on $C$ if and only if for every $y \in C$ , we have

$⟨ y - z, x - z ⟩ \leq 0.$
But $y \in C$ if and only if $y - z \in S$ .
Hence the condition is equivalent to

$⟨ w, x - z ⟩ \leq 0 \forall w \in S .$
But then, it must be an equality since $w$ and $- w$ both belong to $S$ . Thus, we have

$⟨ w, x - z ⟩ = 0 \forall w \in S .$
In other words, $x - z \in S^{⊥}$ .

10.2.4. Distance Function#

Recall that the distance of a point $x \in V$ from a set $C$ is defined as

d_{C} (x) ≜ inf_{y \in C} ‖ x - y ‖ .

Theorem 10.9 (Distance function for nonempty, closed and convex set)

Let $C$ be a nonempty, closed and convex subset of $V$ . Then the function $d_{C} : V \to R$ defining the distance of a point $x \in V$ from the set $C$ satisfies

d_{C} (x) = ‖ x - P_{C} (x) ‖ .

Proof. By Theorem 10.6, there exists a unique point $P_{C} (x)$ which minimizes the distance between $x$ and $C$ . Hence

d_{C} (x) = ‖ x - P_{C} (x) ‖

must hold true.

Theorem 10.10 (Distance function for nonempty, closed and convex set is convex)

Let $C$ be a nonempty, closed and convex subset of $V$ . Let $d_{C} : V \to R$ be the distance to the set $C$ function as defined in Theorem 10.9. Then, $d_{C}$ is convex.

Proof. Assume, for contradiction, that $d_{C}$ is not convex.

Then, there exist $x, y \in V$ and $t \in (0, 1)$ such that

$d_{C} (t x + (1 - t) y) > t d_{C} (x) + (1 - t) d_{C} (y) .$
Let $u = P_{C} (x)$ and $v = P_{C} (y)$ . By definition, $u, v \in C$ .
Then,

$t d_{C} (x) + (1 - t) d_{C} (y) = t ‖ u - x ‖ + (1 - t) ‖ v - y ‖ .$
Since $C$ is convex, hence, $t u + (1 - t) v \in C$ .
Since, $d_{C} (t x + (1 - t) y)$ minimizes the distance of $C$ from the point $t x + (1 - t) y$ , hence

$‖ t u + (1 - t) v - t x - (1 - t) y ‖ \geq d_{C} (t x + (1 - t) y) .$
Rewriting,

$d_{C} (t x + (1 - t) y) \leq ‖ t (u - x) + (1 - t) (v - y) ‖ \leq t ‖ u - x ‖ + (1 - t) ‖ v - y ‖$

due to triangle inequality.
But, this leads to the contradiction

$t ‖ u - x ‖ + (1 - t) ‖ v - y ‖ < d_{C} (t x + (1 - t) y) \leq t ‖ u - x ‖ + (1 - t) ‖ v - y ‖ .$
Hence, $d_{C}$ must be convex.

10.2.5. Nonexpansiveness#

Definition 10.7 (Nonexpansiveness property)

Let $V$ be a normed linear space. An operator $T : V \to V$ is called nonexpansive if

‖ T (x) - T (y) ‖ \leq ‖ y - x ‖ \forall x, y \in V .

In other words, the distance between mapped points in $V$ is always less than or equal to the distance between original points in $V$ .

Theorem 10.11 (Nonexpansive operators are Lipschitz continuous)

Let $V$ a be normed linear space. A nonexpansive operator $T : V \to V$ is Lipschitz continuous. Hence, it is uniformly continuous.

Proof. Recall from Definition 3.45 that if $T$ is a Lipschitz map, then there exists $L > 0$ such that

‖ T (x) - T (y) ‖ \leq L ‖ x - y ‖

for every $x, y \in V$ .

For a nonexpansive operator, such $L = 1$ . This $T$ is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.

Definition 10.8 (Firm nonexpansiveness property)

Let $V$ be a real inner product space. An operator $T : V \to V$ is called firmly nonexpansive if

⟨ T (x) - T (y), x - y ⟩ \geq ‖ T (x) - T (y) ‖^{2}

holds true for every $x, y \in V$ .

Theorem 10.12 (A firmly nonexpansive operator is nonexpansive)

Let $V$ be a real inner product space. Let $T : V \to V$ be a firmly nonexpansive operator. Then, $T$ is nonexpansive.

Proof. For every $x, y \in V$ , we have

‖ T (x) - T (y) ‖^{2} \leq ⟨ T (x) - T (y), x - y ⟩ .

Applying Cauchy Schwartz inequality on R.H.S., we get

‖ T (x) - T (y) ‖^{2} \leq ‖ T (x) - T (y) ‖ ‖ x - y ‖ .

Canceling terms, we get:

‖ T (x) - T (y) ‖ \leq ‖ x - y ‖

which is the nonexpansive property.

Theorem 10.13 (Orthogonal projection is nonexpansive)

Let $C$ be a nonempty, closed and convex subset of $V$ . Let $P_{C} : V \to V$ be the orthogonal projection operator as defined in Definition 10.6 is nonexpansive (and therefore continuous).

In other words,

‖ P_{C} (x) - P_{C} (y) ‖ \leq ‖ x - y ‖ \forall x, y \in V .

Proof. Let $x, y \in V$ .

By Theorem 10.7,

$⟨ w - P_{C} (x), x - P_{C} (x) ⟩ \leq 0 \forall w \in C .$
In particular $P_{C} (y) \in C$ . Hence,

$⟨ P_{C} (y) - P_{C} (x), x - P_{C} (x) ⟩ \leq 0.$
Similarly, starting with $P_{C} (x)$ , we obtain

$⟨ P_{C} (x) - P_{C} (y), y - P_{C} (y) ⟩ \leq 0.$
Adding these two inequalities, we obtain

$⟨ P_{C} (y) - P_{C} (x), x - P_{C} (x) - y + P_{C} (y) ⟩ \leq 0.$
By rearranging the terms, we get

⟨ P_{C} (y) - P_{C} (x), P_{C} (y) - P_{C} (x) ⟩ \leq ⟨ P_{C} (y) - P_{C} (x), y - x ⟩ .

Applying the Cauchy Schwartz inequality on the R.H.S., we obtain

‖ P_{C} (y) - P_{C} (x) ‖^{2} \leq ‖ P_{C} (y) - P_{C} (x) ‖ ‖ y - x ‖ .

Thus, $P_{C}$ is nonexpansive.
Since $P_{C}$ is nonexpansive, hence $P_{C}$ is continuous also.

Theorem 10.14 (Orthogonal projection is firmly nonexpansive)

Let $C$ be a nonempty, closed and convex subset of $V$ . Let $P_{C} : V \to V$ be the orthogonal projection operator as defined in Definition 10.6. Then $P_{C}$ is a firmly nonexpansive operator.

In other words,

⟨ P_{C} (x) - P_{C} (y), x - y ⟩ \geq ‖ P_{C} (x) - P_{C} (y) ‖^{2}

holds true for every $x, y \in V$ .

Proof. Recall from Theorem 10.7 that for any $u \in V$ and $v \in C$

⟨ v - P_{C} (u), u - P_{C} (u) ⟩ \leq 0.

Substituting $u = x$ and $v = P_{C} (y)$ , we obtain

$⟨ P_{C} (y) - P_{C} (x), x - P_{C} (x) ⟩ \leq 0.$
Substituting $u = y$ and $v = P_{C} (x)$ , we obtain

$⟨ P_{C} (x) - P_{C} (y), y - P_{C} (y) ⟩ \leq 0.$
Adding the two inequalities gives us

$\begin{aligned} ⟨ P_{C} (x) - P_{C} (y), y - P_{C} (y) - x + P_{C} (x) ⟩ \leq 0 \\ ⟺ ⟨ P_{C} (x) - P_{C} (y), (y - x) + (P_{C} (x) - P_{C} (y)) ⟩ \leq 0 \\ ⟺ ‖ P_{C} (x) - P_{C} (y) ‖^{2} \leq ⟨ P_{C} (x) - P_{C} (y), x - y ⟩ \end{aligned}$

as desired.

10.2.6. Squared Distance Function#

Definition 10.9 (Squared distance function to a nonempty set)

Let $C$ be a nonempty subset of $V$ . The squared distance to set $C$ function denoted as $φ_{C} : V \to R$ is defined as:

φ_{C} (x) ≜ \frac{1}{2} d_{C}^{2} (x) .

We also define $ψ_{C} : V \to R$ as:

ψ_{C} (x) ≜ \frac{1}{2} (‖ x ‖^{2} - d_{C}^{2} (x)) .

Theorem 10.15 (Expression for $ψ_{C}$ )

Let $C$ be a nonempty subset of $V$ . Then, the function $ψ_{C}$ as defined in Definition 10.9 is given by

ψ_{C} (x) = sup_{y \in C} [⟨ y, x ⟩ - \frac{1}{2} ‖ y ‖^{2}] .

Proof. We proceed as follows.

Expanding on the definition of $d_{C}^{2}$

$\begin{aligned} d_{C}^{2} (x) & = inf_{y \in C} ‖ x - y ‖^{2} \\ = inf_{y \in C} ⟨ x - y, x - y ⟩ \\ = inf_{y \in C} (‖ x ‖^{2} - 2 ⟨ x, y ⟩ + ‖ y ‖^{2}) \\ = inf_{y \in C} (‖ x ‖^{2} - (2 ⟨ x, y ⟩ - ‖ y ‖^{2})) \\ = ‖ x ‖^{2} - sup_{y \in C} (2 ⟨ x, y ⟩ - ‖ y ‖^{2}) . \end{aligned}$
Thus,

$‖ x ‖^{2} - d_{C}^{2} (x) = sup_{y \in C} (2 ⟨ x, y ⟩ - ‖ y ‖^{2}) .$
Thus,

$ψ_{C} (x) = \frac{1}{2} (‖ x ‖^{2} - d_{C}^{2} (x)) = sup_{y \in C} [⟨ x, y ⟩ - \frac{1}{2} ‖ y ‖^{2}] .$

Theorem 10.16 ( $ψ_{C}$ is convex)

Let $C$ be a nonempty subset of $V$ . Then, the function $ψ_{C}$ as defined in Definition 10.9 is convex.

Beauty of this result is the fact that $ψ_{C}$ is convex irrespective of whether $C$ is convex or not.

Proof. We proceed as follows.

For every $y \in C$ , the function $g_{y} : V \to R$ , given by

$g_{y} (x) = ⟨ y, x ⟩ - \frac{1}{2} ‖ y ‖^{2}$

is an affine function.
$g_{y}$ is convex for every $y \in C$ due to Theorem 9.68.
Now,

$ψ_{C} (y) = sup y \in C g_{y} .$
Thus, $ψ_{C}$ is a pointwise supremum of convex functions.
Thus, by Theorem 9.114, $ψ_{C}$ is convex.

Theorem 10.17 (Squared distance function for nonempty, closed and convex sets)

Let $C$ be a nonempty, closed and convex subset of $V$ . Then, the squared distance function $φ_{C}$ is given by

φ_{C} (x) = \frac{1}{2} ‖ x - P_{C} (x) ‖^{2} .

This follows directly from Theorem 10.10.

10.2.7. Gradients and Subgradients#

Theorem 10.18 (Gradient of the squared distance function)

Let $C$ be a nonempty, closed and convex subset of $V$ . The gradient of the squared distance function $φ_{C}$ as defined in Definition 10.9 at $x \in V$ is given by:

\nabla φ_{C} (x) = x - P_{C} (x) \forall x \in V .

Proof. We proceed as follows.

Let $x \in V$ .
Let $z_{x} = x - P_{C} (x)$ .
Consider the function

$g_{x} (d) = φ_{C} (x + d) - φ_{C} (x) - ⟨ d, z_{x} ⟩ .$
If

$lim_{d \to 0} \frac{g_{x} (d)}{‖ d ‖} = 0$

then $z_{x}$ is indeed the gradient of $φ_{C}$ at $x$ .
By definition of orthogonal projection, for any $d \in V$ ,

$‖ x + d - P_{C} (x + d) ‖^{2} \leq ‖ x + d - P_{C} (x) ‖^{2}$

as $P_{C} (x + d)$ is the nearest point to $x + d$ in $C$ . $P_{C} (x)$ is just another point in $C$ .
Thus, for any $d \in V$

$\begin{aligned} g_{x} (d) & = φ_{C} (x + d) - φ_{C} (x) - ⟨ d, z_{x} ⟩ \\ = \frac{1}{2} ‖ x + d - P_{C} (x + d) ‖^{2} - \frac{1}{2} ‖ x - P_{C} (x) ‖^{2} - ⟨ d, z_{x} ⟩ \\ \leq \frac{1}{2} ‖ x + d - P_{C} (x) ‖^{2} - \frac{1}{2} ‖ x - P_{C} (x) ‖^{2} - ⟨ d, z_{x} ⟩ . \end{aligned}$
Recall that for a norm induced by the inner product

$‖ a + b ‖^{2} = ⟨ a + b, a + b ⟩ = ‖ a ‖^{2} + 2 ⟨ a, b ⟩ + ‖ b ‖^{2} .$
Thus,

$\begin{aligned} ‖ x + d - P_{C} (x) ‖^{2} & = ‖ d + (x - P_{C} (x)) ‖^{2} \\ = ‖ d ‖^{2} + ‖ x - P_{C} (x) ‖^{2} + 2 ⟨ d, x - P_{C} (x) ⟩ . \end{aligned}$
Putting it back and simplifying, we obtain

$g_{x} (d) \leq \frac{1}{2} ‖ d ‖^{2} + ⟨ d, x - P_{C} (x) ⟩ - ⟨ d, z_{x} ⟩ = \frac{1}{2} ‖ d ‖^{2} .$
Proceeding similarly, we also have

$g_{x} (- d) \leq \frac{1}{2} ‖ d ‖^{2} .$
Since $φ_{C}$ is convex, hence $g_{x}$ is also convex.
Thus,

$0 = g_{x} (0) = g_{x} (\frac{1}{2} d + \frac{1}{2} (- d)) \leq \frac{1}{2} g_{x} (d) + \frac{1}{2} g_{x} (- d) .$
Thus,

$g_{x} (d) \geq - g_{x} (- d) \geq - \frac{1}{2} ‖ d ‖^{2} .$
Combining, we have

$- \frac{1}{2} ‖ d ‖^{2} \leq g_{x} (d) \leq \frac{1}{2} ‖ d ‖^{2} .$
Or, in terms of absolute values.

$| g_{x} (d) | \leq \frac{1}{2} ‖ d ‖^{2} .$
Then,

$\frac{| g_{x} (d) |}{‖ d ‖} \leq \frac{1}{2} ‖ d ‖ .$
Thus,

$lim_{d \to 0} \frac{g_{x} (d)}{‖ d ‖} = 0$
Thus, $z_{x} = x - P_{C} (x)$ is indeed the gradient of $φ_{C}$ at $x$ .

Theorem 10.19 (Gradient of $ψ_{C}$ .)

Let $C$ be a nonempty, closed and convex subset of $V$ . The gradient of the function $ψ_{C}$ as defined in Definition 10.9 at $x \in V$ is given by:

\nabla ψ_{C} (x) = P_{C} (x) \forall x \in V .

Proof. We have

ψ_{C} (x) = \frac{1}{2} (‖ x ‖^{2} - d_{C}^{2} (x)) = \frac{1}{2} (‖ x ‖^{2} - φ_{C} (x) .

Hence,

\nabla ψ_{C} (x) = x - \nabla φ_{C} (x) = x - (x - P_{C} (x)) = P_{C} (x) .

Remark 10.5 (Distance function and square distance function relation)

We note that $φ_{C} = g \circ d_{C}$ where $g (t) = \frac{1}{2} [t]_{+}^{2}$ .

$g$ is a nonincreasing real-valued convex differentiable function. We also note that

g^{'} (t) = 2 [t]_{+} .

Theorem 10.20 (Subdifferential of the distance function)

Let $C$ be a nonempty, closed and convex subset of $V$ . The subdifferential of the distance function $d_{C}$ is given by

\begin{array}{r} \partial d_{C} (x) = {\begin{cases} {\frac{x - P_{C} (x)}{d_{C} (x)}}, & x \notin C \\ N_{C} (x) \cap B [0, 1], & x \in C \end{cases} . \end{array}

$N_{C} (x)$ denotes the normal cone of all vectors normal to the set $C$ at a point $x \in C$ .

Since $\partial d_{C}$ is a singleton for $x \notin C$ , hence $d_{C}$ is differentiable at $x \notin C$ .

Proof. We can get the subdifferentials for $d_{C}$ by applying the chain rule.

Recall that $φ_{C} = g \circ d_{C}$ where $g (t) = \frac{1}{2} [t]_{+}^{2}$ .
Thus, by subdifferential chain rule (Theorem 9.229):

$\partial φ_{C} (x) = g^{'} (d_{C} (x)) \partial d_{C} (x) = [d_{C} (x)]_{+} \partial d_{C} (x) = d_{C} (x) \partial d_{C} (x) .$

We used the fact that $d_{C}$ is nonnegative.
Since $φ_{C}$ is differentiable, hence $\partial φ_{C} (x) = {x - P_{C} (x)}$ .
If $x \notin C$ , then, $d_{C} (x) > 0$ .
Thus, for $x \notin C$

$\partial d_{C} (x) = {\frac{x - P_{C} (x)}{d_{C} (x)}} .$
For $x \in C$ , $d_{C} (x) = 0$ .
We need to show that $\partial d_{C} (x) = N_{C} (x) \cap B [0, 1]$ in this case.
Consider any $d \in \partial d_{C} (x)$ .
Then, by subgradient inequality

$d_{C} (y) \geq d_{C} (x) + ⟨ y - x, d ⟩ = ⟨ y - x, d ⟩ \forall y \in V$

since $d_{C} (x) = 0$ .
Then, in particular, for any $y \in C$

$d_{C} (y) = 0 \geq ⟨ y - x, d ⟩ .$
Thus, $d \in N_{C} (x)$ .

10.2.8. Conjugates#

Theorem 10.21 (Conjugate of norm squared plus indicator)

Let $C$ be a nonempty subset of $V$ . Let $f : V \to (- \infty, \infty]$ be given by:

f (x) = \frac{1}{2} ‖ x ‖^{2} + δ_{C} (x) .

Then, its conjugate is given by:

f^{*} (y) = \frac{1}{2} ‖ y ‖^{2} - \frac{1}{2} d_{C}^{2} (y) = ψ_{C} (y) .

Further, if $C$ is nonempty, closed and convex, then $f^{* *} = f$ . In other words, $ψ_{C}^{*} = f$ .

Proof. Recall from Definition 9.78 that

f^{*} (y) = sup_{x \in V} {⟨ x, y ⟩ - f (x)} \forall y \in V^{*} .

Expanding, we obtain

\begin{aligned} sup_{x \in V} {⟨ x, y ⟩ - f (x)} & = sup_{x \in V} {⟨ x, y ⟩ - \frac{1}{2} ‖ x ‖^{2} - δ_{C} (x)} \\ = sup_{x \in C} {⟨ x, y ⟩ - \frac{1}{2} ‖ x ‖^{2}} \\ = ψ_{C} (y) . \end{aligned}

The last result is due to Theorem 10.15.

If $C$ is nonempty, closed and convex, then, $f$ is proper closed and convex. Then, due to Theorem 9.242, the biconjugate of $f$ is $f$ itself.

10.2.9. Smoothness#

Recall from Definition 9.80 that a function $f : V \to R$ is $L$ -smooth if

‖ \nabla f (x) - \nabla f (y) ‖_{*} \leq L ‖ x - y ‖ \forall x, y \in V .

Since the norm in this section is induced by the inner product, hence it is self dual.

Thus, the smoothness criteria becomes:

‖ \nabla f (x) - \nabla f (y) ‖_{*} \leq L ‖ x - y ‖ \forall x, y \in V .

Theorem 10.22 (Smoothness of the squared distance function)

The squared distance function $φ_{C}$ is 1-smooth.

Proof. Recall the definition of $φ_{C}$ from Definition 10.9.

By Theorem 10.18,

$\nabla φ_{C} (x) = x - P_{C} (x) .$
Hence,

$\begin{aligned} ‖ \nabla φ_{C} (x) - \nabla φ_{C} (y) ‖^{2} \\ = ‖ x - P_{C} (x) - y + P_{C} (y) ‖^{2} \\ = ‖ (x - y) - (P_{C} (x) - P_{C} (y)) ‖^{2} \\ = ‖ x - y ‖^{2} - 2 ⟨ P_{C} (x) - P_{C} (y), x - y ⟩ + ‖ P_{C} (x) - P_{C} (y) ‖^{2} \\ \leq ‖ x - y ‖^{2} - 2 ‖ P_{C} (x) - P_{C} (y) ‖^{2} + ‖ P_{C} (x) - P_{C} (y) ‖^{2} & firm nonexpansiveness \\ = ‖ x - y ‖^{2} - ‖ P_{C} (x) - P_{C} (y) ‖^{2} \\ \leq ‖ x - y ‖^{2} . \end{aligned}$
Thus,

$‖ \nabla φ_{C} (x) - \nabla φ_{C} (y) ‖ \leq ‖ x - y ‖ .$
Thus, $φ_{C}$ is 1-smooth.

Theorem 10.23 (Smoothness of the $ψ_{C}$ function)

The function $ψ_{C}$ is 1-smooth.

Proof. We recall from Theorem 10.19 that $\nabla ψ_{C} (x) = P_{C} (x)$ .

Hence,

‖ \nabla ψ_{C} (x) - \nabla ψ_{C} (y) ‖ = ‖ P_{C} (x) - P_{C} (y) ‖ \leq ‖ x - y ‖

due to the nonexpansiveness property (Theorem 10.13).

10.2.10. POCS Problems#

In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.

10.2.10.1. Equality Constrained Quadratic Programming#

Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.

Example 10.1 (Equality constrained quadratic programming)

We consider the quadratic programming problem

\begin{aligned} minimize & \frac{1}{2} ‖ x ‖^{2} + c^{T} x \\ subject to & A x = 0 . \end{aligned}

where

$x \in R^{n}$ is the optimization variable.
$c \in R^{n}$ is a given vector.
$A \in R^{m \times n}$ is an $m \times n$ matrix of rank $m$ . Assume that $m < n$ .

We proceed towards converting this problem into a projection on a convex set problem as follows.

By adding a constant term $\frac{1}{2} ‖ c ‖^{2}$ to the objective function, we obtain an equivalent problem.

$\begin{aligned} minimize & \frac{1}{2} ‖ c + x ‖^{2} \\ subject to & A x = 0 . \end{aligned}$
The set $C = {x | A x = 0}$ is the null space of the matrix $A$ which is linear subspace, hence a nonempty, closed and convex set.
Minimizing $\frac{1}{2} ‖ c + x ‖^{2}$ is equivalent to minimizing $‖ (- c) - x ‖$ subject to $x \in C$ .
$‖ (- c) - x ‖$ is $d (- c, x)$ , the distance between $- c$ and a point $x$ .
Thus, we are minimizing the distance of the point $- c$ among $x \in C$ .
This is nothing but the distance of $- c$ from the set $C$ .
Since, $C$ is nonempty, close and convex, hence, there is a unique $x^{*}$ which minimizes the distance due to the projection theorem.
Thus, the solution is the projection of the vector $- c$ on the subspace $C$ .
By Theorem 10.8, $x^{*}$ is the unique projection of $- c$ on $C$ if and only if $- c - x^{*} \in C^{⊥}$ .
In other words, $ $⟨ c + x^{*}, x ⟩ = 0 \forall x \in C .$ $
A closed form solution to this problem does exist given by

$x^{*} = - (I - A^{T} (A A^{T})^{- 1} b A) c .$
It is indeed the unique solution to this quadratic programming problem.

Topics in Signal Processing

Projection on Convex Sets

Contents

10.2. Projection on Convex Sets#

10.2.1. Projection Theorem#

10.2.2. Orthogonal Projection#

10.2.3. Characterization#

10.2.4. Distance Function#

10.2.5. Nonexpansiveness#

10.2.6. Squared Distance Function#

10.2.7. Gradients and Subgradients#

10.2.8. Conjugates#

10.2.9. Smoothness#

10.2.10. POCS Problems#

10.2.10.1. Equality Constrained Quadratic Programming#