9.18. Smoothness#

Primary references for this section are [7].

9.18.1. L-Smooth Functions#

Definition 9.80 (L-smooth functions)

For some $L \geq 0$ , a function $f : V \to (- \infty, \infty]$ is called $L$ -smooth over a set $D \subseteq V$ if it is differentiable over $D$ and satisfies

‖ \nabla f (x) - \nabla f (y) ‖_{*} \leq L ‖ x - y ‖ \forall x, y \in D .

The constant $L$ is called the smoothness parameter.

Since $f$ is differentiable over $D$ , hence $D \subseteq int dom f$ .
If $f$ is $L$ -smooth over the entire $V$ , we simply say that $f$ is $L$ -smooth.
$L$ -smooth functions are also known as functions with Lipschitz gradient with constant $L$ .
The class of functions which are $L$ -smooth over a set $D$ is denoted by $C_{L}^{1, 1} (D)$ .
When $D = V$ , then the class is simply denoted as $C_{L}^{1, 1}$ .
The class of functions which are $L$ -smooth for some $L \geq 0$ (but $L$ may not be known), is denoted by $C^{1, 1}$ .
By definition, if a function is $L_{1}$ -smooth, then it is $L_{2}$ -smooth for every $L_{2} \geq L_{1}$ .
Thus, it is often useful to identify the smallest possible value of $L$ for which a function is $L$ -smooth.

Example 9.82 (Zero smoothness of affine functions)

Let $b \in V^{*}$ and $c \in R$ . Let $f : V \to R$ be given by:

f (x) = ⟨ x, b ⟩ + c .

Then, $f$ is $0$ -smooth.

To see this, we note that $\nabla f (x) = b$ . Consequently,

‖ \nabla f (x) - \nabla f (y) ‖_{*} = ‖ b - b ‖_{*} = ‖ 0 ‖_{*} = 0 \leq 0 ‖ x - y ‖ .

Theorem 9.263 (Smoothness of quadratic functions)

Let $A \in S^{n}$ , $b \in R^{n}$ and $c \in R$ . Assume that $R^{n}$ is endowed with $ℓ_{p}$ -norm for some $1 \leq p \leq \infty$ . Let $f : R^{n} \to R$ be given by:

f (x) = \frac{1}{2} x^{T} A x + b^{T} x + c .

Then, $f$ is $L$ -smooth with $L = ‖ A ‖_{p, q}$ where $q \in [1, \infty]$ is the conjugate exponent satisfying

\frac{1}{p} + \frac{1}{q} = 1

and $‖ A ‖_{p, q}$ is the induced norm given by

‖ A ‖_{p, q} = sup {‖ A x ‖_{q} | ‖ x ‖_{p} \leq 1} .

Proof. We note that the dual norm of $ℓ_{p}$ norm is the $ℓ_{q}$ norm. Now,

\begin{aligned} ‖ \nabla f (x) - \nabla f (y) ‖_{*} \\ = ‖ \nabla f (x) - \nabla f (y) ‖_{q} \\ = ‖ A x + b - A y - b ‖_{q} \\ = ‖ A x - A y ‖_{q} \\ \leq ‖ A ‖_{p, q} ‖ x - y ‖_{p} . \end{aligned}

Thus, $f$ is $‖ A ‖_{p, q}$ smooth.

We next show that $‖ A ‖_{p, q}$ is the smallest smoothness parameter for $f$ .

Assume that $f$ is $L$ smooth for some $L$ .
By definition of $‖ A ‖_{p, q}$ , there exists a vector $\tilde{x}$ such that $‖ \tilde{x} ‖_{p} = 1$ and

$‖ A \tilde{x} ‖_{q} = ‖ A ‖_{p, q} ‖ \tilde{x} ‖_{p} = ‖ A ‖_{p, q} .$
Then,

$\begin{aligned} ‖ A ‖_{p, q} & = ‖ A \tilde{x} ‖_{q} \\ = ‖ A \tilde{x} + b - A 0 - b ‖_{q} \\ = ‖ \nabla f (\tilde{x}) - \nabla f (0) ‖_{q} \\ \leq L ‖ \tilde{x} - 0 ‖_{p} = L . \end{aligned}$
Thus, $‖ A ‖_{p, q} \leq L$ .

Thus, $‖ A ‖_{p, q}$ is indeed the smallest smoothness parameter for $L$ .

9.18.1.1. Descent Lemma#

Theorem 9.264 (Descent lemma)

Let $f : V \to (- \infty, \infty]$ be $L$ -smooth for some $L \geq 0$ over some convex set $D$ . Then for any $x, y \in D$ ,

f (y) \leq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{L}{2} ‖ x - y ‖^{2} .

Proof. we proceed as follows:

By the fundamental theorem of calculus

$f (y) - f (x) = \int_{0}^{1} ⟨ y - x, \nabla f (x + t (y - x)) ⟩ d t .$
By adding and subtracting $⟨ y - x, \nabla f (x) ⟩$ , we get:

$f (y) - f (x) = ⟨ y - x, \nabla f (x) ⟩ + \int_{0}^{1} ⟨ y - x, \nabla f (x + t (y - x)) - \nabla f (x) ⟩ d t .$
This gives us

$\begin{aligned} | f (y) - f (x) - ⟨ y - x, \nabla f (x) ⟩ | \\ = | \int_{0}^{1} ⟨ y - x, \nabla f (x + t (y - x)) - \nabla f (x) ⟩ d t | \\ \leq \int_{0}^{1} | ⟨ y - x, \nabla f (x + t (y - x)) - \nabla f (x) ⟩ | d t \\ \leq \int_{0}^{1} ‖ y - x ‖ ‖ \nabla f (x + t (y - x)) - \nabla f (x) ‖_{*} d t & (a) \\ \leq \int_{0}^{1} ‖ y - x ‖ t L ‖ y - x ‖ d t & (b) \\ = \int_{0}^{1} t L ‖ y - x ‖^{2} d t \\ = L ‖ y - x ‖^{2} \int_{0}^{1} t d t \\ = \frac{L}{2} ‖ y - x ‖^{2} . \end{aligned}$
- (a) is an application of Generalized Cauchy Schwartz inequality (Theorem 4.108}).
- (b) is the application of $L$ -smoothness of $f$ (Definition 9.80).
Thus,

$\begin{aligned} | f (y) - f (x) - ⟨ y - x, \nabla f (x) ⟩ | \leq \frac{L}{2} ‖ y - x ‖^{2} \\ ⟹ f (y) - f (x) - ⟨ y - x, \nabla f (x) ⟩ \leq \frac{L}{2} ‖ y - x ‖^{2} \\ ⟹ f (y) \leq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{L}{2} ‖ y - x ‖^{2} . \end{aligned}$

9.18.1.2. Characterization of $L$ -Smooth Functions#

Theorem 9.265 (Characterization of $L$ -smooth functions)

Let $f : V \to R$ be convex and differentiable over $V$ . Let $L > 0$ . The following claims are equivalent:

$f$ is $L$ -smooth.
$f (y) \leq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{L}{2} ‖ x - y ‖^{2} \forall x, y \in V$ .
$f (y) \geq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{1}{2 L} ‖ \nabla f (x) - \nabla f (y) ‖_{*}^{2} \forall x, y \in V$ .
$⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ \geq \frac{1}{L} ‖ \nabla f (x) - \nabla f (y) ‖_{*}^{2} \forall x, y \in V$ .
$f (t x + (1 - t) y) \geq t f (x) + (1 - t) f (y) - \frac{L}{2} t (1 - t) ‖ x - y ‖^{2} \forall x, y \in V, t \in [0, 1]$ .

Proof. (1) $⟹$ (2). This is a direct implication of the descent lemma (Theorem 9.264).

(2) $⟹$ (3)

We are given that (2) is satisfied.
If $\nabla f (x) = \nabla f (y)$ , then the inequality is trivial due to the convexity of $f$ . Hence, we consider the case where $\nabla f (x) \neq \nabla f (y)$ .
Fix a $x \in V$ .
Consider a function $g_{x} : V \to R$ given by

$g_{x} (y) = f (y) - f (x) - ⟨ y - x, \nabla f (x) ⟩ .$
Then,

$\nabla g_{x} (y) = \nabla f (y) - \nabla f (x) .$
By hypothesis in property (2), for any $z \in V$

$f (z) \leq f (y) + ⟨ z - y, \nabla f (y) ⟩ + \frac{L}{2} ‖ z - y ‖^{2} .$
Now,

$\begin{aligned} g_{x} (z) \\ = f (z) - f (x) - ⟨ z - x, \nabla f (x) ⟩ \\ \leq f (y) + ⟨ z - y, \nabla f (y) ⟩ + \frac{L}{2} ‖ z - y ‖^{2} - f (x) - ⟨ z - x, \nabla f (x) ⟩ \\ = f (y) - f (x) - ⟨ z - x, \nabla f (x) ⟩ + ⟨ z - y, \nabla f (x) ⟩ - ⟨ z - y, \nabla f (x) ⟩ \\ + ⟨ z - y, \nabla f (y) ⟩ + \frac{L}{2} ‖ z - y ‖^{2} \\ = f (y) - f (x) - ⟨ y - x, \nabla f (x) ⟩ + ⟨ z - y, \nabla f (y) - \nabla f (x) ⟩ + \frac{L}{2} ‖ z - y ‖^{2} \\ = g_{x} (y) + ⟨ z - y, \nabla g_{x} (y) ⟩ + \frac{L}{2} ‖ z - y ‖^{2} . \end{aligned}$
Thus, $g_{x}$ also satisfies the inequality in property (2).
We note in particular that $\nabla g_{x} (x) = \nabla f (x) - \nabla f (x) = 0$ .
Since $g_{x}$ is convex, hence $x$ is the global minimizer of $g_{x}$ .
In other words,

$g_{x} (x) \leq g_{x} (z) \forall z \in V .$
We can also see that $g_{x} (x) = f (x) - f (x) - ⟨ x - x, \nabla f (x) ⟩ = 0$ .
Let $y \in V$
Let $v \in V$ be the unit norm vector satisfying $‖ \nabla g_{x} (y) ‖_{*} = ⟨ v, \nabla g_{x} (y) ⟩$ .
Choose

$z = y - \frac{‖ \nabla g_{x} (y) ‖_{*}}{L} v .$
Then,

$0 = g_{x} (x) \leq g_{x} (z) = g_{x} (y - \frac{‖ \nabla g_{x} (y) ‖_{*}}{L} v) .$
Using property (2) on $g_{x} (z)$ , we get

$\begin{aligned} 0 & \leq g_{x} (z) \\ \leq g_{x} (y) + ⟨ z - y, \nabla g_{x} (y) ⟩ + \frac{L}{2} ‖ z - y ‖^{2} \\ = g_{x} (y) - \frac{‖ \nabla g_{x} (y) ‖_{*}}{L} ⟨ v, \nabla g_{x} (y) ⟩ + \frac{L}{2} {‖ \frac{‖ \nabla g_{x} (y) ‖_{*}}{L} v ‖}^{2} \\ = g_{x} (y) - \frac{‖ \nabla g_{x} (y) ‖_{*}}{L} ‖ \nabla g_{x} (y) ‖_{*} + \frac{1}{2 L} ‖ \nabla g_{x} (y) ‖_{*}^{2} \\ = g_{x} (y) - \frac{1}{2 L} ‖ \nabla g_{x} (y) ‖_{*}^{2} \\ = f (y) - f (x) - ⟨ y - x, \nabla f (x) ⟩ - \frac{1}{2 L} ‖ \nabla f (y) - \nabla f (x) ‖_{*}^{2} . \end{aligned}$
Simplifying this, we get

$f (y) \geq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{1}{2 L} ‖ \nabla f (y) - \nabla f (x) ‖_{*}^{2}$

as desired.

(3) $⟹$ (4)

For $x, y$ , the property (3) gives us:

$f (y) \geq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{1}{2 L} ‖ \nabla f (y) - \nabla f (x) ‖_{*}^{2} .$
For $y, x$ , the property (3) gives us:

$f (x) \geq f (y) + ⟨ x - y, \nabla f (y) ⟩ + \frac{1}{2 L} ‖ \nabla f (x) - \nabla f (y) ‖_{*}^{2} .$
Adding the two inequalities and canceling the term $f (x) + f (y)$ gives us

$0 \geq ⟨ x - y, \nabla f (y) - f (x) ⟩ + \frac{1}{L} ‖ \nabla f (x) - \nabla f (y) ‖_{*}^{2} .$
Rearranging, we get

$⟨ x - y, \nabla f (x) - f (y) ⟩ \geq \frac{1}{L} ‖ \nabla f (x) - \nabla f (y) ‖_{*}^{2}$

as desired.

(4) $⟹$ (1)

When $\nabla f (x) = \nabla f (y)$ , then the Lipschitz condition in (1) is trivial. Hence, we consider the case where $\nabla f (x) \neq \nabla f (y)$ .
By generalized Cauchy Schwartz inequality (Theorem 4.108)

$⟨ x - y, \nabla f (x) - f (y) ⟩ \leq ‖ x - y ‖ ‖ f (x) - f (y) ‖_{*} .$
Thus, combining with hypothesis (4), we obtain

$\frac{1}{L} ‖ \nabla f (x) - \nabla f (y) ‖_{*}^{2} \leq ‖ x - y ‖ ‖ f (x) - f (y) ‖_{*} .$
Since $\nabla f (x) \neq \nabla f (y)$ , hence $‖ f (x) - f (y) ‖_{*} > 0$ .
Canceling it from both sides, we get

$‖ \nabla f (x) - \nabla f (y) ‖_{*} \leq L ‖ x - y ‖$

as desired.

We have shown so far that (1), (2), (3) and (4) are equivalent statements. We are left with showing that (5) is equivalent to the other statements.

(2) $⟹$ (5)

Pick $x, y \in V$ and $t \in [0, 1]$ .
Let $z = t x + (1 - t) y$ .
By hypothesis (2),

$\begin{aligned} f (x) \leq f (z) + ⟨ x - z, \nabla f (z) ⟩ + \frac{L}{2} ‖ x - z ‖^{2}; \\ f (y) \leq f (z) + ⟨ y - z, \nabla f (z) ⟩ + \frac{L}{2} ‖ y - z ‖^{2} . \end{aligned}$
Note that $x - z = (1 - t) (x - y)$ and $y - z = t (y - x)$ .
Thus, the previous two inequalities are same as

$\begin{aligned} f (x) \leq f (z) + (1 - t) ⟨ x - y, \nabla f (z) ⟩ + \frac{L (1 - t)^{2}}{2} ‖ x - y ‖^{2}; \\ f (y) \leq f (z) + t ⟨ y - x, \nabla f (z) ⟩ + \frac{L t^{2}}{2} ‖ x - y ‖^{2} . \end{aligned}$
Multiplying the first inequality by $t$ , the second by $(1 - t)$ and adding, we get

$t f (x) + (1 - t) f (y) \leq f (z) + \frac{L t (1 - t)}{2} ‖ x - y ‖^{2} .$
Rearranging, we get

$f (t x + (1 - t) y) = f (z) \geq t f (x) + (1 - t) f (y) - \frac{L}{2} t (1 - t) ‖ x - y ‖^{2} .$

(5) $⟹$ (2)

Pick $x, y \in V$ and $t \in (0, 1)$ .
By hypothesis in inequality (5)

$f (t x + (1 - t) y) \geq t f (x) + (1 - t) f (y) - \frac{L}{2} t (1 - t) ‖ x - y ‖^{2} .$
Rearranging the terms, we obtain

$\begin{aligned} (1 - t) f (y) \leq f (t x + (1 - t) y) - t f (x) + \frac{L}{2} t (1 - t) ‖ x - y ‖^{2} \\ ⟺ (1 - t) f (y) \leq f (t x + (1 - t) y) - f (x) + (1 - t) f (x) + \frac{L}{2} t (1 - t) ‖ x - y ‖^{2} \\ ⟺ f (y) \leq f (x) + \frac{f (t x + (1 - t) y) - f (x)}{1 - t} + \frac{L}{2} t ‖ x - y ‖^{2} . \end{aligned}$

Division by $(1 - t)$ is fine since $(1 - t) \in (0, 1)$ .
Recalling the definition of directional derivative (Definition 9.70):

$\begin{aligned} lim_{t \to 1^{-}} \frac{f (t x + (1 - t) y) - f (x)}{1 - t} \\ = lim_{s \to 0^{+}} \frac{f ((1 - s) x + s y) - f (x)}{s} \\ = lim_{s \to 0^{+}} \frac{f (x + s (y - x)) - f (x)}{s} \\ = f^{'} (x; y - x) . \end{aligned}$
Since the previous inequality is valid for every $t \in (0, 1)$ , taking the limit to $t \to 1^{-}$ on the R.H.S., we obtain

$f (y) \leq f (x) + f^{'} (x; y - x) + \frac{L}{2} ‖ x - y ‖^{2} .$
Recall from Theorem 9.203 that $f^{'} (x; y - x) = ⟨ y - x, \nabla f (x) ⟩$ .
Thus, we get:

$f (y) \leq f (x) + ⟨ y - x, \nabla f (x) ⟩ + \frac{L}{2} ‖ x - y ‖^{2} .$

as desired.

9.18.1.3. Second Order Characterization#

We now restrict our attention to the vector space $R^{n}$ equipped with an $ℓ_{p}$ norm with $p \geq 1$ .

Theorem 9.266 ( $L$ -smoothness and the boundedness of the Hessian)

Let $f : R^{n} \to R$ be a twice continuously differentiable function over $R^{n}$ . Then, for any $L \geq 0$ , the following claims are equivalent:

$f$ is $L$ -smooth w.r.t. the $ℓ_{p}$ -norm ( $p \in [1, \infty]$ ).
$‖ \nabla^{2} f (x) ‖_{p, q} \leq L$ for any $x \in R^{n}$ where $q \geq 1$ satisfies $\frac{1}{p} + \frac{1}{q} = 1$ .

Proof. (2) $⟹$ (1)

We are given that $‖ \nabla^{2} f (x) ‖_{p, q} \leq L$ for any $x \in R^{n}$ .
By the fundamental theorem of calculus

$\begin{aligned} \nabla f (y) - \nabla f (x) & = \int_{0}^{1} \nabla^{2} f (x + t (y - x)) (y - x) d t \\ = (\int_{0}^{1} \nabla^{2} f (x + t (y - x)) d t) (y - x) . \end{aligned}$
Taking the (dual)-norm on both sides

$\begin{aligned} ‖ \nabla f (y) - \nabla f (x) ‖_{q} & = {‖ (\int_{0}^{1} \nabla^{2} f (x + t (y - x)) d t) (y - x) ‖}_{q} \\ \leq {‖ \int_{0}^{1} \nabla^{2} f (x + t (y - x)) d t ‖}_{p, q} ‖ y - x ‖_{p} \\ \leq (\int_{0}^{1} ‖ \nabla^{2} f (x + t (y - x)) ‖_{p, q} d t) ‖ y - x ‖_{p} \\ \leq (\int_{0}^{1} L d t) ‖ y - x ‖_{p} \\ = L ‖ y - x ‖_{p} . \end{aligned}$
Thus, $‖ \nabla f (y) - \nabla f (x) ‖_{q} \leq L ‖ y - x ‖_{p}$ as desired.

(1) $⟹$ (2)

We are given that $f$ is $L$ smooth with $ℓ_{p}$ norm.
By fundamental theorem of calculus, for any $d \in R^{n}$ and $s > 0$ ,

$\nabla f (x + s d) - \nabla f (x) = \int_{0}^{s} \nabla^{2} f (x + t d) d d t .$
Taking $q$ norm on both sides

${‖ (\int_{0}^{s} \nabla^{2} f (x + t d) d t) d ‖}_{q} = ‖ \nabla f (x + s d) - \nabla f (x) ‖_{q} \leq L ‖ x + s d - x ‖_{p} = s L ‖ d ‖_{p} .$
Dividing by $s$ on both sides and taking the limit $s \to 0^{+}$ , we get

$‖ \nabla^{2} f (x) d ‖_{q} \leq L ‖ d ‖_{p} .$
Since this is valid for every $d \in R^{n}$ , hence

$‖ \nabla^{2} f (x) ‖_{p, q} \leq L .$

Corollary 9.27 ( $L$ -smoothness and largest eigenvalue of Hessian)

Let $f : R^{n} \to R$ be a twice continuously differentiable convex function over $R^{n}$ . Then $f$ is $L$ -smooth w.r.t. $ℓ_{2}$ -norm if and only if

λ_{max} (\nabla^{2} f (x)) \leq L \forall x \in R^{n} .

Proof. Since $f$ is convex, hence it follows that $\nabla^{2} f (x) ⪰ O$ for every $x$ . Thus,

‖ \nabla^{2} f (x) ‖_{2, 2} = \sqrt{λ_{max} (\nabla^{2} f (x)^{2})} = λ_{max} (\nabla^{2} f (x)) .

From Theorem 9.266, $f$ is $L$ -smooth is equivalent to the condition that

λ_{max} (\nabla^{2} f (x)) = ‖ \nabla^{2} f (x) ‖_{2, 2} \leq L .

9.18.2. Strong Convexity#

Definition 9.81 (Strong convexity)

A function $f : V \to (- \infty, \infty]$ is called $σ$ -strongly convex for $σ > 0$ if $dom f$ is convex and the following holds for any $x, y \in dom f$ and $t \in [0, 1]$ :

(9.10)#

f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y) - \frac{σ}{2} t (1 - t) ‖ x - y ‖^{2} .

9.18.2.1. Strong Convexity $⟹$ Convexity#

Strongly convex functions are convex. In fact, we have a stronger result available.

Theorem 9.267 (Strong convexity and convexity)

Assume that the ambient space $V$ is Euclidean.

A function $f : V \to (- \infty, \infty]$ is $σ$ -strongly convex if and only if the function $f (\cdot) - \frac{σ}{2} ‖ \cdot ‖^{2}$ is convex.

Proof. Let us define a function $g : V \to (- \infty, \infty]$ as

g (x) = f (x) = \frac{σ}{2} ‖ x ‖^{2} .

We need to show that $f$ is $σ$ -strongly convex if and only if $g$ is convex.

We first note that $dom g = dom f$ .
Thus, $dom g$ is convex if and only if $dom f$ is convex.
Now, $g$ is convex if and only if $dom g = dom f$ is convex and for any $x, y \in dom f$ and $t \in (0, 1)$

$g (t x + (1 - t) y) \leq t g (x) + (1 - t) g (y) .$
Now,

$\begin{aligned} g (t x + (1 - t) y) \leq t g (x) + (1 - t) g (y) \\ ⟺ & f (t x + (1 - t) y) - \frac{σ}{2} ‖ t x + (1 - t) y ‖^{2} \\ \leq t f (x) + (1 - t) f (y) - \frac{σ}{2} [t ‖ x ‖^{2} + (1 - t) ‖ y ‖^{2}] \\ ⟺ & f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y) \\ + \frac{σ}{2} [‖ t x + (1 - t) y ‖^{2} - t ‖ x ‖^{2} - (1 - t) ‖ y ‖^{2}] . \end{aligned}$
Since the norm is Euclidean, hence

$\begin{aligned} ‖ t x + (1 - t) y ‖^{2} - t ‖ x ‖^{2} - (1 - t) ‖ y ‖^{2} \\ = ⟨ t x + (1 - t) y, t x + (1 - t) y ⟩ - t ‖ x ‖^{2} - (1 - t) ‖ y ‖^{2} \\ = t^{2} ‖ x ‖^{2} + (1 - t)^{2} ‖ y ‖^{2} + 2 t (1 - t) ⟨ x, y ⟩ - t ‖ x ‖^{2} - (1 - t) ‖ y ‖^{2} \\ = - t (1 - t) ‖ x ‖^{2} - t (1 - t) ‖ y ‖^{2} + 2 t (1 - t) ⟨ x, y ⟩ \\ = - t (1 - t) (‖ x ‖^{2} + ‖ y ‖^{2} - 2 ⟨ x, y ⟩) \\ = - t (1 - t) ‖ x - y ‖^{2} . \end{aligned}$
Thus, the convexity inequality for $g$ is equivalent to

$f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y) - \frac{σ}{2} t (1 - t) ‖ x - y ‖^{2}$

which is nothing but the $σ$ -strong convexity condition of $f$ .

9.18.2.2. Quadratic Functions#

Theorem 9.268 (Strong convexity of quadratic functions)

Let $A \in S^{n}$ , $b \in R^{n}$ and $c \in R$ . Let $f : R^{n} \to R$ be given by:

f (x) = \frac{1}{2} x^{T} A x + b^{T} x + c .

Then $f$ is $σ$ -strongly convex if and only if $A$ is positive definite and $σ \leq λ_{min} (A)$ .

Proof. Due to Theorem 9.267, $f$ is strongly convex with $σ > 0$ if and only if $g (x) = f (x) - \frac{σ}{2} ‖ x ‖^{2}$ is convex.

We note that

$\begin{aligned} g (x) & = \frac{1}{2} x^{T} A x + b^{T} x + c - \frac{σ}{2} ‖ x ‖^{2} \\ = \frac{1}{2} x^{T} (A - σ I) x + b^{T} x + c . \end{aligned}$
As shown in Example 9.40, $g$ is convex if and only if $A - σ I ⪰ O$ .
This is equivalent to $σ \leq λ_{min} (A)$ .

9.18.2.3. Coerciveness#

Theorem 9.269 (Strong convexity and coerciveness)

Assume that the ambient space $V$ is Euclidean. Assume that $f : V \to (- \infty, \infty]$ is a Fréchet-differentiable function. If $f$ is $σ$ -strongly convex, then it is coercive.

Proof. We proceed as follows.

Define

$g (x) = f (x) - \frac{σ}{2} ‖ x ‖^{2} .$
By Theorem 9.267, $g$ is convex.
Since $f$ is differentiable, hence $g$ is also differentiable.
Specifically, $\nabla g (x) = \nabla f (x) - σ x$ .
Fix some $x \in int dom f$ .
Then $\partial g (x) = {\nabla g (x)}$ .
By subgradient inequality, for any $y \in V$ ,

$g (y) \geq g (x) + ⟨ y - x, \nabla g (x) ⟩ .$
Expanding $g$ and $\nabla g$ :

$f (y) - \frac{σ}{2} ‖ y ‖^{2} \geq f (x) - \frac{σ}{2} ‖ x ‖^{2} + ⟨ y - x, \nabla f (x) - σ x ⟩ .$
Let $v = f (x) - σ x$ .
Rearranging terms

$f (y) \geq \frac{σ}{2} ‖ y ‖^{2} + ⟨ y, v ⟩ + K_{x}$

where $K_{x} = f (x) - \frac{σ}{2} ‖ x ‖^{2} - ⟨ x, v ⟩$ .
We note that the term $K_{x}$ depends solely on $x$ which is fixed. Hence $K_{x}$ is a fixed quantity.
By Cauchy-Schwarz inequality

$⟨ y, v ⟩ \geq - ‖ v ‖ ‖ y ‖ .$
Hence

$f (y) \geq \frac{σ}{2} ‖ y ‖^{2} - ‖ v ‖ ‖ y ‖ + K_{x} .$
It is easy to see that, the R.H.S. goes to $\infty$ as $‖ y ‖ \to \infty$ .
Hence $f$ is coercive.

9.18.2.4. Sum Rule#

Theorem 9.270 (Sum of strongly convex and convex functions)

Let $f$ be $σ$ -strongly convex and $g$ be convex. Then $f + g$ is $σ$ -strongly convex.

Proof. Since both $f$ and $g$ are convex, hence their domains are convex. Hence, $dom (f + g) = dom f \cap dom g$ is also convex.

We further need to show that $f + g$ satisfies (9.10).

Let $x, y \in V$ and $t \in (0, 1)$ .
Since $f$ is $σ$ -strongly convex, hence

$f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y) - \frac{σ}{2} t (1 - t) ‖ x - y ‖^{2} .$
Since $g$ is convex, hence

$g (t x + (1 - t) y) \leq t g (x) + (1 - t) g (y) .$
Then,

$\begin{aligned} (f + g) (t x + (1 - t) y) \\ = f (t x + (1 - t) y) + g ((t x + (1 - t) y)) \\ \leq f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y) - \frac{σ}{2} t (1 - t) ‖ x - y ‖^{2} + t g (x) + (1 - t) g (y) \\ = t (f + g) (x) + (1 - t) (f + g) (y) - \frac{σ}{2} t (1 - t) ‖ x - y ‖^{2} . \end{aligned}$
Thus, $f + g$ is also $σ$ -strongly convex.

Example 9.83 (Strong convexity of $\frac{1}{2} | \cdot |^{2} + I_{C}$ )

Let $V$ be a Euclidean space.

The function $\frac{1}{2} ‖ x ‖^{2}$ is 1-strongly convex due to Theorem 9.268.
Let $C$ be a convex set.
Then, the indicator function $I_{C}$ is convex.
Due to Theorem 9.270, the function

$g (x) = \frac{1}{2} ‖ x ‖^{2} + I_{C} (x)$

is also 1-strongly convex.

9.18.2.5. First Order Characterization#

Recall that $dom (\partial f)$ denotes the set of points at which $f$ is subdifferentiable.

Theorem 9.271 (First order characterization of strong convexity)

Let $f : V \to (- \infty, \infty]$ be a proper closed and convex function. For a given $σ > 0$ , the following statements are equivalent.

$f$ is $σ$ -strongly convex.
For every $x \in dom (\partial f)$ , $y \in dom f$ and $g \in \partial f (x)$ , the following holds true

$f (y) \geq f (x) + ⟨ y - x, g ⟩ + \frac{σ}{2} ‖ y - x ‖^{2} .$
For any $x, y \in dom (\partial f)$ and $g_{x} \in \partial f (x)$ , $g_{y} \in \partial f (x)$ , the following holds true:

$⟨ x - y, g_{x} - g_{y} ⟩ \geq σ ‖ x - y ‖^{2} .$

Proof. We shall prove the equivalence of these statements in the following order. $(2) ⟹ (1)$ , $(1) ⟹ (3)$ , $(3) ⟹ (2)$ .

(2) $⟹$ (1)

We assume that (2) is true.
Let $x, y \in dom f$ and $t \in (0, 1)$ .
We need to show that (9.10) holds for $f$ .
Since $dom f$ is convex, its relative interior is not empty (see Theorem 9.142).
Let $z \in ri dom f$ .
Choose some $α \in (0, 1]$ .
Let $\tilde{x} = (1 - α) x + α z$ .
By the line segment property (Theorem 9.143), $\tilde{x} \in ri dom f$ .
Let $x_{t} = t \tilde{x} + (1 - t) y$ .
Again, by the line segment property, $x_{t} \in ri dom f$ .
Since $f$ is a proper convex function, hence the subdifferential of $f$ at relative interior points is nonempty (Theorem 9.217).
Thus, $\partial f (x_{t}) \neq \emptyset$ and $x_{t} \in dom (\partial f)$ .
Take some $g \in \partial f (x_{t})$ .
By hypothesis (2)

$f (\tilde{x}) \geq f (x_{t}) + ⟨ \tilde{x} - x_{t}, g ⟩ + \frac{σ}{2} ‖ \tilde{x} - x_{t} ‖^{2} .$
Substituting $x_{t} = t \tilde{x} + (1 - t) y$ , we have $\tilde{x} - x_{t} = (1 - t) (\tilde{x} - y)$ . Thus,

$f (\tilde{x}) \geq f (x_{t}) + (1 - t) ⟨ \tilde{x} - y, g ⟩ + \frac{σ (1 - t)^{2}}{2} ‖ \tilde{x} - y ‖^{2} .$
Similarly, by hypothesis (2)

$f (y) \geq f (x_{t}) + ⟨ y - x_{t}, g ⟩ + \frac{σ}{2} ‖ y - x_{t} ‖^{2} .$
$y - x_{t} = y - t \tilde{x} - (1 - t) y = t (y - \tilde{x})$ .
This gives us,

$f (y) \geq f (x_{t}) + t ⟨ y - \tilde{x}, g ⟩ + \frac{σ t^{2}}{2} ‖ y - \tilde{x} ‖^{2} .$
Multiplying the first inequality by $t$ and the second one by $(1 - t)$ and adding them together, we get

$t f (\tilde{x}) + (1 - t) f (y) \geq f (x_{t}) + \frac{σ t (1 - t)}{2} ‖ \tilde{x} - y ‖^{2} .$
Thus,

$f (t \tilde{x} + (1 - t) y) = f (x_{t}) \leq t f (\tilde{x}) + (1 - t) f (y) - \frac{σ t (1 - t)}{2} ‖ \tilde{x} - y ‖^{2} .$
Expanding $\tilde{x}$ ,

$\begin{aligned} t \tilde{x} + (1 - t) y & = t ((1 - α) x + α z) + (1 - t) y \\ = t (1 - α) x + (1 - t) y + t α z . \end{aligned}$
Define $g_{1} (α) = f (t \tilde{x} + (1 - t) y) = f (t (1 - α) x + (1 - t) y + t α z)$ .
Define $g_{2} (α) = f (\tilde{x}) = f ((1 - α) x + α z)$ .
Substituting these into the previous inequality, we obtain

$g_{1} (α) \leq t g_{2} (α) + (1 - t) f (y) - \frac{σ t (1 - t)}{2} ‖ (1 - α) x + α z - y ‖^{2} .$
The functions $g_{1}$ and $g_{2}$ are one dimensional, proper, closed and convex functions.
By Theorem 9.173, both $g_{1}$ and $g_{2}$ are continuous on their domain.
Therefore, taking the limit $α \to 0^{+}$ , it follows that

$g_{1} (0) \leq t g_{2} (0) + (1 - t) f (y) - \frac{σ t (1 - t)}{2} ‖ x - y ‖^{2} .$
Now $g_{1} (0) = f (t x + (1 - t) y)$ and $g_{2} (0) = f (x)$ .
Thus,

$f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y) - \frac{σ t (1 - t)}{2} ‖ x - y ‖^{2} .$
This establishes that $f$ is indeed $σ$ -strongly convex.

(1) $⟹$ (3)

We are given that $f$ is $σ$ -strongly convex.
Let $x, y \in dom (\partial f)$ .
Pick any $g_{x} \in \partial f (x)$ and $g_{y} \in \partial f (y)$ .
Let $t \in [0, 1)$ and denote $x_{t} = t x + (1 - t) y$ .
By the hypothesis

$f (x_{t}) \leq t f (x) + (1 - t) f (y) - \frac{σ t (1 - t)}{2} ‖ x - y ‖^{2} .$
This is same as

$f (x_{t}) - f (x) \leq (1 - t) [f (y) - f (x)] - \frac{σ t (1 - t)}{2} ‖ x - y ‖^{2} .$
We can see that $(1 - t) \in (0, 1]$ .
Dividing both sides of inequality by $(1 - t)$ , we obtain

$\frac{f (x_{t}) - f (x)}{1 - t} \leq f (y) - f (x) - \frac{σ t}{2} ‖ x - y ‖^{2} .$
Since $g_{x} \in \partial f (x)$ , hence by subgradient inequality

$f (x_{t}) \geq f (x) + ⟨ x_{t} - x, g_{x} ⟩ .$
We can rewrite this as

$\frac{f (x_{t}) - f (x)}{1 - t} \geq \frac{⟨ x_{t} - x, g_{x} ⟩}{1 - t} .$
Note that $x_{t} - x = (1 - t) (y - x)$ .
Thus,

$\frac{f (x_{t}) - f (x)}{1 - t} \geq ⟨ y - x, g_{x} ⟩ .$
Thus,

$⟨ y - x, g_{x} ⟩ \leq f (y) - f (x) - \frac{σ t}{2} ‖ x - y ‖^{2} .$
This inequality holds for every $t \in [0, 1)$ .
Taking the limit $t \to 1^{-}$ , we obtain

$⟨ y - x, g_{x} ⟩ \leq f (y) - f (x) - \frac{σ}{2} ‖ x - y ‖^{2} .$
An identical reasoning by switching the roles of $x$ and $y$ , gives us

$⟨ x - y, g_{y} ⟩ \leq f (x) - f (y) - \frac{σ}{2} ‖ y - x ‖^{2} .$
Adding these two inequalities gives us

$⟨ x - y, g_{y} - g_{x} ⟩ \leq - σ ‖ x - y ‖^{2} .$
Multiplying both sides by $- 1$ (and switching the inequality accordingly), we get

$⟨ x - y, g_{x} - g_{y} ⟩ \geq σ ‖ x - y ‖^{2}$

as desired.

(3) $⟹$ (2)

We are given that (3) is satisfied.
Let $x \in dom (\partial f)$ , $y \in dom f$ and $g \in \partial f (x)$ .
Pick any $z \in ri dom f$ .
Pick some $α \in (0, 1)$ .
Define $\tilde{y} = (1 - α) y + α z$ .
By line segment property $\tilde{y} \in ri dom f$ .
Define $x_{t} = (1 - t) x + t \tilde{y}$ .
Consider the 1D function

$φ (t) = f (x_{t}), \forall t \in [0, 1] .$
Pick any $t \in (0, 1)$ .
Then, by line segment principle $x_{t} \in ri dom f$ .
Due to (Theorem 9.217), $\partial f (x_{t}) \neq \emptyset$ and $x_{t} \in dom (\partial f)$ .
Take some $g_{t} \in \partial f (x_{t})$ .
By subgradient inequality

$f (z) \geq f (x_{t}) + ⟨ z - x_{t}, g_{t} ⟩ \forall z \in V .$
In particular, for $x_{s} = (1 - s) x + s \tilde{y}$ , we have

$\begin{aligned} f (x_{s}) \geq f (x_{t}) + ⟨ (1 - s) x + s \tilde{y} - (1 - t) x - t \tilde{y}, g_{t} ⟩ \\ ⟹ φ (s) \geq φ (t) + ⟨ (s - t) (\tilde{y} - x), g_{t} ⟩ \\ ⟹ φ (s) \geq φ (t) + (s - t) ⟨ \tilde{y} - x, g_{t} ⟩ . \end{aligned}$
Since this is valid for every $s$ , hence $⟨ \tilde{y} - x, g_{t} ⟩ \in \partial φ (t)$ .
Applying the mean value theorem (Theorem 9.234)

$f (\tilde{y}) - f (x) = φ (1) - φ (0) = \int_{0}^{1} ⟨ \tilde{y} - x, g_{t} ⟩ d t .$
Since $g \in \partial f (x)$ and $g_{t} \in \partial f (x_{t})$ , hence applying the hypothesis (3), we get

$⟨ x_{t} - x, g_{t} - g ⟩ \geq σ ‖ x_{t} - x ‖^{2} .$
But $x_{t} - x = t (\tilde{y} - x)$ .
Hence

$t ⟨ \tilde{y} - x, g_{t} - g ⟩ \geq σ t^{2} ‖ \tilde{y} - x ‖^{2} .$
This simplifies to

$⟨ \tilde{y} - x, g_{t} ⟩ \geq ⟨ \tilde{y} - x, g ⟩ + σ t ‖ \tilde{y} - x ‖^{2} .$

Canceling $t$ on both sides doesn’t change the sign of inequality since $t > 0$ .
Applying the inequality to the integral above

$f (\tilde{y}) - f (x) \geq \int_{0}^{1} [⟨ \tilde{y} - x, g ⟩ + σ t ‖ \tilde{y} - x ‖^{2}] d t .$
Integrating, we get

$f (\tilde{y}) - f (x) \geq ⟨ \tilde{y} - x, g ⟩ + \frac{σ}{2} ‖ \tilde{y} - x ‖^{2} .$
Expanding for $\tilde{y}$ for any $α \in (0, 1)$ , we have

$f ((1 - α) y + α z) \geq f (x) + ⟨ (1 - α) y + α z - x, g ⟩ + \frac{σ}{2} ‖ (1 - α) y + α z - x ‖^{2} .$
The 1D function $g (α) = f ((1 - α) y + α z)$ is continuous again due to Theorem 9.173.
Taking the limit $α \to 0^{+}$ on both sides, we obtain

$f (y) \geq f (x) + ⟨ y - x, g ⟩ + \frac{σ}{2} ‖ y - x ‖^{2}$

which is the desired result.

9.18.2.6. Minimization#

Theorem 9.272 (Existence and uniqueness of a a minimizer of closed strongly convex function)

Let $f : V \to (- \infty, \infty]$ be a proper, closed and $σ$ -strongly convex function with $σ > 0$ . Then,

$f$ has a unique minimizer $a \in dom f$ such that $f (x) > f (a)$ for every $x \in dom f$ and $x \neq a$ .
The increase in the value of $f$ w.r.t. its minimum satisfies

$f (x) - f (a) \geq \frac{σ}{2} ‖ x - a ‖^{2}$

where $a \in dom f$ is the unique minimizer of $f$ .

Proof. (1) Existence of the minimizer

Since $f$ is proper and convex, hence $dom f$ is nonempty and convex.
Since $dom f$ is nonempty and convex, hence its relative interior is nonempty (Theorem 9.142).
Pick $y \in ri dom f$ .
By Theorem 9.214, $\partial f (y)$ is nonempty.
Pick some $g \in \partial f (y)$ .
Then, by property 2 of Theorem 9.271,

$f (x) \geq f (y) + ⟨ x - y, g ⟩ + \frac{σ}{2} ‖ x - y ‖^{2}$

holds true for every $x \in V$ .
Let $‖ \cdot ‖_{2} ≜ \sqrt{⟨ \cdot, \cdot ⟩}$ denote the Euclidean norm associated with the inner product of the space $V$ . This might be different from the endowed norm $‖ \cdot ‖$ .
Since all norms in a finite dimensional space are equivalent, hence, there exists a constant $C > 0$ such that

$‖ z ‖ \geq \sqrt{C} ‖ z ‖_{2}$

for every $z \in V$ .
Therefore,

$f (x) \geq f (y) + ⟨ x - y, g ⟩ + \frac{σ C}{2} ‖ x - y ‖_{2}^{2} \forall x \in V .$
This in turn is same as

$f (x) \geq f (y) - \frac{1}{2 C σ} ‖ g ‖_{2}^{2} + \frac{C σ}{2} {‖ x - (y - \frac{1}{C σ} g) ‖}_{2}^{2} \forall x \in V .$
Let $S_{t}$ denote the sublevel set ${x | f (x) \leq t}$ .
Consider the sublevel set $S_{f (y)}$ .
Let $x \in S_{f (y)}$ .
Then, $f (x) = f (y) - r$ for some $r \geq 0$ .
But then

$f (y) - r \geq f (y) - \frac{1}{2 C σ} ‖ g ‖_{2}^{2} + \frac{C σ}{2} {‖ x - (y - \frac{1}{C σ} g) ‖}_{2}^{2} .$
This simplifies to

$r \leq \frac{1}{2 C σ} ‖ g ‖_{2}^{2} - \frac{C σ}{2} {‖ x - (y - \frac{1}{C σ} g) ‖}_{2}^{2} .$
Since $r$ must be nonnegative, hence the R.H.S. must be nonnegative also.
Thus, we require that

$\frac{1}{2 C σ} ‖ g ‖_{2}^{2} \geq \frac{C σ}{2} {‖ x - (y - \frac{1}{C σ} g) ‖}_{2}^{2} .$
This simplifies to

${‖ x - (y - \frac{1}{C σ} g) ‖}_{2} \leq \frac{1}{C σ} ‖ g ‖_{2} .$
In other words, $x$ must belong to an $ℓ_{2}$ closed ball given by

$B_{‖ \cdot ‖_{2}} [y - \frac{1}{C σ} g, \frac{1}{C σ} ‖ g ‖_{2}] .$
Since this is valid for every $x \in S_{f (y)}$ , hence

$S_{f (y)} \subseteq B_{‖ \cdot ‖_{2}} [y - \frac{1}{C σ} g, \frac{1}{C σ} ‖ g ‖_{2}] .$
Since $f$ is closed, hence all its sublevel sets are closed.
since $S_{f (y)}$ is contained in a ball, hence $S_{f (y)}$ is bounded.
Thus, $S_{f (y)}$ is closed and bounded.
Since $V$ is finite dimensional, hence $S_{f (y)}$ is compact.
$S_{f (y)}$ is also nonempty since $y \in S_{f (y)}$ .
Thus, the problem of minimizing $f$ over $dom f$ reduces to the problem of minimizing $f$ over the nonempty compact set $S_{f (y)}$ .
Since $f$ is closed, it is also lower semicontinuous.
By Theorem 3.121, $f$ attains a minimum on $S_{f (y)}$ at some point $a \in S_{f (y)}$ .
Thus, we have established the existence of a minimizer of $f$ at some $a \in S_{f (y)} \subseteq dom f$ .

(1) Uniqueness of the minimizer

To show the uniqueness, for contradiction, assume that $u$ and $v$ are two different minimizers of $f$ with $f (u) = f (v) = p^{*}$ , the optimal value.
Let $w = \frac{1}{2} u + \frac{1}{2} v$ .
We must have $f (w) \geq p^{*}$ .
By strong convexity of $f$ ,

$f (w) \leq \frac{1}{2} f (u) + \frac{1}{2} f (v) - \frac{σ}{2} \frac{1}{2} \frac{1}{2} ‖ u - v ‖^{2} = p^{*} - \frac{σ}{8} ‖ u - v ‖^{2} .$
If $u \neq v$ , then $f (w) < p^{*}$ ; a contradiction.
Hence, the minimizer must be unique.

(2) Increase in value of $f$

Let $a$ be the unique minimizer of $f$ .
By Fermat’s optimality condition $0 \in \partial f (a)$ .
Since $f$ is $σ$ -strongly convex, hence by property (2) in the Theorem 9.271,

$f (x) - f (a) \geq ⟨ x - a, 0 ⟩ + \frac{σ}{2} ‖ x - a ‖^{2} = \frac{σ}{2} ‖ x - a ‖^{2}$

holds true for any $x \in dom f$ .

9.18.3. Smoothness and Strong Convexity#

9.18.3.1. The Conjugate Correspondence Theorem#

The idea of smoothness and strong convexity is connected. Roughly speaking, a function is strongly convex if and only if its conjugate is smooth.

Theorem 9.273 (Conjugate correspondence theorem)

Let $σ > 0$ . Then

If $f : V \to R$ is a $\frac{1}{σ}$ -smooth convex function, then $f^{*}$ is $σ$ -strongly convex w.r.t. the dual norm $‖ \cdot ‖_{*}$ .
If $f : V \to (- \infty, \infty]$ is a proper, closed $σ$ -strongly convex function, then $f^{*} : V^{*} \to R$ is $\frac{1}{σ}$ -smooth.

Proof. (1) Smooth convex to strongly convex conjugate

We are given that $f : V \to R$ is a $\frac{1}{σ}$ -smooth convex function.
Due to Theorem 9.239, $f^{*}$ is closed and convex.
Since $f$ is proper and convex, hence due to Theorem 9.240, $f^{*}$ is proper.
Thus $f^{*}$ is a proper, closed and convex function.
Pick any $y_{1}, y_{2} \in dom (\partial f^{*})$ .
Let $v_{1} \in \partial f^{*} (y_{1})$ and $v_{2} \in \partial f^{*} (y_{2})$ .
Since $f$ is proper and convex, hence by conjugate subgradient theorem (Theorem 9.246)

$y_{1} \in \partial f (v_{1}) and y_{2} \in \partial f (v_{2}) .$
Since $f$ is smooth, hence it is differentiable. Hence due to Theorem 9.220,

$y_{1} = \nabla f (v_{1}) and y_{2} = \nabla f (v_{2}) .$
Following characterization of smoothness (Theorem 9.265), by its property 4,

$⟨ v_{1} - v_{2}, y_{1} - y_{2} ⟩ \geq σ ‖ y_{1} - y_{2} ‖_{*}^{2} .$
Since the last inequality holds for any $y_{1}, y_{2} \in dom (\partial f^{*})$ and any $v_{1} \in \partial f^{*} (y_{1}), v_{2} \in \partial f^{*} (v_{2})$ , hence following the first order characterization of strong convexity in Theorem 9.271, $f^{*}$ is a $σ$ -strongly convex function.

(2) Strongly convex to smooth conjugate

We are given that $f$ is proper, closed and $σ$ -strongly convex.
Pick any $y \in V^{*}$ .
The conjugate is given by

$f^{*} (y) = sup_{x \in V} {⟨ x, y ⟩ - f (y)} .$
Define $g (x) = f (x) - ⟨ x, y ⟩$ .
We can see that

$f^{*} (y) = - inf_{x \in V} g (x) .$
Due to the sum rule (Theorem 9.270), $g$ is $σ$ -strongly convex.
Due to Theorem 9.272, $g$ has a unique minimizer.
Hence $f^{*} (y)$ is finite.
Since this is valid for any $y \in V^{*}$ , hence $dom f^{*} = V^{*}$ .
This justifies the signature for $f^{*}$ as $f^{*} : V^{*} \to R$ being real valued.
Let’s continue with any $y$ .
Since $dom f^{*} = V^{*}$ , hence $y \in int dom f^{*}$ .
Now, by the second formulation of conjugate subgradient theorem (Corollary 9.26),

$\partial f^{*} (y) = \arg \max_{x \in V} {⟨ x, y ⟩ - f (x)} .$
We can see that

$\partial f^{*} (y) = - \arg \min_{x \in V} g (x) .$
Since $g$ has a unique minimizer, hence $\partial f^{*} (y)$ is a singleton.
Due to Theorem 9.220, $f^{*}$ is differentiable at $y$ .
Since $y$ is arbitrary, hence $f^{*}$ is differentiable over entire $V^{*}$ .
We now pickup two points $y_{1}, y_{2} \in V^{*}$ and denote $v_{1} = \nabla f^{*} (y_{1}), v_{2} = \nabla f^{*} (y_{2})$ .
By conjugate subgradient theorem (Theorem 9.246), this is equivalent to $y_{1} \in \partial f (v_{1})$ and $y_{2} \in \partial f (v_{2})$ .
Following the first order characterization of strong convexity in Theorem 9.271,

$⟨ v_{1} - v_{2}, y_{1} - y_{2} ⟩ \geq σ ‖ v_{1} - v_{2} ‖^{2} .$
In other words

$⟨ \nabla f^{*} (y_{1}) - \nabla f^{*} (y_{2}), y_{1} - y_{2} ⟩ \geq σ ‖ \nabla f^{*} (y_{1}) - \nabla f^{*} (y_{2}) ‖^{2} .$
By generalized Cauchy Schwartz inequality (Theorem 4.108)

$⟨ \nabla f^{*} (y_{1}) - \nabla f^{*} (y_{2}), y_{1} - y_{2} ⟩ \leq ‖ \nabla f^{*} (y_{1}) - \nabla f^{*} (y_{2}) ‖ ‖ y_{1} - y_{2} ‖_{*} .$
Thus the previous inequality simplifies to

$‖ \nabla f^{*} (y_{1}) - \nabla f^{*} (y_{2}) ‖ \leq \frac{1}{σ} ‖ y_{1} - y_{2} ‖_{*} .$
This establishes that $f^{*}$ is $\frac{1}{σ}$ -smooth.

9.18.4. Examples#

Example 9.84 (Smoothness of $\sqrt{1 + | \cdot |_{2}^{2}}$ )

Let $f : R^{n} \to R$ be given by

f (x) = \sqrt{1 + ‖ x ‖_{2}^{2}} .

$f$ is 1-smooth w.r.t. the $ℓ_{2}$ -norm.

Note that for any $x \in R^{n}$ , the gradient is given by

$\nabla f (x) = \frac{x}{\sqrt{1 + ‖ x ‖_{2}^{2}}} .$
The Hessian is given by

$\nabla^{f} (x) = \frac{I}{\sqrt{1 + ‖ x ‖_{2}^{2}}} - \frac{x x^{T}}{(1 + ‖ x ‖_{2}^{2})^{\frac{3}{2}}} ⪯ \frac{I}{\sqrt{1 + ‖ x ‖_{2}^{2}}} ⪯ I .$
Therefore, $λ_{max} (\nabla^{2} f (x)) \leq 1$ for every $x \in R^{n}$ .
Hence, by Corollary 9.27, $f$ is 1-smooth w.r.t. the $ℓ_{2}$ -norm.

9.18.4.1. Log-Sum-Exp#

Example 9.85 (Smoothness of log-sum-exp)

Consider the log-sum-exp function $f : R^{n} \to R$ given by

f (x) = \ln (\sum_{i = 1}^{n} e^{x_{i}}) .

$f$ is 1-smooth w.r.t. $ℓ_{2}$ and $ℓ_{\infty}$ norms.

Smoothness w.r.t. $ℓ_{2}$ norm

The partial derivatives of $f$ are

$\frac{\partial f}{\partial x_{i}} (x) = \frac{e^{x_{i}}}{\sum_{k = 1}^{n} e^{x_{k}}} .$
The second order partial derivatives are

$\begin{array}{r} \frac{\partial^{2} f}{\partial x_{i} \partial x_{j}} (x) = {\begin{cases} - \frac{e^{x_{i}} e^{x_{j}}}{{(\sum_{k = 1}^{n} e^{x_{k}})}^{2}}, & i \neq j; \\ - \frac{e^{x_{i}} e^{x_{i}}}{{(\sum_{k = 1}^{n} e^{x_{k}})}^{2}} + \frac{e^{x_{i}}}{\sum_{k = 1}^{n} e^{x_{k}}}, & i = j . \end{cases} \end{array}$
The Hessian can be written as

$\nabla^{2} f (x) = diag (w) - w w^{T}$

where $w_{i} = \frac{e^{x_{i}}}{\sum_{k = 1}^{n} e^{x_{k}}}$ .
We can now see that

$\nabla^{2} f (x) = diag (w) - w w^{T} ⪯ diag (w) ⪯ I .$
Hence $λ_{max} (\nabla^{2} f (x)) \leq 1$ for every $x \in R^{n}$ .
Hence, by Corollary 9.27, $f$ is 1-smooth w.r.t. the $ℓ_{2}$ -norm.

Smoothness w.r.t. $ℓ_{\infty}$ norm

We first show that for any $v \in V$

$v^{T} \nabla^{2} f (x) v \leq ‖ v ‖_{\infty}^{2} .$
To see this, we expand the L.H.S. as

$\begin{aligned} v^{T} \nabla^{2} f (x) v & = v^{T} (diag (w) - w w^{T}) v \\ = v^{T} diag (w) v - (w^{T} v)^{2} \\ \leq v^{T} diag (w) v \\ = \sum_{i = 1}^{n} w_{i} v_{i}^{2} \\ \leq ‖ v ‖_{\infty}^{2} \sum_{i = 1}^{n} w_{i} \\ = ‖ v ‖_{\infty}^{2} . \end{aligned}$
Since $f$ is twice differentiable over $R^{n}$ , hence by linear approximation theorem (Theorem 5.5), for any $x, y \in R^{n}$ , there exists $z \in [x, y]$ such that

$f (y) - f (x) = \nabla f (x)^{T} (y - x) + \frac{1}{2} (y - x)^{T} \nabla^{2} f (z) (y - x) .$
Let $v = y - x$ .
Then from above,

$(y - x)^{T} \nabla^{2} f (z) (y - x) \leq ‖ v ‖_{\infty}^{2} .$
Putting this back in the approximation, we have

$f (y) \leq f (x) + \nabla f (x)^{T} (y - x) + \frac{1}{2} ‖ y - x ‖_{\infty}^{2} .$
Following characterization of smoothness (Theorem 9.265), $f$ is indeed 1-smooth w.r.t. the $ℓ_{\infty}$ -norm.

9.18.4.2. Negative Entropy#

Example 9.86 (Strong convexity of negative entropy over the unit simplex)

Let $f : R^{n} \to (- \infty, \infty]$ be given by:

\begin{array}{r} f (x) ≜ {\begin{cases} \sum_{i = 1}^{n} x_{i} \ln x_{i} & x \in Δ_{n} \\ \infty & otherwise \end{cases} . \end{array}

$f$ is 1-strongly convex for both $ℓ_{1}$ and $ℓ_{2}$ norms.

By Theorem 9.257, its conjugate is given by

$f^{*} (y) = \ln (\sum_{j = 1}^{n} e^{y_{j}})$

which is the log sum exp function.
By Example 9.85, the log-sum-exp function is 1-smooth w.r.t. both $ℓ_{2}$ and $ℓ_{\infty}$ norms.
Hence by conjugate correspondence theorem Theorem 9.273, $f$ is 1-strongly convex for both $ℓ_{1}$ and $ℓ_{2}$ norms.

Topics in Signal Processing

Smoothness

Contents

9.18. Smoothness#

9.18.1. L-Smooth Functions#

9.18.1.1. Descent Lemma#

9.18.1.2. Characterization of $L$ -Smooth Functions#

9.18.1.3. Second Order Characterization#

9.18.2. Strong Convexity#

9.18.2.1. Strong Convexity $⟹$ Convexity#

9.18.2.2. Quadratic Functions#

9.18.2.3. Coerciveness#

9.18.2.4. Sum Rule#

9.18.2.5. First Order Characterization#

9.18.2.6. Minimization#

9.18.3. Smoothness and Strong Convexity#

9.18.3.1. The Conjugate Correspondence Theorem#

9.18.4. Examples#

9.18.4.1. Log-Sum-Exp#

9.18.4.2. Negative Entropy#

Topics in Signal Processing

Smoothness

Contents

9.18. Smoothness#

9.18.1. L-Smooth Functions#

9.18.1.1. Descent Lemma#

9.18.1.2. Characterization of L-Smooth Functions#

9.18.1.3. Second Order Characterization#

9.18.2. Strong Convexity#

9.18.2.1. Strong Convexity ⟹ Convexity#

9.18.2.2. Quadratic Functions#

9.18.2.3. Coerciveness#

9.18.2.4. Sum Rule#

9.18.2.5. First Order Characterization#

9.18.2.6. Minimization#

9.18.3. Smoothness and Strong Convexity#

9.18.3.1. The Conjugate Correspondence Theorem#

9.18.4. Examples#

9.18.4.1. Log-Sum-Exp#

9.18.4.2. Negative Entropy#

9.18.1.2. Characterization of $L$ -Smooth Functions#

9.18.2.1. Strong Convexity $⟹$ Convexity#