Smoothness
Contents
9.18. Smoothness#
Primary references for this section are [7].
9.18.1. L-Smooth Functions#
Definition 9.80 (L-smooth functions)
For some
The constant
Since
is differentiable over , hence .If
is -smooth over the entire , we simply say that is -smooth. -smooth functions are also known as functions with Lipschitz gradient with constant .The class of functions which are
-smooth over a set is denoted by .When
, then the class is simply denoted as .The class of functions which are
-smooth for some (but may not be known), is denoted by .By definition, if a function is
-smooth, then it is -smooth for every .Thus, it is often useful to identify the smallest possible value of
for which a function is -smooth.
Example 9.82 (Zero smoothness of affine functions)
Let
Then,
To see this, we note that
Theorem 9.263 (Smoothness of quadratic functions)
Let
Then,
and
Proof. We note that the dual norm of
Thus,
We next show that
Assume that
is smooth for some .By definition of
, there exists a vector such that andThen,
Thus,
.
Thus,
9.18.1.1. Descent Lemma#
Theorem 9.264 (Descent lemma)
Let
Proof. we proceed as follows:
By the fundamental theorem of calculus
By adding and subtracting
, we get:This gives us
(a) is an application of Generalized Cauchy Schwartz inequality (Theorem 4.108}).
(b) is the application of
-smoothness of (Definition 9.80).
Thus,
9.18.1.2. Characterization of -Smooth Functions#
Theorem 9.265 (Characterization of
Let
is -smooth. . . . .
Proof. (1)
(2)
We are given that (2) is satisfied.
If
, then the inequality is trivial due to the convexity of . Hence, we consider the case where .Fix a
.Consider a function
given byThen,
By hypothesis in property (2), for any
Now,
Thus,
also satisfies the inequality in property (2).We note in particular that
.Since
is convex, hence is the global minimizer of .In other words,
We can also see that
.Let
Let
be the unit norm vector satisfying .Choose
Then,
Using property (2) on
, we getSimplifying this, we get
as desired.
(3)
For
, the property (3) gives us:For
, the property (3) gives us:Adding the two inequalities and canceling the term
gives usRearranging, we get
as desired.
(4)
When
, then the Lipschitz condition in (1) is trivial. Hence, we consider the case where .By generalized Cauchy Schwartz inequality (Theorem 4.108)
Thus, combining with hypothesis (4), we obtain
Since
, hence .Canceling it from both sides, we get
as desired.
We have shown so far that (1), (2), (3) and (4) are equivalent statements. We are left with showing that (5) is equivalent to the other statements.
(2)
Pick
and .Let
.By hypothesis (2),
Note that
and .Thus, the previous two inequalities are same as
Multiplying the first inequality by
, the second by and adding, we getRearranging, we get
(5)
Pick
and .By hypothesis in inequality (5)
Rearranging the terms, we obtain
Division by
is fine since .Recalling the definition of directional derivative (Definition 9.70):
Since the previous inequality is valid for every
, taking the limit to on the R.H.S., we obtainRecall from Theorem 9.203 that
.Thus, we get:
as desired.
9.18.1.3. Second Order Characterization#
We now restrict our attention to the vector space
Theorem 9.266 (
Let
is -smooth w.r.t. the -norm ( ). for any where satisfies .
Proof. (2)
We are given that
for any .By the fundamental theorem of calculus
Taking the (dual)-norm on both sides
Thus,
as desired.
(1)
We are given that
is smooth with norm.By fundamental theorem of calculus, for any
and ,Taking
norm on both sidesDividing by
on both sides and taking the limit , we getSince this is valid for every
, hence
Corollary 9.27 (
Let
Proof. Since
From Theorem 9.266,
9.18.2. Strong Convexity#
Definition 9.81 (Strong convexity)
A function
9.18.2.1. Strong Convexity Convexity#
Strongly convex functions are convex. In fact, we have a stronger result available.
Theorem 9.267 (Strong convexity and convexity)
Assume that the ambient space
A function
Proof. Let us define a function
We need to show that
We first note that
.Thus,
is convex if and only if is convex.Now,
is convex if and only if is convex and for any andNow,
Since the norm is Euclidean, hence
Thus, the convexity inequality for
is equivalent towhich is nothing but the
-strong convexity condition of .
9.18.2.2. Quadratic Functions#
Theorem 9.268 (Strong convexity of quadratic functions)
Let
Then
Proof. Due to Theorem 9.267,
We note that
As shown in Example 9.40,
is convex if and only if .This is equivalent to
.
9.18.2.3. Coerciveness#
Theorem 9.269 (Strong convexity and coerciveness)
Assume that the ambient space
Proof. We proceed as follows.
Define
By Theorem 9.267,
is convex.Since
is differentiable, hence is also differentiable.Specifically,
.Fix some
.Then
.By subgradient inequality, for any
,Expanding
and :Let
.Rearranging terms
where
.We note that the term
depends solely on which is fixed. Hence is a fixed quantity.By Cauchy-Schwarz inequality
Hence
It is easy to see that, the R.H.S. goes to
as .Hence
is coercive.
9.18.2.4. Sum Rule#
Theorem 9.270 (Sum of strongly convex and convex functions)
Let
Proof. Since both
We further need to show that
Let
and .Since
is -strongly convex, henceSince
is convex, henceThen,
Thus,
is also -strongly convex.
Example 9.83 (Strong convexity of
Let
The function
is 1-strongly convex due to Theorem 9.268.Let
be a convex set.Then, the indicator function
is convex.Due to Theorem 9.270, the function
is also 1-strongly convex.
9.18.2.5. First Order Characterization#
Recall that
Theorem 9.271 (First order characterization of strong convexity)
Let
is -strongly convex.For every
, and , the following holds trueFor any
and , , the following holds true:
Proof. We shall prove the equivalence of these statements in the following order.
(2)
We assume that (2) is true.
Let
and .We need to show that (9.10) holds for
.Since
is convex, its relative interior is not empty (see Theorem 9.142).Let
.Choose some
.Let
.By the line segment property (Theorem 9.143),
.Let
.Again, by the line segment property,
.Since
is a proper convex function, hence the subdifferential of at relative interior points is nonempty (Theorem 9.217).Thus,
and .Take some
.By hypothesis (2)
Substituting
, we have . Thus,Similarly, by hypothesis (2)
.This gives us,
Multiplying the first inequality by
and the second one by and adding them together, we getThus,
Expanding
,Define
.Define
.Substituting these into the previous inequality, we obtain
The functions
and are one dimensional, proper, closed and convex functions.By Theorem 9.173, both
and are continuous on their domain.Therefore, taking the limit
, it follows thatNow
and .Thus,
This establishes that
is indeed -strongly convex.
(1)
We are given that
is -strongly convex.Let
.Pick any
and .Let
and denote .By the hypothesis
This is same as
We can see that
.Dividing both sides of inequality by
, we obtainSince
, hence by subgradient inequalityWe can rewrite this as
Note that
.Thus,
Thus,
This inequality holds for every
.Taking the limit
, we obtainAn identical reasoning by switching the roles of
and , gives usAdding these two inequalities gives us
Multiplying both sides by
(and switching the inequality accordingly), we getas desired.
(3)
We are given that (3) is satisfied.
Let
, and .Pick any
.Pick some
.Define
.By line segment property
.Define
.Consider the 1D function
Pick any
.Then, by line segment principle
.Due to (Theorem 9.217),
and .Take some
.By subgradient inequality
In particular, for
, we haveSince this is valid for every
, hence .Applying the mean value theorem (Theorem 9.234)
Since
and , hence applying the hypothesis (3), we getBut
.Hence
This simplifies to
Canceling
on both sides doesn’t change the sign of inequality since .Applying the inequality to the integral above
Integrating, we get
Expanding for
for any , we haveThe 1D function
is continuous again due to Theorem 9.173.Taking the limit
on both sides, we obtainwhich is the desired result.
9.18.2.6. Minimization#
Theorem 9.272 (Existence and uniqueness of a a minimizer of closed strongly convex function)
Let
has a unique minimizer such that for every and .The increase in the value of
w.r.t. its minimum satisfieswhere
is the unique minimizer of .
Proof. (1) Existence of the minimizer
Since
is proper and convex, hence is nonempty and convex.Since
is nonempty and convex, hence its relative interior is nonempty (Theorem 9.142).Pick
.By Theorem 9.214,
is nonempty.Pick some
.Then, by property 2 of Theorem 9.271,
holds true for every
.Let
denote the Euclidean norm associated with the inner product of the space . This might be different from the endowed norm .Since all norms in a finite dimensional space are equivalent, hence, there exists a constant
such thatfor every
.Therefore,
This in turn is same as
Let
denote the sublevel set .Consider the sublevel set
.Let
.Then,
for some .But then
This simplifies to
Since
must be nonnegative, hence the R.H.S. must be nonnegative also.Thus, we require that
This simplifies to
In other words,
must belong to an closed ball given bySince this is valid for every
, henceSince
is closed, hence all its sublevel sets are closed.since
is contained in a ball, hence is bounded.Thus,
is closed and bounded.Since
is finite dimensional, hence is compact. is also nonempty since .Thus, the problem of minimizing
over reduces to the problem of minimizing over the nonempty compact set .Since
is closed, it is also lower semicontinuous.By Theorem 3.121,
attains a minimum on at some point .Thus, we have established the existence of a minimizer of
at some .
(1) Uniqueness of the minimizer
To show the uniqueness, for contradiction, assume that
and are two different minimizers of with , the optimal value.Let
.We must have
.By strong convexity of
,If
, then ; a contradiction.Hence, the minimizer must be unique.
(2) Increase in value of
Let
be the unique minimizer of .By Fermat’s optimality condition
.Since
is -strongly convex, hence by property (2) in the Theorem 9.271,holds true for any
.
9.18.3. Smoothness and Strong Convexity#
9.18.3.1. The Conjugate Correspondence Theorem#
The idea of smoothness and strong convexity is connected. Roughly speaking, a function is strongly convex if and only if its conjugate is smooth.
Theorem 9.273 (Conjugate correspondence theorem)
Let
If
is a -smooth convex function, then is -strongly convex w.r.t. the dual norm .If
is a proper, closed -strongly convex function, then is -smooth.
Proof. (1) Smooth convex to strongly convex conjugate
We are given that
is a -smooth convex function.Due to Theorem 9.239,
is closed and convex.Since
is proper and convex, hence due to Theorem 9.240, is proper.Thus
is a proper, closed and convex function.Pick any
.Let
and .Since
is proper and convex, hence by conjugate subgradient theorem (Theorem 9.246)Since
is smooth, hence it is differentiable. Hence due to Theorem 9.220,Following characterization of smoothness (Theorem 9.265), by its property 4,
Since the last inequality holds for any
and any , hence following the first order characterization of strong convexity in Theorem 9.271, is a -strongly convex function.
(2) Strongly convex to smooth conjugate
We are given that
is proper, closed and -strongly convex.Pick any
.The conjugate is given by
Define
.We can see that
Due to the sum rule (Theorem 9.270),
is -strongly convex.Due to Theorem 9.272,
has a unique minimizer.Hence
is finite.Since this is valid for any
, hence .This justifies the signature for
as being real valued.Let’s continue with any
.Since
, hence .Now, by the second formulation of conjugate subgradient theorem (Corollary 9.26),
We can see that
Since
has a unique minimizer, hence is a singleton.Due to Theorem 9.220,
is differentiable at .Since
is arbitrary, hence is differentiable over entire .We now pickup two points
and denote .By conjugate subgradient theorem (Theorem 9.246), this is equivalent to
and .Following the first order characterization of strong convexity in Theorem 9.271,
In other words
By generalized Cauchy Schwartz inequality (Theorem 4.108)
Thus the previous inequality simplifies to
This establishes that
is -smooth.
9.18.4. Examples#
Example 9.84 (Smoothness of
Let
Note that for any
, the gradient is given byThe Hessian is given by
Therefore,
for every .Hence, by Corollary 9.27,
is 1-smooth w.r.t. the -norm.
9.18.4.1. Log-Sum-Exp#
Example 9.85 (Smoothness of log-sum-exp)
Consider the log-sum-exp function
Smoothness w.r.t.
The partial derivatives of
areThe second order partial derivatives are
The Hessian can be written as
where
.We can now see that
Hence
for every .Hence, by Corollary 9.27,
is 1-smooth w.r.t. the -norm.
Smoothness w.r.t.
We first show that for any
To see this, we expand the L.H.S. as
Since
is twice differentiable over , hence by linear approximation theorem (Theorem 5.5), for any , there exists such thatLet
.Then from above,
Putting this back in the approximation, we have
Following characterization of smoothness (Theorem 9.265),
is indeed 1-smooth w.r.t. the -norm.
9.18.4.2. Negative Entropy#
Example 9.86 (Strong convexity of negative entropy over the unit simplex)
Let
By Theorem 9.257, its conjugate is given by
which is the log sum exp function.
By Example 9.85, the log-sum-exp function is 1-smooth w.r.t. both
and norms.Hence by conjugate correspondence theorem Theorem 9.273,
is 1-strongly convex for both and norms.