9.18. Smoothness#

Primary references for this section are [7].

9.18.1. L-Smooth Functions#

Definition 9.80 (L-smooth functions)

For some L0, a function f:V(,] is called L-smooth over a set DV if it is differentiable over D and satisfies

f(x)f(y)Lxyx,yD.

The constant L is called the smoothness parameter.

  • Since f is differentiable over D, hence Dintdomf.

  • If f is L-smooth over the entire V, we simply say that f is L-smooth.

  • L-smooth functions are also known as functions with Lipschitz gradient with constant L.

  • The class of functions which are L-smooth over a set D is denoted by CL1,1(D).

  • When D=V, then the class is simply denoted as CL1,1.

  • The class of functions which are L-smooth for some L0 (but L may not be known), is denoted by C1,1.

  • By definition, if a function is L1-smooth, then it is L2-smooth for every L2L1.

  • Thus, it is often useful to identify the smallest possible value of L for which a function is L-smooth.

Example 9.82 (Zero smoothness of affine functions)

Let bV and cR. Let f:VR be given by:

f(x)=x,b+c.

Then, f is 0-smooth.

To see this, we note that f(x)=b. Consequently,

f(x)f(y)=bb=0=00xy.

Theorem 9.263 (Smoothness of quadratic functions)

Let ASn, bRn and cR. Assume that Rn is endowed with p-norm for some 1p. Let f:RnR be given by:

f(x)=12xTAx+bTx+c.

Then, f is L-smooth with L=Ap,q where q[1,] is the conjugate exponent satisfying

1p+1q=1

and Ap,q is the induced norm given by

Ap,q=sup{Axq|xp1}.

Proof. We note that the dual norm of p norm is the q norm. Now,

f(x)f(y)=f(x)f(y)q=Ax+bAybq=AxAyqAp,qxyp.

Thus, f is Ap,q smooth.

We next show that Ap,q is the smallest smoothness parameter for f.

  1. Assume that f is L smooth for some L.

  2. By definition of Ap,q, there exists a vector x~ such that x~p=1 and

    Ax~q=Ap,qx~p=Ap,q.
  3. Then,

    Ap,q=Ax~q=Ax~+bA0bq=f(x~)f(0)qLx~0p=L.
  4. Thus, Ap,qL.

Thus, Ap,q is indeed the smallest smoothness parameter for L.

9.18.1.1. Descent Lemma#

Theorem 9.264 (Descent lemma)

Let f:V(,] be L-smooth for some L0 over some convex set D. Then for any x,yD,

f(y)f(x)+yx,f(x)+L2xy2.

Proof. we proceed as follows:

  1. By the fundamental theorem of calculus

    f(y)f(x)=01yx,f(x+t(yx))dt.
  2. By adding and subtracting yx,f(x), we get:

    f(y)f(x)=yx,f(x)+01yx,f(x+t(yx))f(x)dt.
  3. This gives us

    |f(y)f(x)yx,f(x)|=|01yx,f(x+t(yx))f(x)dt|01|yx,f(x+t(yx))f(x)|dt01yxf(x+t(yx))f(x)dt (a) 01yxtLyxdt (b) =01tLyx2dt=Lyx201tdt=L2yx2.
    • (a) is an application of Generalized Cauchy Schwartz inequality (Theorem 4.108}).

    • (b) is the application of L-smoothness of f (Definition 9.80).

  4. Thus,

    |f(y)f(x)yx,f(x)|L2yx2f(y)f(x)yx,f(x)L2yx2f(y)f(x)+yx,f(x)+L2yx2.

9.18.1.2. Characterization of L-Smooth Functions#

Theorem 9.265 (Characterization of L-smooth functions)

Let f:VR be convex and differentiable over V. Let L>0. The following claims are equivalent:

  1. f is L-smooth.

  2. f(y)f(x)+yx,f(x)+L2xy2x,yV.

  3. f(y)f(x)+yx,f(x)+12Lf(x)f(y)2x,yV.

  4. xy,f(x)f(y)1Lf(x)f(y)2x,yV.

  5. f(tx+(1t)y)tf(x)+(1t)f(y)L2t(1t)xy2x,yV,t[0,1].

Proof. (1) (2). This is a direct implication of the descent lemma (Theorem 9.264).

(2) (3)

  1. We are given that (2) is satisfied.

  2. If f(x)=f(y), then the inequality is trivial due to the convexity of f. Hence, we consider the case where f(x)f(y).

  3. Fix a xV.

  4. Consider a function gx:VR given by

    gx(y)=f(y)f(x)yx,f(x).
  5. Then,

    gx(y)=f(y)f(x).
  6. By hypothesis in property (2), for any zV

    f(z)f(y)+zy,f(y)+L2zy2.
  7. Now,

    gx(z)=f(z)f(x)zx,f(x)f(y)+zy,f(y)+L2zy2f(x)zx,f(x)=f(y)f(x)zx,f(x)+zy,f(x)zy,f(x)+zy,f(y)+L2zy2=f(y)f(x)yx,f(x)+zy,f(y)f(x)+L2zy2=gx(y)+zy,gx(y)+L2zy2.
  8. Thus, gx also satisfies the inequality in property (2).

  9. We note in particular that gx(x)=f(x)f(x)=0.

  10. Since gx is convex, hence x is the global minimizer of gx.

  11. In other words,

    gx(x)gx(z)zV.
  12. We can also see that gx(x)=f(x)f(x)xx,f(x)=0.

  13. Let yV

  14. Let vV be the unit norm vector satisfying gx(y)=v,gx(y).

  15. Choose

    z=ygx(y)Lv.
  16. Then,

    0=gx(x)gx(z)=gx(ygx(y)Lv).
  17. Using property (2) on gx(z), we get

    0gx(z)gx(y)+zy,gx(y)+L2zy2=gx(y)gx(y)Lv,gx(y)+L2gx(y)Lv2=gx(y)gx(y)Lgx(y)+12Lgx(y)2=gx(y)12Lgx(y)2=f(y)f(x)yx,f(x)12Lf(y)f(x)2.
  18. Simplifying this, we get

    f(y)f(x)+yx,f(x)+12Lf(y)f(x)2

    as desired.

(3) (4)

  1. For x,y, the property (3) gives us:

    f(y)f(x)+yx,f(x)+12Lf(y)f(x)2.
  2. For y,x, the property (3) gives us:

    f(x)f(y)+xy,f(y)+12Lf(x)f(y)2.
  3. Adding the two inequalities and canceling the term f(x)+f(y) gives us

    0xy,f(y)f(x)+1Lf(x)f(y)2.
  4. Rearranging, we get

    xy,f(x)f(y)1Lf(x)f(y)2

    as desired.

(4) (1)

  1. When f(x)=f(y), then the Lipschitz condition in (1) is trivial. Hence, we consider the case where f(x)f(y).

  2. By generalized Cauchy Schwartz inequality (Theorem 4.108)

    xy,f(x)f(y)xyf(x)f(y).
  3. Thus, combining with hypothesis (4), we obtain

    1Lf(x)f(y)2xyf(x)f(y).
  4. Since f(x)f(y), hence f(x)f(y)>0.

  5. Canceling it from both sides, we get

    f(x)f(y)Lxy

    as desired.

We have shown so far that (1), (2), (3) and (4) are equivalent statements. We are left with showing that (5) is equivalent to the other statements.

(2) (5)

  1. Pick x,yV and t[0,1].

  2. Let z=tx+(1t)y.

  3. By hypothesis (2),

    f(x)f(z)+xz,f(z)+L2xz2;f(y)f(z)+yz,f(z)+L2yz2.
  4. Note that xz=(1t)(xy) and yz=t(yx).

  5. Thus, the previous two inequalities are same as

    f(x)f(z)+(1t)xy,f(z)+L(1t)22xy2;f(y)f(z)+tyx,f(z)+Lt22xy2.
  6. Multiplying the first inequality by t, the second by (1t) and adding, we get

    tf(x)+(1t)f(y)f(z)+Lt(1t)2xy2.
  7. Rearranging, we get

    f(tx+(1t)y)=f(z)tf(x)+(1t)f(y)L2t(1t)xy2.

(5) (2)

  1. Pick x,yV and t(0,1).

  2. By hypothesis in inequality (5)

    f(tx+(1t)y)tf(x)+(1t)f(y)L2t(1t)xy2.
  3. Rearranging the terms, we obtain

    (1t)f(y)f(tx+(1t)y)tf(x)+L2t(1t)xy2(1t)f(y)f(tx+(1t)y)f(x)+(1t)f(x)+L2t(1t)xy2f(y)f(x)+f(tx+(1t)y)f(x)1t+L2txy2.

    Division by (1t) is fine since (1t)(0,1).

  4. Recalling the definition of directional derivative (Definition 9.70):

    limt1f(tx+(1t)y)f(x)1t=lims0+f((1s)x+sy)f(x)s=lims0+f(x+s(yx))f(x)s=f(x;yx).
  5. Since the previous inequality is valid for every t(0,1), taking the limit to t1 on the R.H.S., we obtain

    f(y)f(x)+f(x;yx)+L2xy2.
  6. Recall from Theorem 9.203 that f(x;yx)=yx,f(x).

  7. Thus, we get:

    f(y)f(x)+yx,f(x)+L2xy2.

    as desired.

9.18.1.3. Second Order Characterization#

We now restrict our attention to the vector space Rn equipped with an p norm with p1.

Theorem 9.266 (L-smoothness and the boundedness of the Hessian)

Let f:RnR be a twice continuously differentiable function over Rn. Then, for any L0, the following claims are equivalent:

  1. f is L-smooth w.r.t. the p-norm (p[1,]).

  2. 2f(x)p,qL for any xRn where q1 satisfies 1p+1q=1.

Proof. (2) (1)

  1. We are given that 2f(x)p,qL for any xRn.

  2. By the fundamental theorem of calculus

    f(y)f(x)=012f(x+t(yx))(yx)dt=(012f(x+t(yx))dt)(yx).
  3. Taking the (dual)-norm on both sides

    f(y)f(x)q=(012f(x+t(yx))dt)(yx)q012f(x+t(yx))dtp,qyxp(012f(x+t(yx))p,qdt)yxp(01Ldt)yxp=Lyxp.
  4. Thus, f(y)f(x)qLyxp as desired.

(1) (2)

  1. We are given that f is L smooth with p norm.

  2. By fundamental theorem of calculus, for any dRn and s>0,

    f(x+sd)f(x)=0s2f(x+td)ddt.
  3. Taking q norm on both sides

    (0s2f(x+td)dt)dq=f(x+sd)f(x)qLx+sdxp=sLdp.
  4. Dividing by s on both sides and taking the limit s0+, we get

    2f(x)dqLdp.
  5. Since this is valid for every dRn, hence

    2f(x)p,qL.

Corollary 9.27 (L-smoothness and largest eigenvalue of Hessian)

Let f:RnR be a twice continuously differentiable convex function over Rn. Then f is L-smooth w.r.t. 2-norm if and only if

λmax(2f(x))LxRn.

Proof. Since f is convex, hence it follows that 2f(x)O for every x. Thus,

2f(x)2,2=λmax(2f(x)2)=λmax(2f(x)).

From Theorem 9.266, f is L-smooth is equivalent to the condition that

λmax(2f(x))=2f(x)2,2L.

9.18.2. Strong Convexity#

Definition 9.81 (Strong convexity)

A function f:V(,] is called σ-strongly convex for σ>0 if domf is convex and the following holds for any x,ydomf and t[0,1]:

(9.10)#f(tx+(1t)y)tf(x)+(1t)f(y)σ2t(1t)xy2.

9.18.2.1. Strong Convexity Convexity#

Strongly convex functions are convex. In fact, we have a stronger result available.

Theorem 9.267 (Strong convexity and convexity)

Assume that the ambient space V is Euclidean.

A function f:V(,] is σ-strongly convex if and only if the function f()σ22 is convex.

Proof. Let us define a function g:V(,] as

g(x)=f(x)=σ2x2.

We need to show that f is σ-strongly convex if and only if g is convex.

  1. We first note that domg=domf.

  2. Thus, domg is convex if and only if domf is convex.

  3. Now, g is convex if and only if domg=domf is convex and for any x,ydomf and t(0,1)

    g(tx+(1t)y)tg(x)+(1t)g(y).
  4. Now,

    g(tx+(1t)y)tg(x)+(1t)g(y)f(tx+(1t)y)σ2tx+(1t)y2tf(x)+(1t)f(y)σ2[tx2+(1t)y2]f(tx+(1t)y)tf(x)+(1t)f(y)+σ2[tx+(1t)y2tx2(1t)y2].
  5. Since the norm is Euclidean, hence

    tx+(1t)y2tx2(1t)y2=tx+(1t)y,tx+(1t)ytx2(1t)y2=t2x2+(1t)2y2+2t(1t)x,ytx2(1t)y2=t(1t)x2t(1t)y2+2t(1t)x,y=t(1t)(x2+y22x,y)=t(1t)xy2.
  6. Thus, the convexity inequality for g is equivalent to

    f(tx+(1t)y)tf(x)+(1t)f(y)σ2t(1t)xy2

    which is nothing but the σ-strong convexity condition of f.

9.18.2.2. Quadratic Functions#

Theorem 9.268 (Strong convexity of quadratic functions)

Let ASn, bRn and cR. Let f:RnR be given by:

f(x)=12xTAx+bTx+c.

Then f is σ-strongly convex if and only if A is positive definite and σλmin(A).

Proof. Due to Theorem 9.267, f is strongly convex with σ>0 if and only if g(x)=f(x)σ2x2 is convex.

  1. We note that

    g(x)=12xTAx+bTx+cσ2x2=12xT(AσI)x+bTx+c.
  2. As shown in Example 9.40, g is convex if and only if AσIO.

  3. This is equivalent to σλmin(A).

9.18.2.3. Coerciveness#

Theorem 9.269 (Strong convexity and coerciveness)

Assume that the ambient space V is Euclidean. Assume that f:V(,] is a Fréchet-differentiable function. If f is σ-strongly convex, then it is coercive.

Proof. We proceed as follows.

  1. Define

    g(x)=f(x)σ2x2.
  2. By Theorem 9.267, g is convex.

  3. Since f is differentiable, hence g is also differentiable.

  4. Specifically, g(x)=f(x)σx.

  5. Fix some xintdomf.

  6. Then g(x)={g(x)}.

  7. By subgradient inequality, for any yV,

    g(y)g(x)+yx,g(x).
  8. Expanding g and g:

    f(y)σ2y2f(x)σ2x2+yx,f(x)σx.
  9. Let v=f(x)σx.

  10. Rearranging terms

    f(y)σ2y2+y,v+Kx

    where Kx=f(x)σ2x2x,v.

  11. We note that the term Kx depends solely on x which is fixed. Hence Kx is a fixed quantity.

  12. By Cauchy-Schwarz inequality

    y,vvy.
  13. Hence

    f(y)σ2y2vy+Kx.
  14. It is easy to see that, the R.H.S. goes to as y.

  15. Hence f is coercive.

9.18.2.4. Sum Rule#

Theorem 9.270 (Sum of strongly convex and convex functions)

Let f be σ-strongly convex and g be convex. Then f+g is σ-strongly convex.

Proof. Since both f and g are convex, hence their domains are convex. Hence, dom(f+g)=domfdomg is also convex.

We further need to show that f+g satisfies (9.10).

  1. Let x,yV and t(0,1).

  2. Since f is σ-strongly convex, hence

    f(tx+(1t)y)tf(x)+(1t)f(y)σ2t(1t)xy2.
  3. Since g is convex, hence

    g(tx+(1t)y)tg(x)+(1t)g(y).
  4. Then,

    (f+g)(tx+(1t)y)=f(tx+(1t)y)+g((tx+(1t)y))f(tx+(1t)y)tf(x)+(1t)f(y)σ2t(1t)xy2+tg(x)+(1t)g(y)=t(f+g)(x)+(1t)(f+g)(y)σ2t(1t)xy2.
  5. Thus, f+g is also σ-strongly convex.

Example 9.83 (Strong convexity of 12||2+IC)

Let V be a Euclidean space.

  1. The function 12x2 is 1-strongly convex due to Theorem 9.268.

  2. Let C be a convex set.

  3. Then, the indicator function IC is convex.

  4. Due to Theorem 9.270, the function

    g(x)=12x2+IC(x)

    is also 1-strongly convex.

9.18.2.5. First Order Characterization#

Recall that dom(f) denotes the set of points at which f is subdifferentiable.

Theorem 9.271 (First order characterization of strong convexity)

Let f:V(,] be a proper closed and convex function. For a given σ>0, the following statements are equivalent.

  1. f is σ-strongly convex.

  2. For every xdom(f), ydomf and gf(x), the following holds true

    f(y)f(x)+yx,g+σ2yx2.
  3. For any x,ydom(f) and gxf(x), gyf(x), the following holds true:

    xy,gxgyσxy2.

Proof. We shall prove the equivalence of these statements in the following order. (2)(1), (1)(3), (3)(2).

(2) (1)

  1. We assume that (2) is true.

  2. Let x,ydomf and t(0,1).

  3. We need to show that (9.10) holds for f.

  4. Since domf is convex, its relative interior is not empty (see Theorem 9.142).

  5. Let zridomf.

  6. Choose some α(0,1].

  7. Let x~=(1α)x+αz.

  8. By the line segment property (Theorem 9.143), x~ridomf.

  9. Let xt=tx~+(1t)y.

  10. Again, by the line segment property, xtridomf.

  11. Since f is a proper convex function, hence the subdifferential of f at relative interior points is nonempty (Theorem 9.217).

  12. Thus, f(xt) and xtdom(f).

  13. Take some gf(xt).

  14. By hypothesis (2)

    f(x~)f(xt)+x~xt,g+σ2x~xt2.
  15. Substituting xt=tx~+(1t)y, we have x~xt=(1t)(x~y). Thus,

    f(x~)f(xt)+(1t)x~y,g+σ(1t)22x~y2.
  16. Similarly, by hypothesis (2)

    f(y)f(xt)+yxt,g+σ2yxt2.
  17. yxt=ytx~(1t)y=t(yx~).

  18. This gives us,

    f(y)f(xt)+tyx~,g+σt22yx~2.
  19. Multiplying the first inequality by t and the second one by (1t) and adding them together, we get

    tf(x~)+(1t)f(y)f(xt)+σt(1t)2x~y2.
  20. Thus,

    f(tx~+(1t)y)=f(xt)tf(x~)+(1t)f(y)σt(1t)2x~y2.
  21. Expanding x~,

    tx~+(1t)y=t((1α)x+αz)+(1t)y=t(1α)x+(1t)y+tαz.
  22. Define g1(α)=f(tx~+(1t)y)=f(t(1α)x+(1t)y+tαz).

  23. Define g2(α)=f(x~)=f((1α)x+αz).

  24. Substituting these into the previous inequality, we obtain

    g1(α)tg2(α)+(1t)f(y)σt(1t)2(1α)x+αzy2.
  25. The functions g1 and g2 are one dimensional, proper, closed and convex functions.

  26. By Theorem 9.173, both g1 and g2 are continuous on their domain.

  27. Therefore, taking the limit α0+, it follows that

    g1(0)tg2(0)+(1t)f(y)σt(1t)2xy2.
  28. Now g1(0)=f(tx+(1t)y) and g2(0)=f(x).

  29. Thus,

    f(tx+(1t)y)tf(x)+(1t)f(y)σt(1t)2xy2.
  30. This establishes that f is indeed σ-strongly convex.

(1) (3)

  1. We are given that f is σ-strongly convex.

  2. Let x,ydom(f).

  3. Pick any gxf(x) and gyf(y).

  4. Let t[0,1) and denote xt=tx+(1t)y.

  5. By the hypothesis

    f(xt)tf(x)+(1t)f(y)σt(1t)2xy2.
  6. This is same as

    f(xt)f(x)(1t)[f(y)f(x)]σt(1t)2xy2.
  7. We can see that (1t)(0,1].

  8. Dividing both sides of inequality by (1t), we obtain

    f(xt)f(x)1tf(y)f(x)σt2xy2.
  9. Since gxf(x), hence by subgradient inequality

    f(xt)f(x)+xtx,gx.
  10. We can rewrite this as

    f(xt)f(x)1txtx,gx1t.
  11. Note that xtx=(1t)(yx).

  12. Thus,

    f(xt)f(x)1tyx,gx.
  13. Thus,

    yx,gxf(y)f(x)σt2xy2.
  14. This inequality holds for every t[0,1).

  15. Taking the limit t1, we obtain

    yx,gxf(y)f(x)σ2xy2.
  16. An identical reasoning by switching the roles of x and y, gives us

    xy,gyf(x)f(y)σ2yx2.
  17. Adding these two inequalities gives us

    xy,gygxσxy2.
  18. Multiplying both sides by 1 (and switching the inequality accordingly), we get

    xy,gxgyσxy2

    as desired.

(3) (2)

  1. We are given that (3) is satisfied.

  2. Let xdom(f), ydomf and gf(x).

  3. Pick any zridomf.

  4. Pick some α(0,1).

  5. Define y~=(1α)y+αz.

  6. By line segment property y~ridomf.

  7. Define xt=(1t)x+ty~.

  8. Consider the 1D function

    φ(t)=f(xt),t[0,1].
  9. Pick any t(0,1).

  10. Then, by line segment principle xtridomf.

  11. Due to (Theorem 9.217), f(xt) and xtdom(f).

  12. Take some gtf(xt).

  13. By subgradient inequality

    f(z)f(xt)+zxt,gtzV.
  14. In particular, for xs=(1s)x+sy~, we have

    f(xs)f(xt)+(1s)x+sy~(1t)xty~,gtφ(s)φ(t)+(st)(y~x),gtφ(s)φ(t)+(st)y~x,gt.
  15. Since this is valid for every s, hence y~x,gtφ(t).

  16. Applying the mean value theorem (Theorem 9.234)

    f(y~)f(x)=φ(1)φ(0)=01y~x,gtdt.
  17. Since gf(x) and gtf(xt), hence applying the hypothesis (3), we get

    xtx,gtgσxtx2.
  18. But xtx=t(y~x).

  19. Hence

    ty~x,gtgσt2y~x2.
  20. This simplifies to

    y~x,gty~x,g+σty~x2.

    Canceling t on both sides doesn’t change the sign of inequality since t>0.

  21. Applying the inequality to the integral above

    f(y~)f(x)01[y~x,g+σty~x2]dt.
  22. Integrating, we get

    f(y~)f(x)y~x,g+σ2y~x2.
  23. Expanding for y~ for any α(0,1), we have

    f((1α)y+αz)f(x)+(1α)y+αzx,g+σ2(1α)y+αzx2.
  24. The 1D function g(α)=f((1α)y+αz) is continuous again due to Theorem 9.173.

  25. Taking the limit α0+ on both sides, we obtain

    f(y)f(x)+yx,g+σ2yx2

    which is the desired result.

9.18.2.6. Minimization#

Theorem 9.272 (Existence and uniqueness of a a minimizer of closed strongly convex function)

Let f:V(,] be a proper, closed and σ-strongly convex function with σ>0. Then,

  1. f has a unique minimizer adomf such that f(x)>f(a) for every xdomf and xa.

  2. The increase in the value of f w.r.t. its minimum satisfies

    f(x)f(a)σ2xa2

    where adomf is the unique minimizer of f.

Proof. (1) Existence of the minimizer

  1. Since f is proper and convex, hence domf is nonempty and convex.

  2. Since domf is nonempty and convex, hence its relative interior is nonempty (Theorem 9.142).

  3. Pick yridomf.

  4. By Theorem 9.214, f(y) is nonempty.

  5. Pick some gf(y).

  6. Then, by property 2 of Theorem 9.271,

    f(x)f(y)+xy,g+σ2xy2

    holds true for every xV.

  7. Let 2, denote the Euclidean norm associated with the inner product of the space V. This might be different from the endowed norm .

  8. Since all norms in a finite dimensional space are equivalent, hence, there exists a constant C>0 such that

    zCz2

    for every zV.

  9. Therefore,

    f(x)f(y)+xy,g+σC2xy22xV.
  10. This in turn is same as

    f(x)f(y)12Cσg22+Cσ2x(y1Cσg)22xV.
  11. Let St denote the sublevel set {x|f(x)t}.

  12. Consider the sublevel set Sf(y).

  13. Let xSf(y).

  14. Then, f(x)=f(y)r for some r0.

  15. But then

    f(y)rf(y)12Cσg22+Cσ2x(y1Cσg)22.
  16. This simplifies to

    r12Cσg22Cσ2x(y1Cσg)22.
  17. Since r must be nonnegative, hence the R.H.S. must be nonnegative also.

  18. Thus, we require that

    12Cσg22Cσ2x(y1Cσg)22.
  19. This simplifies to

    x(y1Cσg)21Cσg2.
  20. In other words, x must belong to an 2 closed ball given by

    B2[y1Cσg,1Cσg2].
  21. Since this is valid for every xSf(y), hence

    Sf(y)B2[y1Cσg,1Cσg2].
  22. Since f is closed, hence all its sublevel sets are closed.

  23. since Sf(y) is contained in a ball, hence Sf(y) is bounded.

  24. Thus, Sf(y) is closed and bounded.

  25. Since V is finite dimensional, hence Sf(y) is compact.

  26. Sf(y) is also nonempty since ySf(y).

  27. Thus, the problem of minimizing f over domf reduces to the problem of minimizing f over the nonempty compact set Sf(y).

  28. Since f is closed, it is also lower semicontinuous.

  29. By Theorem 3.121, f attains a minimum on Sf(y) at some point aSf(y).

  30. Thus, we have established the existence of a minimizer of f at some aSf(y)domf.

(1) Uniqueness of the minimizer

  1. To show the uniqueness, for contradiction, assume that u and v are two different minimizers of f with f(u)=f(v)=p, the optimal value.

  2. Let w=12u+12v.

  3. We must have f(w)p.

  4. By strong convexity of f,

    f(w)12f(u)+12f(v)σ21212uv2=pσ8uv2.
  5. If uv, then f(w)<p; a contradiction.

  6. Hence, the minimizer must be unique.

(2) Increase in value of f

  1. Let a be the unique minimizer of f.

  2. By Fermat’s optimality condition 0f(a).

  3. Since f is σ-strongly convex, hence by property (2) in the Theorem 9.271,

    f(x)f(a)xa,0+σ2xa2=σ2xa2

    holds true for any xdomf.

9.18.3. Smoothness and Strong Convexity#

9.18.3.1. The Conjugate Correspondence Theorem#

The idea of smoothness and strong convexity is connected. Roughly speaking, a function is strongly convex if and only if its conjugate is smooth.

Theorem 9.273 (Conjugate correspondence theorem)

Let σ>0. Then

  1. If f:VR is a 1σ-smooth convex function, then f is σ-strongly convex w.r.t. the dual norm .

  2. If f:V(,] is a proper, closed σ-strongly convex function, then f:VR is 1σ-smooth.

Proof. (1) Smooth convex to strongly convex conjugate

  1. We are given that f:VR is a 1σ-smooth convex function.

  2. Due to Theorem 9.239, f is closed and convex.

  3. Since f is proper and convex, hence due to Theorem 9.240, f is proper.

  4. Thus f is a proper, closed and convex function.

  5. Pick any y1,y2dom(f).

  6. Let v1f(y1) and v2f(y2).

  7. Since f is proper and convex, hence by conjugate subgradient theorem (Theorem 9.246)

    y1f(v1) and y2f(v2).
  8. Since f is smooth, hence it is differentiable. Hence due to Theorem 9.220,

    y1=f(v1) and y2=f(v2).
  9. Following characterization of smoothness (Theorem 9.265), by its property 4,

    v1v2,y1y2σy1y22.
  10. Since the last inequality holds for any y1,y2dom(f) and any v1f(y1),v2f(v2), hence following the first order characterization of strong convexity in Theorem 9.271, f is a σ-strongly convex function.

(2) Strongly convex to smooth conjugate

  1. We are given that f is proper, closed and σ-strongly convex.

  2. Pick any yV.

  3. The conjugate is given by

    f(y)=supxV{x,yf(y)}.
  4. Define g(x)=f(x)x,y.

  5. We can see that

    f(y)=infxVg(x).
  6. Due to the sum rule (Theorem 9.270), g is σ-strongly convex.

  7. Due to Theorem 9.272, g has a unique minimizer.

  8. Hence f(y) is finite.

  9. Since this is valid for any yV, hence domf=V.

  10. This justifies the signature for f as f:VR being real valued.

  11. Let’s continue with any y.

  12. Since domf=V, hence yintdomf.

  13. Now, by the second formulation of conjugate subgradient theorem (Corollary 9.26),

    f(y)=argmaxxV{x,yf(x)}.
  14. We can see that

    f(y)=argminxVg(x).
  15. Since g has a unique minimizer, hence f(y) is a singleton.

  16. Due to Theorem 9.220, f is differentiable at y.

  17. Since y is arbitrary, hence f is differentiable over entire V.

  18. We now pickup two points y1,y2V and denote v1=f(y1),v2=f(y2).

  19. By conjugate subgradient theorem (Theorem 9.246), this is equivalent to y1f(v1) and y2f(v2).

  20. Following the first order characterization of strong convexity in Theorem 9.271,

    v1v2,y1y2σv1v22.
  21. In other words

    f(y1)f(y2),y1y2σf(y1)f(y2)2.
  22. By generalized Cauchy Schwartz inequality (Theorem 4.108)

    f(y1)f(y2),y1y2f(y1)f(y2)y1y2.
  23. Thus the previous inequality simplifies to

    f(y1)f(y2)1σy1y2.
  24. This establishes that f is 1σ-smooth.

9.18.4. Examples#

Example 9.84 (Smoothness of 1+||22)

Let f:RnR be given by

f(x)=1+x22.

f is 1-smooth w.r.t. the 2-norm.

  1. Note that for any xRn, the gradient is given by

    f(x)=x1+x22.
  2. The Hessian is given by

    f(x)=I1+x22xxT(1+x22)32I1+x22I.
  3. Therefore, λmax(2f(x))1 for every xRn.

  4. Hence, by Corollary 9.27, f is 1-smooth w.r.t. the 2-norm.

9.18.4.1. Log-Sum-Exp#

Example 9.85 (Smoothness of log-sum-exp)

Consider the log-sum-exp function f:RnR given by

f(x)=ln(i=1nexi).

f is 1-smooth w.r.t. 2 and norms.

Smoothness w.r.t. 2 norm

  1. The partial derivatives of f are

    fxi(x)=exik=1nexk.
  2. The second order partial derivatives are

    2fxixj(x)={exiexj(k=1nexk)2,ij;exiexi(k=1nexk)2+exik=1nexk,i=j.
  3. The Hessian can be written as

    2f(x)=diag(w)wwT

    where wi=exik=1nexk.

  4. We can now see that

    2f(x)=diag(w)wwTdiag(w)I.
  5. Hence λmax(2f(x))1 for every xRn.

  6. Hence, by Corollary 9.27, f is 1-smooth w.r.t. the 2-norm.

Smoothness w.r.t. norm

  1. We first show that for any vV

    vT2f(x)vv2.
  2. To see this, we expand the L.H.S. as

    vT2f(x)v=vT(diag(w)wwT)v=vTdiag(w)v(wTv)2vTdiag(w)v=i=1nwivi2v2i=1nwi=v2.
  3. Since f is twice differentiable over Rn, hence by linear approximation theorem (Theorem 5.5), for any x,yRn, there exists z[x,y] such that

    f(y)f(x)=f(x)T(yx)+12(yx)T2f(z)(yx).
  4. Let v=yx.

  5. Then from above,

    (yx)T2f(z)(yx)v2.
  6. Putting this back in the approximation, we have

    f(y)f(x)+f(x)T(yx)+12yx2.
  7. Following characterization of smoothness (Theorem 9.265), f is indeed 1-smooth w.r.t. the -norm.

9.18.4.2. Negative Entropy#

Example 9.86 (Strong convexity of negative entropy over the unit simplex)

Let f:Rn(,] be given by:

f(x){i=1nxilnxixΔn otherwise .

f is 1-strongly convex for both 1 and 2 norms.

  1. By Theorem 9.257, its conjugate is given by

    f(y)=ln(j=1neyj)

    which is the log sum exp function.

  2. By Example 9.85, the log-sum-exp function is 1-smooth w.r.t. both 2 and norms.

  3. Hence by conjugate correspondence theorem Theorem 9.273, f is 1-strongly convex for both 1 and 2 norms.