9.9. Differentiability and Convex Functions#

9.9.1. First Order Conditions#

Let us look at the special case of real valued functions over Rn which are differentiable.

Theorem 9.98 (First order characterization of convexity)

Let f:RnR be a real valued function which is differentiable at each point in domf which is open.

Then f is convex if and only if domf is convex and

(9.4)#f(y)f(x)+f(x)T(yx)

holds true for all x,ydomf.

Proof. To prove (9.4), we first show that a differentiable real function f:RR is convex if and only if

f(y)f(x)+f(x)(yx)

holds true for all x,ydomf.

Assume that f is convex. Hence, domf is convex too.

  1. Let x,ydomf.

  2. Since domf is convex, hence (1t)x+ty=x+t(yx)domf for all t[0,1].

  3. By convexity of f, we have:

    f(x+t(yx))(1t)f(x)+tf(y).
  4. If we divide by t on both sides, we obtain:

    f(y)f(x)+f(x+t(yx))f(x)t.
  5. Taking the limit as t0+, we obtain:

    f(y)f(x)+f(x)(yx).

For the converse, assume that domf is convex and

f(y)f(x)+f(x)(yx)

holds true for all x,ydomf.

  1. Recall that in R the only convex sets are intervals. Thus, domf is an open interval.

  2. Choose any x,ydomf such that xy.

  3. Choose t[0,1].

  4. Let z=tx+(1t)y.

  5. By hypothesis, we have:

    f(x)f(z)+f(z)(xz)

    and

    f(y)f(z)+f(z)(yz).
  6. Multiplying the first inequality with t and second with (1t) and adding them yields:

    tf(x)+(1t)f(y)f(z)=f(tx+(1t)y).
  7. Thus, f is convex.

We now prove for the general case with f:RnR. Recall from Theorem 9.70 that for any x,ydomf the restriction of f on the line passing through x and y is given by:

g(t)=f(ty+(1t)x)=f(x+t(yx)).

Note that, by chain rule (Example 5.11):

g(t)=f(ty+(1t)x)T(yx)

Assume f is convex.

  1. Let x,ydomf such that xy.

  2. Let g be the restriction of f on the line passing through x,y as described above.

  3. Due to Theorem 9.70, g is convex.

  4. By the argument for real functions above:

    g(t)g(t)+g(t)(tt)

    holds true for all t,tdomg.

  5. In particular, with t=1 and t=0, we have:

    g(1)g(0)+g(0).
  6. But g(0)=f(x)T(yx).

  7. Also, g(1)=f(y) and g(0)=f(x).

  8. Thus, we get:

    f(y)f(x)+f(x)T(yx)

    as desired.

For the converse, assume that this inequality holds for all x,ydomf and domf is convex.

  1. Pick some x,ydomf with xy.

  2. Let g be the restriction of f on the line passing through x,y as described above.

  3. Pick t1,t2domg.

  4. Then, z1=t1y+(1t1)x and z2=t2y+(1t2)x are in domf.

  5. Consider g(t1)=f(t1y+(1t1)x)=f(z1) and g(t2)=f(t2y+(121)x)=f(z2).

  6. Note that g(t2)=f(t2y+(1t2)x)T(yx)=f(z2)T(yx).

  7. By hypothesis, we have:

    f(z1)f(z2)+f(z2)T(z1z2).
  8. But z1z2=(t1t2)(yx).

  9. Thus, we get:

    g(t1)g(t2)+g(t2)(t1t2).
  10. This holds for every t1,t2domg.

  11. But then, g is convex by previous argument for real functions.

  12. Since this is valid for every restriction of f to a line passing through its domain, hence by Theorem 9.70 f is convex.

9.9.2. Second Order Conditions#

For functions which are twice differentiable, convexity can be expressed in terms of the positive-semidefiniteness of their Hessian matrices.

We start with a result on convexity of real functions on open intervals.

Theorem 9.99 (Convexity characterization for twice differentiable real functions on open intervals)

Let f:RR be twice continuously differentiable on an open interval (α,β); i.e., second derivative f exists and is continuous at every point the open interval (α,β).

Then, f is convex if and only if its second derivative f is non-negative for every x(α,β):

f(x)0x(α,β).

Proof. Assume that f is nonnegative on (α,β).

  1. Then, f is nondecreasing on (α,β).

  2. For any x,y(α,β) with x<y and r(0,1), let z=(1r)x+ry.

  3. We have z(x,y); i.e. x<z<y. Consequently,

    f(z)f(x)=xzf(t)dtf(z)(zx);f(y)f(z)=zyf(t)dtf(z)(yz).
  4. Since zx=r(yx) and yz=(1r)(yx), we have

    f(z)f(x)+rf(z)(yx);f(z)f(y)(1r)f(z)(yx).

    We wish to eliminate f(z) from these inequalities.

  5. Multiplying the two inequalities by (1r) and r respectively, and adding them together, we obtain:

    (1r)f(z)+rf(z)(1r)f(x)+rf(y).
  6. But (1r)f(z)+rf(z)=f(z)=f((1r)x+ry).

  7. Thus, f((1r)x+ry)(1r)f(x)+rf(y).

  8. This inequality is valid for the case where x>y also.

  9. Thus, f is convex over (α,β).

For the converse, assume that f is not non-negative on (α,β).

  1. Then, since f is continuous in (α,β), hence f is negative in some subinterval (α,β).

  2. Choose x,y such that α<x<y<β. Choose some r(0,1).

  3. Following an argument parallel to above, we have

    f((1r)x+ry)>(1r)f(x)+rf(y).
  4. Thus, there exist x,y(α,β) where the inequality (9.1) is not valid.

  5. Consequently, f is non-convex.

We continue further with real valued functions over Rn which are twice differentiable.

Theorem 9.100 (Second order characterization of convexity in Euclidean spaces)

Let f:RnR be twice continuously differentiable; i.e., its Hessian or second derivative 2f exists at every point in domf which is open.

Then, f is convex if and only if domf is convex and its Hessian is positive semidefinite for every xdomf:

2f(x)Oxdomf.

Proof. The convexity of f on its domain C=domf is equivalent to the convexity of the restriction of f to each line segment in C due to Theorem 9.70.

We first note that if f is convex then C is convex and if C is not convex, then f is not convex. So, for the rest of the argument, we shall assume that C is convex.

Consequently, for any yC and a nonzero zRn the intersection of the line {x=y+tz|tR} and C is an open line segment as C is open and convex.

  1. Let yC.

  2. Let zRn be an arbitrary (nonzero) direction.

  3. Let L={x=y+tz|tR} be a line passing through y in the direction z.

  4. Consider the open real interval S={t|y+tzC}. Since LC is an open line segment in Rn, hence S is indeed an open interval in R.

  5. Consider the parameterized restriction of f on the open interval S as:

    g(t)=f(y+tz),tS.
  6. A simple calculation shows that

    g(t)=z,2f(x)z

    where x=y+tz.

  7. By Theorem 9.99, g is convex for each yC and nonzero zRn if and only if z,2f(x)z0 for every zRn and xC.

  8. Thus, f is convex if and only if 2f(x)OxC.

For real functions, the Hessian is simply the second derivative f.

Corollary 9.6 (Second order characterization of concavity)

Let f:RnR be twice continuously differentiable; i.e., its Hessian or second derivative 2f exists at every point in domf which is open.

Then, f is concave if and only if domf is convex and its Hessian is negative semidefinite for every xdomf:

2f(x)Oxdomf.

Example 9.40 (Convexity of a quadratic function)

Let PSn be a symmetric matrix. Let qRn and rR. Consider the quadratic functional f:RnR given as:

f(x)=12xTPx+qTx+r.

As shown in Example 5.13, the Hessian of f is:

2f(x)=PxRn.

Thus, f is convex if and only if PO (i.e., it is positive semidefinite).

In fact f is strictly convex if and only if PsuccO.

Example 9.41 (Identity is convex and concave)

Let f:RR be:

f(x)=x.

We have f(x)=1 and f(x)=0.

f is both convex and concave.

Example 9.42 (Exponential is convex)

Let f:RR be:

f(x)=eax

with domf=R.

We have f(x)=aeax and f(x)=a2eax.

For any a,xR, a2eax>0. Thus, f is strictly convex.

Example 9.43 (Powers)

Let f:RR be:

f(x)=xa

with domf=R++.

Now, f(x)=axa1 and f(x)=a(a1)xa2.

  1. We have x>0.

  2. For a1, f(x)0. f is convex for a1.

  3. For a0, a(a1)0. Thus, f(x)0. f is convex for a0.

  4. For 0a1, a(a1)0. Thus, f(x)0. f is concave on 0a1.

Example 9.44 (Reciprocal powers)

Let f:RR be:

f(x)=1xr=xr.

with domf=R++.

Now, f(x)=(r)xr1 and f(x)=(r)(r1)xr2=r(r+1)x(r+2).

  1. We have x>0.

  2. For r0, f(x)0. f is convex for r0.

Example 9.45 (Logarithm is concave)

Let f:RR be:

f(x)=lnx.

with domf=R++.

Now, f(x)=1x and f(x)=1x2.

  1. f(x)<0 for all x>0.

  2. Thus, f is concave for all x>0.

Example 9.46 (Negative entropy is convex)

Let f:RR be:

f(x)=xlnx.

with domf=R++.

Now, f(x)=lnx+1 and f(x)=1x.

  1. f(x)>0 for all x>0.

  2. Thus, f is convex for all x>0.

Example 9.47 (Quadratic over linear form is convex)

Let f:R×RR be given by:

f(x,y)=x2y

with domf={(x,y)|y>0}.

From Example 5.16, the Hessian is:

2f(x,y)=2y3[y2xyxyx2]=2y3[yx][yx]T.

Recall that for any xRn, the matrix xxT is positive semi-definite. Hence,

[yx][yx]T

is positive semi-definite.

For y>0, 2y3>0. Combining:

2f(x,y)O.

Thus, f is convex.

Example 9.48 (Log sum exponential is convex)

Let f:RnR be given by:

f(x)=ln(i=1nexi)

with domf=Rn.

From Example 5.14, we have

2f(x)=1(1Tz)2((1Tz)diag(z)zzT)

where

z=[ex1exn].

To show that 2f(x) is p.s.d., it suffices to show that (1Tz)diag(z)zzT is p.s.d..

Now for any vRn.

vT((1Tz)diag(z)zzT)v=(1Tz)(vTdiag(z)v)vTzzTv=(1Tz)(vTdiag(z)v)(vTz)2=(i=1nzi)(i=1nvi2zi)(i=1nvizi)2.

If we define vectors a and b with ai=vizi and bi=zi, then by Cauchy-Schwartz inequality , we have:

(aTa)(bTb)(aTb)2(aTa)(bTb)(aTb)20.

But this is exactly the expression above. Thus, 2f(x)O.

Hence, f is convex.

Example 9.49 (Log determinant function is concave)

Let f:SnR be:

f(X)=logdetX.

with domf=S++n (the set of symmetric positive definite matrices).

Let any line in Sn be given by:

X=Z+tV

where Z,VSn.

Consider the restriction of f on a line:

g(t)=logdet(Z+tV)

to the interval of values where Z+tVsuccO (since domf=S++n ). In other words,

domg={tR|Z+tVsuccO}.

Without any loss of generality, we can assume that t=0domg; i.e. ZsuccO.

Recall that:

  1. det(AB)=det(A)det(B) for square matrices.

  2. det(A)=i=1nλi for symmetric matrices with λi being their eigen values.

  3. If λi are eigen values of A, then the eigen values of I+tA are 1+tλi.

Now

g(t)=logdet(Z+tV)=logdet(Z12(Z12+tZ12V))=logdet(Z12(I+tZ12VZ12)Z12)=logdet(Z12)+logdet(I+tZ12VZ12)+logdet(Z12)=logdet(Z)+logdet(I+tZ12VZ12).
  1. Let λi be the eigen values of Z12VZ12.

  2. Then, 1+tλi are eigen values of I+tZ12VZ12.

  3. Thus, logdet(I+tZ12VZ12)=i=1nlogdet(1+tλi).

Thus,

g(t)=i=1nlogdet(1+tλi)+logdet(Z).

Note that logdet(Z) doesn’t depend on t. Similarly, λi only depend on Z and V, hence they don’t depend on t.

Differentiating g w.r.t. t, we get:

g(t)=i=1nλi1+tλi.

Differentiating again, we get:

g(t)=i=1nλi2(1+tλi)2.

Since g(t)0, hence f is concave.