9.16. Subgradients#

Primary references for this section are [7, 9].

Throughout this section, we assume that V,W are finite dimensional real vector spaces. Wherever necessary, they are equipped with a norm or an real inner product ,. They are also equipped with a metric d(x,y)=xy as needed.

9.16.1. Subgradients#

Definition 9.72 (Subgradient)

Let f:V(,] be a proper function. Let xdomf. A vector gV is called a subgradient of f at x if

(9.6)#f(y)f(x)+yx,gyV.

This inequality is known as the subgradient inequality.

Here, we assume that f is an extended value extension whenever required; i.e., f(x)= for all xdomf.

As discussed in Theorem 4.106, the vector spaces V and V are isomorphic. Therefore, we follow the convention that both V and V have exactly the same elements. The primary difference between V and V comes from the computation of norm. If V is endowed with a norm then V is endowed with a dual norm .

In the arguments below B[a,r] or B[a,r] denotes the closed ball of radius r in the normed space (V,). The closed ball of radius r in the dual space (V,) shall be denoted by B[a,r] or B[a,r]. Open balls shall be denoted similarly.

Observation 9.9 (Global affine underestimator)

If g is a subgradient of f at some xdomf, then the affine function a:VR given by:

a(y)=f(x)+yx,g=y,g+f(x)x,gyV

is a global affine underestimator for f. Note that the term f(x)x,g is a constant since xV and gV are fixed. This comes from the subgradient inequality:

f(y)a(y)yV.

Observation 9.10 (Subgradient inequality alternative form)

For ydomf, f(y)=. Thus, the subgradient inequality is trivially satisfied for all ydomf. An alternate form of the inequality is:

(9.7)#f(y)f(x)+yx,gydomf.

9.16.1.1. Geometric Interpretation#

Observation 9.11 (Subgradient and supporting hyperplane)

Let f:V(,] be a proper function. Then g be a subgradient of f at x if and only if epif has a supporting hyperplane at (x,f(x)) with a normal (g,1).

Let H be a supporting hyperplane of epif at (x,f(x)) with the normal (g,1).

  1. Then

    H={(y,t)|y,g+t=x,g+f(x)}.
  2. For any (y,f(y))epif, we must have

    y,g+f(y)x,g+f(x)f(y)f(x)+yx,g.
  3. Then g is a subgradient of f at x.

Now let g be a subgradient of f at x.

  1. Let (y,t)epif.

  2. Then we have

    tf(y)f(x)+yx,g.
  3. Rearranging the terms, we have

    y,g+tx,g+f(x)

    for every (y,t)epif.

  4. Then the hyperplane

    H={(y,t)|y,g+t=x,g+f(x)}

    is indeed a supporting hyperplane for epif.

  5. The normal vector for this hyperplane is (g,1) and it passes through the point (x,f(x)).

9.16.2. Subdifferential#

At a point xdomf, it is possible that there are more than one subgradients. It is thus natural to introduce the notion of the set of all subgradients of f at a specific point xdomf.

Definition 9.73 (Subdifferential set)

Let f:V(,] be a proper function. The set of all subgradients of f at a point xdomf is called the subdifferential of f at x and is denoted by f(x).

f(x){gV|f(y)f(x)+yx,gyV}.

For all xdomf, we define f(x)=.

Theorem 9.210 (Subdifferential of norm at origin)

Let f:VR by given by:

f(x)=x

where is the norm endowed on V. Then, the subdifferential of f at x=0 is given by the dual norm unit ball:

f(0)=B[0,1]={gV|g1}.

Proof. gf(0) if and only if

f(y)f(0)+y0,gyV.

This reduces to:

yy,gyV.

Maximizing both sides of this inequality over the set {y|y1}, we obtain:

g=supy1{y,g}supy1y=1.

Thus, g1 is a necessary condition.

We now show that g1 is sufficient too. By Generalized Cauchy Schwartz inequality:

y,gygyyV.

Thus, if g1, then g is a subgradient.

Thus, the vectors that satisfy the subgradient inequality are exactly the same as those in B[0,1].

The subdifferential of a function f may be empty at specific points xV.

Definition 9.74 (Subdifferentiability)

A proper function f:V(,] is called subdifferentiable at some xdomf if f(x).

Definition 9.75 (Domain of subdifferentiability)

The set of points at which a proper function f:V(,] is subdifferentiable, denoted by dom(f), is defined as:

dom(f){xV|f(x)}.

9.16.2.1. Closedness and Convexity#

Theorem 9.211 (Closedness and convexity of the subdifferential set)

Let f:V(,] be a proper function. Then the set f(x) is closed and convex for any xV.

Proof. Let xV be fixed. For any yV, define the set

Hy={gV|yx,gf(y)f(x)}.

Note that Hy is a closed half space in V.

It is easy to see that

f(x)=yVHy.
gf(x)f(y)f(x)+yx,gyVf(y)f(x)yx,gyVyx,gf(y)f(x)yVgHyyVgyVHy.

Thus, f(x) is an infinite intersection of closed and convex sets. Hence f(x) is closed and convex.

9.16.2.2. Subdifferentiability and Convex Domain#

Theorem 9.212 (Subdifferentiability + Convex domain Convexity)

Let f:V(,] be a proper function. Assume that domf is convex. If f is subdifferentiable at every xdomf, then f is convex.

In other words:

xdomf,f(x)f is convex.

Proof. Let x,ydomf. Let t[0,1]. Let z=(1t)x+ty.

  1. Since domf is convex, hence zdomf.

  2. By hypothesis, f is subdifferentiable at z.

  3. Thus, there exists gf(z).

  4. By subgradient inequality (9.7)

    f(y)f(z)+yz,g=f(z)+(1t)yx,gf(x)f(z)+xz,g=f(z)tyx,g.
  5. Multiplying the first inequality by t, second by (1t) and adding, we get:

    tf(y)+(1t)f(x)f(z).
  6. Thus,

    f((1t)x+ty)=f(z)tf(y)+(1t)f(x)

    holds true for any x,ydomf and any t[0,1].

  7. Thus, f is convex.

A convex function need not be subdifferentiable at every point in its domain. The problem usually occurs at the boundary points of the domain if the domain is not open.

9.16.2.3. Positive Scaling#

Theorem 9.213 (Multiplication by a positive scalar)

Let f:V(,] be a proper function. Let xdomf. For any α>0,

(αf)(x)=αf(x).

Proof. Let gf(x).

  1. By subgradient inequality (9.7)

    f(y)f(x)+yx,gydomf.
  2. Multiplying by α, we get:

    (αf)(y)(αf)(x)+yx,αgydom(αf).
  3. Thus, αg(αf)(x).

  4. Thus, αf(x)(αf)(x).

  5. It is easy to see the same argument backwards to show that

    (αf)(x)=αf(x).

9.16.3. Proper Convex Functions#

In this subsection, we discuss the properties of the subdifferential sets for proper convex functions.

A proper convex function may not be subdifferentiable at every point in its domain. However, it is indeed subdifferentiable at the interior points and relative interior points of its domain.

9.16.3.1. Nonemptiness and Boundedness at Interior Points#

Theorem 9.214 (Nonemptiness and boundedness of the subdifferential at interior points)

Let f:V(,] be a proper convex function with S=domf. Let aintS. Then, f(a) is nonempty and bounded.

In other words, for a proper convex function, the subdifferential at the interior points of its domain is nonempty and bounded. We have

intdomfdom(f).

Proof. Outline of the proof

  1. Identify a supporting hyperplane for the epigraph of f at (a,f(a).

  2. Make use of the local Lipschitz continuity of the convex function at its interior points.

  3. Show that the normal to the supporting hyperplane leads to a subgradient at a.

  4. Show that the subgradients are bounded by using the local Lipschitz continuity inequality and the subgradient inequality.

Consider the direct sum vector space VR.

  1. epifVR.

  2. Since f is convex, hence epif is convex.

  3. For some aintS, consider the point (a,f(a))VR.

  4. Since f is convex, hence (a,f(a))bdepif.

  5. By supporting hyperplane theorem, there exists a vector (p,α)VR such that

    x,ptαa,pf(a)α(x,t)epif.
  6. We shall show that α>0 must hold true and g=pα is indeed a subgradient at a.

  7. We note that, (a,f(a)+1)epif. Putting it in,

    a,p(f(a)+1)αa,pαf(a)α0α0.

    Thus, α0.

  8. Recall from Theorem 9.174 that f is locally Lipschitz continuous at aintdomf.

  9. Thus, there exists r>0 and L>0 such that B[a,r]S and

    |f(x)f(a)|LxaxB[a,r].
  10. Since B[a,r]S, hence (x,f(x))epif for every xB[a,r].

  11. Plugging t=f(x) in the supporting hyperplane inequality, we get

    x,pf(x)αa,pf(a)αxB[a,r].
  12. Rearranging the terms,

    xa,pα(f(x)f(a))xB[a,r].
  13. Using the local Lipschitz property,

    xa,pαLxaxB[a,r].
  14. Recall that the dual norm for pV is given by

    p=sup{|x,p||xV,x1}.
  15. Let pV with p=1 be the vector at which the supremum is attained.

  16. Then, p=p,p (since V is real).

  17. Since p is a unit vector, hence a+rpB[a,r].

  18. Plugging x=a+rp in the inequality above, we get

    rp,pαLrpxB[a,r].
  19. Simplifying

    rpαLrxB[a,r].
  20. This means that α>0 must be true.

    1. If α=0, then this inequality would require p=0.

    2. But (p,α) is a nonzero vector describing the supporting hyperplane.

  21. Going back to the supporting hyperplane inequality and putting t=f(x), we have

    x,pf(x)αa,pf(a)αxS.
  22. Rearranging the terms, we get

    α(f(x)f(a))xa,pxS.
  23. Letting g=1αp and dividing on both sides by α (which is positive), we obtain

    f(x)f(a)xa,gxS.
  24. Rearranging again

    f(x)f(a)+xa,gxS

    which is the subgradient inequality.

  25. Thus, gf(a).

  26. Thus, f(a) is nonempty.

We next show the boundedness of f(a).

  1. Let gf(a).

  2. Let gV such that g=1 and

    g=g,g.
  3. Let x=a+rg.

  4. Applying the subgradient inequality on x, we get:

    f(x)f(a)+rg,g=f(a)+rg.
  5. Thus,

    rgf(x)f(a)Lxa=Lrg=Lr.
  6. Thus, gL for every gf(a).

  7. Thus, f(a) is bounded.

If f is a proper convex function, then the only points at which f may not be subdifferentiable (i.e. the subdifferential set is empty) are the points at the frontier of domf (i.e., domfintdomf). f may be subdifferentiable on the frontier points too.

Corollary 9.19 (Subdifferentiability of real valued convex functions)

Let f:VR be a convex function. Then, f is subdifferentiable over V.

dom(f)=V.

Proof. We have domf=V.

  1. Let xV.

  2. V is open in (V,).

  3. Thus, xintV=domf.

  4. By Theorem 9.214, f(x) is nonempty and bounded as xintdomf.

  5. Hence, f is subdifferentiable at x.

  6. Since this is valid for every xV, hence dom(f)=V.

9.16.3.2. Nonempty, Convex and Compact Subdifferentials#

Theorem 9.215 (Nonempty, convex and compact subdifferentials for proper convex functions)

Let f:V(,] be a proper and convex function. Let xintS. Then f(x) is nonempty, convex and compact.

Proof. Let xintS.

  1. By Theorem 9.211, f(x) is closed and convex.

  2. By Theorem 9.214, f(x) is nonempty and bounded.

  3. Since f(x) is closed and bounded, hence it must be compact since V is a finite dimensional normed linear space.

We present an alternative proof based on the min common/max crossing framework developed in Min Common/Max Crossing Duality. This proof can be skipped in first reading.

Proof. This proof is based on the second min common/max crossing theorem Theorem 10.34.

We fix some xintS. We construct the set M as

M={(u,t)|uV,f(x+u)t}.

We first consider the min common problem.

  1. Note that (0,f(x))M since f(x+0)f(x).

  2. Further see that the min common value

    p=inf(0,p)Mp=f(x).
  3. Hence p is finite.

  4. Note that M=M where

    M={(x,t)VR| there exists t¯R with t¯t and (x,t¯)M}.
  5. M is convex.

    1. Let (u1,t1),(u2,t2)M and let r(0,1).

    2. Let (u,t)=r(u1,t1)+(1r)(u2,t2).

    3. We have f(x+u1)t1 and f(x+u2)t2.

    4. Now

      f(x+u)=f(x+ru1+(1r)u2)=f(r(x+u1)+(1r)(x+u2))rf(x+u1)+(1r)f(x+u2)rt1+(1r)t2=t.
    5. Hence (u,t)M.

    6. Hence M is convex.

Next consider the max crossing problem and strong duality.

  1. We have

    q=supaVq(a)

    where

    q(a)=inf(u,t)M{u,a+t}.
  2. The set of optimal solutions of the max crossing problem is given by

    Q={aV|q(a)=q}
  3. For some aQ, we can attain strong duality with

    p=q=q(a)

    if and only if

    f(x)=inf(u,t)M{u,a+t}.
  4. Equivalently

    f(x)f(x+u)+u,auV.
  5. Equivalently

    f(x+u)f(x)+u,auV.
  6. But this is nothing but the subgradient inequality with a as the subgradient at x.

  7. In other words, strong duality is attained at a as a solution of the max crossing problem if and only if a is a subgradient of f.

  8. Hence Q with strong duality is given by f(x).

We next establish the conditions for the second min common/max crossing theorem Theorem 10.34

  1. Consider the set

    D={uV| there exists tR with (u,t)M}.
  2. It is easy to see that D=Sx.

    1. Consider the set T=Sx.

    2. Let uT.

    3. Then x+uS.

    4. Hence f(x+u)f(x+u).

    5. Hence (u,f(x+u))M.

    6. Hence uD.

    7. Hence T=SxD.

    8. Let uT.

    9. Then u+xS.

    10. Hence f(u+x)=.

    11. Hence for every tR, f(u+x)>t.

    12. Hence (u,t)M for every tR.

    13. Hence uD.

    14. Hence D=T=Sx.

  3. Since xintS, hence 0intD.

  4. We see that all the conditions of the second min common/max crossing theorem are satisfied.

    1. p is finite.

    2. M=M is convex.

    3. The set D contains 0 in its interior.

  5. Hence f(x) is nonempty, convex and compact.

  6. Hence f(x) is also nonempty, convex and compact.

9.16.3.3. Subgradients over a Compact Set#

Theorem 9.216 (Subgradients over a compact set are nonempty and bounded)

Let f:V(,] be a proper convex function with S=domf. Let AintS be a nonempty and compact subset of the interior of the domain of f. Then, the set of subgradients over A given by

Y=xAf(x)

is nonempty and bounded.

Proof. We are given that A is nonempty and compact subset of interior of domain of f.

  1. For any xA, by Theorem 9.214, f(x) is nonempty and bounded.

  2. Thus, Y is nonempty.

We next prove that Y must be bounded also.

  1. Let T=V(intS).

  2. T is closed and A is closed. AintS. Hence, AT=.

  3. Since A is compact and T is closed and AT=, hence distance between A and T is nonzero due to Theorem 3.128.

    r=d(A,T)>0.
  4. Thus,

    xyrxA,yintS.
  5. Let s=r2.

  6. Let D=B[0,s]. D is a closed and bounded set. Hence, it is compact due to Theorem 4.66.

  7. Let E=A+D. Then EintS.

    1. Let yE.

    2. Then, there is xA and vD such that y=x+v.

    3. Thus, yx=v.

    4. Hence yxs<r.

  8. Since both A and D are compact, hence E is compact due to Theorem 4.67.

  9. By Theorem 9.174, f is local Lipschitz continuous at every xE since xEintS.

  10. Then, by Theorem 3.79, f is Lipschitz continuous on E.

  11. Thus, there exists L>0 such that

    |f(x)f(y)|Lxyx,yE.
  12. Let gY. Then, there is xA such that gf(x).

  13. we can choose gV such that

    g=g,g and g=1.
  14. Now, let y=x+sg. Then, yE.

    1. yx=sg=s.

    2. Thus, sgD.

    3. Thus, yE since xA.

  15. Also, xE since x=x+0 and 0D.

  16. Consequently, by Lipschitz continuity

    |f(y)f(x)|Lyx=Ls.
  17. By subgradient inequality at x

    f(y)f(x)yx,g=sg,g=sg.
  18. Using the Lipschitz bound above, we get

    sgLs.
  19. Thus, gL.

  20. Since g was chosen arbitrarily, hence Y is bounded.

We recall from Definition 9.56 that the relative interior of a convex set is given by

riC={xC|r>0,B(x,r)affCC}.

It is interior of the convex set w.r.t. the subspace topology of its affine hull.

9.16.3.4. Nonempty Subdifferential at Relative Interior Points#

Theorem 9.217 (Nonemptiness of the subdifferential at relative interior points)

Let f:V(,] be a proper convex function with S=domf. Let xriS. Then, f(x) is nonempty and has the form

f(x)=L+G

where L is the subspace that is parallel to affS and G is a nonempty and compact set.

In other words, for a proper convex function, the subdifferential at the relative interior points of its domain is nonempty. We have

ridomfdom(f).

The proof is based on the min common/max crossing framework developed in Min Common/Max Crossing Duality. It can be skipped in first reading. It follows the proof of Theorem 9.215.

Proof. This proof is based on the second min common/max crossing theorem Theorem 10.34.

We fix some xriS. We construct the set M as

M={(u,t)|uV,f(x+u)t}.

We have already established the following in the proof of Theorem 9.215:

  1. M is convex.

  2. M=M.

  3. p=f(x).

  4. p is finite.

  5. q=supaVq(a) where

    q(a)=inf(u,t)M{u,a+t}.
  6. Q={aV|q(a)=q}.

  7. When the strong duality holds, then

    Q=f(x).
  8. The set

    D={uV| there exists tR with (u,t)M}=Sx.

Continuing further

  1. Since xriS, hence 0riD.

  2. Hence affD=affSx=L.

  3. Hence by the second min common/max crossing theorem

    f(x)=Q=L+Q~

    where Q~ is a nonempty, convex and compact set.

  4. Negating on both sides, we obtain

    f(x)=L+G

    where G is a nonempty, convex and compact set.

Corollary 9.20 (Existence of points with nonempty subdifferential)

Let f:V(,] be a proper convex function with S=domf. Then, there exists xS such that f(x) is nonempty.

Proof. The effective domain of a proper convex function is convex and nonempty.

  1. By Theorem 9.142, the relative interior of S=domf is nonempty.

  2. Thus, there exists xS.

  3. By Theorem 9.217, f(x) is nonempty.

9.16.3.5. Unbounded Subdifferential#

Theorem 9.218 (Unboundedness condition for subdifferential)

Let f:V(,] be a proper convex function with S=domf.

Assume that dimS<dimV. Let xS. If f(x), then f(x) is unbounded.

Proof. We proceed as follows.

  1. Let n=dimV.

  2. Let A=affS. A is an affine set.

  3. We have xSA.

  4. Then, W=Ax is the subspace parallel to A.

  5. Accordingly, m=dimW<dimV=n.

  6. Then, the orthogonal complement of W is a nontrivial subspace with dimension nm.

  7. Let vW be a nonzero vector.

  8. Then, w,v=0 for every wW.

  9. Now let gf(x) be an arbitrary subgradient at x.

  10. By subgradient inequality

    f(y)f(x)+yx,gyS.
  11. Note that both xS and yS.

  12. Hence, yxW.

  13. Thus, yx,v=0.

  14. But then, for any αR,

    yx,(g+αv)=yx,g+αyx,v=yx,g.
  15. Thus, if gf(x), then g+αvf(x) for every αR.

  16. Thus, f(x) is unbounded.

9.16.4. Directional Derivatives#

The directional derivative of a proper convex function is closely linked with its subdifferential. To see this, let xintdomf, let dV be a nonzero direction and t>0. Let gf(x) and consider the subgradient inequality

f(x+td)f(x)+td,g.

Hence

f(x+td)f(x)td,g

We saw in Observation 9.8 that f(x+td)f(x)t is a nondecreasing quantity and

f(x;d)=inft>0f(x+td)f(x)t.

This establishes the basic relation

(9.8)#f(x;d)d,g

for every gf(x). In fact a stronger result is available in the form of max formula.

9.16.4.1. Max Formula#

The max formula is one of the key results in this section. It connects subgradients with directional derivatives.

Theorem 9.219 (Max formula)

Let f:V(,] be a proper convex function with S=domf. Then for any xintS and dV,

f(x;d)=sup{d,g|gf(x)}.

In words, the directional derivative is the supremum of the inner product of the subgradients with the direction.

Proof. Let xintS and dV.

  1. Let t>0. Then, by subgradient inequality

    f(x+td)f(x)td,ggf(x).
  2. Thus,

    f(x+td)f(x)td,ggf(x).
  3. Taking the limit

    f(x;d)=limt0+f(x+td)f(x)tlimt0+d,g=d,g

    for every gf(x).

  4. Taking the supremum over f(x) on the R.H.S., we obtain

    f(x;d)sup{d,g|gf(x)}.

We now show that the inequality is indeed an equality.

  1. Let h:VR be given by

    h(v)=f(x;v).
  2. By Theorem 9.206, h is a real valued convex function and nonnegative homogeneous.

  3. By Corollary 9.19, h is subdifferentiable everywhere in V.

  4. In particular, h is subdifferentiable at d.

  5. Let gh(d).

  6. For any vV and t0,

    tf(x;v)=th(v)=h(tv)

    since h is nonnegative homogeneous.

  7. By subdifferential inequality

    tf(x;v)=h(tv)h(d)+tvd,g=f(x;d)+tvd,g.
  8. Rearranging the terms,

    t(f(x;v)v,g)f(x;d)d,g.
  9. Since this inequality is valid for every t0, hence the term f(x;v)v,g must be nonnegative. Otherwise, the inequality will be invalided for large enough t. Thus,

    f(x;v)v,g.
  10. By Theorem 9.207, for any yS,

    f(y)f(x)+f(x;yx).
  11. From previous inequality,

    f(x;yx)yx,g.
  12. Thus, for any yS,

    f(y)f(x)+yx,g.
  13. But this is a subgradient inequality. Hence, gf(x).

  14. Taking t=0, in the subgradient inequality for h,

    0f(x;d)d,g.
  15. Thus, there exists gf(x) such that

    f(x;d)d,g.
  16. Consequently,

    f(x;d)sup{d,g|gf(x)}.

Combining the two inequalities, we obtain the max formula:

f(x;d)=sup{d,g|gf(x)}.

Recall from Definition 9.51 that support function for a set C is given by

σC(x)=sup{x,y|yC}.

Corollary 9.21 (Max formula as a support function)

The max formula can be written as

f(x;d)=σf(x)(d).

9.16.5. Differentiability#

9.16.5.1. Subdifferential and gradient#

Theorem 9.220 (Subdifferential at points of differentiability)

Let f:V(,] be a proper convex function with S=domf. Let xintS.

Then f is differentiable at x if and only if

f(x)={f(x)}.

In other words, if f is differentiable at x then its subdifferential is a singleton set consisting of the gradient and if the subdifferential at x is a singleton, then f is differentiable at x.

Proof. Assume that f is differentiable at x.

  1. Let dV be some direction.

  2. By Theorem 9.203,

    f(x;d)=d,f(x).
  3. By Theorem 9.214, f is subdifferentiable at x since f is convex and xintS.

  4. Let gf(x).

  5. By the max formula (Theorem 9.219):

    f(x;d)d,g

    as the directional derivative is the supremum of the inner product of the subgradients with the direction.

  6. Thus,

    d,f(x)d,g.
  7. In turn,

    d,gf(x)0.

    This holds for every dV.

  8. By the definition of dual norm

    gf(x)=supd1{d,gf(x)}.
  9. Using the previous inequality

    gf(x)0.
  10. Since, dual norm is a norm, hence it cannot be negative. Thus,

    gf(x)=0
  11. Moreover, due to positive definiteness of a norm

    gf(x)=0

    must hold true.

  12. Thus, g=f(x).

  13. In other words, if g is a subgradient to f at x, then it must equal f(x).

  14. Thus, the only subgradient for f at x is f(x).

  15. Thus,

    f(x)={f(x)}.

For the converse, assume that f is subdifferentiable at x with f(x)={g}.

  1. By the subgradient inequality

    f(x+u)f(x)+u,guV.
  2. Thus,

    f(x+u)f(x)u,g0uV.
  3. Define a function h:V(,] as

    h(u)f(x+u)f(x)u,g.
  4. We list some properties of h.

    1. By definition h(u)0 for every uV.

    2. h is a convex function since f(x+u) is convex, u,g is linear and f(x) is a constant (w.r.t. the variable u).

    3. domh=domfx=Sx.

    4. Thus, since xintS, hence 0=xxintdomh.

    5. h(0)=f(x)f(x)0,g=0.

  5. If we are able to show that

    limu0h(u)u=0

    then, by the definition of gradient (Definition 9.71),

    g=f(x).
  6. We can easily show that h(0)={0}.

    1. If g~ is a subgradient of h at 0, then by subgradient inequality

      h(u)h(0)+u,g~=u,g~uV.
    2. Then, g~=0 satisfies this inequality since h(u)0 by definition.

    3. For contradiction, assume a nonzero g~ can satisfy this inequality.

    4. Then,

      h(u)u,g~f(x+u)f(x)u,gu,g~f(x+u)f(x)+u,g~+gg~+gf(x).
    5. This contradicts the hypothesis that the subgradient of f at x is {g}.

    6. Thus, h(0)={0}.

  7. Then, max formula (Theorem 9.219):

    h(0;d)=σh(0)(d)=d,0=0.
  8. Thus, from the definition of directional derivatives

    0=h(0;d)=limα0+h(αd)h(0)α=limα0+h(αd)α.
  9. Let us now introduce an orthonormal basis for V as {e1,,en}.

  10. Assume that V has been equipped with various p norms as described in Remark 9.1.

  11. Since 0intdomh, there exists r(0,1) such that

    B1[0,r]domh.
  12. It is a cross polytope of radius r with 2n vertices given by {±rei}i=1n.

    B1[0,r]=conv{±rei}i=1n.
  13. Let us denote these 2n vectors as w1,,w2n.

  14. By Remark 9.1

    B[0,s]=B2[0,s]B1[0,r]

    where s=rn.

  15. Let uB[0,s2] be a nonzero vector.

  16. Since r<1, hence s<1, hence s2<s.

  17. Let v=suu.

  18. Then, vB[0,s]B1[0,r].

  19. Thus, vconv{wi}i=1n.

  20. Thus, there exists tΔ2n such that

    suu=v=i=12ntiwi.
  21. Then,

    h(u)u=h(ussuu)u=h(i=12ntiuswi)uconvex combinationi=12ntih(uwis)uh is convex and tΔ2nmaxi=1,,2n{h(uwis)u}since ti=1.

    Note that uwisB[0,s]B1[0,r]domh.

  22. Now,

    limu0h(uwis)u=limu0h(uwis)u=limα0+h(αwis)α=0.
  23. Thus,

    limu0h(u)u=0.
  24. Thus, g=f(x) as desired.

9.16.6. Subdifferential Calculus#

9.16.6.1. Function Sums#

Theorem 9.221 (Subdifferential subadditivity with sum of functions)

Let f1,f2:V(,] be proper functions with S1=domf1 and S2=domf2. For any xS1S2

f1(x)+f2(x)(f1+f2)(x).

Proof. Let f=f1+f2. We note that domf=dom(f1+f2)=domf1domf2=S1S2.

  1. Let xS1S2.

  2. Let gf1(x)+f2(x).

  3. Then, there exist g1f1(x) and g2f2(x) such that g=g1+g2.

  4. Then, by subgradient inequality, for any yS1S2

    f1(y)f1(x)+yx,g1,f2(y)f2(x)+yx,g2.
  5. Summing the two inequalities, we get

    f1(y)+f2(y)f1(x)+f2(x)+yx,g1+g2.
  6. Rewriting, for every ydomf

    (f1+f2)(y)(f1+f2)(x)+yx,g.
  7. Thus, g=g1+g2(f1+f2)(x) = \partial f (\bx).

  8. Thus, f1(x)+f2(x)f(x).

We can generalize this result for a finite sum of functions using simple mathematical induction.

Corollary 9.22 (Weak sum rule of subdifferential calculus)

Let f1,,fm:V(,] be proper functions. For any xi=1mdomfi

i=1mfi(x)(i=1mfi)(x).

Theorem 9.222 (Subdifferential additivity with sum of convex functions)

Let f1,f2:V(,] be proper convex functions with S1=domf1 and S2=domf2. For any xintS1intS2

(f1+f2)(x)=f1(x)+f2(x).

Proof. With f=f1+f2, by Theorem 3.22,

intdomf=int(S1S2)=intS1intS2.
  1. Let xintS1intS2.

  2. Thus, xintdomf.

  3. By max formula, for any dV,

    f(x;d)=sup{d,g|gf(x)}=σf(x)(d).
  4. Since directional derivative is additive, hence

    f(x;d)=f1(x;d)+f2(x;d).
  5. Expanding on this

    σf(x)(d)=f(x;d)=f1(x;d)+f2(x;d)=sup{d,g1|g1f1(x)}+sup{d,g1|g2f2(x)}=sup{d,g1+g2|g1f1(x),g2f2(x)}=sup{d,g|gf1(x)+f2(x)}=σf1(x)+f2(x)(d).
  6. In summary, for every dV,

    σf(x)(d)=σf1(x)+f2(x)(d).
  7. By Theorem 9.211, f(x), f1(x) and f2(x) are closed and convex.

  8. By Theorem 9.214, f(x), f1(x) and f2(x) are nonempty and bounded.

  9. Since f1(x) and f2(x) are closed and bounded, hence they are compact (V is finite dimensional).

  10. Thus, f1(x)+f2(x) is also closed, bounded, convex and nonempty.

  11. Thus, both f(x) and f1(x)+f2(x) are nonempty, convex and closed.

  12. Then, due to Theorem 9.90,

    f(x)=f1(x)+f2(x).

We can generalize this result for a finite sum of proper convex functions using simple mathematical induction.

Corollary 9.23 (Sum rule of subdifferential calculus for proper convex functions at interior points)

Let f1,,fm:V(,] be proper convex functions. For any xi=1mintdomfi

i=1mfi(x)=(i=1mfi)(x).

For real valued convex functions, the domain is the entire V and interior of V is V itself.

Corollary 9.24 (Sum rule of subdifferential calculus for real valued convex functions)

Let f1,,fm:VR be real valued convex functions. For any xV

i=1mfi(x)=(i=1mfi)(x).

A more powerful result with less restrictive assumptions than Corollary 9.23 is possible if the intersection of the relative interiors of the domains of the individual functions is nonempty.

Theorem 9.223 (Sum rule of subdifferential calculus for proper convex functions)

Let f1,,fm:V(,] be proper convex functions. Assume that i=1mridomfi. Then for any xV

i=1mfi(x)=(i=1mfi)(x).

9.16.6.2. Linear Transformations#

Our interest here is in compositions of the form h=fA where A is a linear transformation. In other words h(x)=f(A(x)).

If A:VW is a linear transformation then AT:WV is a mapping from W to V and satisfies the relationship:

A(x),y=x,AT(y).

From the definition of directional derivative, we have

h(x;d)=limt0h(x+td)h(x)t=limt0f(A(x+td))f(A(x))t=limt0f(A(x)+tA(d))f(A(x))t=f(A(x);A(d)).

Theorem 9.224 (Weak linear transformation rule of subdifferential calculus)

Let f:W(,] be a proper function. Let A:VW be a linear transformation. Define h:V(,] as

h(x)=f(A(x)).

Assume that h is proper, i.e. domh is not empty:

domh={xV|A(x)domf}.

Then, for any xdomh

AT(f(A(x)))h(x).

Proof. We proceed as follows.

  1. Let xdomh.

  2. Let gf(A(x)).

  3. By Theorem 9.219,

    z,gf(A(x);z)zW.
  4. Choosing z=A(d), we have

    A(d),gf(A(x);A(d))dV.
  5. Equivalently

    d,AT(g)h(x;d)dV.
  6. Hence AT(g)h(x) due to (9.8).

  7. Hence AT(f(A(x)))h(x).

Theorem 9.225 (Strong linear transformation rule for subdifferential calculus)

Let f:W(,] be a proper convex function. Let A:VW be a linear transformation. Define h:V(,] as

h(x)=f(A(x)).

Assume that h is proper, i.e. domh is not empty:

domh={xV|A(x)domf}.

Then, for any xintdomh such that A(x)intdomf, we have:

AT(f(A(x)))=h(x).

Proof. We showed AT(f(A(x)))h(x) in Theorem 9.224. We show the reverse inclusion by contradiction.

  1. Let xintdomh such that A(x)intdomf.

  2. Assume that there exists dh(x) such that dAT(f(A(x))).

  3. By Theorem 9.215, the set f(A(x)) is nonempty, convex and compact.

  4. Hence AT(f(A(x))) is also nonempty, convex and compact.

  5. By strict separation theorem (Theorem 9.169), there exists a vector p and a scalar c such that

    AT(g),p<c<d,pgf(A(x)).
  6. Equivalently

    g,A(p)<c<d,pgf(A(x)).
  7. Taking the supremum over f(A(x)) on the L.H.S., we obtain

    supgf(A(x))g,A(p)<d,p.
  8. By the max formula

    f(A(x);A(p))<d,p.
  9. But this means that

    h(x;p)<d,p.
  10. This contradicts the assumption that dh(x).

  11. Hence we must have

    AT(f(A(x)))=h(x).

9.16.6.3. Affine Transformations#

Theorem 9.226 (Weak affine transformation rule of subdifferential calculus)

Let f:W(,] be a proper function. Let A:VW be a linear transformation. Let bW. Define h:V(,] as

h(x)=f(A(x)+b).

Assume that h is proper, i.e. domh is not empty:

domh={xV|A(x)+bdomf}.

Then, for any xdomh

AT(f(A(x)+b))h(x).

Proof. We proceed as follows.

  1. Let xdomh.

  2. Then, x=A(x)+bdomf such that h(x)=f(x).

  3. Let gAT(f(x)).

  4. Then, there is dW such that g=AT(d) with df(x).

  5. Let ydomh.

  6. Then, y=A(y)+bdomf such that h(y)=f(y).

  7. Applying subgradient inequality for f at x with the subgradient being d, we get

    f(y)f(x)+yx,d.
  8. We have h(y)=f(y), h(x)=f(x) and yx=A(yx).

  9. Thus, the subgradient inequality simplifies to

    h(y)h(x)+A(yx),d.
  10. We note that

    A(yx),d=yx,AT(d).
  11. Thus, for any ydomh, we have

    h(y)h(x)+yx,AT(d).
  12. Thus, g=AT(d)h(x).

Since this is valid for any xdomh and for every gAT(f(A(x)+b)), hence

AT(f(A(x)+b))h(x).

Theorem 9.227 (Affine transformation rule of subdifferential calculus)

Let f:W(,] be a proper convex function. Let A:VW be a linear transformation. Let bW. Define h:V(,] as

h(x)=f(A(x)+b).

Assume that h is proper, i.e. domh is not empty:

domh={xV|A(x)+bdomf}.

Then, for any xintdomh such that A(x)+bintdomf, we have:

AT(f(A(x)+b))=h(x).

Proof. We note that h is a proper convex function since it is a composition of an affine transformation with a proper convex function.

  1. Let xintdomh such that x=A(x)+bintdomf.

  2. Then, for any direction dV, by the max formula,

    h(x;d)=σh(x)(d).
  3. By the definition of the directional derivative, we have

    h(x;d)=limα0+h(x+αd)h(x)α=limα0+f(A(x)+b+αA(d))f(A(x)+bα=f(A(x)+b;A(d)).
  4. Thus,

    σh(x)(d)=f(A(x)+b;A(d)).
  5. Using the max formula on the R.H.S., we get

    σh(x)(d)=f(A(x)+b;A(d))=supgf(A(x)+b)A(d),g=supgf(A(x)+b)d,AT(g)=supgAT(f(A(x)+b))d,g=σAT(f(A(x)+b))(d).
  6. Since xintdomh, hence by Theorem 9.211 and Theorem 9.214 h(x) is nonempty, closed and convex.

  7. Since A(x)+bintdomf, hence by Theorem 9.211 and Theorem 9.214 f(A(x)+b) is nonempty, closed and convex.

  8. It follows that AT(f(A(x)+b)) is nonempty, closed and convex since AT is a linear operator and both V and W are finite dimensional.

  9. Then, due to Theorem 9.90,

    AT(f(A(x)+b))=h(x).

9.16.6.4. Composition#

Chain rule is a key principle in computing derivatives of composition of functions. A chain rule is available for subgradient calculus also.

We first recall a result on the derivative of composition of real functions.

Theorem 9.228 (Chain rule for real functions)

Let f:RR be a real function which is continuous on [a,b] with a<b. Assume that f+(a) exists. Let g:RR be another real function defined on an open interval I such that rangefI. Assume g is differentiable at f(a). Then the composite real function h:RR given by

h(t)g(f(t))(atb)

is right differentiable at t=a. In particular,

h+(a)=g(f(a))f+(a).

Proof. We show this by working with the definition of right hand derivative as a limit

h+(a)=limta+h(t)h(a)ta=limta+g(f(t))g(f(a))ta=limta+g(f(t))g(f(a))f(t)f(a)f(t)f(a)ta=limzf(a)g(z)g(f(a))zf(a)limta+f(t)f(a)ta=g(f(a))f+(a).

We can now develop a chain rule for subdifferentials of multidimensional functions with the help of max formula.

Theorem 9.229 (Subdifferential chain rule)

Let f:VR be convex and let g:RR be a nondecreasing convex function. Let xV. Assume that g is differentiable at f(x). Let h=gf. Then

h(x)=g(f(x))f(x).

Proof. We are given xV at which g is differentiable and f is convex.

  1. Since f is convex and g is nondecreasing convex function, hence h is also convex.

  2. We now introduce two real functions parametrized on x and an arbitrary direction dV

    fx,d(t)=f(x+td),tRhx,d(t)=h(x+td),tR
  3. It is now easy to see that

    hx,d(t)=h(x+td)=g(f(x+td))=g(fx,d(t)).
  4. Thus, hx,d=gfx,d.

  5. Since fx,d and hx,d are restrictions of f and h along a line, they are also convex.

  6. Due to Theorem 9.204, the directional derivatives of f and h exist in every direction.

  7. By the definition of directional derivative Definition 9.70,

    (fx,d)+(0)=f(x;d),(hx,d)+(0)=h(x;d).
  8. Also note that fx,d(0)=f(x) and hx,d(0)=h(x).

  9. fx,d is right differentiable at t=0, and g is differentiable at f(x).

  10. Hence, by the chain rule in Theorem 9.228,

    h(x;d)=g(f(x))f(x;d).
  11. By the max formula in Corollary 9.21,

    h(x;d)=σh(x)(d)f(x;d)=σf(x)(d).
  12. Thus,

    σh(x)(d)=h(x;d)=g(f(x))f(x;d)=g(f(x))σf(x)(d)=σg(f(x))f(x)(d).

    The last step is due to Theorem 9.92. Since g is nondecreasing, hence g(f(x))0.

  13. By Theorem 9.211 and Theorem 9.214, the sets f(x) and h(x) are nonempty, closed and convex.

  14. Then, the set g(f(x))f(x) is also nonempty, closed and convex.

  15. Thus, by Theorem 9.90,

    h(x)=g(f(x))f(x).

Applications of this rule are presented later in Example 9.70.

9.16.6.5. Max Rule#

Theorem 9.230 (Max rule of subdifferential calculus)

Let f1,f2,,fm:V(,] be a set of proper convex functions. Let f:V(,] be given by:

f(x)=max{f1(x),f2(x),,fm(x)}.

Let xi=1mintdomfi be a point common to the interiors of domains of all the functions.

The subdifferential set of f at x can be obtained from the subdifferentials of fi as follows:

f(x)=conv(iI(x)fi(x))

where I(x)={i|fi(x)=f(x)}.

Proof. Since fi are proper convex, hence their pointwise maximum f is proper convex.

  1. Let I(x)={i1,,m|fi(x)=f(x)}.

  2. For any (nonzero) direction, dV, by Theorem 9.209:

    f(x;d)=maxiI(x)fi(x;d).
  3. Without loss of generality, let us assume that I(x)=1,,k for some k1,,m. This can be achieved by reordering fi.

  4. By max formula (Theorem 9.219),

    fi(x;d)=sup{d,g|gfi(x)}.
  5. Thus,

    f(x;d)=maxi1,,ksupgifi(x)d,gi.
  6. Recall that for any a1,,akR, the identity below holds:

    max{a1,,ak}=suptΔki=1ktiai.
  7. Thus, we can expand f(x;d) as:

    f(x;d)=maxi1,,ksupgifi(x)d,gi=suptΔk{i=1ktisupgifi(x)d,gi}=suptΔk{i=1ksupd,tigi|gifi(x)}=sup{d,i=1ktigi|gifi(x),tΔk}=sup{d,g|gconv(i=1kfi(x))}=σA(d)

    where A=conv(i=1kfi(x)) and σ denotes the support function.

  8. Since xintdomf, hence, by the max formula (Corollary 9.21)

    f(x;d)=σf(x)(d).
  9. Thus, we have

    σf(x)(d)=σAd.
  10. By Theorem 9.211, f(x) is closed and convex.

  11. By Theorem 9.214, f(x) is nonempty and bounded.

  12. Thus, f(x) is nonempty, closed and convex.

  13. Similarly, fi(x) are nonempty, closed, convex and bounded.

  14. Thus, i=1kfi(x) is a finite union of nonempty, closed, convex and bounded sets.

  15. Thus, i=1kfi(x) is also nonempty and compact.

    1. A finite union of nonempty sets is nonempty.

    2. A finite union of bounded sets is bounded.

    3. A finite union of closed sets is closed.

    4. Thus, i=1kfi(x) is closed and bounded.

    5. Since V is finite dimensional, hence closed and bounded sets are compact.

  16. Since A is a convex hull of i=1kfi(x), hence A is nonempty, closed and convex.

    1. Recall from Theorem 9.129 that convex hull of a compact set is compact.

    2. Also, recall that compact sets are closed and bounded.

  17. Since σf(x)(d)=σA(d) is true for any dV, the support functions for the underlying nonempty, closed and convex set are equal. Hence by Theorem 9.90,

    f(x)=A=conv(i=1kfi(x)).

Some applications of this rule are presented later in Example 9.75, Example 9.76, Example 9.78, Example 9.79.

We now present a weaker version of the max rule which is applicable for pointwise supremum over an arbitrary set of functions.

Theorem 9.231 (Weak max rule of subdifferential calculus)

Let I be an arbitrary index set and suppose that for every iI, there exists a proper convex function fi:V(,]. Let f:V(,] be given by:

f(x)=supiI{fi(x)}.

Then for any xdomf,

conv(iI(x)fi(x))f(x)

where I(x)={iI|fi(x)=f(x)}.

In words, if fi(x)=f(x), then a subgradient of fi at x is also a subgradient of f at x. Also, for all iI such that fi(x)=f(x), any convex combination of their subgradients at x is also a subgradient of f at x.

Proof. Pick some xdomf.

  1. Let zdomf be arbitrary.

  2. Let I(x)={iI|fi(x)=f(x)}.

  3. Let iI(x) be arbitrary.

  4. Let gfi(x) be a subgradient of fi at x.

  5. Then, by definition of f and subgradient inequality:

    f(z)fi(z)fi(x)+zx,g=f(x)+zx,g.

    We used the fact that fi(x)=f(x) for iI(x).

  6. Thus, gf(x). g is a subgradient of f at x.

  7. Since this is valid for every subgradient of fi at x, hence fi(x)f(x).

  8. Since this is valid for every iI(x), hence

    iI(x)fi(x)f(x).
  9. Recall from Theorem 9.211 that f(x) is convex.

  10. Thus, it contains the convex hull of any of its subsets. Hence,

    conv(iI(x)fi(x))f(x).

Next is an example application of the weak max rule.

Example 9.66 (Subgradient of λmax(A0+i=1mxiAi)

Let A0,A1,,AmSn be m+1 given symmetric matrices. Define an affine transformation A:RmSn as

A(x)A0+i=1mxiAi.

For every vector xRm, this mapping defines a symmetric matrix A(x). We can compute the largest eigen value of A(x). We introduce a function f:RmR as

f(x)λmax(A(x))=λmax(A0+i=1mxiAi).

Our task is to find a subgradient of f at x.

  1. Recall from the definition of largest eigen values,

    f(x)=supyRn;y2=1yTA(x)y.
  2. For every yRn such that y2=1, we can define a function:

    fy(x)yTA(x)y.
  3. Then,

    f(x)=supyRn;y2=1fy(x).
  4. The function fy(x) is affine (in x) for every y.

  5. Thus, fy is convex for every y.

  6. Thus, f is a pointwise supremum of a family of functions fy.

  7. Thus, f is also convex (see Theorem 9.114).

  8. Consequently, we can use the weak max rule Theorem 9.231 to identify a subgradient of f at x.

  9. Let y~ be a normalized eigenvector of A(x) corresponding to its largest eigenvalue. Then

    f(x)=y~TA(x)y~.
  10. This means that f(x)=fy~(x).

  11. By the weak max rule, a subgradient of fy~ at x is also a subgradient of f at x.

  12. Expanding fy~(x):

    fy~(x)=y~TA(x)y~=y~TA0y~+i=1my~TAiy~xi.
  13. Then, the gradient of fy~ at x (computed by taking partial derivatives w.r.t. xi) is

    fy~(x)=(y~TA1y~,,y~TAmy~).
  14. Since fy is affine (thus convex), hence its gradient is also a subgradient.

  15. Thus,

    (y~TA1y~,,y~TAmy~)f(x).

9.16.7. Lipschitz Continuity#

Theorem 9.232 (Lipschitz continuity and boundedness of the subdifferential sets)

Let f:V(,] be a proper convex function. Suppose that Xintdomf. Consider the following two claims:

  1. |f(x)f(y)|Lxy for any x,yX.

  2. gL for any gf(x) where xX.

Then,

  • (2) implies (1). In other words, if subgradients are bounded then, the function is Lipschitz continuous.

  • If X is open, then (1) holds if and only if (2) holds.

In other words, if the subgradients over a set X are bounded then f is Lipschitz continuous over X. If X is open then f is Lipschitz continuous over X if and only if the subgradients over X are bounded.

Proof. (a) We first show that (2)(1).

  1. Assume that (2) is satisfied.

  2. Pick any x,yX.

  3. Since f is proper and convex and x,yintdomf, hence due to Theorem 9.214, f(x) and f(y) are nonempty.

  4. Let gxf(x) and gyf(y).

  5. By subgradient inequality

    f(y)f(x)+yx,gx;f(x)f(y)+xy,gy.
  6. We can rewrite this as

    f(x)f(y)xy,gx;f(y)f(x)yx,gy.
  7. By generalized Cauchy Schwartz inequality (Theorem 4.108),

    xy,gxxygxLxy;yx,gyyxgyLxy.
  8. Combining the two inequalities, we get

    |f(x)f(y)|Lxy.
  9. Thus, (2)(1).

(b) If X is open, then we need to show that (1)(2).

  1. We have already shown that (2)(1).

  2. Assume that X is open and (1) holds.

  3. Let xX.

  4. Since x is an interior point of domf, hence the subdifferential is nonempty.

  5. Pick any gf(x).

  6. Let gV be a vector with g=1 and g,g=g. Such a vector exists by definition of the dual norm.

  7. Since X is open, we can choose ϵ>0 small enough such that x+ϵgX.

  8. By the subgradient inequality, we have:

    f(x+ϵg)f(x)+ϵg,g.
  9. Thus,

    ϵg=ϵg,gf(x+ϵg)f(x)L(x+ϵgx by hypothesis in (1)=Lϵg=Lϵ.
  10. Canceling ϵ, we get:

    gL

    holds true for every gf(x) where xX as desired.

Corollary 9.25 (Lipschitz continuity of convex functions over compact domains)

Let f:V(,] be a proper and convex function. Suppose that Xintdomf is compact. Then, there exists L>0 such that

|f(x)f(y)|Lxyx,yX.

Proof. Recall from Theorem 9.216 that the subgradients of a proper convex function over a compact set are nonempty and bounded.

  1. In other words, the set

    Y=xXf(x)

    is nonempty and bounded.

  2. Thus, for every gY, there exists L>0 such that gL.

  3. Then by Theorem 9.232,

    |f(x)f(x)|Lxyx,yX.
  4. Thus, f is indeed Lipschitz continuous over X.

9.16.8. ϵ-Subgradients#

Definition 9.76 (ϵ-Subgradient)

Let f:V(,] be a proper function. Let xdomf. A vector gV is called an ϵ-subgradient of f at x if

(9.9)#f(y)f(x)+yx,gϵyV.

9.16.8.1. Geometric Interpretation#

Observation 9.12 (ϵ-subgradient and supporting hyperplane)

Let f:V(,] be a proper function. Then g be an ϵ-subgradient of f at x if and only if epif is contained in the positive halfspace of the hyperplane with a normal (g,1) passing through (x,f(x)ϵ).

Proof. Let H denote the hyperplane

H={(y,t)|y,g+t=x,g+f(x)ϵ}.

The positive halfspace of H is given by

H+={(y,t)|y,g+tx,g+f(x)ϵ}.

Assume that g is an ϵ-subgradient of f at x.

  1. For any (y,t)epif, we have

    tf(y)f(x)+yx,gϵ.
  2. This is equivalent to

    y,g+tx,g+f(x)ϵ

    for all (y,t)epif.

  3. Hence epifH+.

Now assume that epifH+.

  1. Let (y,f(y))epif.

  2. Then we have

    y,g+f(y)x,g+f(x)ϵ.
  3. Rearranging the terms, we have

    f(y)f(x)+yx,gϵyV.
  4. But this means that g is an ϵ-subgradient of f at x.

9.16.8.2. ϵ-Subdifferential#

Definition 9.77 (ϵ-subdifferential)

Let f:V(,] be a proper function. The set of all ϵ-subgradients of f at a point xdomf is called the ϵ-subdifferential of f at x and is denoted by ϵf(x).

ϵf(x){gV|f(y)f(x)+yx,gϵyV}.

For all xdomf, we define ϵf(x)=.

It is easy to see that

f(x)ϵf(x).

Also, if ϵ2ϵ1>0, then

ϵ1f(x)ϵ2f(x).

9.16.9. Optimality Conditions#

A well known result for differentiable functions is that at the point of optimality f(x)=0 (see Theorem 8.1). Subdifferentials are useful in characterizing the minima of a function. The idea of vanishing gradients can be generalized for subgradients also.

Theorem 9.233 (Fermat’s optimality condition)

Let f:V(,] be a proper convex function. Then

aargmin{f(x)|xV}

if and only if 0f(a).

In other words, a is a minimizer of f if and only if 0 is a subgradient of f at a.

Proof. Assume that 0f(a) where adomf.

  1. By subgradient inequality

    f(x)f(a)+xa,0xV.
  2. This simplifies to

    f(x)f(a)xV.
  3. Thus, aargmin{f(x)|xV}.

For the converse, assume that aargmin{f(x)|xV}.

  1. Then,

    f(x)f(a)xV.
  2. But then

    f(x)f(a)f(x)f(a)+0f(x)f(a)+xa,0

    holds true for every xV.

  3. This implies that 0f(a).

9.16.10. Mean Value Theorem#

The following result is from [48].

Theorem 9.234 (A subgradients based mean value theorem for 1D functions)

Let f:R(,] be a proper closed convex function. Let [a,b]domf with a<b. Then,

f(b)f(a)=abh(t)dt

where h:(a,b)R satisfies h(t)f(t) for every t(a,b).

In the reminder of this section, we compute the subgradients and subdifferential sets for a variety of standard functions.

9.16.11. Norm Functions#

We recall from Theorem 9.210 that the subdifferential of a norm :VR at x=0 is given by:

f(0)=B[0,1]={gV|g1}.

9.16.11.1. 1-Norm#

Example 9.67 (Subdifferential of 1 norm at origin)

Let f:RnR be given by f(x)=x1. We recall that the dual norm of 1 is . The unit ball of -norm at origin is given by

B[0,1]=[1,1]n.

Following Theorem 9.210, the subdifferential of f at x=0 is given by:

f(0)=B[0,1]=[1,1]n.

Example 9.68 (Subdifferential of absolute value function at origin)

Let g:RR be the absolute value function given by

g(x)=|x|.

This is a special case of 1 norm for R1. Thus, following Example 9.67, the subdifferential of g at x=0 is given by:

g(0)=[1,1].

For a complete specification of the subdifferential of g, see Example 9.73 below.

Example 9.69 (Subdifferential of 1 norm )

Let f:RnR be given by f(x)=x1. We can write f as a sum of n functions

f(x)=x1=i=1n|xi|=i=1nfi(x)

where

fi(x)=|xi|.

Let g(x)=|x|. Then

fi(x)=|xi|=|eiTx|=g(eiTx).

Due to affine transformation rule (Theorem 9.227),

fi(x)=(g(eiTx))ei=(g(xi))ei.

The subdifferential of the absolute value function g is described in Example 9.73 below.

Thus, the subdifferential set of fi is given by:

fi(x)={{sgn(xi)ei}forxi0[ei,ei]forxi=0.

Using the sum rule Corollary 9.24, we have:

i=1nfi(x)=(i=1nfi)(x).

We define the index set:

I0(x)={i|xi=0}.

Expanding the sum of subdifferentials,

f(x)=iI0(x)[ei,ei]+iI0(x)sgn(xi)ei.

We can rewrite this as:

f(x)={zRn|zi=sgn(xi) whenever xi0,|zi|1, otherwise }.

We also have a weak result from this:

sgn(x)f(x)=x1.

Example 9.70 (Subdifferential of 1 norm squared)

Let f:RnR be given by f(x)=x1.

Now let g(t)=[t]+2. And consider the function h=gf given by

h(x)=x12.

By subdifferential chain rule (Theorem 9.229):

h(x)=2x1f(x)=2x1{zRn|zi=sgn(xi) whenever xi0,|zi|1, otherwise }.

We have used the subdifferential of f from Example 9.69.

9.16.11.2. 2-Norm#

Example 9.71 (Subdifferential of 2 norm at origin)

Let f:RnR be given by f(x)=x2. We recall that the dual norm of 1 is also 2 as this norm is self dual.

Following Theorem 9.210, the subdifferential of f at x=0 is given by:

f(0)=B2[0,1]={gRn|g21}.

Example 9.72 (Subdifferential of 2 norm)

Let f:RnR be given by f(x)=x2. At x0, f is differentiable with the gradient (see Example 5.10):

f(x)=xx2.

Since f is convex and differentiable at x0, hence due to Theorem 9.220,

f(x)={f(x)}={xx2}.

Combining this with the subdifferential of f at origin from Example 9.71, we obtain:

f(x)={{xx2}forx0B2[0,1]forx=0.

Example 9.73 (Subdifferential of absolute value function)

Let g:RR be the absolute value function given by

g(x)=|x|.

This is a special case of 2 norm for R1. Following Example 9.72,

g(x)={{sgn(x)}forx0[1,1]forx=0.

c.f. Example 9.68.

9.16.11.3. -Norm#

Example 9.74 (Subdifferential of norm at origin)

Let f:RnR be given by f(x)=x. We recall that the dual norm of is 1. The unit ball of 1-norm at origin is given by

B1[0,1]={xRn|x11}.

Following Theorem 9.210, the subdifferential of f at x=0 is given by:

f(0)=B1[0,1]={xRn|x11}.

Example 9.75 (Subdifferential of norm)

Let f:RnR be given by f(x)=x. Let us compute the subdifferential of f at x0.

We have:

f(x)=max{f1(x),f2(x),,fn(x)}

where fi(x)=|xi|. We define:

I(x)={i[1,,n]||xi|=f(x)=x}.

Then, following Example 9.69

fi(x)={sgn(xi)ei}iI(x).

This is valid since x0 implies that f(x)0 which in turn implies that xi0 for every iI(x).

Then, using the max rule for proper convex functions (Theorem 9.230):

f(x)=conv(iI(x){sgn(xi)ei}).

We can rewrite this as:

f(x)={iI(x)λisgn(xi)ei|iI(x)λi=1,λj0,jI(x)}.

Combining this with the subdifferential of f at origin from Example 9.74, we obtain:

f(x)={{iI(x)λisgn(xi)ei|iI(x)λi=1,λj0,jI(x)},x0B1[0,1],x=0.

9.16.11.4. 1 Norm over Affine Transformation#

Let ARm×n. Let bRm. Let f:RmR be given by f(y)=y1.

Let h:RnR be the function h(x)=Ax+b1=f(Ax+b).

By affine transformation rule, we have:

h(x)=ATf(Ax+b)xRn.

Denoting i-th row of A as aiT, we define the index set:

I0(x)={i:aiTxi+bi=0}.

we have:

h(x)=iI0(x)[ai,ai]+iI0(x)sgn(aiTx+bi)ai.

In particular, we have the weak result:

ATsgn(Ax+b)h(x).

9.16.11.5. 2 Norm over Affine Transformation#

Let ARm×n. Let bRm. Let f:RmR be given by f(y)=y2.

Let h:RnR be the function h(x)=Ax+b2=f(Ax+b).

We have:

f(Ax+b)={{Ax+bAx+b2}forAx+b0B2[0,1]forAx+b=0.

Applying the affine transformation rule, we get:

h(x)=ATf(Ax+b)={{AT(Ax+b)Ax+b2}forAx+b0ATB2[0,1]forAx+b=0.

For x|Ax+b=0, we can write this as

h(x)=ATB2[0,1]={ATy|y21}.

9.16.11.6. Norm over Affine Transformation#

Example 9.76 (Subdifferential of |Ax+b|)

Let ARm×n. Let bRm. Let f:RmR be given by

f(y)=y.

Let h:RnR be the function

h(x)=Ax+b=f(Ax+b).

With y=Ax+b, we have yi=aiTx+bi where aiT is the i-th row vector of A.

Following Example 9.75

f(y)={{iI(y)λisgn(yi)ei|iI(y)λi=1,λj0,jI(y)},y0B1[0,1],y=0

where I(y)={i[1,,n]||yi|=f(y)=y}.

Due to affine transformation rule (Theorem 9.227),

h(x)=ATf(Ax+b).

We have the following cases.

(a) y=0.

  1. In terms of x, the condition y=0 is equivalent to Ax+b=0.

  2. Then,

    f(Ax+b)=f(0)=B1[0,1].
  3. Thus,

    h(x)=ATB1[0,1].

(b) y0.

  1. In terms of x, the condition y0 is equivalent to Ax+b0.

  2. Then,

    f(Ax+b)={iI(y)λisgn(yi)ei|iI(y)λi=1,λj0,jI(y)}={iIxλisgn(aiTx+bi)ei|iIxλi=1,λj0,jIx}

    where

    Ix=I(y)=I(Ax+b).
  3. Note that ATei=ai.

  4. Then,

    h(x)=ATf(Ax+b)={iIxλisgn(aiTx+bi)ai|iIxλi=1,λj0,jIx}.

Combining the two cases, we get:

h(x)={{iIxλisgn(aiTx+bi)ai|iIxλi=1,λj0,jIx},Ax+b0ATB1[0,1],Ax+b=0

9.16.12. Indicator Functions#

Theorem 9.235 (Subdifferential of indicator function)

The subdifferential of indicator function for a nonempty set SV at any point xS is given by

IS(x)=NS(x).

where NS(x) is the normal cone of S at x.

Proof. Let xS and gIS(x). The subgradient inequality (9.7) gives us:

IS(z)IS(x)+zx,gzS00+zx,gzSzx,g0zSgNS(x).

Example 9.77 (Subdifferential of the indicator function of the unit ball)

The unit ball at origin is given by:

S=B[0,1]={xV|x1}.

From Theorem 9.64, the normal cone of S at xS is given by:

NS(x)={yV|yx,y}.

For any xS, NS(x)=. Combining:

δB[0,1](x)={{yV|yx,y}forx1forx>1.

9.16.13. Maximum Eigen Value Function#

The maximum eigen value function for symmetric matrices, denoted as f:SnR, is given by:

f(X)λmax(X).

Theorem 9.236 (Subgradient for maximum eigen value function)

Let f:SnR be the maximum eigen value function. Then

vvTf(X)

where v is a normalized eigen-vector of XSn associated with its maximum eigen value.

Proof. Let XSn be given. Let v be a normalized eigen vector associated with the largest eigen value of X. Then, v2=1.

For any YSn, we have:

f(Y)=λmax(Y)=maxu2=1{uTYu}vTYv=vTXv+vT(YX)v=λmax(X)v22+tr(vT(YX)v)=λmax(X)+tr((YX)vvT)=f(X)+YX,vvT.

In this derivation, we have used the following results:

  • The maximum eigen value can be obtained by maximizing uTYu over the unit sphere.

  • For a scalar xR, x=tr(x).

  • tr(AB)=tr(BA) if both AB and BA are well defined.

  • A,B=tr(AB) for the space of symmetric matrices.

Thus, vvTf(X).

We note here that this result only identifies one of the subgradients of f at X. It doesn’t characterize the entire subdifferential of f at X. In this sense, this result is a weak result. In contrast, a strong result would characterize the entire subdifferential.

9.16.14. The Max Function#

Example 9.78 (Subdifferential of the max function)

Let f:RnR be given by:

f(x)=max{x1,x2,,xn}.

Let fi(x)=xi=eiTx. Then

f(x)=max{f1(x),f2(x),,fn(x)}.

We note that fi are differentiable and their gradient is given by (see Example 5.4):

fi(x)=ei.

Also, fi are linear, hence convex. Thus, due to Theorem 9.220:

fi(x)={ei}.

We denote the index set of functions which equal the value of f(x at x by:

I(x)={i|f(x)=xi}.

Then, using the max rule for proper convex functions (Theorem 9.230):

f(x)=conv(iI(x)fi(x))=conv(iI(x){ei}).

As and example, consider the case where x=α1 for some αR.

  1. In other words, x=(α,,α).

  2. Then, f(x)=α.

  3. fi(x)=α=f(x) for ever i[1,,n].

  4. I(x)={1,,n}.

  5. fi(x)=ei.

  6. conv(iI(x){ei})=conv{e1,,en}.

  7. But conv{e1,,en}=Δn.

  8. Thus,

    f(α1)=ΔnαR.

9.16.15. Space of Matrices#

Let V=Rm×n. Let the standard inner product for x,y,V be x,y=tr(xTy).

Let f:V(,] be a proper function. Let xintdomf.

The gradient at x, if it exists, is given by:

f(x)=Df(x)(fxij(x))i,j.

Let H be a positive definite matrix and define an inner product for V as:

x,yHtr(xTHy).

Then

f(x)=H1Df(x).

9.16.16. Convex Piecewise Linear Functions#

Example 9.79 (Subdifferential of convex piecewise linear functions)

Let a convex piecewise linear function f:RnR be given by:

f(x)=max1im{aiTx+bi}

where aiRn,biR for i=1,,m.

We define a set of functions fi:RnR for i=1,,m as

fi(x)=aiTx+bi

We can see that f is a pointwise maximum of these functions.

f(x)=max1im{fi(x)}.

Clearly,

fi(x)={fi(x)}={ai}.

We define:

I(x)={i[1,,m]|f(x)=fi(x)=aiTx+bi}.

Then, using the max rule for proper convex functions (Theorem 9.230):

f(x)=conv(iI(x)fi(x))={iI(x)λiai|iI(x)λi=1,λj0jI(x)}.

By Fermat’s optimality condition (Theorem 9.233), x is a minimizer of f if and only if 0f(x).

Thus, x is a minimizer if and only if there exists λλΔm such that

0=i=1mλiai,λj=0jI(x).

Note that at any x, for every jI(x), we have

ajTx+bjf(x)<0.

Thus, the complimentary condition

λj(ajTx+bjf(x))=0,j=1,,m

denotes the fact that whenever ajTx+bjf(x)<0, then λj must be zero and whenever ajTx+bjf(x)=0 then λj0 is allowed (since λλΔm).

If we put together a matrix ARm×n whose rows are a1T,,amT, then the optimality condition can be succinctly stated as

λλΔm s.t. ATλλ=0 and λj(ajTx+bjf(x))=0,j=1,,m.

9.16.17. Minimization Problems#

Minimization problem:

min{f(x)|g(x)0,xX}.

Dual function:

q(λ)minxX{L(x,λ)f(x)+λTg(x)}

Assume that for λ=λ0 the minimization in R.H.S. is obtained at x=x0.

Subgradient of the (negative of the) dual function q:

g(x0)(q)(λ0).