(sec:cvxf:subgradients)=
# Subgradients
Primary references for this section are
{cite}`bertsekas2003convex,beck2017first`.
Throughout this section, we assume that $\VV, \WW$ are
finite dimensional real vector spaces. Wherever necessary,
they are equipped with a
{prf:ref}`norm ` $\| \cdot \|$
or an {prf:ref}`real inner product `
$\langle \cdot, \cdot \rangle$.
They are also equipped with a metric $d(\bx, \by) = \| \bx - \by \|$
as needed.
## Subgradients
```{index} Subgradient
```
````{prf:definition} Subgradient
:label: def-cvxf-subgradient
Let $f : \VV \to \RERL$ be a
{prf:ref}`proper function `.
Let $\bx \in \dom f$.
A vector $\bg \in \VV^*$ is called a *subgradient* of $f$ at $\bx$
if
```{math}
:label: eq-cvxf-subgradient-inequality
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle \Forall \by \in \VV.
```
This inequality is known as the *subgradient inequality*.
````
```{div}
Here, we assume that $f$ is an extended value extension whenever
required; i.e., $f(\bx) = \infty$ for all $\bx \notin \dom f$.
As discussed in {prf:ref}`res-la-ip-dual-space-isomorphism`,
the vector spaces $\VV$ and $\VV^*$ are isomorphic.
Therefore, we follow the convention that both
$\VV$ and $\VV^*$ have exactly the same elements.
The primary difference between $\VV$ and $\VV^*$ comes from the
computation of norm. If $\VV$ is endowed with a norm $\| \cdot \|$
then $\VV^*$ is endowed with a {prf:ref}`dual norm `
$\| \cdot \|_*$.
In the arguments below $B[\ba, r]$ or $B_{\| \cdot \|}[\ba, r]$ denotes the
closed ball of radius $r$ in the normed space $(\VV, \| \cdot \|)$.
The closed ball of radius $r$ in the dual space $(\VV, \| \cdot \|_*)$
shall be denoted by $B_*[\ba, r]$ or $B_{\| \cdot \|_*}[\ba, r]$.
Open balls shall be denoted similarly.
```
```{prf:observation} Global affine underestimator
:label: res-cvxf-subgradient-affine-under-estimator
If $\bg$ is a subgradient of $f$ at some $\bx \in \dom f$, then
the affine function $a : \VV \to \RR$ given by:
$$
a(\by) = f(\bx) + \langle \by - \bx, \bg \rangle
= \langle \by, \bg \rangle + f(\bx) - \langle \bx, \bg \rangle \Forall \by \in \VV
$$
is a global affine underestimator for $f$.
Note that the term $f(\bx) - \langle \bx, \bg \rangle$ is a constant
since $\bx \in \VV$ and $\bg \in \VV^*$ are fixed.
This comes from the subgradient inequality:
$$
f(\by) \geq a(\by) \Forall \by \in \VV.
$$
```
````{prf:observation} Subgradient inequality alternative form
:label: res-cvxf-subgradient-inequality-alternative-form
For $\by \notin \dom f$, $f(\by) = \infty$. Thus, the subgradient
inequality is trivially satisfied for all $\by \notin \dom f$.
An alternate form of the inequality is:
```{math}
:label: eq-cvxf-subgradient-inequality-2
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle \Forall \by \in \dom f.
```
````
### Geometric Interpretation
```{prf:observation} Subgradient and supporting hyperplane
:label: res-cvxf-subgradient-supporting-hyperplane
Let $f: \VV \to \RERL$ be a proper function.
Then $\bg$ be a subgradient of $f$ at $\bx$
if and only if $\epi f$ has a supporting
hyperplane at $(\bx, f(\bx))$ with
a normal $(-\bg, 1)$.
Let $H$ be a supporting hyperplane of $\epi f$ at $(\bx, f(\bx))$
with the normal $(-\bg, 1)$.
1. Then
$$
H = \{ (\by, t) \ST \langle \by, -\bg \rangle + t
= \langle \bx, -\bg \rangle + f(\bx) \}.
$$
1. For any $(\by, f(\by)) \in \epi f$, we must have
$$
& \langle \by, -\bg \rangle + f(\by)
\geq \langle \bx, -\bg \rangle + f(\bx) \\
\iff & f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle.
$$
1. Then $\bg$ is a subgradient of $f$ at $\bx$.
Now let $\bg$ be a subgradient of $f$ at $\bx$.
1. Let $(\by, t) \in \epi f$.
1. Then we have
$$
t \geq f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle.
$$
1. Rearranging the terms, we have
$$
\langle \by, -\bg \rangle + t
\geq \langle \bx, -\bg \rangle + f(\bx)
$$
for every $(\by, t) \in \epi f$.
1. Then the hyperplane
$$
H = \{ (\by, t) \ST \langle \by, -\bg \rangle + t
= \langle \bx, -\bg \rangle + f(\bx) \}
$$
is indeed a supporting hyperplane for $\epi f$.
1. The normal vector for this hyperplane is $(-\bg, 1)$
and it passes through the point $(\bx, f(\bx))$.
```
## Subdifferential
At a point $\bx \in \dom f$, it is possible that there are
more than one subgradients. It is thus natural to introduce
the notion of the set of all subgradients of $f$ at a specific
point $\bx \in \dom f$.
```{index} Subdifferential
```
```{prf:definition} Subdifferential set
:label: def-cvxf-subdifferential
Let $f : \VV \to \RERL$ be a proper function.
The set of all subgradients of $f$ at a point $\bx \in \dom f$
is called the *subdifferential* of $f$ at $\bx$ and is denoted by
$\partial f(\bx)$.
$$
\partial f(\bx) \triangleq
\{ \bg \in \VV^* \ST f (\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle
\Forall \by \in \VV \}.
$$
For all $\bx \notin \dom f$, we define $\partial f(\bx) = \EmptySet$.
```
```{prf:theorem} Subdifferential of norm at origin
:label: res-cvxf-subdiff-norm-origin
Let $f : \VV \to \RR$ by given by:
$$
f(\bx) = \| \bx \|
$$
where $\| \cdot \|$ is the norm endowed on $\VV$. Then,
the subdifferential of $f$ at $\bx = \bzero$ is given by the
dual norm unit ball:
$$
\partial f (\bzero) = B_{\| \cdot \|_*}[\bzero, 1]
= \{ \bg \in \VV^* \ST \| \bg \|_* \leq 1 \}.
$$
```
```{prf:proof}
$\bg \in \partial f (\bzero)$ if and only if
$$
f(\by) \geq f(\bzero) + \langle \by - \bzero, g \rangle \Forall \by \in \VV.
$$
This reduces to:
$$
\| \by \| \geq \langle \by, \bg \rangle \Forall \by \in \VV.
$$
Maximizing both sides of this inequality over the set $\{ \by \ST \| \by \| \leq 1 \}$,
we obtain:
$$
\| \bg \|_* = \underset{\| \by \| \leq 1}{\sup} \{\langle \by, \bg \rangle \}
\leq \underset{\| \by \| \leq 1}{\sup} \| \by \| = 1.
$$
Thus, $\| \bg \|_* \leq 1$ is a necessary condition.
We now show that $\| \bg \|_* \leq 1$ is sufficient too.
By {prf:ref}`Generalized Cauchy Schwartz inequality `:
$$
\langle \by , \bg \rangle \leq \| \by \| \| \bg \|_* \leq \| \by \| \Forall \by \in \VV.
$$
Thus, if $\| \bg \|_* \leq 1$, then $\bg$ is a subgradient.
Thus, the vectors that satisfy the subgradient inequality are exactly the
same as those in $ B_{\| \cdot \|_*}[\bzero, 1]$.
```
The subdifferential of a function $f$ may be empty at specific points
$\bx \in \VV$.
```{index} Subdifferentiability
```
```{prf:definition} Subdifferentiability
:label: def-cvxf-subdifferentiable
A proper function $f : \VV \to \RERL$ is called *subdifferentiable*
at some $\bx \in \dom f$ if $\partial f(\bx) \neq \EmptySet$.
```
```{index} Subdifferentiability; domain
```
```{prf:definition} Domain of subdifferentiability
:label: def-cvxf-domain-subdifferentiable
The set of points at which a proper function $f : \VV \to \RERL$ is
subdifferentiable, denoted by $\dom (\partial f)$, is defined as:
$$
\dom (\partial f) \triangleq \{\bx \in \VV \ST \partial f(\bx) \neq \EmptySet \}.
$$
```
### Closedness and Convexity
```{prf:theorem} Closedness and convexity of the subdifferential set
:label: res-cvxf-subdifferential-closed-convex
Let $f: \VV \to \RERL$ be a proper function. Then the set $\partial f(\bx)$
is closed and convex for any $\bx \in \VV$.
```
```{prf:proof}
Let $\bx \in \VV$ be fixed. For any $\by \in \VV$, define the set
$$
H_{\by} = \{ \bg \in \VV^* \ST \langle \by - \bx, \bg \rangle \leq f(\by) - f(\bx) \}.
$$
Note that $H_{\by}$ is a closed {prf:ref}`half space ` in $\VV^*$.
It is easy to see that
$$
\partial f(\bx) = \bigcap_{\by \in \VV} H_{\by}.
$$
$$
& \bg \in \partial f(\bx) \\
&\iff f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle \Forall \by \in \VV\\
&\iff f(\by) - f(\bx) \geq \langle \by - \bx, \bg \rangle \Forall \by \in \VV\\
&\iff \langle \by - \bx, \bg \rangle \leq f(\by) - f(\bx) \Forall \by \in \VV\\
&\iff \bg \in H_{\by} \Forall \by \in \VV\\
&\iff \bg \in \bigcap_{\by \in \VV} H_{\by}.
$$
Thus, $\partial f(\bx)$ is an infinite intersection of closed and convex sets.
Hence $\partial f(\bx)$ is closed and convex.
```
### Subdifferentiability and Convex Domain
```{prf:theorem} Subdifferentiability + Convex domain $\implies$ Convexity
:label: res-cvxf-subdiff-cvx-dom-cvx
Let $f: \VV \to \RERL$ be a proper function.
Assume that $\dom f$ is convex.
If $f$ is subdifferentiable at every $\bx \in \dom f$, then $f$ is convex.
In other words:
$$
\Forall \bx \in \dom f, \partial f(\bx) \neq \EmptySet \implies f \text{ is convex}.
$$
```
```{prf:proof}
Let $\bx, \by \in \dom f$. Let $t \in [0,1]$.
Let $\bz = (1- t) \bx + t \by$.
1. Since $\dom f$ is convex, hence $\bz \in \dom f$.
1. By hypothesis, $f$ is subdifferentiable at $\bz$.
1. Thus, there exists $\bg \in \partial f(\bz)$.
1. By subgradient inequality {eq}`eq-cvxf-subgradient-inequality-2`
$$
f(\by) \geq f(\bz) + \langle \by - \bz, \bg \rangle = f(\bz) + (1-t) \langle \by - \bx, \bg \rangle\\
f(\bx) \geq f(\bz) + \langle \bx - \bz, \bg \rangle = f(\bz) - t \langle \by - \bx, \bg \rangle.
$$
1. Multiplying the first inequality by $t$, second by $(1-t)$ and adding, we get:
$$
t f(\by) + (1-t) f(\bx) \geq f(\bz).
$$
1. Thus,
$$
f((1-t)\bx + t \by) = f(\bz) \leq t f(\by) + (1-t) f(\bx)
$$
holds true for any $\bx, \by \in \dom f$ and any $t \in [0,1]$.
1. Thus, $f$ is convex.
```
A convex function need not be subdifferentiable at every point in its domain.
The problem usually occurs at the boundary points of the domain if the domain
is not open.
### Positive Scaling
```{prf:theorem} Multiplication by a positive scalar
:label: res-cvxf-subdiff-scaling
Let $f: \VV \to \RERL$ be a proper function.
Let $\bx \in \dom f$.
For any $\alpha > 0$,
$$
\partial (\alpha f)(\bx) = \alpha \partial f(\bx).
$$
```
```{prf:proof}
Let $\bg \in \partial f(\bx)$.
1. By subgradient inequality {eq}`eq-cvxf-subgradient-inequality-2`
$$
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle \Forall \by \in \dom f.
$$
1. Multiplying by $\alpha$, we get:
$$
(\alpha f)(\by) \geq (\alpha f)(\bx) +
\langle \by - \bx, \alpha \bg \rangle \Forall \by \in \dom (\alpha f).
$$
1. Thus, $\alpha \bg \in \partial (\alpha f)(\bx)$.
1. Thus, $\alpha \partial f(\bx) \subseteq \partial (\alpha f)(\bx)$.
1. It is easy to see the same argument backwards to show that
$$
\partial (\alpha f)(\bx) = \alpha \partial f(\bx).
$$
```
## Proper Convex Functions
In this subsection, we discuss the properties of the
subdifferential sets for proper convex functions.
A proper convex function may not be subdifferentiable
at every point in its domain.
However, it is indeed subdifferentiable at
the interior points and relative interior points
of its domain.
### Nonemptiness and Boundedness at Interior Points
```{prf:theorem} Nonemptiness and boundedness of the subdifferential at interior points
:label: res-cvxf-proper-interior-subdiff-nonempty-bounded
Let $f: \VV \to \RERL$ be a proper convex function
with $S = \dom f$.
Let $\ba \in \interior S$.
Then, $\partial f (\ba)$ is nonempty and bounded.
In other words, for a proper convex function, the subdifferential
at the interior points of its domain is nonempty and bounded.
We have
$$
\interior \dom f \subseteq \dom (\partial f).
$$
```
```{prf:proof}
Outline of the proof
1. Identify a supporting hyperplane for the epigraph of $f$ at $(\ba, f(\ba)$.
1. Make use of the local Lipschitz continuity of the convex function at its interior points.
1. Show that the normal to the supporting hyperplane leads to a subgradient at $\ba$.
1. Show that the subgradients are bounded by using the
local Lipschitz continuity inequality and the subgradient inequality.
Consider the {prf:ref}`direct sum `
vector space $\VV \oplus \RR$.
1. $\epi f \subseteq \VV \oplus \RR$.
1. Since $f$ is convex, hence $\epi f$ is convex.
1. For some $\ba \in \interior S$, consider the point
$(\ba, f(\ba)) \in \VV \oplus \RR$.
1. Since $f$ is convex, hence $(\ba, f(\ba)) \in \boundary \epi f$.
1. By {prf:ref}`supporting hyperplane theorem `,
there exists a vector $(\bp, -\alpha) \in \VV^* \oplus \RR$ such that
$$
\langle \bx, \bp \rangle - t \alpha
\leq \langle \ba, \bp \rangle - f(\ba) \alpha
\Forall (\bx, t) \in \epi f.
$$
1. We shall show that $\alpha > 0$ must hold true and $\bg = \frac{\bp}{\alpha}$
is indeed a subgradient at $\ba$.
1. We note that, $(\ba, f(\ba) + 1) \in \epi f$. Putting it in,
$$
& \langle \ba, \bp \rangle - (f(\ba) + 1) \alpha
\leq \langle \ba, \bp \rangle - \alpha f(\ba)\\
&\iff -\alpha \leq 0 \\
&\iff \alpha \geq 0.
$$
Thus, $\alpha \geq 0$.
1. Recall from {prf:ref}`res-cvxf-convex-local-lipschitz-continuous` that
$f$ is locally Lipschitz continuous at $\ba \in \interior \dom f$.
1. Thus, there exists $r > 0$ and $L > 0$ such that
$B[\ba, r] \subseteq S$ and
$$
|f(\bx) - f(\ba)| \leq L \| \bx - \ba \| \Forall \bx \in B[\ba, r].
$$
1. Since $B[\ba, r] \subseteq S$, hence $(\bx, f(\bx)) \in \epi f$
for every $\bx \in B[\ba, r]$.
1. Plugging $t = f(\bx)$ in the supporting hyperplane inequality,
we get
$$
\langle \bx, \bp \rangle - f(\bx) \alpha
\leq \langle \ba, \bp \rangle - f(\ba) \alpha
\Forall \bx \in B[\ba, r].
$$
1. Rearranging the terms,
$$
\langle \bx - \ba, \bp \rangle \leq \alpha (f(\bx) - f(\ba))
\Forall \bx \in B[\ba, r].
$$
1. Using the local Lipschitz property,
$$
\langle \bx - \ba, \bp \rangle \leq \alpha L \| \bx - \ba \|
\Forall \bx \in B[\ba, r].
$$
1. Recall that the {prf:ref}`dual norm `
for $\bp \in \VV^*$ is given by
$$
\| \bp \|_* = \sup \{ |\langle \bx, \bp \rangle | \ST \bx \in \VV, \bx \| \leq 1 \}.
$$
1. Let $\bp^{\dag} \in \VV$ with $\| \bp^{\dag} \| = 1$ be the vector at which
the supremum is attained.
1. Then, $\| \bp \|_* = \langle \bp^{\dag}, \bp \rangle$ (since $\VV$ is real).
1. Since $\bp^{\dag}$ is a unit vector, hence $\ba + r \bp^{\dag} \in B[\ba, r]$.
1. Plugging $\bx = \ba + r \bp^{\dag}$ in the inequality above, we get
$$
r \langle \bp^{\dag}, \bp \rangle \leq \alpha L \| r \bp^{\dag} \|
\Forall \bx \in B[\ba, r].
$$
1. Simplifying
$$
r \| \bp \|_* \leq \alpha L r \Forall \bx \in B[\ba, r].
$$
1. This means that $\alpha > 0$ must be true.
1. If $\alpha = 0$, then this inequality would require $\bp = \bzero$.
1. But $(\bp, -\alpha)$ is a nonzero vector describing the supporting hyperplane.
1. Going back to the supporting hyperplane inequality and putting $t=f(\bx)$,
we have
$$
\langle \bx, \bp \rangle - f(\bx) \alpha
\leq \langle \ba, \bp \rangle - f(\ba) \alpha
\Forall \bx \in S.
$$
1. Rearranging the terms, we get
$$
\alpha (f(\bx) - f(\ba)) \geq \langle \bx - \ba, \bp \rangle \Forall \bx \in S.
$$
1. Letting $\bg = \frac{1}{\alpha} \bp$ and dividing on both sides by
$\alpha$ (which is positive), we obtain
$$
f(\bx) - f(\ba) \geq \langle \bx - \ba, \bg \rangle \Forall \bx \in S.
$$
1. Rearranging again
$$
f(\bx) \geq f(\ba) + \langle \bx - \ba, \bg \rangle \Forall \bx \in S
$$
which is the subgradient inequality.
1. Thus, $\bg \in \partial f(\ba)$.
1. Thus, $\partial f(\ba)$ is nonempty.
We next show the boundedness of $\partial f(\ba)$.
1. Let $\bg \in \partial f(\ba)$.
1. Let $\bg^{\dag} \in \VV$ such that $\| \bg^{\dag} \| = 1$ and
$$
\| \bg \|_* = \langle \bg^{\dag}, \bg \rangle.
$$
1. Let $\bx = \ba + r \bg^{\dag}$.
1. Applying the subgradient inequality on $\bx$, we get:
$$
f(\bx) \geq f(\ba) + \langle r \bg^{\dag}, \bg \rangle
= f(\ba) + r \| \bg \|_*.
$$
1. Thus,
$$
r \| \bg \|_* \leq f(\bx) - f(\ba) \leq L \| \bx - \ba \| = L \| r \bg^{\dag} \| = L r.
$$
1. Thus, $\| \bg \|_* \leq L$ for every $\bg \in \partial f(\ba)$.
1. Thus, $\partial f(\ba)$ is bounded.
```
If $f$ is a proper convex function, then the only points
at which $f$ may not be subdifferentiable (i.e. the
subdifferential set is empty) are the points at the
frontier of $\dom f$ (i.e., $\dom f \setminus \interior \dom f$).
$f$ may be subdifferentiable on the frontier points too.
```{prf:corollary} Subdifferentiability of real valued convex functions
:label: res-cvxf-convex-subdiff-everywhere
Let $f: \VV \to \RR$ be a convex function. Then, $f$ is subdifferentiable
over $\VV$.
$$
\dom (\partial f) = \VV.
$$
```
```{prf:proof}
We have $\dom f = \VV$.
1. Let $\bx \in \VV$.
1. $\VV$ is open in $(\VV, \| \cdot \|)$.
1. Thus, $\bx \in \interior \VV = \dom f$.
1. By {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$\partial f(\bx)$ is nonempty and bounded as $\bx \in \interior \dom f$.
1. Hence, $f$ is subdifferentiable at $\bx$.
1. Since this is valid for every $\bx \in \VV$, hence $\dom (\partial f) = \VV$.
```
### Nonempty, Convex and Compact Subdifferentials
```{prf:theorem} Nonempty, convex and compact subdifferentials for proper convex functions
:label: res-cvxf-subdiff-proper-convex-interior
Let $f : \VV \to \RERL$ be a proper and convex function.
Let $\bx \in \interior S$. Then
$\partial f(\bx)$ is nonempty, convex and compact.
```
```{prf:proof}
Let $\bx \in \interior S$.
1. By {prf:ref}`res-cvxf-subdifferential-closed-convex`,
$\partial f(\bx)$ is closed and convex.
1. By {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$\partial f(\bx)$ is nonempty and bounded.
1. Since $\partial f(\bx)$ is closed and bounded, hence
it must be compact since $\VV$ is a finite dimensional normed linear space.
```
We present an alternative proof based on the min common/max crossing
framework developed in {ref}`sec:opt:duality:common:crossing`.
This proof can be skipped in first reading.
```{prf:proof}
This proof is based on
the second min common/max crossing theorem
{prf:ref}`res-opt-min-max-strong-duality-2`.
We fix some $\bx \in \interior S$.
We construct the set $M$ as
$$
M = \{ (\bu, t) \ST \bu \in \VV, f(\bx + \bu) \leq t \}.
$$
We first consider the min common problem.
1. Note that $(\bzero, f(\bx)) \in M$
since $f(\bx + \bzero) \leq f(\bx)$.
1. Further see that the min common value
$$
p^* = \inf_{(\bzero, p) \in M} p = f(\bx).
$$
1. Hence $p^*$ is finite.
1. Note that $\overline{M} = M$ where
$$
\overline{M} = \{ (\bx, t) \in \VV \oplus \RR \ST \text{ there exists }
\bar{t} \in \RR \text{ with }
\bar{t} \leq t \text{ and } (\bx, \bar{t}) \in M \}.
$$
1. $M$ is convex.
1. Let $(\bu_1, t_1), (\bu_2, t_2) \in M$ and let $r \in (0, 1)$.
1. Let $(\bu, t) = r (\bu_1, t_1) + (1-r) (\bu_2, t_2)$.
1. We have $f(\bx + \bu_1) \leq t_1$
and $f(\bx + \bu_2) \leq t_2$.
1. Now
$$
f(\bx + \bu)
&= f(\bx + r \bu_1 + (1-r) \bu_2) \\
&= f(r (\bx + \bu_1) + (1-r)(\bx + \bu_2)) \\
&\leq r f(\bx + \bu_1) + (1-r) f(\bx + \bu_2) \\
&\leq r t_1 + (1-r) t_2 = t.
$$
1. Hence $(\bu, t) \in M$.
1. Hence $M$ is convex.
Next consider the max crossing problem and strong duality.
1. We have
$$
q^* = \sup_{\ba \in \VV} q(\ba)
$$
where
$$
q(\ba) = \inf_{(\bu, t) \in M} \{ \langle \bu, \ba \rangle + t \}.
$$
1. The set of optimal solutions of the max crossing problem
is given by
$$
Q^* = \{ \ba \in \VV \ST q(\ba) = q^* \}
$$
1. For some $\ba \in Q^*$, we can attain strong duality with
$$
p^* = q^* = q(\ba)
$$
if and only if
$$
f(\bx) = \inf_{(\bu, t) \in M} \{ \langle \bu, \ba \rangle + t \}.
$$
1. Equivalently
$$
f(\bx) \leq f(\bx + \bu) + \langle \bu, \ba \rangle \Forall \bu \in \VV.
$$
1. Equivalently
$$
f(\bx + \bu) \geq f(\bx) + \langle \bu, -\ba \rangle \Forall \bu \in \VV.
$$
1. But this is nothing but the subgradient inequality with $-\ba$
as the subgradient at $\bx$.
1. In other words, strong duality is attained at $\ba$ as a solution
of the max crossing problem if and only if
$-\ba$ is a subgradient of $f$.
1. Hence $Q^*$ with strong duality
is given by $-\partial f(\bx)$.
We next establish the conditions for the
second min common/max crossing theorem
{prf:ref}`res-opt-min-max-strong-duality-2`
1. Consider the set
$$
D = \{ \bu \in \VV \ST \text{ there exists }
t \in \RR \text{ with } (\bu, t) \in M \}.
$$
1. It is easy to see that $D = S - \bx$.
1. Consider the set $T = S - \bx$.
1. Let $\bu \in T$.
1. Then $\bx + \bu \in S$.
1. Hence $f(\bx + \bu) \leq f(\bx + \bu)$.
1. Hence $(\bu, f(\bx + \bu)) \in M$.
1. Hence $\bu \in D$.
1. Hence $T = S - \bx \subseteq D$.
1. Let $\bu \notin T$.
1. Then $\bu + \bx \notin S$.
1. Hence $f(\bu + \bx) = \infty$.
1. Hence for every $t \in \RR$, $f(\bu + \bx) > t$.
1. Hence $(\bu, t) \notin M$ for every $t \in \RR$.
1. Hence $\bu \notin D$.
1. Hence $D = T = S - \bx$.
1. Since $\bx \in \interior S$, hence $\bzero \in \interior D$.
1. We see that all the conditions of the second min common/max crossing
theorem are satisfied.
1. $p^*$ is finite.
1. $\overline{M} = M$ is convex.
1. The set $D$ contains $\bzero$ in its interior.
1. Hence $-\partial f(\bx)$ is nonempty, convex and compact.
1. Hence $\partial f(\bx)$ is also nonempty, convex and compact.
```
### Subgradients over a Compact Set
```{prf:theorem} Subgradients over a compact set are nonempty and bounded
:label: res-cvxf-subg-compact-nonempty-bounded
Let $f: \VV \to \RERL$ be a proper convex function
with $S = \dom f$.
Let $A \subseteq \interior S$ be a nonempty and compact
subset of the interior of the domain of $f$. Then, the
set of subgradients over $A$ given by
$$
Y = \bigcup_{\bx \in A} \partial f(\bx)
$$
is nonempty and bounded.
```
```{prf:proof}
We are given that $A$ is nonempty and compact subset of interior of domain of $f$.
1. For any $\bx \in A$,
by {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$\partial f(\bx)$ is nonempty and bounded.
1. Thus, $Y$ is nonempty.
We next prove that $Y$ must be bounded also.
1. Let $T = \VV \setminus (\interior S)$.
1. $T$ is closed and $A$ is closed. $A \subseteq \interior S$.
Hence, $A \cap T = \EmptySet$.
1. Since $A$ is compact and $T$ is closed and $A \cap T = \EmptySet$,
hence distance between $A$ and $T$ is nonzero
due to {prf:ref}`res-ms-dist-disjoint-compact-closed`.
$$
r = d(A, T) > 0.
$$
1. Thus,
$$
\| \bx - \by \| \geq r \Forall \bx \in A, \by \notin \interior S.
$$
1. Let $s = \frac{r}{2}$.
1. Let $D = B[\bzero, s]$. $D$ is a closed and bounded
set. Hence, it is compact
due to {prf:ref}`res-la-ndim-compact-closed-bounded`.
1. Let $E = A + D$. Then $E \subseteq \interior S$.
1. Let $\by \in E$.
1. Then, there is $\bx \in A$ and $\bv \in D$ such that
$\by = \bx + \bv$.
1. Thus, $\by - \bx = \bv$.
1. Hence $\| \by - \bx \| \leq s < r$.
1. Since both $A$ and $D$ are compact, hence $E$ is compact
due to {prf:ref}`res-la-ndim-sum-compact`.
1. By {prf:ref}`res-cvxf-convex-local-lipschitz-continuous`, $f$
is local Lipschitz continuous at every $\bx \in E$
since $\bx \in E \subseteq \interior S$.
1. Then, by {prf:ref}`res-ms-compact-llc-lipschitz`,
$f$ is Lipschitz continuous on $E$.
1. Thus, there exists $L > 0$ such that
$$
|f(\bx) - f(\by)| \leq L \| \bx - \by \| \Forall \bx, \by \in E.
$$
1. Let $\bg \in Y$. Then, there is $\bx \in A$ such that
$\bg \in \partial f(\bx)$.
1. we can choose $\bg^{\perp} \in \VV$ such that
$$
\| \bg \|_* = \langle \bg^{\perp}, \bg \rangle
\text{ and }
\| \bg^{\perp} \| = 1.
$$
1. Now, let $\by = \bx + s \bg^{\perp}$. Then, $\by \in E$.
1. $ \| \by - \bx \| = \| s \bg^{\perp}\| = s$.
1. Thus, $s \bg^{\perp} \in D$.
1. Thus, $\by \in E$ since $\bx \in A$.
1. Also, $\bx \in E$ since $\bx = \bx + \bzero$ and $\bzero \in D$.
1. Consequently, by Lipschitz continuity
$$
|f(\by) - f(\bx)| \leq L \| \by - \bx \| = L s.
$$
1. By subgradient inequality at $\bx$
$$
f(\by) - f(\bx) \geq \langle \by - \bx, \bg \rangle
= s \langle \bg^{\perp}, \bg \rangle
= s \| \bg \|_*.
$$
1. Using the Lipschitz bound above, we get
$$
s \| \bg \|_* \leq L s.
$$
1. Thus, $\| \bg \|_* \leq L$.
1. Since $\bg$ was chosen arbitrarily, hence $Y$ is bounded.
```
We recall from {prf:ref}`def-cvx-relative-interior`
that the relative interior of a convex set is given by
$$
\relint C = \{\bx \in C \ST \exists r > 0,
B(\bx, r) \cap \affine C \subseteq C \}.
$$
It is interior of the convex set w.r.t. the
subspace topology of its affine hull.
### Nonempty Subdifferential at Relative Interior Points
```{prf:theorem} Nonemptiness of the subdifferential at relative interior points
:label: res-cvxf-relint-subdiff-nonempty
Let $f: \VV \to \RERL$ be a proper convex function
with $S = \dom f$.
Let $\bx \in \relint S$.
Then, $\partial f (\bx)$ is nonempty and has the form
$$
\partial f(\bx) = L^{\perp} + G
$$
where $L$ is the subspace that is parallel to $\affine S$
and $G$ is a nonempty and compact set.
In other words, for a proper convex function, the subdifferential
at the relative interior points of its domain is nonempty.
We have
$$
\relint \dom f \subseteq \dom (\partial f).
$$
```
The proof is based on the min common/max crossing
framework developed in {ref}`sec:opt:duality:common:crossing`.
It can be skipped in first reading.
It follows the proof of {prf:ref}`res-cvxf-subdiff-proper-convex-interior`.
```{prf:proof}
This proof is based on
the second min common/max crossing theorem
{prf:ref}`res-opt-min-max-strong-duality-2`.
We fix some $\bx \in \relint S$.
We construct the set $M$ as
$$
M = \{ (\bu, t) \ST \bu \in \VV, f(\bx + \bu) \leq t \}.
$$
We have already established the following in the proof of
{prf:ref}`res-cvxf-subdiff-proper-convex-interior`:
1. $M$ is convex.
1. $\overline{M} = M$.
1. $p^* = f(\bx)$.
1. $p^*$ is finite.
1. $q^* = \sup_{\ba \in \VV} q(\ba)$
where
$$
q(\ba) = \inf_{(\bu, t) \in M} \{ \langle \bu, \ba \rangle + t \}.
$$
1. $Q^* = \{ \ba \in \VV \ST q(\ba) = q^* \}$.
1. When the strong duality holds, then
$$
Q^* = -\partial f(\bx).
$$
1. The set
$$
D = \{ \bu \in \VV \ST \text{ there exists }
t \in \RR \text{ with } (\bu, t) \in M \}
= S - \bx.
$$
Continuing further
1. Since $\bx \in \relint S$, hence $\bzero \in \relint D$.
1. Hence $\affine D = \affine S - \bx = L$.
1. Hence by the second min common/max crossing theorem
$$
-\partial f(\bx) = Q^* = L^{\perp} + \tilde{Q}
$$
where $\tilde{Q}$ is a nonempty, convex and compact set.
1. Negating on both sides, we obtain
$$
\partial f(\bx) = L^{\perp} + G
$$
where $G$ is a nonempty, convex and compact set.
```
```{prf:corollary} Existence of points with nonempty subdifferential
:label: res-cvxf-subdiff-exist-relint
Let $f: \VV \to \RERL$ be a proper convex function
with $S = \dom f$.
Then, there exists $\bx \in S$ such that $\partial f(\bx)$
is nonempty.
```
```{prf:proof}
The effective domain of a proper convex function
is convex and nonempty.
1. By {prf:ref}`res-cvx-nonempty-relint`, the relative
interior of $S = \dom f$ is nonempty.
1. Thus, there exists $\bx \in S$.
1. By {prf:ref}`res-cvxf-relint-subdiff-nonempty`,
$\partial f(\bx)$ is nonempty.
```
### Unbounded Subdifferential
```{prf:theorem} Unboundedness condition for subdifferential
:label: res-cvxf-subdiff-relint-unbounded
Let $f: \VV \to \RERL$ be a proper convex function
with $S = \dom f$.
Assume that $\dim S < \dim \VV$.
Let $\bx \in S$.
If $\partial f(\bx) \neq \EmptySet$, then
$\partial f(\bx)$ is unbounded.
```
```{prf:proof}
We proceed as follows.
1. Let $n = \dim \VV$.
1. Let $A = \affine S$. $A$ is an affine set.
1. We have $\bx \in S \subseteq A$.
1. Then, $\WW = A - \bx$ is the subspace parallel to $A$.
1. Accordingly, $m = \dim \WW < \dim \VV = n$.
1. Then, the orthogonal complement of $\WW$ is a nontrivial subspace with dimension $n -m$.
1. Let $\bv \in \WW^{\perp}$ be a nonzero vector.
1. Then, $\langle \bw, \bv \rangle = 0$ for every $\bw \in \WW$.
1. Now let $\bg \in \partial f(\bx)$ be an arbitrary subgradient at $\bx$.
1. By subgradient inequality
$$
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle \Forall \by \in S.
$$
1. Note that both $\bx \in S$ and $\by \in S$.
1. Hence, $\by - \bx \in \WW$.
1. Thus, $\langle \by - \bx, \bv \rangle = 0$.
1. But then, for any $\alpha \in \RR$,
$$
\langle \by - \bx, (\bg + \alpha \bv) \rangle
&= \langle \by - \bx, \bg \rangle + \alpha \langle \by - \bx, \bv \rangle\\
&= \langle \by - \bx, \bg \rangle.
$$
1. Thus, if $\bg \in \partial f(\bx)$, then $\bg + \alpha \bv \in \partial f(\bx)$
for every $\alpha \in \RR$.
1. Thus, $\partial f(\bx)$ is unbounded.
```
## Directional Derivatives
The directional derivative of a proper convex function
is closely linked with its subdifferential.
To see this, let $\bx \in \interior \dom f$,
let $\bd \in \VV$ be a nonzero direction
and $t > 0$.
Let $\bg \in \partial f(\bx)$
and consider the subgradient inequality
$$
f(\bx + t \bd) \geq f(\bx) + \langle t \bd, \bg \rangle.
$$
Hence
$$
\frac{f(\bx + t \bd) - f(\bx)}{t} \geq \langle \bd, \bg \rangle
$$
We saw in {prf:ref}`res-cvxf-dir-der-infimum` that
$\frac{f(\bx + t \bd) - f(\bx)}{t}$ is a nondecreasing
quantity and
$$
f'(\bx; \bd) = \inf_{t > 0} \frac{f(\bx + t \bd) - f(\bx)}{t}.
$$
This establishes the basic relation
```{math}
:label: eq-cvxf-subg-dir-der-inequality
f'(\bx; \bd) \geq \langle \bd, \bg \rangle
```
for every $\bg \in \partial f(\bx)$.
In fact a stronger result is available in the form of
max formula.
### Max Formula
The max formula is one of the key results in this section.
It connects subgradients with directional derivatives.
```{prf:theorem} Max formula
:label: res-cvxf-subg-dir-der-max-formula
Let $f: \VV \to \RERL$ be a proper convex function
with $S = \dom f$.
Then for any $\bx \in \interior S$ and $\bd \in \VV$,
$$
f'(\bx;\bd) = \sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f(\bx) \}.
$$
In words, the directional derivative is the supremum of the
inner product of the subgradients with the direction.
```
```{prf:proof}
Let $\bx \in \interior S$ and $\bd \in \VV$.
1. Let $t > 0$. Then, by subgradient inequality
$$
f(\bx + t \bd) - f(\bx) \geq \langle t \bd , \bg \rangle
\Forall \bg \in \partial f(\bx).
$$
1. Thus,
$$
\frac{f(\bx + t \bd) - f(\bx)}{t} \geq \langle \bd , \bg \rangle
\Forall \bg \in \partial f(\bx).
$$
1. Taking the limit
$$
f'(\bx; \bd) = \lim_{t \to 0^+} \frac{f(\bx + t \bd) - f(\bx)}{t}
\geq \lim_{t \to 0^+} \langle \bd , \bg \rangle
= \langle \bd , \bg \rangle
$$
for every $\bg \in \partial f(\bx)$.
1. Taking the supremum over $\partial f(\bx)$ on the R.H.S., we obtain
$$
f'(\bx;\bd) \geq\sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f(\bx) \}.
$$
We now show that the inequality is indeed an equality.
1. Let $h : \VV \to \RR$ be given by
$$
h(\bv) = f'(\bx; \bv).
$$
1. By {prf:ref}`res-cvxf-dir-der-convex-homo`, $h$ is a real valued convex
function and nonnegative homogeneous.
1. By {prf:ref}`res-cvxf-convex-subdiff-everywhere`, $h$ is
subdifferentiable everywhere in $\VV$.
1. In particular, $h$ is subdifferentiable at $\bd$.
1. Let $\bg \in \partial h(\bd)$.
1. For any $\bv \in \VV$ and $t \geq 0$,
$$
t f'(\bx; \bv) = t h(\bv) = h (t \bv)
$$
since $h$ is nonnegative homogeneous.
1. By subdifferential inequality
$$
t f'(\bx; \bv) &= h(t \bv)\\
&\geq h(\bd) + \langle t \bv - \bd , \bg \rangle\\
&=f'(\bx; \bd) + \langle t \bv - \bd , \bg \rangle.
$$
1. Rearranging the terms,
$$
t (f'(\bx; \bv) - \langle \bv, \bg \rangle) \geq
f'(\bx; \bd) - \langle \bd, \bg \rangle.
$$
1. Since this inequality is valid for every $t \geq 0$,
hence the term $f'(\bx; \bv) - \langle \bv, \bg \rangle$
must be nonnegative.
Otherwise, the inequality will be invalided for large enough $t$.
Thus,
$$
f'(\bx; \bv) \geq \langle \bv, \bg \rangle.
$$
1. By {prf:ref}`res-cvxf-dir-der-underestimator`, for any $\by \in S$,
$$
f(\by) \geq f(\bx) + f'(\bx; \by - \bx).
$$
1. From previous inequality,
$$
f'(\bx; \by - \bx) \geq \langle \by - \bx, \bg \rangle.
$$
1. Thus, for any $\by \in S$,
$$
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle.
$$
1. But this is a subgradient inequality.
Hence, $\bg \in \partial f(\bx)$.
1. Taking $t=0$, in the subgradient inequality for $h$,
$$
0 \geq f'(\bx; \bd) - \langle \bd, \bg \rangle.
$$
1. Thus, there exists $\bg \in \partial f(\bx)$ such that
$$
f'(\bx; \bd) \leq \langle \bd, \bg \rangle.
$$
1. Consequently,
$$
f'(\bx;\bd) \leq\sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f(\bx) \}.
$$
Combining the two inequalities, we obtain the max formula:
$$
f'(\bx;\bd) = \sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f(\bx) \}.
$$
```
Recall from {prf:ref}`def-cvxf-support-function` that
support function for a set $C$ is given by
$$
\sigma_C (\bx) = \sup \{\langle \bx, \by \rangle \ST \by \in C \}.
$$
```{prf:corollary} Max formula as a support function
:label: res-cvxf-dir-der-subg-support
The max formula can be written as
$$
f'(\bx;\bd) = \sigma_{\partial f(\bx)} (\bd).
$$
```
## Differentiability
### Subdifferential and gradient
```{prf:theorem} Subdifferential at points of differentiability
:label: res-cvxf-subdiff-grad
Let $f : \VV \to \RERL$ be a proper convex function with $S = \dom f$.
Let $\bx \in \interior S$.
Then $f$ is differentiable at $\bx$ if and only if
$$
\partial f(\bx) = \{ \nabla f (\bx) \}.
$$
In other words, if $f$ is differentiable at $\bx$ then
its subdifferential is a singleton set consisting of the
gradient and if the subdifferential at $\bx$ is a singleton, then
$f$ is differentiable at $\bx$.
```
```{prf:proof}
Assume that $f$ is differentiable at $\bx$.
1. Let $\bd \in \VV$ be some direction.
1. By {prf:ref}`res-cvxf-grad-dir-der`,
$$
f'(\bx; \bd) = \langle \bd, \nabla f(\bx) \rangle.
$$
1. By {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$f$ is subdifferentiable at $\bx$ since $f$ is convex
and $\bx \in \interior S$.
1. Let $\bg \in \partial f(\bx)$.
1. By the max formula ({prf:ref}`res-cvxf-subg-dir-der-max-formula`):
$$
f'(\bx;\bd) \geq \langle \bd, \bg \rangle
$$
as the directional derivative is the supremum of the
inner product of the subgradients with the direction.
1. Thus,
$$
\langle \bd, \nabla f(\bx) \rangle \geq \langle \bd, \bg \rangle.
$$
1. In turn,
$$
\langle \bd, \bg - \nabla f(\bx) \rangle \leq 0.
$$
This holds for every $\bd \in \VV$.
1. By the definition of {prf:ref}`dual norm `
$$
\| \bg - \nabla f(\bx) \|_* = \underset{\| \bd \| \leq 1}{\sup}
\{ \langle \bd, \bg - \nabla f(\bx) \rangle \}.
$$
1. Using the previous inequality
$$
\| \bg - \nabla f(\bx) \|_* \leq 0.
$$
1. Since, dual norm is a norm, hence it cannot be negative. Thus,
$$
\| \bg - \nabla f(\bx) \|_* = 0
$$
1. Moreover, due to positive definiteness of a norm
$$
\bg - \nabla f(\bx) = \bzero
$$
must hold true.
1. Thus, $\bg = \nabla f(\bx)$.
1. In other words, if $\bg$ is a subgradient to $f$ at $\bx$,
then it must equal $\nabla f(\bx)$.
1. Thus, the only subgradient for $f$ at $\bx$ is $\nabla f(\bx)$.
1. Thus,
$$
\partial f(\bx) = \{ \nabla f(\bx) \}.
$$
For the converse, assume that $f$ is subdifferentiable at $\bx$
with $\partial f(\bx) = \{ \bg \}$.
1. By the subgradient inequality
$$
f(\bx + \bu) \geq f(\bx) + \langle \bu, \bg \rangle \Forall \bu \in \VV.
$$
1. Thus,
$$
f(\bx + \bu) - f(\bx) - \langle \bu, \bg \rangle \geq 0 \Forall \bu \in \VV.
$$
1. Define a function $h : \VV \to \RERL$ as
$$
h(\bu) \triangleq f(\bx + \bu) - f(\bx) - \langle \bu, \bg \rangle.
$$
1. We list some properties of $h$.
1. By definition $h(\bu) \geq 0$ for every $\bu \in \VV$.
1. $h$ is a convex function since $f(\bx + \bu)$ is convex,
$\langle \bu, \bg \rangle$ is linear and
$f(\bx)$ is a constant (w.r.t. the variable $\bu$).
1. $\dom h = \dom f - \bx = S - \bx$.
1. Thus, since $\bx \in \interior S$,
hence $\bzero = \bx - \bx \in \interior \dom h$.
1. $h(\bzero) = f(\bx) - f(\bx) - \langle \bzero, \bg \rangle = 0$.
1. If we are able to show that
$$
\lim_{\bu \to \bzero}\frac{h(\bu)}{\| \bu \|} = 0
$$
then, by the definition of gradient
({prf:ref}`def-cvxf-differentiability-proper`),
$$
\bg = \nabla f(\bx).
$$
1. We can easily show that $\partial h(\bzero) = \{ \bzero \}$.
1. If $\tilde{\bg}$ is a subgradient of $h$ at $\bzero$,
then by subgradient inequality
$$
h(\bu) \geq h(\bzero) + \langle \bu, \tilde{\bg} \rangle
= \langle \bu, \tilde{\bg} \rangle
\Forall \bu \in \VV.
$$
1. Then, $\tilde{\bg} = \bzero$ satisfies this inequality
since $h(\bu) \geq 0$ by definition.
1. For contradiction, assume a nonzero $\tilde{\bg}$ can satisfy this inequality.
1. Then,
$$
& h(\bu) \geq \langle \bu, \tilde{\bg} \rangle \\
& \iff f(\bx + \bu) - f(\bx) - \langle \bu, \bg \rangle \geq \langle \bu, \tilde{\bg} \rangle \\
& \iff f(\bx + \bu) \geq f(\bx) + \langle \bu, \tilde{\bg} + \bg \rangle \\
& \iff \tilde{\bg} + \bg \in \partial f(\bx).
$$
1. This contradicts the hypothesis that the subgradient of $f$ at $\bx$ is $\{ \bg \}$.
1. Thus, $\partial h(\bzero) = \{ \bzero \}$.
1. Then, max formula ({prf:ref}`res-cvxf-subg-dir-der-max-formula`):
$$
h'(\bzero; \bd) = \sigma_{\partial h (\bzero)} (\bd)
= \langle \bd, \bzero \rangle = 0.
$$
1. Thus, from the definition of
{prf:ref}`directional derivatives `
$$
0 = h'(\bzero; \bd)
= \lim_{\alpha \to 0^+} \frac{h(\alpha \bd) - h(\bzero)}{\alpha}
= \lim_{\alpha \to 0^+} \frac{h(\alpha \bd)}{\alpha}.
$$
1. Let us now introduce an orthonormal basis for $\VV$ as
$\{\be_1, \dots, \be_n \}$.
1. Assume that $\VV$ has been equipped with various
$\ell_p$ norms as described in {prf:ref}`rem-cvx-nd-space-norms`.
1. Since $\bzero \in \interior \dom h$, there exists
$r \in (0, 1)$ such that
$$
B_1[\bzero, r] \subseteq \dom h.
$$
1. It is a cross polytope of radius $r$ with $2n$ vertices given by
$\{\pm r \be_i \}_{i=1}^n$.
$$
B_1[\bzero, r] = \convex \{\pm r \be_i \}_{i=1}^n.
$$
1. Let us denote these $2n$ vectors as
$\bw_1, \dots, \bw_{2n}$.
1. By {prf:ref}`rem-cvx-nd-space-norms`
$$
B[\bzero, s] = B_2[\bzero, s] \subseteq B_1[\bzero, r]
$$
where $s = \frac{r}{\sqrt{n}}$.
1. Let $\bu \in B[\bzero, s^2]$ be a nonzero vector.
1. Since $r < 1$, hence $s < 1$, hence $s^2 < s$.
1. Let $\bv = s \frac{\bu}{\| \bu \|}$.
1. Then, $\bv \in B[\bzero, s] \subseteq B_1[\bzero, r]$.
1. Thus, $\bv \in \convex \{\bw_i \}_{i=1}^n$.
1. Thus, there exists $\bt \in \Delta_{2 n}$ such that
$$
s \frac{\bu}{\| \bu \|} = \bv = \sum_{i=1}^{2n} t_i \bw_i.
$$
1. Then,
$$
\frac{h(\bu)}{\| \bu \|}
&= \frac{h\left ( \frac{\| \bu \|}{s} s \frac{\bu}{\| \bu \|} \right ) }{\| \bu \|} & \\
&= \frac{h \left ( \sum_{i=1}^{2n} t_i \frac{\| \bu \|}{s} \bw_i \right ) }{\| \bu \|} & \text{convex combination} \\
&\leq \sum_{i=1}^{2n} t_i \frac{h \left ( \| \bu \| \frac{\bw_i}{s} \right ) }{\| \bu \|} & h \text{ is convex and } \bt \in \Delta_{2 n}\\
&\leq \underset{i=1,\dots, 2n}{\max} \left \{
\frac{h \left ( \| \bu \| \frac{\bw_i}{s} \right ) }{\| \bu \|}
\right \} & \text{since } \sum t_i = 1.
$$
Note that $\| \bu \| \frac{\bw_i}{s} \in B[\bzero, s] \subseteq B_1[\bzero, r] \subseteq \dom h$.
1. Now,
$$
\lim_{\bu \to \bzero} \frac{h \left ( \| \bu \| \frac{\bw_i}{s} \right ) }{\| \bu \|}
= \lim_{ \| \bu \| \to 0} \frac{h \left ( \| \bu \| \frac{\bw_i}{s} \right ) }{\| \bu \|}
= \lim_{ \alpha \to 0^+} \frac{h \left ( \alpha \frac{\bw_i}{s} \right ) }{\alpha}
= 0.
$$
1. Thus,
$$
\lim_{\bu \to \bzero} \frac{h(\bu)}{\| \bu \|} = 0.
$$
1. Thus, $\bg = \nabla f(\bx)$ as desired.
```
## Subdifferential Calculus
### Function Sums
```{prf:theorem} Subdifferential subadditivity with sum of functions
:label: res-cvxf-subdiff-function-sum
Let $f_1, f_2 : \VV \to \RERL$ be proper functions
with $S_1 = \dom f_1$ and $S_2 = \dom f_2$.
For any $\bx \in S_1 \cap S_2$
$$
\partial f_1(\bx) + \partial f_2(\bx) \subseteq \partial (f_1 + f_2)(\bx).
$$
```
```{prf:proof}
Let $f = f_1 + f_2$.
We note that $\dom f = \dom (f_1 + f_2) = \dom f_1 \cap \dom f_2 = S_1 \cap S_2$.
1. Let $\bx \in S_1 \cap S_2$.
1. Let $\bg \in \partial f_1(\bx) + \partial f_2(\bx)$.
1. Then, there exist $\bg_1 \in \partial f_1(\bx)$ and $\bg_2 \in \partial f_2(\bx)$
such that $\bg = \bg_1 + \bg_2$.
1. Then, by subgradient inequality, for any $\by \in S_1 \cap S_2$
$$
&f_1(\by) \geq f_1(\bx) + \langle \by - \bx , \bg_1 \rangle, \\
&f_2(\by) \geq f_2(\bx) + \langle \by - \bx , \bg_2 \rangle.
$$
1. Summing the two inequalities, we get
$$
f_1(\by) + f_2(\by) \geq f_1(\bx) + f_2(\bx) + \langle \by - \bx , \bg_1 + \bg_2 \rangle.
$$
1. Rewriting, for every $\by \in \dom f$
$$
(f_1 + f_2)(\by) \geq (f_1 + f_2)(\bx) + \langle \by - \bx , \bg \rangle.
$$
1. Thus, $\bg = \bg_1 + \bg_2 \in \partial (f_1 + f_2)(\bx)$ = \partial f (\bx).
1. Thus, $\partial f_1(\bx) + \partial f_2(\bx) \subseteq \partial f(\bx)$.
```
We can generalize this result for a finite sum of functions
using simple mathematical induction.
```{prf:corollary} Weak sum rule of subdifferential calculus
:label: res-cvxf-subdiff-weak-sum-rule
Let $f_1, \dots, f_m : \VV \to \RERL$ be proper functions.
For any $\bx \in \cap_{i=1}^m \dom f_i$
$$
\sum_{i=1}^m \partial f_i(\bx) \subseteq
\partial \left ( \sum_{i=1}^m f_i \right )(\bx).
$$
```
```{prf:theorem} Subdifferential additivity with sum of convex functions
:label: res-cvxf-subdiff-function-sum-convex
Let $f_1, f_2 : \VV \to \RERL$ be proper convex functions
with $S_1 = \dom f_1$ and $S_2 = \dom f_2$.
For any $\bx \in \interior S_1 \cap \interior S_2$
$$
\partial (f_1 + f_2) (\bx) = \partial f_1(\bx) + \partial f_2(\bx).
$$
```
```{prf:proof}
With $f = f_1 + f_2$,
by {prf:ref}`res-ms-int-intersect-int`,
$$
\interior \dom f = \interior (S_1 \cap S_2) = \interior S_1 \cap \interior S_2.
$$
1. Let $\bx \in \interior S_1 \cap \interior S_2$.
1. Thus, $\bx \in \interior \dom f$.
1. By {prf:ref}`max formula `,
for any $\bd \in \VV$,
$$
f'(\bx;\bd) = \sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f(\bx) \}
= \sigma_{\partial f(\bx)}(\bd).
$$
1. Since directional derivative is additive, hence
$$
f'(\bx;\bd) = f'_1(\bx;\bd) + f'_2(\bx;\bd).
$$
1. Expanding on this
$$
\sigma_{\partial f(\bx)} (\bd) &= f'(\bx;\bd)\\
&= f'_1(\bx;\bd) + f'_2(\bx;\bd)\\
&= \sup \{ \langle \bd, \bg_1 \rangle \ST \bg_1 \in \partial f_1(\bx) \}
+ \sup \{ \langle \bd, \bg_1 \rangle \ST \bg_2 \in \partial f_2(\bx) \}\\
&= \sup \{ \langle \bd, \bg_1 + \bg_2 \rangle \ST \bg_1 \in \partial f_1(\bx)
, \bg_2 \in \partial f_2(\bx) \}\\
&= \sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f_1(\bx)
+ \partial f_2(\bx) \}\\
&= \sigma_{\partial f_1(\bx) + \partial f_2(\bx)} (\bd).
$$
1. In summary, for every $\bd \in \VV$,
$$
\sigma_{\partial f(\bx)} (\bd) =
\sigma_{\partial f_1(\bx) + \partial f_2(\bx)} (\bd).
$$
1. By {prf:ref}`res-cvxf-subdifferential-closed-convex`,
$\partial f(\bx)$, $\partial f_1(\bx)$ and $\partial f_2(\bx)$
are closed and convex.
1. By {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$\partial f(\bx)$, $\partial f_1(\bx)$ and $\partial f_2(\bx)$
are nonempty and bounded.
1. Since $\partial f_1(\bx)$ and $\partial f_2(\bx)$ are
closed and bounded, hence they are compact ($\VV$ is finite dimensional).
1. Thus, $\partial f_1(\bx) + \partial f_2(\bx)$ is also
closed, bounded, convex and nonempty.
1. Thus, both $\partial f(\bx)$ and $\partial f_1(\bx) + \partial f_2(\bx)$
are nonempty, convex and closed.
1. Then, due to {prf:ref}`res-cvxf-support-func-equality-convex`,
$$
\partial f(\bx) = \partial f_1(\bx) + \partial f_2(\bx).
$$
```
We can generalize this result for a finite sum of proper convex functions
using simple mathematical induction.
```{prf:corollary} Sum rule of subdifferential calculus for proper convex functions at interior points
:label: res-cvxf-subdiff-sum-rule-proper-convex
Let $f_1, \dots, f_m : \VV \to \RERL$ be proper convex functions.
For any $\bx \in \cap_{i=1}^m \interior \dom f_i$
$$
\sum_{i=1}^m \partial f_i(\bx) =
\partial \left ( \sum_{i=1}^m f_i \right )(\bx).
$$
```
For real valued convex functions, the domain is the entire $\VV$
and interior of $\VV$ is $\VV$ itself.
```{prf:corollary} Sum rule of subdifferential calculus for real valued convex functions
:label: res-cvxf-subdiff-sum-rule-rv-convex
Let $f_1, \dots, f_m : \VV \to \RR$ be real valued convex functions.
For any $\bx \in \VV$
$$
\sum_{i=1}^m \partial f_i(\bx) =
\partial \left ( \sum_{i=1}^m f_i \right )(\bx).
$$
```
A more powerful result with less restrictive assumptions than
{prf:ref}`res-cvxf-subdiff-sum-rule-proper-convex` is possible
if the intersection of the relative interiors of the domains
of the individual functions is nonempty.
```{prf:theorem} Sum rule of subdifferential calculus for proper convex functions
:label: res-cvxf-subdiff-sum-rule-relint
Let $f_1, \dots, f_m : \VV \to \RERL$ be proper convex functions.
Assume that $\bigcap_{i=1}^m \relint \dom f_i \neq \EmptySet$.
Then for any $\bx \in \VV$
$$
\sum_{i=1}^m \partial f_i(\bx) =
\partial \left ( \sum_{i=1}^m f_i \right )(\bx).
$$
```
### Linear Transformations
```{div}
Our interest here is in compositions of the
form $h = f \circ \bAAA$ where $\bAAA$ is a linear transformation.
In other words $h (\bx) = f (\bAAA(\bx))$.
If $\bAAA : \VV \to \WW$ is a linear transformation
then $\bAAA^T : \WW^* \to \VV^*$ is a mapping from
$\WW^*$ to $\VV^*$ and satisfies the relationship:
$$
\langle \bAAA (\bx), \by \rangle = \langle \bx, \bAAA^T (\by) \rangle.
$$
From the definition of directional derivative, we have
$$
h'(\bx; \bd)
&= \lim_{t \downarrow 0} \frac{h(\bx + t \bd) - h(\bx)}{t} \\
&= \lim_{t \downarrow 0} \frac{f(\bAAA(\bx + t \bd)) - f(\bAAA(\bx))}{t}\\
&= \lim_{t \downarrow 0} \frac{f(\bAAA(\bx) + t \bAAA(\bd)) - f(\bAAA(\bx))}{t}\\
&= f'(\bAAA(\bx); \bAAA(\bd)).
$$
```
```{prf:theorem} Weak linear transformation rule of subdifferential calculus
:label: res-cvxf-subdiff-weak-rule-linear
Let $f: \WW \to \RERL$ be a proper function.
Let $\bAAA : \VV \to \WW$ be a linear transformation.
Define $h : \VV \to \RERL$ as
$$
h (\bx) = f (\bAAA(\bx)).
$$
Assume that $h$ is proper, i.e. $\dom h$ is not empty:
$$
\dom h = \{ \bx \in \VV \ST \bAAA(\bx) \in \dom f\} \neq \EmptySet.
$$
Then, for any $\bx \in \dom h$
$$
\bAAA^T (\partial f (\bAAA(\bx))) \subseteq \partial h(\bx).
$$
```
```{prf:proof}
We proceed as follows.
1. Let $\bx \in \dom h$.
1. Let $\bg \in \partial f(\bAAA(\bx))$.
1. By {prf:ref}`res-cvxf-subg-dir-der-max-formula`,
$$
\langle \bz, \bg \rangle \leq f'(\bAAA(\bx); \bz)
\Forall \bz \in \WW.
$$
1. Choosing $\bz = \bAAA(\bd)$, we have
$$
\langle \bAAA(\bd), \bg \rangle \leq f'(\bAAA(\bx); \bAAA(\bd))
\Forall \bd \in \VV.
$$
1. Equivalently
$$
\langle \bd, \bAAA^T(\bg) \rangle \leq h'(\bx; \bd)
\Forall \bd \in \VV.
$$
1. Hence $\bAAA^T(\bg) \in \partial h(\bx)$
due to {eq}`eq-cvxf-subg-dir-der-inequality`.
1. Hence $\bAAA^T (\partial f(\bAAA(\bx)) ) \subseteq \partial h(\bx)$.
```
```{prf:theorem} Strong linear transformation rule for subdifferential calculus
:label: res-cvxf-subdiff-rule-linear
Let $f: \WW \to \RERL$ be a proper convex function.
Let $\bAAA : \VV \to \WW$ be a linear transformation.
Define $h : \VV \to \RERL$ as
$$
h (\bx) = f (\bAAA(\bx)).
$$
Assume that $h$ is proper, i.e. $\dom h$ is not empty:
$$
\dom h = \{ \bx \in \VV \ST \bAAA(\bx) \in \dom f\} \neq \EmptySet.
$$
Then, for any $\bx \in \interior \dom h$ such that
$\bAAA(\bx) \in \interior \dom f$, we have:
$$
\bAAA^T (\partial f (\bAAA(\bx))) = \partial h(\bx).
$$
```
```{prf:proof}
We showed $\bAAA^T (\partial f (\bAAA(\bx))) \subseteq \partial h(\bx)$
in {prf:ref}`res-cvxf-subdiff-weak-rule-linear`.
We show the reverse inclusion by contradiction.
1. Let $\bx \in \interior \dom h$ such that
$\bAAA(\bx) \in \interior \dom f$.
1. Assume that there exists $\bd \in \partial h(\bx)$
such that $\bd \notin \bAAA^T (\partial f(\bAAA(\bx)) )$.
1. By {prf:ref}`res-cvxf-subdiff-proper-convex-interior`,
the set $\partial f(\bAAA(\bx))$ is nonempty, convex and compact.
1. Hence $\bAAA^T (\partial f(\bAAA(\bx)))$ is also
nonempty, convex and compact.
1. By strict separation theorem
({prf:ref}`res-cvxf-cl-convex-set-strict-separation`),
there exists a vector $\bp$ and a scalar $c$ such that
$$
\langle \bAAA^T(\bg), \bp \rangle
< c < \langle \bd, \bp \rangle \Forall \bg \in \partial f(\bAAA(\bx)).
$$
1. Equivalently
$$
\langle \bg, \bAAA(\bp) \rangle
< c < \langle \bd, \bp \rangle \Forall \bg \in \partial f(\bAAA(\bx)).
$$
1. Taking the supremum over $ \partial f(\bAAA(\bx))$ on the
L.H.S., we obtain
$$
\sup_{\bg \in \partial f(\bAAA(\bx))}
\langle \bg, \bAAA(\bp) \rangle
< \langle \bd, \bp \rangle.
$$
1. By the max formula
$$
f'(\bAAA(\bx); \bAAA(\bp)) < \langle \bd, \bp \rangle.
$$
1. But this means that
$$
h'(\bx; \bp) < \langle \bd, \bp \rangle.
$$
1. This contradicts the assumption that $\bd \in \partial h(\bx)$.
1. Hence we must have
$$
\bAAA^T (\partial f(\bAAA(\bx)) ) = \partial h(\bx).
$$
```
### Affine Transformations
```{prf:theorem} Weak affine transformation rule of subdifferential calculus
:label: res-cvxf-subdiff-weak-rule-affine
Let $f: \WW \to \RERL$ be a proper function.
Let $\bAAA : \VV \to \WW$ be a linear transformation.
Let $\bb \in \WW$.
Define $h : \VV \to \RERL$ as
$$
h (\bx) = f (\bAAA(\bx) + \bb).
$$
Assume that $h$ is proper, i.e. $\dom h$ is not empty:
$$
\dom h = \{ \bx \in \VV \ST \bAAA(\bx) + \bb \in \dom f\} \neq \EmptySet.
$$
Then, for any $\bx \in \dom h$
$$
\bAAA^T (\partial f (\bAAA(\bx) + \bb)) \subseteq \partial h(\bx).
$$
```
```{prf:proof}
We proceed as follows.
1. Let $\bx \in \dom h$.
1. Then, $\bx' = \bAAA(\bx) + \bb \in \dom f$ such that $h(\bx) = f(\bx')$.
1. Let $\bg \in \bAAA^T (\partial f (\bx'))$.
1. Then, there is $\bd \in \WW^*$ such that $\bg = \bAAA^T (\bd)$ with
$\bd \in \partial f(\bx')$.
1. Let $\by \in \dom h$.
1. Then, $\by' = \bAAA(\by) + \bb \in \dom f$ such that $h(\by)= f(\by')$.
1. Applying subgradient inequality for $f$ at $\bx'$
with the subgradient being $\bd$, we get
$$
f(\by') \geq
f(\bx') + \langle \by' - \bx', \bd \rangle.
$$
1. We have $h(\by) = f(\by')$, $h(\bx) = f(\bx')$ and
$\by' - \bx' = \bAAA(\by - \bx)$.
1. Thus, the subgradient inequality simplifies to
$$
h(\by) \geq h(\bx) + \langle \bAAA(\by - \bx), \bd \rangle.
$$
1. We note that
$$
\langle \bAAA(\by - \bx), \bd \rangle
= \langle \by - \bx, \bAAA^T(\bd) \rangle.
$$
1. Thus, for any $\by \in \dom h$, we have
$$
h(\by) \geq h(\bx) + \langle \by - \bx, \bAAA^T(\bd) \rangle.
$$
1. Thus, $\bg = \bAAA^T(\bd) \in \partial h (\bx)$.
Since this is valid for any $\bx \in \dom h$ and for every
$\bg \in \bAAA^T (\partial f (\bAAA(\bx) + \bb))$, hence
$$
\bAAA^T (\partial f (\bAAA(\bx) + \bb)) \subseteq h(\bx).
$$
```
```{prf:theorem} Affine transformation rule of subdifferential calculus
:label: res-cvxf-subdiff-strong-rule-affine
Let $f: \WW \to \RERL$ be a proper convex function.
Let $\bAAA : \VV \to \WW$ be a linear transformation.
Let $\bb \in \WW$.
Define $h : \VV \to \RERL$ as
$$
h (\bx) = f (\bAAA(\bx) + \bb).
$$
Assume that $h$ is proper, i.e. $\dom h$ is not empty:
$$
\dom h = \{ \bx \in \VV \ST \bAAA(\bx) + \bb \in \dom f\} \neq \EmptySet.
$$
Then, for any $\bx \in \interior \dom h$ such that
$\bAAA(\bx) + \bb \in \interior \dom f$, we have:
$$
\bAAA^T (\partial f (\bAAA(\bx) + \bb)) = \partial h(\bx).
$$
```
```{prf:proof}
We note that $h$ is a proper convex function since it is a composition
of an affine transformation with a proper convex function.
1. Let $\bx \in \interior \dom h$ such that $\bx' = \bAAA(\bx) + \bb \in \interior \dom f$.
1. Then, for any direction $\bd \in \VV$, by the
{prf:ref}`max formula `,
$$
h'(\bx; \bd) = \sigma_{\partial h(\bx)} (\bd).
$$
1. By the definition of the directional derivative, we have
$$
h'(\bx; \bd) &= \lim_{\alpha \to 0^+} \frac{h(\bx + \alpha \bd ) - h(\bx)}{\alpha}\\
&= \lim_{\alpha \to 0^+} \frac{f(\bAAA(\bx) + \bb + \alpha \bAAA(\bd) )
- f(\bAAA(\bx) + \bb}{\alpha}\\
&= f'(\bAAA(\bx) + \bb; \bAAA(\bd)).
$$
1. Thus,
$$
\sigma_{\partial h(\bx)} (\bd) = f'(\bAAA(\bx) + \bb; \bAAA(\bd)).
$$
1. Using the max formula on the R.H.S., we get
$$
\sigma_{\partial h(\bx)} (\bd) &= f'(\bAAA(\bx) + \bb; \bAAA(\bd))\\
&= \sup_{\bg \in \partial f(\bAAA(\bx) + \bb)} \langle \bAAA(\bd), \bg \rangle \\
&= \sup_{\bg \in \partial f(\bAAA(\bx) + \bb)} \langle \bd, \bAAA^T (\bg) \rangle \\
&= \sup_{\bg' \in \bAAA^T(\partial f(\bAAA(\bx) + \bb))} \langle \bd, \bg' \rangle \\
&= \sigma_{\bAAA^T(\partial f(\bAAA(\bx) + \bb))}(\bd).
$$
1. Since $\bx \in \interior \dom h$, hence
by {prf:ref}`res-cvxf-subdifferential-closed-convex` and
{prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`
$\partial h(\bx)$ is nonempty, closed and convex.
1. Since $\bAAA(\bx) + \bb \in \interior \dom f$, hence
by {prf:ref}`res-cvxf-subdifferential-closed-convex` and
{prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`
$\partial f(\bAAA(\bx) + \bb)$ is nonempty, closed and convex.
1. It follows that
$\bAAA^T(\partial f(\bAAA(\bx) + \bb))$ is nonempty, closed and convex
since $\bAAA^T$ is a linear operator and both $\VV$ and $\WW$ are
finite dimensional.
1. Then, due to {prf:ref}`res-cvxf-support-func-equality-convex`,
$$
\bAAA^T (\partial f (\bAAA(\bx) + \bb)) = \partial h(\bx).
$$
```
### Composition
Chain rule is a key principle in computing derivatives of composition of functions.
A chain rule is available for subgradient calculus also.
We first recall a result on the derivative of composition of real functions.
```{prf:theorem} Chain rule for real functions
:label: res-cvxf-subdiff-chain-rule-rf
Let $f : \RR \to \RR$ be a real function which is
continuous on $[a,b]$ with $a < b$.
Assume that $f'_+(a)$ exists.
Let $g : \RR \to \RR$ be another real function defined on an open interval
$I$ such that $\range f \subseteq I$.
Assume $g$ is differentiable at $f(a)$.
Then the composite real function $h : \RR \to \RR$
given by
$$
h(t) \triangleq g (f (t)) \quad (a \leq t \leq b)
$$
is right differentiable at $t=a$. In particular,
$$
h'_+(a) = g'(f(a)) f'_+(a).
$$
```
```{prf:proof}
We show this by working with the definition of right hand derivative as a limit
$$
h'_+(a) &= \lim_{t \to a^+} \frac{h(t) - h(a)}{t - a} \\
&= \lim_{t \to a^+} \frac{g(f(t)) - g(f(a))}{t - a} \\
&= \lim_{t \to a^+} \frac{g(f(t)) - g(f(a))}{f(t) - f(a)} \frac{f(t) - f(a)}{t - a} \\
&= \lim_{z \to f(a)} \frac{g(z) - g(f(a))}{z - f(a)} \lim_{t \to a^+} \frac{f(t) - f(a)}{t - a} \\
&= g'(f(a)) f'_+(a).
$$
```
We can now develop a chain rule for subdifferentials
of multidimensional functions with the help of max formula.
```{prf:theorem} Subdifferential chain rule
:label: res-cvxf-subdiff-chain-rule
Let $f : \VV \to \RR$ be convex and let $g : \RR \to \RR$ be a
nondecreasing convex function.
Let $\bx \in \VV$.
Assume that $g$ is differentiable at $f(\bx)$.
Let $h = g \circ f$. Then
$$
\partial h (\bx) = g'(f(\bx)) \partial f(\bx).
$$
```
```{prf:proof}
We are given $\bx \in \VV$ at which $g$ is differentiable and $f$ is convex.
1. Since $f$ is convex and $g$ is nondecreasing convex function, hence $h$ is also convex.
1. We now introduce two real functions parametrized on $\bx$ and
an arbitrary direction $\bd \in \VV$
$$
& f_{\bx, \bd} (t) = f(\bx + t \bd), t \in \RR \\
& h_{\bx, \bd} (t) = h(\bx + t \bd), t \in \RR
$$
1. It is now easy to see that
$$
h_{\bx, \bd} (t) = h(\bx + t \bd) = g ( f (\bx + t\bd)) = g (f_{\bx, \bd} (t)).
$$
1. Thus, $h_{\bx, \bd} = g \circ f_{\bx, \bd}$.
1. Since $f_{\bx, \bd}$ and $ h_{\bx, \bd}$ are restrictions of $f$ and $h$
along a line, they are also convex.
1. Due to {prf:ref}`res-cvxf-dir-der-exist-convex`, the directional derivatives of $f$
and $h$ exist in every direction.
1. By the definition of directional derivative {prf:ref}`def-cvxf-directional-derivative`,
$$
& (f_{\bx, \bd})'_+ (0) = f'(\bx; \bd), \\
& (h_{\bx, \bd})'_+ (0) = h'(\bx; \bd).
$$
1. Also note that $f_{\bx, \bd} (0) = f(\bx)$ and $h_{\bx, \bd} (0) = h(\bx)$.
1. $f_{\bx, \bd}$ is right differentiable at $t=0$,
and $g$ is differentiable at $f(\bx)$.
1. Hence, by the chain rule in {prf:ref}`res-cvxf-subdiff-chain-rule-rf`,
$$
h'(\bx; \bd) = g'(f(\bx)) f'(\bx; \bd).
$$
1. By the max formula in {prf:ref}`res-cvxf-dir-der-subg-support`,
$$
& h'(\bx; \bd) = \sigma_{\partial h (\bx)} (\bd) \\
& f'(\bx; \bd) = \sigma_{\partial f (\bx)} (\bd).
$$
1. Thus,
$$
\sigma_{\partial h (\bx)} (\bd) &= h'(\bx; \bd) \\
&= g'(f(\bx)) f'(\bx; \bd) \\
&= g'(f(\bx)) \sigma_{\partial f (\bx)} (\bd) \\
&= \sigma_{g'(f(\bx)) \partial f (\bx)} (\bd).
$$
The last step is due to {prf:ref}`res-cvxf-support-func-Homogeneity`.
Since $g$ is nondecreasing, hence $g'(f(\bx)) \geq 0$.
1. By {prf:ref}`res-cvxf-subdifferential-closed-convex` and
{prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
the sets $\partial f(\bx)$ and $\partial h(\bx)$ are nonempty,
closed and convex.
1. Then, the set $g'(f(\bx)) \partial f (\bx)$ is also nonempty,
closed and convex.
1. Thus, by {prf:ref}`res-cvxf-support-func-equality-convex`,
$$
\partial h (\bx) = g'(f(\bx)) \partial f (\bx).
$$
````
Applications of this rule are presented later
in {prf:ref}`ex-cvxf-subdiff-l1-norm-squared`.
### Max Rule
```{prf:theorem} Max rule of subdifferential calculus
:label: res-cvxf-subdiff-calculus-max-rule
Let $f_1, f_2, \dots, f_m : \VV \to \RERL$ be a set of
proper convex functions. Let $f : \VV \to \RERL$ be given by:
$$
f(\bx) = \max \{ f_1(\bx), f_2(\bx), \dots, f_m(\bx)\}.
$$
Let $\bx \in \bigcap_{i=1}^m \interior \dom f_i$ be a point common to the
interiors of domains of all the functions.
The subdifferential set of $f$ at $\bx$ can be obtained from
the subdifferentials of $f_i$ as follows:
$$
\partial f(\bx) = \convex \left ( \bigcup_{i\in I(\bx)} \partial f_i (\bx) \right )
$$
where $I(\bx) = \{ i \ST f_i(\bx) = f(\bx)\}$.
```
```{prf:proof}
Since $f_i$ are proper convex, hence their pointwise maximum $f$ is proper convex.
1. Let $I(\bx) = \{ i \in 1,\dots,m \ST f_i(\bx) = f(\bx)\}$.
1. For any (nonzero) direction, $\bd \in \VV$,
by {prf:ref}`res-cvxf-dir-der-max-convex-funcs`:
$$
f'(\bx; \bd) = \underset{i \in I(\bx)}{\max} f'_i(\bx;\bd).
$$
1. Without loss of generality, let us assume that $I(\bx) = 1,\dots,k$
for some $k \in 1,\dots,m$. This can be achieved by reordering $f_i$.
1. By max formula ({prf:ref}`res-cvxf-subg-dir-der-max-formula`),
$$
f_i'(\bx;\bd) = \sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f_i(\bx) \}.
$$
1. Thus,
$$
f'(\bx; \bd) = \underset{i \in 1,\dots,k}{\max}
\underset{\bg_i \in \partial f_i(\bx)}{\sup} \langle \bd, \bg_i \rangle.
$$
1. Recall that for any $a_1, \dots, a_k \in \RR$, the identity below holds:
$$
\max \{ a_1, \dots, a_k \} = \underset{\bt \in \Delta_k }{\sup} \sum_{i=1}^k t_i a_i.
$$
1. Thus, we can expand $f'(\bx; \bd)$ as:
$$
f'(\bx; \bd)
&= \underset{i \in 1,\dots,k}{\max}
\underset{\bg_i \in \partial f_i(\bx)}{\sup} \langle \bd, \bg_i \rangle\\
&= \underset{\bt \in \Delta_k }{\sup} \left \{ \sum_{i=1}^k t_i
\underset{\bg_i \in \partial f_i(\bx)}{\sup} \langle \bd, \bg_i \rangle
\right \} \\
&= \underset{\bt \in \Delta_k }{\sup} \left \{
\sum_{i=1}^k
\sup
\left \langle \bd, t_i \bg_i \right \rangle
\ST \bg_i \in \partial f_i(\bx) \right \} \\
&= \sup \left \{
\left \langle \bd, \sum_{i=1}^k t_i \bg_i \right \rangle
\ST \bg_i \in \partial f_i(\bx), \bt \in \Delta_k \right \} \\
&= \sup \left \{
\left \langle \bd, \bg \right \rangle
\ST \bg \in \convex \left (\bigcup_{i=1}^k \partial f_i(\bx) \right) \right \} \\
&= \sigma_A (\bd)
$$
where $A = \convex \left (\bigcup_{i=1}^k \partial f_i(\bx) \right )$
and $\sigma$ denotes the support function.
1. Since $\bx \in \interior \dom f$, hence, by the
max formula ({prf:ref}`res-cvxf-dir-der-subg-support`)
$$
f'(\bx;\bd) = \sigma_{\partial f(\bx)} (\bd).
$$
1. Thus, we have
$$
\sigma_{\partial f(\bx)} (\bd) = \sigma_A {\bd}.
$$
1. By {prf:ref}`res-cvxf-subdifferential-closed-convex`, $\partial f(\bx)$
is closed and convex.
1. By {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$\partial f(\bx)$ is nonempty and bounded.
1. Thus, $\partial f(\bx)$ is nonempty, closed and convex.
1. Similarly, $\partial f_i(\bx)$ are nonempty, closed, convex and
bounded.
1. Thus, $\bigcup_{i=1}^k \partial f_i(\bx)$
is a finite union of nonempty, closed, convex and bounded sets.
1. Thus, $\bigcup_{i=1}^k \partial f_i(\bx)$
is also nonempty and compact.
1. A finite union of nonempty sets is nonempty.
1. A finite union of bounded sets is bounded.
1. A finite union of closed sets is closed.
1. Thus, $\bigcup_{i=1}^k \partial f_i(\bx)$ is closed and bounded.
1. Since $\VV$ is finite dimensional, hence closed and bounded
sets are compact.
1. Since $A$ is a convex hull of $\bigcup_{i=1}^k \partial f_i(\bx)$,
hence $A$ is nonempty, closed and convex.
1. Recall from {prf:ref}`res-cvxf-convex-hull-compact` that
convex hull of a compact set is compact.
1. Also, recall that compact sets are closed and bounded.
1. Since $\sigma_{\partial f(\bx)} (\bd) = \sigma_A (\bd)$
is true for any $\bd \in \VV$,
the support functions for the underlying nonempty, closed and convex
set are equal.
Hence by {prf:ref}`res-cvxf-support-func-equality-convex`,
$$
\partial f(\bx) = A = \convex \left ( \bigcup_{i=1}^k \partial f_i(\bx) \right ).
$$
```
Some applications of this rule are presented later in
{prf:ref}`ex-cvxf-subdiff-linf-norm`,
{prf:ref}`ex-cvxf-subdiff-ax-b-inf-norm`,
{prf:ref}`ex-cvxf-subdiff-max-func`,
{prf:ref}`ex-cvxf-subdiff-piecewise-linear-function`.
We now present a weaker version of the max rule which is applicable
for pointwise supremum over an arbitrary set of functions.
```{prf:theorem} Weak max rule of subdifferential calculus
:label: res-cvxf-subdiff-calculus-weak-max-rule
Let $I$ be an arbitrary index set and suppose that
for every $i \in I$, there exists a proper convex
function $f_i : \VV \to \RERL$.
Let $f : \VV \to \RERL$ be given by:
$$
f(\bx) = \sup_{i \in I} \{ f_i(\bx)\}.
$$
Then for any $\bx \in \dom f$,
$$
\convex \left ( \bigcup_{i\in I(\bx)} \partial f_i (\bx) \right )
\subseteq \partial f(\bx)
$$
where $I(\bx) = \{ i \in I \ST f_i(\bx) = f(\bx)\}$.
In words, if $f_i(\bx) = f(\bx)$, then a subgradient of $f_i$
at $\bx$ is also a subgradient of $f$ at $\bx$.
Also, for all $i \in I$ such that $f_i(\bx) = f(\bx)$,
any convex combination of their subgradients at $\bx$ is also a subgradient
of $f$ at $\bx$.
```
```{prf:proof}
Pick some $\bx \in \dom f$.
1. Let $\bz \in \dom f$ be arbitrary.
1. Let $I(\bx) = \{ i \in I \ST f_i(\bx) = f(\bx)\}$.
1. Let $i \in I(\bx)$ be arbitrary.
1. Let $\bg \in \partial f_i(\bx)$ be a subgradient of $f_i$ at $\bx$.
1. Then, by definition of $f$ and subgradient inequality:
$$
f(\bz) \geq f_i(\bz) \geq f_i(\bx) + \langle \bz - \bx, \bg \rangle
= f(\bx) + \langle \bz - \bx, \bg \rangle.
$$
We used the fact that $f_i(\bx) = f(\bx)$ for $i \in I(\bx)$.
1. Thus, $\bg \in \partial f(\bx)$. $\bg$ is a subgradient of $f$
at $\bx$.
1. Since this is valid for every subgradient of $f_i$ at $\bx$,
hence $\partial f_i(\bx) \subseteq \partial f(\bx)$.
1. Since this is valid for every $i \in I(\bx)$, hence
$$
\bigcup_{i\in I(\bx)} \partial f_i (\bx) \subseteq \partial f(\bx).
$$
1. Recall from {prf:ref}`res-cvxf-subdifferential-closed-convex`
that $\partial f(\bx)$ is convex.
1. Thus, it contains the convex hull of any of its subsets.
Hence,
$$
\convex \left ( \bigcup_{i\in I(\bx)} \partial f_i (\bx) \right )
\subseteq \partial f(\bx).
$$
```
Next is an example application of the weak max rule.
```{prf:example} Subgradient of $\lambda_{\max}(\bA_0 + \sum_{i=1}^m x_i \bA_i$
:label: ex-cvxf-subdiff-max-eigen-val-m-sym-mat-sum
Let $\bA_0, \bA_1, \dots, \bA_m \in \SS^n$ be $m+1$ given symmetric
matrices. Define an affine transformation $\bAAA : \RR^m \to \SS^n$
as
$$
\bAAA(\bx) \triangleq \bA_0 + \sum_{i=1}^m x_i \bA_i.
$$
For every vector $\bx \in \RR^m$, this mapping
defines a symmetric matrix $\bAAA(\bx)$.
We can compute the largest eigen value of $\bAAA(\bx)$.
We introduce a function $f: \RR^m \to \RR$ as
$$
f(\bx) \triangleq \lambda_{\max} (\bAAA(\bx))
= \lambda_{\max} (\bA_0 + \sum_{i=1}^m x_i \bA_i).
$$
Our task is to find a subgradient of $f$ at $\bx$.
1. Recall from the definition of largest eigen values,
$$
f(\bx) = \underset{\by \in \RR^n; \| \by \|_2 = 1 }{\sup} \by^T \bAAA(\bx) \by.
$$
1. For every $\by \in \RR^n$ such that $\| \by \|_2 = 1$,
we can define a function:
$$
f_{\by}(\bx) \triangleq \by^T \bAAA(\bx) \by.
$$
1. Then,
$$
f(\bx) = \underset{\by \in \RR^n; \| \by \|_2 = 1 }{\sup} f_{\by}(\bx).
$$
1. The function $f_{\by}(\bx)$ is affine (in $\bx$) for every $\by$.
1. Thus, $f_{\by}$ is convex for every $\by$.
1. Thus, $f$ is a pointwise supremum of a family of
functions $f_{\by}$.
1. Thus, $f$ is also convex (see {prf:ref}`res-cvx-ptws-supremum`).
1. Consequently, we can use the weak max rule
{prf:ref}`res-cvxf-subdiff-calculus-weak-max-rule`
to identify a subgradient of $f$ at $\bx$.
1. Let $\tilde{\by}$ be a normalized eigenvector of $\bAAA(\bx)$
corresponding to its largest eigenvalue. Then
$$
f(\bx) = \tilde{\by}^T \bAAA(\bx) \tilde{\by}.
$$
1. This means that $f(\bx) = f_{\tilde{\by}}(\bx)$.
1. By the weak max rule, a subgradient of $f_{\tilde{\by}}$ at $\bx$
is also a subgradient of $f$ at $\bx$.
1. Expanding $f_{\tilde{\by}}(\bx)$:
$$
f_{\tilde{\by}}(\bx) = \tilde{\by}^T \bAAA(\bx) \tilde{\by}
= \tilde{\by}^T \bA_0\tilde{\by}
+ \sum_{i=1}^m \tilde{\by}^T \bA_i \tilde{\by} x_i.
$$
1. Then, the gradient of $f_{\tilde{\by}}$ at $\bx$
(computed by taking partial derivatives w.r.t. $x_i$) is
$$
\nabla f_{\tilde{\by}}(\bx) =
(\tilde{\by}^T \bA_1 \tilde{\by}, \dots, \tilde{\by}^T \bA_m \tilde{\by}).
$$
1. Since $f_{\by}$ is affine (thus convex), hence its gradient
is also a subgradient.
1. Thus,
$$
(\tilde{\by}^T \bA_1 \tilde{\by}, \dots, \tilde{\by}^T \bA_m \tilde{\by})
\in \partial f(\bx).
$$
```
## Lipschitz Continuity
```{prf:theorem} Lipschitz continuity and boundedness of the subdifferential sets
:label: res-cvxf-subdiff-bounded-lipschitz-continuous
Let $f : \VV \to \RERL$ be a proper convex function.
Suppose that $X \subseteq \interior \dom f$.
Consider the following two claims:
1. $| f(\bx) - f(\by) | \leq L \| \bx - \by \|$ for any $\bx, \by \in X$.
1. $ \| \bg \|_* \leq L$ for any $\bg \in \partial f(\bx)$ where $\bx \in X$.
Then,
* (2) implies (1). In other words, if subgradients are bounded
then, the function is Lipschitz continuous.
* If $X$ is open, then (1) holds if and only if (2) holds.
In other words, if the subgradients over a set $X$
are bounded then $f$ is Lipschitz continuous over $X$.
If $X$ is open then $f$ is Lipschitz continuous over $X$
if and only if the subgradients over $X$ are bounded.
```
```{prf:proof}
(a) We first show that $(2) \implies (1)$.
1. Assume that (2) is satisfied.
1. Pick any $\bx, \by \in X$.
1. Since $f$ is proper and convex and $\bx, \by \in \interior \dom f$,
hence due to {prf:ref}`res-cvxf-proper-interior-subdiff-nonempty-bounded`,
$\partial f(\bx)$ and $\partial f(\by)$ are nonempty.
1. Let $\bg_x \in \partial f(\bx)$
and $\bg_y \in \partial f(\by)$.
1. By subgradient inequality
$$
& f(\by) \geq f(\bx) + \langle \by - \bx, \bg_x \rangle; \\
& f(\bx) \geq f(\by) + \langle \bx - \by, \bg_y \rangle.
$$
1. We can rewrite this as
$$
& f(\bx) - f(\by) \leq \langle \bx - \by , \bg_x \rangle; \\
& f(\by) - f(\bx) \leq \langle \by - \bx , \bg_y \rangle.
$$
1. By generalized Cauchy Schwartz inequality ({prf:ref}`res-la-ip-gen-cs-ineq`),
$$
\langle \bx - \by , \bg_x \rangle \leq \| \bx - \by \| \| \bg_x \|_*
\leq L \| \bx - \by \|; \\
\langle \by - \bx , \bg_y \rangle \leq \| \by - \bx \| \| \bg_y \|_*
\leq L \| \bx - \by \|.
$$
1. Combining the two inequalities, we get
$$
| f(\bx) - f(\by) | \leq L \| \bx - \by \|.
$$
1. Thus, $(2) \implies (1)$.
(b) If $X$ is open, then we need to show that $(1) \iff (2)$.
1. We have already shown that $(2) \implies (1)$.
1. Assume that $X$ is open and $(1)$ holds.
1. Let $\bx \in X$.
1. Since $\bx$ is an interior point of $\dom f$, hence
the subdifferential is nonempty.
1. Pick any $\bg \in \partial f(\bx)$.
1. Let $\bg^{\dag} \in \VV$ be a vector with $\| \bg^{\dag} \|=1$
and $\langle \bg^{\dag}, \bg \rangle = \| \bg \|_*$.
Such a vector exists by definition of the dual norm.
1. Since $X$ is open, we can choose $\epsilon > 0$ small enough
such that $\bx + \epsilon \bg^{\dag} \in X$.
1. By the subgradient inequality, we have:
$$
f(\bx + \epsilon \bg^{\dag}) \geq f(\bx) + \langle \epsilon \bg^{\dag}, \bg \rangle.
$$
1. Thus,
$$
\epsilon \| \bg \|_*
&= \langle \epsilon \bg^{\dag}, \bg \rangle \\
&\leq f(\bx + \epsilon \bg^{\dag}) - f(\bx) \\
&\leq L \| (\bx + \epsilon \bg^{\dag} - \bx \| & \text{ by hypothesis in (1)} \\
&= L \epsilon \| \bg^{\dag} \| = L \epsilon.
$$
1. Canceling $\epsilon$, we get:
$$
\| \bg \|_* \leq L
$$
holds true for every $\bg \in \partial f(\bx)$ where $\bx \in X$ as desired.
```
```{prf:corollary} Lipschitz continuity of convex functions over compact domains
:label: res-cvxf-convex-func-compact-dom-lipschitz-cont
Let $f: \VV \to \RERL$ be a proper and convex function. Suppose that
$X \subseteq \interior \dom f$ is compact.
Then, there exists $L > 0$ such that
$$
| f(\bx) -f(\by) | \leq L \| \bx - \by \| \quad \Forall \bx, \by \in X.
$$
```
```{prf:proof}
Recall from {prf:ref}`res-cvxf-subg-compact-nonempty-bounded`
that the subgradients of a proper convex function over
a compact set are nonempty and bounded.
1. In other words, the set
$$
Y = \bigcup_{\bx \in X} \partial f(\bx)
$$
is nonempty and bounded.
1. Thus, for every $\bg \in Y$, there exists $L > 0$
such that $\| \bg \|_* \leq L$.
1. Then by {prf:ref}`res-cvxf-subdiff-bounded-lipschitz-continuous`,
$$
| f(\bx) - f(\bx)| \leq L \| \bx - \by \| \Forall \bx, \by \in X.
$$
1. Thus, $f$ is indeed Lipschitz continuous over $X$.
```
## $\epsilon$-Subgradients
```{index} Approximate subgradient
```
````{prf:definition} $\epsilon$-Subgradient
:label: def-cvxf-e-subgradient
Let $f : \VV \to \RERL$ be a proper function.
Let $\bx \in \dom f$.
A vector $\bg \in \VV^*$ is called an $\epsilon$-*subgradient*
of $f$ at $\bx$
if
```{math}
:label: eq-cvxf-e-subgradient-inequality
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle - \epsilon \Forall \by \in \VV.
```
````
### Geometric Interpretation
```{prf:observation} $\epsilon$-subgradient and supporting hyperplane
:label: res-cvxf-e-subgradient-supporting-hyperplane
Let $f: \VV \to \RERL$ be a proper function.
Then $\bg$ be an $\epsilon$-subgradient of $f$ at $\bx$
if and only if $\epi f$ is contained in the positive
halfspace of the
hyperplane with a normal $(-\bg, 1)$
passing through $(\bx, f(\bx) - \epsilon)$.
```
```{prf:proof}
Let $H$ denote the hyperplane
$$
H = \{
(\by, t) \ST \langle \by, -\bg \rangle + t
= \langle \bx, -\bg \rangle + f(\bx) - \epsilon \}.
$$
The positive halfspace of $H$ is given by
$$
H_+ = \{
(\by, t) \ST \langle \by, -\bg \rangle + t
\geq \langle \bx, -\bg \rangle + f(\bx) - \epsilon \}.
$$
Assume that $\bg$ is an $\epsilon$-subgradient of $f$ at $\bx$.
1. For any $(\by, t) \in \epi f$, we have
$$
t \geq f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle - \epsilon.
$$
1. This is equivalent to
$$
\langle \by, -\bg \rangle + t \geq
\langle \bx, -\bg \rangle + f(\bx) - \epsilon
$$
for all $(\by, t) \in \epi f$.
1. Hence $\epi f \subseteq H_+$.
Now assume that $\epi f \subseteq H_+$.
1. Let $(\by, f(\by)) \in \epi f$.
1. Then we have
$$
\langle \by, -\bg \rangle + f(\by) \geq \langle \bx, -\bg \rangle + f(\bx) - \epsilon.
$$
1. Rearranging the terms, we have
$$
f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle - \epsilon \Forall \by \in \VV.
$$
1. But this means that $\bg$ is an $\epsilon$-subgradient of $f$ at $\bx$.
```
### $\epsilon$-Subdifferential
```{index} Approximate Subdifferential
```
```{prf:definition} $\epsilon$-subdifferential
:label: def-cvxf-e-subdifferential
Let $f : \VV \to \RERL$ be a proper function.
The set of all $\epsilon$-subgradients of $f$ at a point $\bx \in \dom f$
is called the $\epsilon$-*subdifferential* of $f$ at $\bx$ and is denoted by
$\partial_{\epsilon} f(\bx)$.
$$
\partial_{\epsilon} f(\bx) \triangleq
\{ \bg \in \VV^* \ST f (\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle
- \epsilon
\Forall \by \in \VV \}.
$$
For all $\bx \notin \dom f$, we define $\partial_{\epsilon} f(\bx) = \EmptySet$.
```
It is easy to see that
$$
\partial f(\bx) \subseteq \partial_{\epsilon} f(\bx).
$$
Also, if $\epsilon_2 \geq \epsilon_1 > 0$, then
$$
\partial_{\epsilon_1} f(\bx) \subseteq \partial_{\epsilon_2} f(\bx).
$$
## Optimality Conditions
A well known result for differentiable functions is that
at the point of optimality $\nabla f(\bx) = \bzero$
(see {prf:ref}`res-opt-first-order-optimality-local`).
Subdifferentials are useful in characterizing the
minima of a function. The idea of vanishing gradients
can be generalized for subgradients also.
```{prf:theorem} Fermat's optimality condition
:label: res-cvxf-subdiff-fermat-optimality
Let $f : \VV \to \RERL$ be a proper convex function.
Then
$$
\ba \in \argmin \{ f(\bx) \ST \bx \in \VV \}
$$
if and only if $\bzero \in \partial f(\ba)$.
In other words, $\ba$ is a minimizer of $f$ if and only if
$\bzero$ is a subgradient of $f$ at $\ba$.
```
```{prf:proof}
Assume that $\bzero \in \partial f(\ba)$ where $\ba \in \dom f$.
1. By subgradient inequality
$$
f(\bx) \geq f(\ba) + \langle \bx - \ba, \bzero \rangle \Forall \bx \in \VV.
$$
1. This simplifies to
$$
f(\bx) \geq f(\ba) \Forall \bx \in \VV.
$$
1. Thus, $\ba \in \argmin \{ f(\bx) \ST \bx \in \VV \}$.
For the converse, assume that $\ba \in \argmin \{ f(\bx) \ST \bx \in \VV \}$.
1. Then,
$$
f(\bx) \geq f(\ba) \Forall \bx \in \VV.
$$
1. But then
$$
& f(\bx) \geq f(\ba) \\
& \iff f(\bx) \geq f(\ba) + 0 \\
& \iff f(\bx) \geq f(\ba) + \langle \bx - \ba, \bzero \rangle
$$
holds true for every $\bx \in \VV$.
1. This implies that $\bzero \in \partial f(\ba)$.
```
## Mean Value Theorem
The following result is from {cite}`hiriart2013convex`.
```{prf:theorem} A subgradients based mean value theorem for 1D functions
:label: res-cvxf-convex-subdiff-mvt
Let $f : \RR \to \RERL$ be a proper closed convex function.
Let $[a,b] \subseteq \dom f$ with $a < b$. Then,
$$
f(b) - f(a) = \int_a^b h(t) d t
$$
where $h : (a, b) \to \RR$ satisfies $h(t) \in \partial f(t)$
for every $t \in (a, b)$.
```
In the reminder of this section, we compute the subgradients
and subdifferential sets for a variety of standard functions.
## Norm Functions
```{div}
We recall from {prf:ref}`res-cvxf-subdiff-norm-origin` that
the subdifferential of a norm $\| \cdot \|: \VV \to \RR$ at $x = \bzero$
is given by:
$$
\partial f(\bzero)
= B_{\| \cdot \|_*} [\bzero, 1]
= \{ g \in \VV^* \ST \|g\|_* \leq 1 \}.
$$
```
### $\ell_1$-Norm
```{prf:example} Subdifferential of $\ell_1$ norm at origin
:label: ex-cvxf-subdiff-l1-norm-origin
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_1$.
We recall that the dual norm of $\ell_1$ is $\ell_{\infty}$.
The unit ball of $\ell_{\infty}$-norm at origin is given by
$$
B_{\| \cdot \|_{\infty}} [\bzero, 1] = [-1, 1]^n.
$$
Following {prf:ref}`res-cvxf-subdiff-norm-origin`,
the subdifferential of $f$ at $\bx = \bzero$ is given by:
$$
\partial f(\bzero) = B_{\| \cdot \|_{\infty}} [\bzero, 1] = [-1, 1]^n.
$$
```
```{prf:example} Subdifferential of absolute value function at origin
:label: ex-cvxf-subdiff-abs-func-origin
Let $g : \RR \to \RR$ be the absolute value function given by
$$
g(x) = | x |.
$$
This is a special case of $\ell_1$ norm for $\RR^1$.
Thus, following {prf:ref}`ex-cvxf-subdiff-l1-norm-origin`,
the subdifferential of $g$ at $x = 0$ is given by:
$$
\partial g (0) = [-1, 1].
$$
For a complete specification of the subdifferential of $g$,
see {prf:ref}`ex-cvxf-subdiff-abs-func-l2` below.
```
```{prf:example} Subdifferential of $\ell_1$ norm
:label: ex-cvxf-subdiff-l1-norm
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_1$.
We can write $f$ as a sum of $n$ functions
$$
f(\bx) = \| \bx \|_1 = \sum_{i=1}^n |x_i| = \sum_{i=1}^n f_i(\bx)
$$
where
$$
f_i (\bx) = | x_i |.
$$
Let $g (x) = | x |$. Then
$$
f_i (\bx) = | x_i | = | \be_i^T \bx | = g(\be_i^T \bx).
$$
Due to affine transformation rule ({prf:ref}`res-cvxf-subdiff-strong-rule-affine`),
$$
\partial f_i (\bx) = (\partial g(\be_i^T \bx)) \be_i
= (\partial g(x_i)) \be_i.
$$
The subdifferential of the absolute value function $g$
is described in {prf:ref}`ex-cvxf-subdiff-abs-func-l2` below.
Thus, the subdifferential set of $f_i$ is given by:
$$
\partial f_i (\bx) = \begin{cases}
\{ \sgn (x_i) \be_i \} & \text{for} & x_i \neq 0 \\
[-\be_i, \be_i] & \text{for} & x_i = 0
\end{cases}.
$$
Using the sum rule {prf:ref}`res-cvxf-subdiff-sum-rule-rv-convex`, we have:
$$
\sum_{i=1}^n \partial f_i(\bx) =
\partial \left ( \sum_{i=1}^n f_i \right )(\bx).
$$
We define the index set:
$$
I_0(\bx) = \{ i \ST x_i = 0\}.
$$
Expanding the sum of subdifferentials,
$$
\partial f (\bx) = \sum_{i \in I_0(\bx)} [-\be_i, \be_i]
+ \sum_{i \notin I_0(\bx)} \sgn (x_i) \be_i.
$$
We can rewrite this as:
$$
\partial f (\bx) = \{ \bz \in \RR^n \ST z_i = \sgn (x_i)
\text{ whenever } x_i \neq 0, |z_i | \leq 1, \text{ otherwise } \}.
$$
We also have a weak result from this:
$$
\sgn (\bx) \in \partial f (\bx) = \partial \| \bx \|_1.
$$
```
```{prf:example} Subdifferential of $\ell_1$ norm squared
:label: ex-cvxf-subdiff-l1-norm-squared
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_1$.
Now let $g(t) = [t]_+^2$. And consider the function $h = g \circ f$ given by
$$
h (\bx) = \| \bx \|_1^2.
$$
By subdifferential chain rule ({prf:ref}`res-cvxf-subdiff-chain-rule`):
$$
\partial h (\bx) &= 2 \| \bx \|_1 \partial f (\bx) \\
&= 2 \| \bx \|_1 \{ \bz \in \RR^n \ST z_i = \sgn (x_i)
\text{ whenever } x_i \neq 0, |z_i | \leq 1, \text{ otherwise } \}.
$$
We have used the subdifferential of $f$ from {prf:ref}`ex-cvxf-subdiff-l1-norm`.
```
### $\ell_2$-Norm
```{prf:example} Subdifferential of $\ell_2$ norm at origin
:label: ex-cvxf-subdiff-l2-norm-origin
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_2$.
We recall that the dual norm of $\ell_1$ is also $\ell_2$
as this norm is self dual.
Following {prf:ref}`res-cvxf-subdiff-norm-origin`,
the subdifferential of $f$ at $\bx = \bzero$ is given by:
$$
\partial f(\bzero) = B_{\| \cdot \|_2} [\bzero, 1]
= \{ \bg \in \RR^n \ST \| \bg \|_2 \leq 1 \}.
$$
```
```{prf:example} Subdifferential of $\ell_2$ norm
:label: ex-cvxf-subdiff-l2-norm
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_2$.
At $\bx \neq \bzero$, $f$ is differentiable with the gradient
(see {prf:ref}`ex-mvc-gradient-l2-norm`):
$$
\nabla f (\bx) = \frac{\bx}{ \| \bx \|_2}.
$$
Since $f$ is convex and differentiable at $\bx \neq \bzero$,
hence due to {prf:ref}`res-cvxf-subdiff-grad`,
$$
\partial f(\bx) = \{ \nabla f (\bx) \} =
\left \{ \frac{\bx}{ \| \bx \|_2} \right \}.
$$
Combining this with the subdifferential of $f$
at origin from {prf:ref}`ex-cvxf-subdiff-l2-norm-origin`,
we obtain:
$$
\partial f (\bx) = \begin{cases}
\left \{ \frac{\bx}{ \| \bx \|_2} \right \} & \text{for} & \bx \neq \bzero \\
B_{\| \cdot \|_2} [\bzero, 1] & \text{for} & \bx = \bzero .
\end{cases}
$$
```
```{prf:example} Subdifferential of absolute value function
:label: ex-cvxf-subdiff-abs-func-l2
Let $g : \RR \to \RR$ be the absolute value function given by
$$
g(x) = | x |.
$$
This is a special case of $\ell_2$ norm for $\RR^1$.
Following {prf:ref}`ex-cvxf-subdiff-l2-norm`,
$$
\partial g(x) = \begin{cases}
\left \{ \sgn (x) \right \} & \text{for} & x \neq 0 \\
[-1,1] & \text{for} & x = 0 .
\end{cases}
$$
c.f. {prf:ref}`ex-cvxf-subdiff-abs-func-origin`.
```
### $\ell_{\infty}$-Norm
```{prf:example} Subdifferential of $\ell_{\infty}$ norm at origin
:label: ex-cvxf-subdiff-linf-norm-origin
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_{\infty}$.
We recall that the dual norm of $\ell_{\infty}$ is $\ell_{1}$.
The unit ball of $\ell_{1}$-norm at origin is given by
$$
B_{\| \cdot \|_1} [\bzero, 1] = \{ \bx \in \RR^n \ST \| \bx \|_1 \leq 1\}.
$$
Following {prf:ref}`res-cvxf-subdiff-norm-origin`,
the subdifferential of $f$ at $\bx = \bzero$ is given by:
$$
\partial f(\bzero) = B_{\| \cdot \|_1 [\bzero, 1]}
= \{ \bx \in \RR^n \ST \| \bx \|_1 \leq 1\}.
$$
```
```{prf:example} Subdifferential of $\ell_{\infty}$ norm
:label: ex-cvxf-subdiff-linf-norm
Let $f : \RR^n \to \RR$ be given by $f(\bx) = \| \bx \|_{\infty}$.
Let us compute the subdifferential of $f$ at $\bx \neq \bzero$.
We have:
$$
f(\bx) = \max \{f_1(\bx), f_2(\bx), \dots, f_n(\bx)\}
$$
where $f_i(\bx) = |x_i|$.
We define:
$$
I(\bx) = \{i \in [1,\dots,n] \ST |x_i | = f(\bx) = \| \bx \|_{\infty} \}.
$$
Then, following {prf:ref}`ex-cvxf-subdiff-l1-norm`
$$
\partial f_i(\bx) = \{\sgn (x_i) \be_i \} \Forall i \in I(\bx).
$$
This is valid since $\bx \neq \bzero$ implies that $f(\bx) \neq 0$
which in turn implies that $x_i \neq 0$ for every $i \in I(\bx)$.
Then, using the max rule for proper convex functions
({prf:ref}`res-cvxf-subdiff-calculus-max-rule`):
$$
\partial f(\bx) =
\convex \left (\bigcup_{i \in I(\bx)} \{\sgn (x_i) \be_i \} \right ).
$$
We can rewrite this as:
$$
\partial f(\bx) = \left \{\sum_{i \in I(\bx)} \lambda_i \sgn(x_i) \be_i \ST
\sum_{i \in I(\bx)} \lambda_i = 1, \lambda_j \geq 0, j \in I(\bx) \right \}.
$$
Combining this with the subdifferential of $f$
at origin from {prf:ref}`ex-cvxf-subdiff-linf-norm-origin`,
we obtain:
$$
\partial f (\bx) = \begin{cases}
\left \{\sum_{i \in I(\bx)} \lambda_i \sgn(x_i) \be_i \ST
\sum_{i \in I(\bx)} \lambda_i = 1, \lambda_j \geq 0, j \in I(\bx) \right \},
& \bx \neq \bzero \\
B_{\| \cdot \|_1} [\bzero, 1], & \bx = \bzero .
\end{cases}
$$
```
### $\ell_1$ Norm over Affine Transformation
```{div}
Let $A \in \RR^{m \times n}$. Let $b \in \RR^m$.
Let $f : \RR^m \to \RR$ be given by $f(y) = \| y \|_1$.
Let $h : \RR^n \to \RR$ be the function
$h(x) = \| A x + b \|_1 = f(A x + b)$.
By affine transformation rule, we have:
$$
\partial h (x) = A^T \partial f (A x + b) \Forall x \in \RR^n.
$$
Denoting $i$-th row of $A$ as $a_i^T$, we define the index set:
$$
I_0(x) = \{i : a_i^T x_i + b_i = 0 \}.
$$
we have:
$$
\partial h (x) = \sum_{i \in I_0(x)} [-a_i, a_i] +
\sum_{i \notin I_0(x)} \sgn(a_i^T x + b_i) a_i.
$$
In particular, we have the weak result:
$$
A^T \sgn (A x + b) \in \partial h (x).
$$
```
### $\ell_2$ Norm over Affine Transformation
```{div}
Let $A \in \RR^{m \times n}$. Let $b \in \RR^m$.
Let $f : \RR^m \to \RR$ be given by $f(y) = \| y \|_2$.
Let $h : \RR^n \to \RR$ be the function
$h(x) = \| A x + b \|_2 = f(A x + b)$.
We have:
$$
\partial f (Ax + b) = \begin{cases}
\left \{ \frac{A x + b}{ \| A x + b \|_2} \right \} & \text{for} & Ax + b \neq \bzero \\
B_{\| \cdot \|_2} [\bzero, 1] & \text{for} & A x + b = 0
\end{cases}.
$$
Applying the affine transformation rule, we get:
$$
\partial h (x) = A^T \partial f (Ax + b)
= \begin{cases}
\left \{ \frac{A^T (A x + b)}{ \| A x + b \|_2} \right \} & \text{for} & Ax + b \neq \bzero \\
A^T B_{\| \cdot \|_2} [\bzero, 1] & \text{for} & A x + b = 0
\end{cases}.
$$
For $x \ST A x + b = 0$, we can write this as
$$
\partial h (x) = A^T B_{\| \cdot \|_2} [\bzero, 1] = \{A^T y \ST \| y \|_2 \leq 1 \}.
$$
```
### $\ell_{\infty}$ Norm over Affine Transformation
```{prf:example} Subdifferential of $\|\bA \bx + \bb \|_{\infty}$
:label: ex-cvxf-subdiff-ax-b-inf-norm
Let $\bA \in \RR^{m \times n}$. Let $\bb \in \RR^m$.
Let $f : \RR^m \to \RR$ be given by
$$
f(\by) = \| \by \|_{\infty}.
$$
Let $h : \RR^n \to \RR$ be the function
$$
h(\bx) = \| \bA \bx + \bb \|_{\infty} = f(\bA \bx + \bb).
$$
With $\by = \bA \bx + \bb$, we have $y_i = \ba_i^T \bx + b_i$
where $\ba_i^T$ is the $i$-th row vector of $\bA$.
Following {prf:ref}`ex-cvxf-subdiff-linf-norm`
$$
\partial f (\by) = \begin{cases}
\left \{\sum_{i \in I(\by)} \lambda_i \sgn(y_i) \be_i \ST
\sum_{i \in I(\by)} \lambda_i = 1, \lambda_j \geq 0, j \in I(\by) \right \},
& \by \neq \bzero \\
B_{\| \cdot \|_1} [\bzero, 1], & \by = \bzero
\end{cases}
$$
where $I(\by) = \{i \in [1,\dots,n] \ST |y_i | = f(\by) = \| \by \|_{\infty} \}$.
Due to affine transformation rule ({prf:ref}`res-cvxf-subdiff-strong-rule-affine`),
$$
\partial h(\bx) = \bA^T \partial f (\bA \bx + \bb).
$$
We have the following cases.
(a) $\by = \bzero$.
1. In terms of $\bx$, the condition $\by = \bzero$ is equivalent to
$\bA \bx + \bb = \bzero$.
1. Then,
$$
\partial f (\bA \bx + \bb) = \partial f(\bzero)
= B_{\| \cdot \|_1} [\bzero, 1].
$$
1. Thus,
$$
\partial h(\bx) = \bA^T B_{\| \cdot \|_1} [\bzero, 1].
$$
(b) $\by \neq \bzero$.
1. In terms of $\bx$, the condition $\by \neq \bzero$ is equivalent to
$\bA \bx + \bb \neq \bzero$.
1. Then,
$$
\partial f(\bA \bx + \bb)
&= \left \{\sum_{i \in I(\by)} \lambda_i \sgn(y_i) \be_i
\ST \sum_{i \in I(\by)} \lambda_i = 1, \lambda_j \geq 0, j \in I(\by) \right \} \\
&= \left \{\sum_{i \in I_x} \lambda_i \sgn(\ba_i^T \bx + b_i) \be_i
\ST \sum_{i \in I_x} \lambda_i = 1, \lambda_j \geq 0, j \in I_x \right \}
$$
where
$$
I_x = I(\by) = I(\bA \bx + \bb).
$$
1. Note that $\bA^T \be_i = \ba_i$.
1. Then,
$$
\partial h(\bx) &= \bA^T \partial f (\bA \bx + \bb) \\
&= \left \{\sum_{i \in I_x} \lambda_i \sgn(\ba_i^T \bx + b_i) \ba_i \ST
\sum_{i \in I_x} \lambda_i = 1, \lambda_j \geq 0, j \in I_x \right \}.
$$
Combining the two cases, we get:
$$
\partial h (\bx) = \begin{cases}
\left \{\sum_{i \in I_x} \lambda_i \sgn(\ba_i^T \bx + b_i) \ba_i \ST
\sum_{i \in I_x} \lambda_i = 1, \lambda_j \geq 0, j \in I_x \right \},
& \bA \bx + \bb \neq \bzero \\
\bA^T B_{\| \cdot \|_1} [\bzero, 1], & \bA \bx + \bb = \bzero
\end{cases}
$$
```
## Indicator Functions
```{prf:theorem} Subdifferential of indicator function
:label: res-cvxf-subdifferential-indicator
The subdifferential of indicator function for a nonempty set $S \subset \VV$
at any point $\bx \in S$ is given by
$$
\partial I_S (\bx) = N_S (\bx).
$$
where $N_S (\bx)$ is the
{prf:ref}`normal cone ` of $S$ at $\bx$.
```
```{prf:proof}
Let $\bx \in S$ and $\bg \in \partial I_S(\bx)$.
The subgradient inequality {eq}`eq-cvxf-subgradient-inequality-2` gives us:
$$
& I_S(\bz) \geq I_S(\bx) + \langle \bz - \bx , \bg \rangle \Forall \bz \in S\\
& \iff 0 \geq 0 + \langle \bz - \bx , \bg \rangle \Forall \bz \in S\\
& \iff \langle \bz - \bx , \bg \rangle \leq 0 \Forall \bz \in S\\
& \iff \bg \in N_S (\bx).
$$
```
```{prf:example} Subdifferential of the indicator function of the unit ball
:label: ex-cvxf-subg-ind-unit-ball
The unit ball at origin is given by:
$$
S = B[\bzero, 1] = \{\bx \in \VV \ST \| \bx \| \leq 1 \}.
$$
From {prf:ref}`res-cvxf-normal-cone-unit-ball`, the normal cone
of $S$ at $\bx \in S$ is given by:
$$
N_S(\bx) = \{ \by \in \VV^* \ST \| \by \|_* \leq \langle \bx, \by \rangle \}.
$$
For any $\bx \notin S$, $N_S(\bx) = \EmptySet$. Combining:
$$
\partial \delta_{B[\bzero, 1]} (x) = \begin{cases}
\{ \by \in \VV^* \ST \| \by \|_* \leq \langle \bx, \by \rangle \}
& \text{for} & \| \bx \| \leq 1 \\
\EmptySet & \text{for} & \| \bx \| > 1.
\end{cases}
$$
```
## Maximum Eigen Value Function
```{div}
The maximum eigen value function for symmetric matrices,
denoted as $f : \SS^n \to \RR$, is given by:
$$
f(\bX) \triangleq \lambda_{\max} (\bX).
$$
```
```{prf:theorem} Subgradient for maximum eigen value function
:label: res-cvxf-sg-max-eig-val-func
Let $f : \SS^n \to \RR$ be the maximum eigen value function.
Then
$$
\bv \bv^T \in \partial f (\bX)
$$
where $\bv$ is a normalized eigen-vector of $\bX \in \SS^n$
associated with its maximum eigen value.
```
```{prf:proof}
Let $\bX \in \SS^n$ be given. Let $\bv$ be a normalized eigen vector
associated with the largest eigen value of $\bX$. Then, $\| \bv \|_2 = 1$.
For any $\bY \in \SS^n$, we have:
$$
f(\bY) &= \lambda_{\max} (\bY) \\
&= \underset{\| \bu \|_2 = 1}{\max} \{ \bu^T \bY \bu \}\\
&\geq \bv^T \bY \bv \\
&= \bv^T \bX \bv + \bv^T (\bY - \bX) \bv\\
&= \lambda_{\max} (\bX) \| \bv \|_2^2 + \Trace (\bv^T (\bY - \bX) \bv) \\
&= \lambda_{\max} (\bX) + \Trace ((\bY - \bX) \bv \bv^T ) \\
&= f (\bX) + \langle \bY - \bX, \bv \bv^T \rangle.
$$
In this derivation, we have used the following results:
* The maximum eigen value can be obtained by maximizing $\bu^T \bY \bu$ over the unit sphere.
* For a scalar $x \in \RR$, $\bx = \Trace(x)$.
* $\Trace (AB) = \Trace (BA)$ if both $AB$ and $BA$ are well defined.
* $\langle A, B \rangle = \Trace(AB)$ for the space of symmetric matrices.
Thus, $\bv \bv^T \in \partial f(\bX)$.
```
We note here that this result only identifies one of the
subgradients of $f$ at $\bX$. It doesn't characterize
the entire subdifferential of $f$ at $\bX$. In this sense,
this result is a *weak result*. In contrast, a *strong result*
would characterize the entire subdifferential.
## The Max Function
```{prf:example} Subdifferential of the max function
:label: ex-cvxf-subdiff-max-func
Let $f : \RR^n \to \RR$ be given by:
$$
f(\bx) = \max \{ x_1, x_2, \dots, x_n\}.
$$
Let $f_i(\bx) = x_i = \be_i^T \bx$. Then
$$
f_(\bx) = \max \{ f_1(\bx), f_2(\bx), \dots, f_n(\bx)\}.
$$
We note that $f_i$ are differentiable and their
gradient is given by
(see {prf:ref}`ex-mvc-gradient-linear-functional`):
$$
\nabla f_i(\bx) = \be_i.
$$
Also, $f_i$ are linear, hence convex.
Thus, due to {prf:ref}`res-cvxf-subdiff-grad`:
$$
\partial f_i(\bx) = \{ \be_i\}.
$$
We denote the index set of functions which
equal the value of $f(\bx$ at $\bx$ by:
$$
I (\bx) = \{ i \ST f(\bx) = x_i\}.
$$
Then, using the max rule for proper convex functions
({prf:ref}`res-cvxf-subdiff-calculus-max-rule`):
$$
\partial f (\bx) = \convex \left (
\bigcup_{i \in I(\bx)} \partial f_i(\bx) \right )
= \convex \left ( \bigcup_{i \in I(\bx)} \{ \be_i\} \right ).
$$
As and example, consider the case where $\bx = \alpha \bone$
for some $\alpha \in \RR$.
1. In other words, $\bx = (\alpha, \dots, \alpha)$.
1. Then, $f(\bx) = \alpha$.
1. $f_i(\bx) = \alpha = f(\bx)$ for ever $i \in [1,\dots, n]$.
1. $I(\bx) = \{1, \dots, n \}$.
1. $\nabla f_i(\bx) = \be_i$.
1. $\convex ( \bigcup_{i \in I(\bx)} \{ \be_i\} ) = \convex \{ \be_1, \dots, \be_n \}$.
1. But $\convex \{ \be_1, \dots, \be_n \} = \Delta_n$.
1. Thus,
$$
\partial f (\alpha \bone) = \Delta_n \Forall \alpha \in \RR.
$$
```
## Space of Matrices
```{div}
Let $\VV = \RR^{m \times n}$. Let the standard
inner product for $x, y, \in \VV$ be
$\langle x, y \rangle = \Trace(x^T y)$.
Let $f : \VV \to \RERL$ be a proper function.
Let $x \in \interior \dom f$.
The gradient at $x$, if it exists, is given by:
$$
\nabla f(x) = D_f(x) \triangleq \left ( \frac{\partial f}{\partial x_{ij}} (x) \right)_{i,j}.
$$
```
Let $H$ be a positive definite matrix and define an inner
product for $\VV$ as:
$$
\langle x, y \rangle_H \triangleq \Trace (x^T H y).
$$
Then
$$
\nabla f(x) = H^{-1} D_f(x).
$$
## Convex Piecewise Linear Functions
```{prf:example} Subdifferential of convex piecewise linear functions
:label: ex-cvxf-subdiff-piecewise-linear-function
Let a convex piecewise linear function $f : \RR^n \to \RR$ be given by:
$$
f(\bx) = \underset{1 \leq i \leq m}{\max} \{\ba_i^T \bx + b_i \}
$$
where $\ba_i \in \RR^n, b_i \in \RR$ for $i=1,\dots,m$.
We define a set of functions $f_i : \RR^n \to \RR$
for $i=1,\dots,m$ as
$$
f_i(\bx) = \ba_i^T \bx + b_i
$$
We can see that $f$ is a pointwise maximum of these functions.
$$
f(\bx) = \underset{1 \leq i \leq m}{\max} \{ f_i(\bx)\}.
$$
Clearly,
$$
\partial f_i(\bx) = \{ \nabla f_i(\bx) \}
= \{ \ba_i \}.
$$
We define:
$$
I(\bx) = \{i \in [1,\dots,m] \ST f(\bx) = f_i(\bx) = \ba_i^T \bx + b_i \}.
$$
Then, using the max rule for proper convex functions
({prf:ref}`res-cvxf-subdiff-calculus-max-rule`):
$$
\partial f(\bx)
&= \convex \left ( \bigcup_{i\in I(\bx)} \partial f_i (\bx) \right ) \\
&= \left \{ \sum_{i \in I(\bx)} \lambda_i \ba_i \ST
\sum_{i \in I(\bx)} \lambda_i = 1, \lambda_j \geq 0 \Forall j \in I(\bx)
\right \}.
$$
By Fermat's optimality condition
({prf:ref}`res-cvxf-subdiff-fermat-optimality`),
$\bx^*$ is a minimizer of $f$ if and only if $\bzero \in f(\bx^*)$.
Thus, $\bx^*$ is a minimizer if and only if
there exists $\blambda \in \Delta_m$ such that
$$
\bzero = \sum_{i=1}^m \lambda_i \ba_i,\quad \lambda_j = 0 \Forall j \notin I(\bx^*).
$$
Note that at any $\bx$, for every $j \notin I(\bx)$, we have
$$
\ba_j^T \bx + b_j - f(\bx) < 0.
$$
Thus, the complimentary condition
$$
\lambda_j (\ba_j^T \bx + b_j - f(\bx)) = 0, j=1,\dots,m
$$
denotes the fact that whenever
$\ba_j^T \bx + b_j - f(\bx) < 0$, then $\lambda_j$ must be zero
and whenever $\ba_j^T \bx + b_j - f(\bx) = 0$ then $\lambda_j \geq 0$
is allowed (since $\blambda \in \Delta_m$).
If we put together a matrix $\bA \in \RR^{m \times n}$
whose rows are $\ba_1^T, \dots, \ba_m^T$, then
the optimality condition can be succinctly stated as
$$
\exists \blambda \in \Delta_m \text{ s.t. }
\bA^T \blambda = \bzero \text{ and }
\lambda_j (\ba_j^T \bx + b_j - f(\bx^*)) = 0, j=1,\dots,m.
$$
```
## Minimization Problems
```{div}
Minimization problem:
$$
\min \{f(x) \ST g(x) \leq 0, x \in X \}.
$$
Dual function:
$$
q(\lambda) \min_{x \in X} \{L(x, \lambda) \triangleq f(x) + \lambda^T g(x)\}
$$
Assume that for $\lambda = \lambda_0$ the minimization in R.H.S. is obtained at $x = x_0$.
Subgradient of the (negative of the) dual function $-q$:
$$
- g (x_0) \in \partial (-q) (\lambda_0).
$$
```