Constrained Optimization I
Contents
10.5. Constrained Optimization I#
In this section, we focus on objective functions of type
We also concern ourselves with functions of type
Main references for this section are [6, 17].
10.5.1. Stationary Points#
We first look at functions which are differentiable over a convex set. Here, we don’t require that the function itself be convex. Thus, we may not characterize the global optimality of a point. However, we can still characterize the local optimality of a point.
In Definition 8.6, we defined
stationary points for a real valued function as
points where the gradient vanishes; i.e.
In this section, we wish to restrict the domain of feasible
points to a convex set
where
We now introduce the notion of stationary points for
the given optimization problem of minimizing
Definition 10.21 (Stationary point for an optimization problem)
Let
Then,
If
Theorem 10.45 (Local minimum points are stationary points)
Let
If
Proof. Let
Then, there exists
such that .Let
be a parameter.Let
.Since
is convex, hence .Differentiating
w.r.t. at , we obtainThus, for small enough
, .This contradicts the hypothesis that
is a local minimum.Thus, all local minimum points must be stationary points.
Example 10.6 (Unconstrained minimization)
Let
In this case, the feasible set is
If
is a stationary point for this problem, thenIn particular, if we choose
, we getThis is true only if
.Thus, for unconstrained minimization, the gradient vanishes at stationary points.
Theorem 10.46 (Stationary point as an orthogonal projection)
Let
Let
Proof. Recall from Theorem 10.7 that
Replace
and . We getBut this is the same condition as the definition for a stationary point.
10.5.2. First Order Optimality Criteria#
We now pay our attention to the case where
Theorem 10.47 (Optimality criterion for differentiable objective function)
Let
Then,
In other words,
Proof. By Theorem 9.98
for every
Assume that some
Let
.Then, by differentiability
By hypothesis
.Thus,
.Since this is true for every
, hence is an optimal point for the minimization problem.
Now for the converse, assume that
For contradiction, assume that (10.8) doesn’t hold.
Then, there exists
such thatLet
be a parameter.Let
.Since
is convex, hence .Differentiating
w.r.t. at , we obtainThus, for small enough
, .Thus,
cannot be an optimal point for the minimization problem.This contradicts our hypothesis that
is an optimal point.Hence, (10.8) must hold for every
.
Theorem 10.48 (Optimality criterion for unconstrained problem with differentiable objective function)
Let
Then,
Proof. In this case, the set of feasible points is
Since
is differentiable, hence is an open set.By Theorem 10.47,
is an optimal point if and only ifIf
, then this inequality is satisfied. Hence, must be an optimal point.Now, assume that
is an optimal point.Since
and is open, hence there exists a closed ball .Let
.For sufficiently small
, .Then,
must hold true for
.This means that
must be true.Thus
must be true.
10.5.2.1. Nondifferentiability at the Boundary#
There are some specific results available for the unconstrained minimization
of a convex function
Under what conditions, the minimizer of
is a point of differentiability.Under what conditions, the minimizer of
may be at a point of nondifferentiability.
Theorem 10.49 (Zero gradient implies minimizer)
Let
Proof. Recall from Theorem 9.220 that
Assume that
is differentiable at and .Then
is the one and only subgradient of at .Due to subgradient inequality
Since
, henceThus
attains a minimum at and is one of its minimizers.
Next we consider the special case of the
minimization of a convex function
Theorem 10.50 (Minimizers at points of nondifferentiability)
Let
Proof. We are given that there exists a is a minimizer of
Pick any
.We need to show that
cannot be a minimizer.For contradiction, assume that
is a minimizer.By subgradient inequality
Since
is a minimizer, hence we must haveLet
.Then
This contradicts our assumption that
is a minimizer.Hence if
is a minimizer of then .
We next show how the condition in (10.8) simplifies for specific optimization problem structures.
10.5.2.2. Equality Constraints#
Theorem 10.51 (Differentiable objective minimization with equality constraints)
Let
where
Then,
Proof. The feasible set is given by
Thus,
is feasible if and only if for every satisfying .Since both
and are feasible, hence .Thus,
.In fact,
if and only if for some .Thus, the optimality criteria reduces to
for every .Note that
is a linear function of as is a fixed vector.If a linear function is nonnegative on a subspace, then it must be identically zero on the subspace.
Thus,
for every .In other words,
is optimal if and only if .Recall that
; i.e., the null space of is orthogonal complement of the column space (range) of .Thus,
is optimal if and only if .In other words,
is optimal if and only if there exists a vector such that
This result is a Lagrange multiplier optimality condition to be discussed in more detail in later sections.
10.5.2.3. Nonnegative Orthant Constraints#
Example 10.7 (Differentiable objective minimization over nonnegative orthant)
Let
The feasible set is the nonnegative orthant
. is optimal if and only if and for every .The term
is unbounded below on unless .Thus,
must be nonnegative.Then, the minimum value for
is 0.Consequently, the optimality condition reduces to
or .But
and .Thus, we must have
.`We note that
Thus, it is a sum of products of nonnegative numbers.
So each term in the sum must be 0.
Thus,
must hold true for every .Thus, the optimality condition can be rephrased as
The condition
Thus, the sparsity patterns of
where
10.5.2.4. Unit Sum Set Constraint#
Example 10.8 (Minimization over unit sum set)
Let
The feasible set is given by
This set is also known as the unit sum set.
A point
is optimal if and only ifLet us call this condition (I).
This is equivalent to the condition:
Let us call this condition (II).
We shall show that (I) and (II) are equivalent.
We first show that (II) implies (I).
Assume that some
satisfies (II) withThen, for any
,Thus, we see that
indeed and (I) is satisfied.
Now, assume that (I) is satisfied for some
For contradiction, assume that
doesn’t satisfy (II).Then, their exist
such thatPick a vector
with following definitionNote that
Thus,
holds.Now,
This violates the hypothesis that (I) holds true.
Thus, (II) must be true.
Thus, (I) implies (II).
10.5.2.5. Unit Ball Constraint#
Example 10.9 (Minimization over unit ball)
Let
The feasible set is given by
A point
is an optimal point if and only ifThis is equivalent to saying that
Recall from Theorem 8.11 that for any
, the optimal value of the problemis
.Thus,
Thus, the inequality simplifies to
At the same time, by Cauchy Schwartz inequality,
since
.Thus, the inequality must be an equality, giving us,
is an optimal point
We now have following possibilities for this condition.
If
, then the condition holds and is indeed an optimal point.Otherwise, if
, then must be true.For contradiction, if we assume that
.Then, by Cauchy Schwartz inequality
a contradiction.
Thus, if
, then is an optimal point if and only if andBut this is possible only when there exists
such thatThus, if
, then is an optimal point if and only if and there exists such that .
10.5.3. Descent Directions Method#
We first consider the problem of unconstrained minimization of
a continuously differentiable function
Typical iterative algorithms which aim to find
the solution
where
This brings us to the notion of a descent direction.
Definition 10.22 (Descent direction)
Let
In other words,
If the directional derivative is negative, then it
is clear that for a small enough step in this direction,
the value of
Lemma 10.4 (Descent property of descent direction)
Let
for any
Proof. This follows from the negativity of the directional derivative.
Recall from Definition 9.70 that
Since
is a descent direction, hence .Thus,
Thus, there exists
such thatfor every
.
10.5.3.1. Descent Directions Method#
Algorithm 10.1 (The descent directions method)
Initialization
Pick
arbitrarily as initial solution.
General iteration: for
Pick a descent direction
.Pick a step size
satisfying .Update:
.Check for convergence: If converged, then STOP.
Return
This method is not really an actual algorithm. It can be considered as a template for an actual algorithm. Several aspects of the method need to be carefully chosen to come up with a viable algorithm.
How to select the initial point
?How to choose the descent direction?
How to select the step size?
How to decide when to stop the iterations (stopping criterion)?
The key result that we establish here is to show that
if the step size
Theorem 10.52 (Sufficient decrease condition for descent direction method)
Let
holds for every
Proof. We proceed as follows.
Since
is continuously differentiable, hence due to Theorem 5.1,Rearranging the terms and introducing an
, we obtainSince
is a descent direction of , hence .Thus,
Hence, there exists
such that for every ,Thus, from the previous equation:
for every
.
10.5.3.2. Step Size Selection#
Following are common methods for step size selection. Each method has has advantages as well as drawbacks.
Constant step size
Exact line search
Backtracking
Constant step size uses
A large step size might cause algorithm to take non-decreasing steps.
The algorithm may never converge with a large step size.
A small constant step size may lead to very slow convergence.
Exact line search solves the minimization problem
The optimization variable is the step size parameter
Minimizing
along the ray may not be straight-forward.Any closed form or algorithmic solution for the exact line search may not be available.
Backtracking is a compromise between these two approaches.
Input parameters
, , .Initialize
.While
Set
.Continue.
Return step size
.
Essentially, we are reducing the step size by a factor
There is no exact line search.
Does find a good enough step size satisfying the sufficient decrease condition.
Involves multiple evaluations of
. should be chosen carefully.
10.5.4. Gradient Method#
The gradient method is a descent direction method in which the descent direction is always chosen to be the negative of the gradient at the current point (solution).
Lemma 10.5 (Negative of gradient is a descent direction)
Let
Proof. We just need to compute the directional derivative.
The last strict inequality is valid since
10.5.5. Gradient Projection Method#
In this subsection, we present a method to solve the optimization problem
where
Recall from Theorem 10.46, that
This stationarity condition is the basis for the gradient projection method presented below.
Algorithm 10.2 (The gradient projection method)
Inputs
- tolerance parameter
Initialization
Pick
arbitrarily.
General iteration: for
Pick a step size
by a line search procedure.Update:
.Check for convergence: If
, then STOP.
Return
In the case of unconstrained minimization:
. .We see that gradient projection reduces to gradient descent.
Another way to look at the algorithm is:
computes the next candidate solution assuming no constraints. step projects the next candidate solution back to the feasible set .Thus, we have a gradient step followed by a projection step.
10.5.5.1. Convergence#
If we can establish conditions under which each iteration of the gradient projection algorithm leads to sufficient decrease in the value of the objective function, then we can guarantee that the algorithm will converge in a finite number of steps.
Lemma 10.6 (Sufficient decrease lemma for constrained problems)
Consider the optimization problem:
Assume that
where
Proof.