Mathematical Optimization
Contents
8.1. Mathematical Optimization#
Main references for this section are [9, 17].
This section provides a general overview of optimization problems. The notion of objective or cost functions and constraint functions is introduced. Standard form for optimization problems is introduced. Several examples are discussed for equivalent forms of optimization problems. The standard form focuses on minimizing the objective function over a set of equality and inequality constraints. Maximization problems can also be converted into minimization problems by changing the sign of the objective function.
We introduce the notion of the domain of an optimization problem, feasible points or solutions, feasible set of solutions, the optimal value of the minimization problem, optimal points or solutions, optimal set etc.. We discuss when the problem may be infeasible or feasible but unbounded. We discuss situations where the problem may be feasible and have a global minimum/optimal value and yet a feasible solution may not exist.
We then discuss general requirements for the feasibility of the optimization problem and existence of an optimal solution. We introduce the notion of coercive functions which provide the Weierstrass type result for the existence of optimal solutions.
8.1.1. Optimization Problems#
Let
In the most general form, an optimization problem can
be considered as minimizing or maximizing the value
of a real valued function
We call
as the objective function which is being minimized or maximized.The variable
is called the optimization variable. is called the feasible set of solutions for the optimization problem.A point
is called a feasible point or feasible solution.Our goal is to find an
which maximizes or minimizes the objective function .
Definition 8.1 (Globally optimal value and points)
Let
The maximum value of
is given byThe minimum value of
is given byWe allow the maximum and minimum values to take
and values.The function
may or may not attain its maximum / minimum value at some point in its domain . is called a global minimum point if for every . is called a global maximum point if for every . is called a strict global minimum point if for every with . is called a strict global minimum point if for every with .A point
is called a global optimal point if it is either a global maximum or minimum point.The maximum and minimum values of
are always unique as opposed to global optimal points which may not be unique.The set of global minimizers is given by
The set of global maximizers is given by
In a more restricted setting, an optimization problem can
be considered as minimizing or maximizing the value
of a real valued function
In this case, the feasible set is restricted to
.Minimizers and maximizers are searched within this subset
.This problem can be converted into the earlier general form by considering the restriction
of such that and
In several problems, it may not be possible to establish whether a point is globally optimal. However, it is easier to establish if a point is optimal in a neighborhood around it. Such points are called locally optimal points or extreme values.
Definition 8.2 (Local optimal points)
Let
More specifically,
is a local maximum value of if for some : is a local minimum value of if for some :
The point
is a strict local maximum point if for some : is a strict local minimum point of if for some :
Here
Remark 8.1 (Local optimal points for open domain)
Let
If
is a local maximum point, then there exists such thatBut since
is open, hence . Thus, there exists an open ball .Let
.Then,
and .Thus,
.Thus,
.Thus, the local optimality condition simplifies to looking for an open ball of radius
totally contained inside the open domain .
Based on this discussion, we can adjust the conditions for local optimality.
is a local maximum point of if for some : is a local minimum point of if for some : is a strict local maximum point if for some : is a strict local minimum point of if for some :
8.1.1.1. Standard Form for Mathematical Optimization Problems#
Definition 8.3 (Optimization problem standard form)
Let
is known as an optimization problem in its standard form.
is called the optimization variable. is called the objective function or cost function.The functions
are called the inequality constraint functions.The corresponding inequalities
are called the inequality constraints.The functions
are called the equality constraint functions.The corresponding equations
are called the equality constraints.If there are no constraints (
), the problem is called unconstrained.The set
is called the domain of the optimization problem.
A point
is called feasible if it satisfies all the inequality constraints and all the equality constraints .The optimization problem is called feasible if there exists at least one feasible point.
If there are no feasible points in the domain
, then the optimization problem is called infeasible. Naturally, if , then the problem is infeasible.The set of feasible points for an optimization problem is called the feasible set or the constraint set. We shall denote the feasible set by
.It is the intersection of the domain of
, the 0-sublevel sets of for , and the -level sets of for .Thus, if the problem is infeasible, then
.The optimum value
of the optimization problem is defined asIn other words,
We allow
to take the extended values and .If the problem is infeasible, then
It is consistent with the convention that the infimum of an empty set is .If
, then the problem is called unbounded below. In this case, there exists a sequence of such that .We say that
is an optimal point if it solves (8.1). In other words, and .The set of all optimal points is known as the optimal set denoted by
.In other words,
If an optimal point exists in
, then we say that the optimal value is attained or achieved.If
is empty, then we say that the optimal value is not attained or not achieved.In particular, if the problem is unbounded below, then
is indeed empty.If the feasible set
is not closed, then it is quite possible that The optimum value is finite and yet it is not attained at any feasible point. Then, there exists a sequence of feasible points such that . However, there is no such that .A feasible point
with is called an -suboptimal point.The set of all
-suboptimal points is called the -suboptimal set for the problem (8.1).We say that a feasible point
is locally optimal if there exists such thatIn other words,
minimizes over a local neighborhood of feasible points.If
is feasible and , we say that -th inequality constraint is active at . Otherwise, and we say that the -th inequality constraint is inactive.We say that a constraint is redundant if removing it does not change the feasible set.
If we choose an orthonormal basis
for , then is isomorphic to under the bracket operator . The optimization variable has a representation in given byThus, determining
is same as determining its components .Since there are
scalar components in the representation of , hence we can say that the optimization problem has (scalar) variables.
In the sequel, we will be presenting a variety of optimization problems. We shall show how those problems can be converted to the standard form described above.
Example 8.1 (Box constraints standard form)
Let
where
This problem can be transformed into an equivalent problem:
This form has
Then, the problem becomes
This is the optimization problem in standard form with
Example 8.2 (Maximization problem in standard form)
Consider the problem
If we replace
Definition 8.4 (Feasibility problem)
Let
is called a feasibility problem.
We can convert this problem into the standard problem by introducing a cost function which is identically 0:
The domain reduces to
The optimal value
if the problem is not feasible.Otherwise, the optimal value is
.Every feasible point is also an optimal point.
Every optimization problem in standard form can be converted into a feasibility problem by replacing the objective function with the function which is identically 0 everywhere.
In that case, the problem reduces to checking whether the inequality and equality constraints are consistent or not. In other words, whether there are some points which satisfy all the inequality and equality constraints.
8.1.2. Equivalent Problems#
Definition 8.5 (Equivalent optimization problems)
We provide an informal definition of equivalent optimization problems.
Two optimization problems are called equivalent if it is possible to find the solution of one problem from the other problem and the vice versa.
The primary reason for transforming an optimization problem to an equivalent one is that it may be easier to solve an equivalent problem. If we can transform a problem into a form which has a well known closed form solution or has a well known algorithm to solve the problem, then we can solve the original problem by first solving the equivalent problem and then transforming the solution of the equivalent problem to the solution of the original problem.
Transform the problem to an equivalent problem with a known algorithm for solution.
Solve the equivalent problem.
Transform the solution of the equivalent problem to the solution of the original problem.
In this sense, the computational complexity of solving an optimization problem cannot be greater than the complexity of solving the equivalent problem plus the complexity of transforming the solution of the equivalent problem to the solution of the original problem.
There are several ways to obtain an equivalent optimization problem from an existing optimization problem.
8.1.2.1. Scaling#
Proposition 8.1 (Equivalent problems by scaling of objective or constraint functions)
Consider the problem given by
where
This problem is obtained from the optimization problem in standard from in (8.1) by scaling the objective function and the inequality constraint functions by positive constants and the equality constraints by nonzero constants.
cannot be negative otherwise it will convert a minimization problem to a maximization problem. cannot be zero as that will convert the optimization problem into a feasibility problem. for cannot be negative as it will turn the inequality sign from to . for cannot be zero as it will remove the corresponding constraint altogether. cannot be zero. A zero value will discard the corresponding equality constraint.The feasible sets for both problems are identical.
The optimal sets for both problems are identical.
The optimal values for both problems are not identical however (unless
). Since is a scaled version of , hence the optimal value is also scaled accordingly.
8.1.2.2. Change of Variables#
8.1.2.3. Transformation with Monotone Functions#
Proposition 8.3 (Transformation of objective and constraint functions)
For the problem (8.1), we introduce the following:
Let
be a strictly increasing function over .Let
for be strictly increasing functions over with .Let
for be real functions that satisfy .Let
for .Let
for .
Now, consider the problem
with the variable
Example 8.3 (Least norms and least norms squared)
Consider the unconstrained least norm problem
with
Then,
Then, the least norm minimization problem is equivalent to the problem
which is the least norm squared problem.
We note that while
8.1.2.4. Slack Variables#
Proposition 8.4 (Slack variables)
For the inequality constraints in the problem (8.1),
we can introduce the variables
where the optimization variables are
The variables
This form has
inequality constraints and equality constraints.It has
optimization variables and .If
is a feasible point for (8.5), then is a feasible point for (8.1).If
is a feasible point for (8.1), then we can pick to form making a feasible point for (8.5). is an optimal point for (8.1) if and only if is an optimal point for (8.5) where .The slack variables measure how much slack does an inequality constraint function have at a feasible point.
If
then, is active and it has no slack.If
, then inactive and it has some slack.
8.1.2.5. Epigraph Polar Form#
Proposition 8.5 (Epigraph polar form)
The problem (8.1) is equivalent to the optimization problem below:
with the optimization variable
Note that
appears only in a single inequality in (8.6).At the optimal point,
must hold true. If , then we can always choose a lower value of as the solution.If
is an optimal point for (8.1), then must be an optimal point for (8.6).Similarly, if
is an optimal point for (8.6), then must be an optimal point for (8.1) and must hold true.The two problems are indeed equivalent.
This problem has
inequality constraints and equality constraints.The objective function for the epigraph form is a linear function of
.
8.1.3. First Order Conditions#
In this subsection, we focus on objective functions of type
Theorem 8.1 (First order optimality criterion for local optimal points)
Let
Then,
Let
.Consider the one-dimensional function
given bywhere
is the -th unit vector of .We note that
is differentiable at andSince
is a local optimal point of , hence is a local optimal point of .Thus,
.Thus,
.Since, this is true for every
, hence .
We mention that gradient being zero is a necessary condition;
i.e., if a point is an optimal point then the gradient of
Definition 8.6 (Stationary point)
Let
Thus, the locally optimal points are necessarily stationary points.
8.1.4. Second Order Conditions#
Recall the discussion on semidefinite and definite matrices in Eigen Values.
For a twice continuously differentiable function
We have following necessary conditions for local optimality.
If
is a local minimum point then must be positive semidefinite.If
is a local maximum point then must be negative semidefinite.
We have following sufficient conditions for local optimality.
If
is positive definite, then is a strict local minimum.If
is negative definite, then is a strict local maximum.
Theorem 8.2 (Necessary second order optimality conditions)
Let
Then, the following hold.
If
is a local minimum point of over , then .If
is a local maximum point of over , then .
Proof. Assume
Then, there exists an open ball
such that for all .Let
be a nonzero vector.For
, we defineBy definition,
.Hence, for any
, .By linear approximation theorem (Theorem 5.5), there exists a vector
such thatSince
is a stationary point, hence .Also, by definition of
, .Thus,
Since
is local minimum. Hence, .Thus, for any
and any , we haveBy the continuity of the Hessian, and the fact that
as , we obtain thatThus,
.
The argument for second statement is similar. We can apply the
same argument on
Theorem 8.3 (Sufficient second order optimality conditions)
Let
Then, the following hold.
If
, then is a strict local minimum point of over .If
, then is a strict local maximum point of over .
Proof. Let
Since the Hessian is continuous (
is twice continuously differentiable), hence there exists an open ball such that for every .By linear approximation theorem (Theorem 5.5), for any
, there exists a vector such thatWe note that
. Hence, .Thus,
.Thus,
holds true for every .Thus,
is a strict local minimum point for over .
The proof for the second statement is similar.
Theorem 8.4 (Hessian based sufficient global optimality condition)
Let
In other words, if the Hessian of a function is continuous and always positive semidefinite, then all its stationary points are also global minimum points.
Proof. We are given that
.Let
.By linear approximation theorem (Theorem 5.5), there exists a vector
such thatBy hypothesis,
.Thus,
for every .Thus,
is a global minimum point of .
This result is a special case for the global optimality of convex optimization problems. If a convex function is twice differentiable, then its Hessian is positive semidefinite Theorem 9.100.
8.1.4.1. Saddle Points#
Definition 8.7 (Saddle point)
Let
A stationary point
Theorem 8.5 (Sufficient condition for saddle points)
Let
If
Proof. We are given that
Then,
.Recall from Theorem 4.133 that a matrix is indefinite if and only if at least one eigenvalue is positive and at least one eigenvalue is negative.
Let
be a positive eigenvalue of and be the corresponding normalized eigenvector.Since
is open, hence there exists such that .Accordingly,
for every .By quadratic approximation theorem (Theorem 5.6),
Putting
and using ,Here
is a function satisfying as .Thus,
as .For every
, there exists such thatChoose
.Then,
for every .Thus,
for every .Thus, there exists
such that for every , .Thus,
is not a local maximum point.A similar argument with a negative eigenvalue and corresponding normalized eigenvector shows that
cannot be a local minimum point.Thus,
must be a saddle point.
8.1.5. Minimization of Proper Functions#
Let
is the objective function or cost function (being minimized).The set
is the feasible set or constraint set.Any
is a feasible solution or feasible point.If there is at least one feasible point (i.e.,
is nonempty), the problem is feasible.Otherwise, the problem is infeasible.
Let
be the optimal value of the minimization problem.We allow
to take values over the extended real line .If the minimization problem is infeasible, then
.If
, then the problem is unbounded below.If there is some
such that , then is an optimal solution or optimal point or minimizing point or minimizer or global minimum over .Alternatively,
attains a minimum over at . We write this asIf
is a unique minimizer of over , then we abuse the notation and writeIt is possible that
is finite and yet there is no optimal point in .In other words, the set
may be empty.
A basic question of optimization is whether an optimal solution exists.
Remark 8.2 (The set of optimal points)
The set of optimal points for the problem of minimization of a proper function
where
This comes directly from the fact that
must be feasible. Hence . must attain the optimal value at . Thus, . Hence, .
In other words, the optimal set is the intersection of the feasible set
Thus, for an optimal solution to exist
must be finite.The level set
must be nonempty.The feasible set
must be nonempty.The intersection of
with must be nonempty.
8.1.5.1. Coercive Functions#
Definition 8.8 (Coercive function)
A proper function
If
One way to think about coercive functions is that they grow rapidly at the extremes of the domain on which they are defined.
Theorem 8.6 (Level sets of coercive functions)
Let
In other words, all nonempty level sets of a coercive function are bounded.
Proof. For contradiction, assume that there exists
Then, there exists a sequence
of such that .Since
is coercive, hence .But, by definition,
.Hence
.We have a contradiction.
Hence,
must be bounded.
Theorem 8.7 (Sublevel sets of coercive functions)
Let
In other words, all nonempty sublevel sets of a coercive function are bounded.
Proof. For contradiction, assume that there exists
Then, there exists a sequence
of such that .Since
is coercive, hence .But, by definition,
.Hence
.We have a contradiction.
Hence,
must be bounded.
8.1.5.2. Weierstrass’ Theorem#
In this subsection, we examine the problem of unconstrained minimization of a proper closed function. Recall from Definition 3.64 that a function is called closed if all its sublevel sets are closed.
Theorem 8.8 (Unconstrained minimization of a proper closed function)
Let
Let
. . is closed.
In words, the minimal value
Proof. We show that
Since
is proper, hence is nonempty.Thus, there exists
such that .Hence,
.
Let
We first show that
let
.Then,
.Then,
for every .Hence,
for every .Thus,
.Thus,
.
We now show that
Let
.We claim that
must hold true. cannot be smaller than since by definition .Thus,
must hold true.Now, for contradiction, assume that
.Let
.Let
.Then,
.Hence,
.But then,
since as .A contradiction.
Thus, the only allowed value for
is .Thus,
.Thus,
.Thus,
We show that
If
is empty, then it is closed by definition.Otherwise,
is an intersection of sublevel sets.Since
is closed, its sublevel sets are closed.Hence
is closed for every .Thus,
is an intersection of closed sets.Thus,
is closed.
Theorem 8.9 (Weierstrass’ theorem)
Let
Let
Assume that one of the following conditions are true.
is closed and bounded.There exists a scalar
such that the sublevel set is nonempty and bounded. is coercive.
Then,
Here is this result in a simplified language. The theorem addresses the problem of unconstrained minimization of a function.
If
is proper and closed and its domain is closed and bounded, then its set of minimizers is nonempty and compact.If
is proper and closed and any sublevel set of is bounded, then its set of minimizers is nonempty and compact.If
is proper, closed and coercive, then its set of minimizers is nonempty and compact.
Since the set of minimizers is nonempty and compact, it is nonempty, closed and bounded.
Proof. If there exists
Assume that condition (1) holds.
Since
is proper, hence is nonempty.Consider a sequence
of such that .Since
is bounded, hence has a convergent subsequence by Bolzano-Weierstrass theorem (Theorem 4.69).Let
be the convergent subsequence of with .Such a sequence exists since
.Thus, for every
, there exists such that .
Since
is closed, hence .Since
is convergent and is a subsequence of , henceSince
is closed, hence it is lower semicontinuous (l.s.c.) at (see Theorem 3.119).Also, note that since
is convergent, henceSince
is l.s.c. at , hence by Theorem 3.108,Since
is the infimum value of , hence cannot be smaller than .Thus,
must be true.Since
is proper and , hence . Thus the optimal value is finite.Thus,
means that is an optimal solution for the minimization of .Thus, the set
is nonempty. is bounded since is bounded by hypothesis.Since
is closed and bounded. Hence, is compact.
Assume that condition (2) holds.
The sublevel set
is nonempty and bounded.Since
is closed, hence is also closed.Consider the restriction of
on given byThen,
. Thus, is nonempty, closed and bounded.Then,
never takes the value and is not identically . Hence, is a proper function.Since
is closed, hence sublevel sets of are also closed.For any
, the sublevel set of is identical to .For any
, the sublevel set of is identical to , the corresponding sublevel set of .
Thus,
is closed too.Applying condition (1) on
, the set of minimizers of is nonempty and compact.We also note that the minimizers of
are identical to the minimizers of .Thus, the set of minimizers of
is nonempty and compact.
Assume that condition (3) holds.
Since
is proper, hence it has some nonempty sublevel sets.Let
be one such scalar such that the sublevel set is nonempty.By Theorem 8.7, the nonempty sublevel sets of
are bounded. Hence, is bounded.Then, by applying condition (2), the set of minimizers of
is nonempty and compact.
Corollary 8.1 (Minimizing a proper closed function over a closed set)
Let
Further, assume that one of the following conditions are true.
is bounded.There exists a scalar
such that the set is nonempty and bounded. is coercive over .
Then, the set of minimizers of
Proof. We define a restriction
. Thus, is closed.Since
never takes the value , hence also never takes the value .Since
is nonempty, hence there exists such that .Hence,
is a proper function.Also, note that the set
is nothing but the sublevel set of for the scalar .Since
is closed, all its sublevel sets are closed. is closed, hence is also closed.Hence, all the sublevel sets of
are also closed. Hence, is also closed.The set of minimizers of
over is nothing but the set of minimizers of .If
is coercive over , then is coercive (over its entire domain).
Thus, applying Theorem 8.9,
the set of minimizers of
Corollary 8.2 (Minimizing a real valued l.s.c. function over a closed set)
Let
Further, assume that one of the following conditions are true.
is bounded.There exists a scalar
such that the set is nonempty and bounded. is coercive over .
Then, the set of minimizers of
Proof. We define a restriction
. Thus, is closed.Since
is real valued hence also never takes the value .Since
is nonempty, hence there exists such that .Hence,
is a proper function.Since
is l.s.c. on , hence is closed.Also, note that the set
is nothing but the sublevel set of for the scalar .The set of minimizers of
over is nothing but the set of minimizers of .If
is coercive over , then is coercive (over its entire domain).
Thus, applying Theorem 8.9,
the set of minimizers of
8.1.6. Some Simple Problems and Technical Results#
In this subsection, we provide some technical results on some simple optimization problems. These results are useful building blocks for bigger problems.
Theorem 8.10 (Nonnegativity of affine functional on nonnegative orthant)
Consider the affine function
Then,
Proof. Assume that
For the converse, assume that
If
, then for , . Hence, cannot be smaller than 0.Consider some
.Assume that
for some .Let
be such that for all and for some .Then,
.By increasing
suitably, we can get .Thus,
is also necessary to ensure that for all .
Theorem 8.11 (Minimization of linear functional over unit ball)
For any
is
Proof. If
Now consider the case where
By Cauchy Schwartz inequality
for any
.Thus, we have established a lower bound:
We now show that the lower bound is indeed attained.
Let
.Then
. Thus, belongs to the unit ball.Now
. Thus, the lower bound is attained.