8.1. Mathematical Optimization#

Main references for this section are [9, 17].

This section provides a general overview of optimization problems. The notion of objective or cost functions and constraint functions is introduced. Standard form for optimization problems is introduced. Several examples are discussed for equivalent forms of optimization problems. The standard form focuses on minimizing the objective function over a set of equality and inequality constraints. Maximization problems can also be converted into minimization problems by changing the sign of the objective function.

We introduce the notion of the domain of an optimization problem, feasible points or solutions, feasible set of solutions, the optimal value of the minimization problem, optimal points or solutions, optimal set etc.. We discuss when the problem may be infeasible or feasible but unbounded. We discuss situations where the problem may be feasible and have a global minimum/optimal value and yet a feasible solution may not exist.

We then discuss general requirements for the feasibility of the optimization problem and existence of an optimal solution. We introduce the notion of coercive functions which provide the Weierstrass type result for the existence of optimal solutions.

8.1.1. Optimization Problems#

Let V be a real n-dimensional inner product space.

In the most general form, an optimization problem can be considered as minimizing or maximizing the value of a real valued function f:VR over its domain S=domf.

  1. We call f as the objective function which is being minimized or maximized.

  2. The variable xV is called the optimization variable.

  3. S=domf is called the feasible set of solutions for the optimization problem.

  4. A point xS is called a feasible point or feasible solution.

  5. Our goal is to find an xS which maximizes or minimizes the objective function f.

Definition 8.1 (Globally optimal value and points)

Let f:VR be a real valued function with S=domf.

  1. The maximum value of f is given by

    sup{f(x)|xS}.
  2. The minimum value of f is given by

    inf{f(x)|xS}.
  3. We allow the maximum and minimum values to take and values.

  4. The function f may or may not attain its maximum / minimum value at some point in its domain S.

  5. xS is called a global minimum point if f(x)f(x) for every xS.

  6. xS is called a global maximum point if f(x)f(x) for every xS.

  7. xS is called a strict global minimum point if f(x)>f(x) for every xS with xx.

  8. xS is called a strict global minimum point if f(x)<f(x) for every xS with xx.

  9. A point xS is called a global optimal point if it is either a global maximum or minimum point.

  10. The maximum and minimum values of f are always unique as opposed to global optimal points which may not be unique.

  11. The set of global minimizers is given by

    argmin{f(x)|xS}.
  12. The set of global maximizers is given by

    argmax{f(x)|xS}.

In a more restricted setting, an optimization problem can be considered as minimizing or maximizing the value of a real valued function f:VR with S=domf over a set A such that AS.

  1. In this case, the feasible set is restricted to A.

  2. Minimizers and maximizers are searched within this subset A.

  3. This problem can be converted into the earlier general form by considering the restriction f~:VR of f such that A=domf~ and

    f~(x)=f(x)xA.

In several problems, it may not be possible to establish whether a point is globally optimal. However, it is easier to establish if a point is optimal in a neighborhood around it. Such points are called locally optimal points or extreme values.

Definition 8.2 (Local optimal points)

Let f:VR be a real valued function with S=domf. We say that f(a) is a local extreme value or local optimal value of f at adomf if there exists δ>0 such that f(x)f(a) doesn’t change sign on B(a,δ)S.

More specifically,

  1. f(a) is a local maximum value of f if for some δ>0:

    f(x)f(a)xB(a,δ)S.
  2. f(a) is a local minimum value of f if for some δ>0:

    f(x)f(a)xB(a,δ)S.

The point x=a is called a local extreme point or local optimal point of f or more specifically, a local maximum or a local minimum point of f.

  1. a is a strict local maximum point if for some δ>0:

    f(x)>f(a)xBd(a,δ)S.
  2. a is a strict local minimum point of f if for some δ>0:

    f(x)>f(a)xBd(a,δ)S.

Here Bd(a,δ) denotes the deleted neighborhood (an open ball of radius δ excluding a itself).

Remark 8.1 (Local optimal points for open domain)

Let f:VR be a real valued function with S=domf. Assume that S is open.

  1. If aS is a local maximum point, then there exists r1>0 such that

    f(x)f(a)xB(a,r1)S.
  2. But since S is open, hence aintS. Thus, there exists an open ball B(a,r2)S.

  3. Let r=min(r1,r2).

  4. Then, B(a,r)S and B(a,r)B(a,r1).

  5. Thus, B(a,r)B(a,r1)S.

  6. Thus, f(x)f(a)xB(a,r)S.

  7. Thus, the local optimality condition simplifies to looking for an open ball of radius r totally contained inside the open domain S.

Based on this discussion, we can adjust the conditions for local optimality.

  1. a is a local maximum point of f if for some r>0:

    f(x)f(a)xB(a,r)S.
  2. a is a local minimum point of f if for some r>0:

    f(x)f(a)xB(a,r)S.
  3. a is a strict local maximum point if for some r>0:

    f(x)>f(a)xBd(a,r)S.
  4. a is a strict local minimum point of f if for some r>0:

    f(x)>f(a)xBd(a,r)S.

8.1.1.1. Standard Form for Mathematical Optimization Problems#

Definition 8.3 (Optimization problem standard form)

Let V be an n-dimensional real vector space. A mathematical optimization problem of the form

(8.1)#minimize f0(x)subject to fi(x)0,i=1,,mhj(x)=0,j=1,,p

is known as an optimization problem in its standard form.

  1. xV is called the optimization variable.

  2. f0:VR is called the objective function or cost function.

  3. The functions fi:VR are called the inequality constraint functions.

  4. The corresponding inequalities fi(x)0 are called the inequality constraints.

  5. The functions hj:VR are called the equality constraint functions.

  6. The corresponding equations hj(x)=0 are called the equality constraints.

  7. If there are no constraints (m=0,p=0), the problem is called unconstrained.

  8. The set

    Ddomf0i=1mdomfij=1pdomhj

    is called the domain of the optimization problem.

  9. A point xD is called feasible if it satisfies all the inequality constraints fi(x)0 and all the equality constraints hi(x)=0.

  10. The optimization problem is called feasible if there exists at least one feasible point.

  11. If there are no feasible points in the domain D, then the optimization problem is called infeasible. Naturally, if D=, then the problem is infeasible.

  12. The set of feasible points for an optimization problem is called the feasible set or the constraint set. We shall denote the feasible set by C.

    C=domf0i=1mfi1(,0]j=1phj1(0).

    It is the intersection of the domain of f0, the 0-sublevel sets of fi for i=1,,m, and the 0-level sets of hj for j=1,,p.

  13. Thus, if the problem is infeasible, then C=.

  14. The optimum value pR of the optimization problem is defined as

    p=inf{f0(x)|fi(x)0,i=1,,m,hj(x)=0,j=1,,p}.

    In other words,

    p=inf{f0(x)|xC}.

    We allow p to take the extended values and .

  15. If the problem is infeasible, then p= It is consistent with the convention that the infimum of an empty set is .

  16. If p=, then the problem is called unbounded below. In this case, there exists a sequence {xk} of C such that limf0(xk)=.

  17. We say that x is an optimal point if it solves (8.1). In other words, xC and f(x)=p.

  18. The set of all optimal points is known as the optimal set denoted by Xopt.

    Xopt{x|fi(x)0,i=1,,m,hj(x)=0,j=1,,p,f0(x)=p}.

    In other words,

    Xopt={xC|f0(x)=p}.
  19. If an optimal point exists in C, then we say that the optimal value is attained or achieved.

  20. If Xopt is empty, then we say that the optimal value is not attained or not achieved.

  21. In particular, if the problem is unbounded below, then Xopt is indeed empty.

  22. If the feasible set C is not closed, then it is quite possible that The optimum value p is finite and yet it is not attained at any feasible point. Then, there exists a sequence {xk} of feasible points such that limf(xk)=p. However, there is no xC such that f(x)=p.

  23. A feasible point xC with f0(x)p+ϵ is called an ϵ-suboptimal point.

  24. The set of all ϵ-suboptimal points is called the ϵ-suboptimal set for the problem (8.1).

  25. We say that a feasible point x is locally optimal if there exists r>0 such that

    f0(x)=inf{f0(z)|zB[x,r]C}.

    In other words, x minimizes f over a local neighborhood of feasible points.

  26. If x is feasible and fi(x)=0, we say that i-th inequality constraint is active at x. Otherwise, fi(x)<0 and we say that the i-th inequality constraint is inactive.

  27. We say that a constraint is redundant if removing it does not change the feasible set.

  28. If we choose an orthonormal basis B={e1,,en} for V, then V is isomorphic to Rn under the bracket operator []B. The optimization variable x has a representation (x1,,xn) in Rn given by

    x=i=1nxiei.
  29. Thus, determining x is same as determining its components (x1,,xn).

  30. Since there are n scalar components in the representation of x, hence we can say that the optimization problem has n (scalar) variables.

In the sequel, we will be presenting a variety of optimization problems. We shall show how those problems can be converted to the standard form described above.

Example 8.1 (Box constraints standard form)

Let f0:RnR be a real valued function. Consider the optimization problem

minimize f0(x)subject to lixiui,i=1,,n

where xRn is the optimization variable. There are n constraints, one on each component of x. Each constraint gives a lower and upper bound on one component. Such constraints are known as variable bounds or box constraints.

This problem can be transformed into an equivalent problem:

minimize f0(x)subject to lixi0,i=1,,nsubject to xiui0,i=1,,n.

This form has 2n inequality constraints. We introduce the functions

fi(x)=lixi,i=1,,n and fi(x)=xinuin,i=n+1,,2n.

Then, the problem becomes

minimize f0(x)subject to fi(x)0,i=1,,2n.

This is the optimization problem in standard form with 2n inequality constraints and 0 equality constraints.

Example 8.2 (Maximization problem in standard form)

Consider the problem

maximize f0(x)subject to fi(x)0,i=1,,mhj(x)=0,j=1,,p

If we replace f0 by f0, then maximizing f0 is same as minimizing f0. Thus, this problem has an equivalent form

minimize f0(x)subject to fi(x)0,i=1,,mhj(x)=0,j=1,,p

Definition 8.4 (Feasibility problem)

Let V be an n-dimensional real vector space. A mathematical optimization problem of the form

(8.2)#find xVsubject to fi(x)0,i=1,,mhj(x)=0,j=1,,p

is called a feasibility problem.

We can convert this problem into the standard problem by introducing a cost function which is identically 0:

f0(x)=0xV.
  1. The domain reduces to

    D=i=1mdomfij=1pdomhj.
  2. The optimal value p= if the problem is not feasible.

  3. Otherwise, the optimal value is p=0.

  4. Every feasible point is also an optimal point.

  5. Every optimization problem in standard form can be converted into a feasibility problem by replacing the objective function with the function which is identically 0 everywhere.

  6. In that case, the problem reduces to checking whether the inequality and equality constraints are consistent or not. In other words, whether there are some points which satisfy all the inequality and equality constraints.

8.1.2. Equivalent Problems#

Definition 8.5 (Equivalent optimization problems)

We provide an informal definition of equivalent optimization problems.

Two optimization problems are called equivalent if it is possible to find the solution of one problem from the other problem and the vice versa.

The primary reason for transforming an optimization problem to an equivalent one is that it may be easier to solve an equivalent problem. If we can transform a problem into a form which has a well known closed form solution or has a well known algorithm to solve the problem, then we can solve the original problem by first solving the equivalent problem and then transforming the solution of the equivalent problem to the solution of the original problem.

  1. Transform the problem to an equivalent problem with a known algorithm for solution.

  2. Solve the equivalent problem.

  3. Transform the solution of the equivalent problem to the solution of the original problem.

In this sense, the computational complexity of solving an optimization problem cannot be greater than the complexity of solving the equivalent problem plus the complexity of transforming the solution of the equivalent problem to the solution of the original problem.

There are several ways to obtain an equivalent optimization problem from an existing optimization problem.

8.1.2.1. Scaling#

Proposition 8.1 (Equivalent problems by scaling of objective or constraint functions)

Consider the problem given by

minimize f0(x)=α0f0(x)subject to fi(x)=αifi(x)0,i=1,,mhj(x)=βjhj(x)=0,j=1,,p

where αi>0 for i=0,,m and βj0 for j=1,,p.

This problem is obtained from the optimization problem in standard from in (8.1) by scaling the objective function and the inequality constraint functions by positive constants and the equality constraints by nonzero constants.

  1. α0 cannot be negative otherwise it will convert a minimization problem to a maximization problem.

  2. α0 cannot be zero as that will convert the optimization problem into a feasibility problem.

  3. αi for i=1,,m cannot be negative as it will turn the inequality sign from fi(x)0 to f(x)0.

  4. αi for i=1,,m cannot be zero as it will remove the corresponding constraint altogether.

  5. βj cannot be zero. A zero value will discard the corresponding equality constraint.

  6. The feasible sets for both problems are identical.

  7. The optimal sets for both problems are identical.

  8. The optimal values for both problems are not identical however (unless α0=1). Since f0 is a scaled version of f0, hence the optimal value is also scaled accordingly.

8.1.2.2. Change of Variables#

Proposition 8.2 (Change of variables)

Let ϕ:VV be an injective function with Drangeϕ.

For the problem (8.1), introduce the following functions

f~i(z)=fi(ϕ(z)),i=0,,m and h~j(z)=hj(ϕ(z)),j=1,,p.

Now, consider the problem

(8.3)#minimize f~0(z)subject to f~i(z)0,i=1,,mh~j(z)=0,j=1,,p

with the variable zV.

We say that the two problems are related by the change of variable or substitution of the variable x=ϕ(z).

Suppose z is the solution to (8.3), then x=ϕ(z) is a solution to (8.1).

Similarly, if x is a solution to (8.1), then z=ϕ1(x) is a solution to (8.3).

8.1.2.3. Transformation with Monotone Functions#

Proposition 8.3 (Transformation of objective and constraint functions)

For the problem (8.1), we introduce the following:

  1. Let ψ0:RR be a strictly increasing function over rangef0.

  2. Let ψi:RR for i=1,,m be strictly increasing functions over rangefi with ψi(u)0u0.

  3. Let ηj:RR for j=1,,p be real functions that satisfy ηj(u)=0u=0.

  4. Let f~i(x)=ψi(fi(x)) for i=0,,m.

  5. Let h~j(x)=ηj(hj(x)) for j=1,,p.

Now, consider the problem

(8.4)#minimize f~0(x)subject to f~i(x)0,i=1,,mh~j(x)=0,j=1,,p

with the variable xV.

The problem (8.4) is equivalent to (8.1).

Example 8.3 (Least norms and least norms squared)

Consider the unconstrained least norm problem

minimizeAxb2

with xRn, ARm×n and bRm. We have f0(x)=Axb2. Then, rangef0=R+. Consider ψ0:RR as

ψ0(x)=x2.

Then, ψ0 is strictly increasing over R+. Let

f~0(x)=ψ0(f0(x))=Axb22=(Axb)T(Axb).

Then, the least norm minimization problem is equivalent to the problem

minimize(Axb)T(Axb)

which is the least norm squared problem.

We note that while f~0 is differentiable, f0 is not. It is a key difference between the two problems. Solving the least norm squared problem can be done using gradient methods since the objective function is differentiable.

8.1.2.4. Slack Variables#

Proposition 8.4 (Slack variables)

For the inequality constraints in the problem (8.1), we can introduce the variables si0 so that fi(x)+si=0. This way, we can convert an inequality constraint to an equality constraint and introduce simpler inequality constraints for the variables si0. The resultant problem:

(8.5)#minimize f0(x)subject to si0,i=1,,mfi(x)+si=0,i=1,,mhj(x)=0,j=1,,p

where the optimization variables are xV and sRm is equivalent to (8.1).

The variables si are known as slack variables. s=(s1,,sm) is a vector that collects all the slack variables.

  1. This form has m inequality constraints and m+p equality constraints.

  2. It has n+m optimization variables x1,,xn and s1,,sm.

  3. If (x,s) is a feasible point for (8.5), then x is a feasible point for (8.1).

  4. If x is a feasible point for (8.1), then we can pick si=fi(x) to form s making (x,s) a feasible point for (8.5).

  5. x is an optimal point for (8.1) if and only if (x,s) is an optimal point for (8.5) where si=fi(x).

  6. The slack variables measure how much slack does an inequality constraint function have at a feasible point.

  7. If si=0 then, fi is active and it has no slack.

  8. If si>0, then fi inactive and it has some slack.

8.1.2.5. Epigraph Polar Form#

Proposition 8.5 (Epigraph polar form)

The problem (8.1) is equivalent to the optimization problem below:

(8.6)#minimize tsubject to f0(x)t0,fi(x)0,i=1,,mhj(x)=0,j=1,,p

with the optimization variable (x,t)VR.

  1. Note that t appears only in a single inequality in (8.6).

  2. At the optimal point, f0(x)=t must hold true. If f(x)<t, then we can always choose a lower value of t as the solution.

  3. If x is an optimal point for (8.1), then (x,f(x)) must be an optimal point for (8.6).

  4. Similarly, if (x,t) is an optimal point for (8.6), then x must be an optimal point for (8.1) and t=f0(x) must hold true.

  5. The two problems are indeed equivalent.

  6. This problem has m+1 inequality constraints and p equality constraints.

  7. The objective function for the epigraph form is a linear function of (x,t).

8.1.3. First Order Conditions#

In this subsection, we focus on objective functions of type f:RnR which are differentiable.

Theorem 8.1 (First order optimality criterion for local optimal points)

Let f:RnR be a real valued function with S=domf. Suppose that xintS is a local optimal point. Assume that all the partial derivatives of f exist at x.
Then, f(x)=0.

  1. Let i=1,,n.

  2. Consider the one-dimensional function gi:RR given by

    gi(t)=f(x+tei)

    where ei is the i-th unit vector of Rn.

  3. We note that gi is differentiable at t=0 and

    gi(0)=fxi(x).
  4. Since x is a local optimal point of f, hence t=0 is a local optimal point of gi.

  5. Thus, gi(0)=0.

  6. Thus, fxi(x)=0.

  7. Since, this is true for every i=1,,n, hence f(x)=0.

We mention that gradient being zero is a necessary condition; i.e., if a point is an optimal point then the gradient of f at that point must be zero. It is not a sufficient condition. The gradient may be zero and yet the point may not be an optimal point.

Definition 8.6 (Stationary point)

Let f:RnR be a real valued function with S=domf. Let xintS and assume that f is differentiable in some neighborhood of x. Then, x is called a stationary point if f(x)=0.

Thus, the locally optimal points are necessarily stationary points.

8.1.4. Second Order Conditions#

Recall the discussion on semidefinite and definite matrices in Eigen Values.

For a twice continuously differentiable function f, the Hessian is symmetric. The semidefiniteness or definiteness of the Hessian provides necessary and sufficient conditions for local optimality at stationary points of f; i.e., the points adomf where f(a)=0.

We have following necessary conditions for local optimality.

  1. If a is a local minimum point then 2f(a) must be positive semidefinite.

  2. If a is a local maximum point then 2f(a) must be negative semidefinite.

We have following sufficient conditions for local optimality.

  1. If 2f(a) is positive definite, then a is a strict local minimum.

  2. If 2f(a) is negative definite, then a is a strict local maximum.

Theorem 8.2 (Necessary second order optimality conditions)

Let f:RnR be a real valued function with S=domf. Assume that S is open. Further, assume that f is twice continuously differentiable over S and that aS is a stationary point.

Then, the following hold.

  1. If a is a local minimum point of f over S, then 2f(a)O.

  2. If a is a local maximum point of f over S, then 2f(a)O.

Proof. Assume a to be a local minimum point.

  1. Then, there exists an open ball B(a,r)S such that f(x)f(a) for all xB(a,r).

  2. Let dRn be a nonzero vector.

  3. For 0<t<rd, we define

    at=a+td.

    By definition, atB(a,r).

  4. Hence, for any t(0,rd), f(at)f(a).

  5. By linear approximation theorem (Theorem 5.5), there exists a vector zt[a,at] such that

    f(at)f(a)=f(a)T(ata)+12(ata)T2f(zt)(ata).
  6. Since a is a stationary point, hence f(a)=0.

  7. Also, by definition of at, ata=td.

  8. Thus,

    f(at)f(a)=t22dT2f(zt)d.
  9. Since f(a) is local minimum. Hence, f(at)f(a)0.

  10. Thus, for any dRn and any 0<t<rd, we have

    t22dT2f(zt)d0.
  11. By the continuity of the Hessian, and the fact that zta as t0+, we obtain that

    t22dT2f(a)d0dRn.
  12. Thus, 2f(a)O.

The argument for second statement is similar. We can apply the same argument on f and recognize that if a is a local maximum for f then it is a local minimum for f.

Theorem 8.3 (Sufficient second order optimality conditions)

Let f:RnR be a real valued function with S=domf. Assume that S is open. Further, assume that f is twice continuously differentiable over S and that aS is a stationary point.

Then, the following hold.

  1. If 2f(a)succO, then a is a strict local minimum point of f over S.

  2. If 2f(a)O, then a is a strict local maximum point of f over S.

Proof. Let aS is a stationary point satisfying 2f(a)succO.

  1. Since the Hessian is continuous (f is twice continuously differentiable), hence there exists an open ball B(a,r) such that 2f(x)succO for every xB(a,r).

  2. By linear approximation theorem (Theorem 5.5), for any xB(a,r), there exists a vector z[a,x] such that

    f(x)f(a)=12(xa)T2f(z)(xa).
  3. We note that zB(a,r). Hence, 2f(z)succO.

  4. Thus, 12(xa)T2f(z)(xa)>0.

  5. Thus, f(x)f(a)>0 holds true for every xB(a,r).

  6. Thus, a is a strict local minimum point for f over S.

The proof for the second statement is similar.

Theorem 8.4 (Hessian based sufficient global optimality condition)

Let f:RnR be a real valued function with domf=Rn. Further, assume that f is twice continuously differentiable over Rn. Suppose that 2f(x)O for every xRn. Let aRn be a stationary point of f. Then, a is a global minimum point of f.

In other words, if the Hessian of a function is continuous and always positive semidefinite, then all its stationary points are also global minimum points.

Proof. We are given that a is a stationary point.

  1. f(a)=0.

  2. Let xRn.

  3. By linear approximation theorem (Theorem 5.5), there exists a vector z[a,x] such that

    f(x)f(a)=12(xa)T2f(z)(xa).
  4. By hypothesis, 2f(z)O.

  5. Thus, f(x)f(a)0 for every xRn.

  6. Thus, a is a global minimum point of f.

This result is a special case for the global optimality of convex optimization problems. If a convex function is twice differentiable, then its Hessian is positive semidefinite Theorem 9.100.

8.1.4.1. Saddle Points#

Definition 8.7 (Saddle point)

Let f:RnR be a real valued function with S=domf. Assume that S is open.

A stationary point aS is called a saddle point of f over U if it is neither a local minimum point nor a local maximum point of f over S.

Theorem 8.5 (Sufficient condition for saddle points)

Let f:RnR be a real valued function with S=domf. Assume that S is open. Further, assume that f is twice continuously differentiable over S and that aS is a stationary point.

If 2f(a) is an indefinite matrix, then a is a saddle point of f over S.

Proof. We are given that a is a stationary point and 2f(a) is indefinite.

  1. Then, f(a)=0.

  2. Recall from Theorem 4.133 that a matrix is indefinite if and only if at least one eigenvalue is positive and at least one eigenvalue is negative.

  3. Let λ>0 be a positive eigenvalue of 2f(a) and v be the corresponding normalized eigenvector.

  4. Since S is open, hence there exists r>0 such that B(a,r)S.

  5. Accordingly, a+tvB(a,r)S for every t[0,r).

  6. By quadratic approximation theorem (Theorem 5.6),

    f(a+tv)=f(a)+t22vT2f(a)v+o(t2v2).
  7. Putting 2f(a)v=λv and using v=1,

    f(a+tv)=f(a)+λt22+o(t2).
  8. Here o:R++R is a function satisfying o(x)x0 as x0+.

  9. Thus, o(t2)t20 as t0+.

  10. For every ϵ>0, there exists δ>0 such that

    |o(t2)t2|<ϵ, whenever t<δ.
  11. Choose ϵ=λ.

  12. Then, λt22<o(t2)<λt22 for every t<δ.

  13. Thus, λt22+o(t2)>0 for every t<δ.

  14. Thus, there exists r1=min(δ,r) such that for every 0t<r1, f(a+tv)>f(a).

  15. Thus, a is not a local maximum point.

  16. A similar argument with a negative eigenvalue and corresponding normalized eigenvector shows that a cannot be a local minimum point.

  17. Thus, a must be a saddle point.

8.1.5. Minimization of Proper Functions#

Let V be a real n-dimensional normed linear space. Let f:V(,] be a proper function with S=domf. We consider the problem of minimizing f over a set AS.

  1. f is the objective function or cost function (being minimized).

  2. The set A is the feasible set or constraint set.

  3. Any xA is a feasible solution or feasible point.

  4. If there is at least one feasible point (i.e., A is nonempty), the problem is feasible.

  5. Otherwise, the problem is infeasible.

  6. Let p=infxAf(x) be the optimal value of the minimization problem.

  7. We allow p to take values over the extended real line R.

  8. If the minimization problem is infeasible, then p=.

  9. If p=, then the problem is unbounded below.

  10. If there is some xA such that f(x)=p, then x is an optimal solution or optimal point or minimizing point or minimizer or global minimum over A.

  11. Alternatively, f attains a minimum over A at x. We write this as

    xargminxAf(x).
  12. If x is a unique minimizer of f over A, then we abuse the notation and write

    x=argminxAf(x).
  13. It is possible that p is finite and yet there is no optimal point in A.

  14. In other words, the set argminxAf(x) may be empty.

A basic question of optimization is whether an optimal solution exists.

Remark 8.2 (The set of optimal points)

The set of optimal points for the problem of minimization of a proper function f over a feasible set A is given by

argminxAf(x)=Af1(p)

where p=infxAf(x) and f1(y) denotes the level set of f given by {xS|f(x)=y}.

This comes directly from the fact that

  1. x must be feasible. Hence xA.

  2. f must attain the optimal value at x. Thus, p=f(x). Hence, xf1(p).

In other words, the optimal set is the intersection of the feasible set A with the level set f1(p).

Thus, for an optimal solution to exist

  1. p must be finite.

  2. The level set f1(p) must be nonempty.

  3. The feasible set A must be nonempty.

  4. The intersection of A with f1(p) must be nonempty.

8.1.5.1. Coercive Functions#

Definition 8.8 (Coercive function)

A proper function f:V(,] is called coercive over a set A if for every sequence {xn} of A such that limkxk=, we have limkf(xk)=.

If f is coercive over the entire vector space V, we say that f is coercive.

One way to think about coercive functions is that they grow rapidly at the extremes of the domain on which they are defined.

Theorem 8.6 (Level sets of coercive functions)

Let f:V(,] be a coercive proper function. Let aR. If the level set f1(a) is nonempty, then it is bounded.

In other words, all nonempty level sets of a coercive function are bounded.

Proof. For contradiction, assume that there exists aR such that A=f1(a) is unbounded.

  1. Then, there exists a sequence {xk} of A such that limkxk=.

  2. Since f is coercive, hence limkf(xk)=.

  3. But, by definition, f(xk)=a.

  4. Hence limkf(xk)=a.

  5. We have a contradiction.

  6. Hence, f1(a) must be bounded.

Theorem 8.7 (Sublevel sets of coercive functions)

Let f:V(,] be a coercive proper function with S=domf. For every rR, define the sublevel set Sr as {xS|f(x)r}. If Sr is nonempty, then it is bounded.

In other words, all nonempty sublevel sets of a coercive function are bounded.

Proof. For contradiction, assume that there exists rR such that Sr is unbounded.

  1. Then, there exists a sequence {xk} of Sr such that limkxk=.

  2. Since f is coercive, hence limkf(xk)=.

  3. But, by definition, f(xk)r.

  4. Hence limkf(xk)r.

  5. We have a contradiction.

  6. Hence, Sr must be bounded.

8.1.5.2. Weierstrass’ Theorem#

In this subsection, we examine the problem of unconstrained minimization of a proper closed function. Recall from Definition 3.64 that a function is called closed if all its sublevel sets are closed.

Theorem 8.8 (Unconstrained minimization of a proper closed function)

Let V be a real n-dim normed linear space. Let f:V(,] be a proper closed function with S=domf.

Let p=infxVf(x). Let X=f1(p) be the set of minimizers of f (over all of V). For every pR, let Sp denote the sublevel set {x|f(x)p}. Then,

  1. p<.

  2. X=p>pSp.

  3. X is closed.

In words, the minimal value p is either finite or . The set of minimizers is the intersection of all the (closed) sublevel sets for p>p. The set of minimizers is a closed set.

Proof. We show that p<.

  1. Since f is proper, hence S is nonempty.

  2. Thus, there exists xS such that f(x)<.

  3. Hence, p=infxVf(x)<.

Let Y=p>pSp. We have to show that X=Y.

We first show that XY.

  1. let xX.

  2. Then, f(x)=p.

  3. Then, f(x)<p for every p>p.

  4. Hence, xSp for every p>p.

  5. Thus, xY.

  6. Thus, XY.

We now show that YX.

  1. Let xY.

  2. We claim that f(x)=p must hold true.

  3. f(x) cannot be smaller than p since by definition p=inff(x).

  4. Thus, f(x)p must hold true.

  5. Now, for contradiction, assume that f(x)>p.

    1. Let q=f(x).

    2. Let p=p+q2.

    3. Then, p<p<q.

    4. Hence, xSp.

    5. But then, xY since YSp as p>p.

    6. A contradiction.

  6. Thus, the only allowed value for f(x) is p.

  7. Thus, xX.

  8. Thus, YX.

  9. Thus,

    X=f1(p)=Y=p>pSp.

We show that X is closed.

  1. If X is empty, then it is closed by definition.

  2. Otherwise, X is an intersection of sublevel sets.

  3. Since f is closed, its sublevel sets are closed.

  4. Hence Sp is closed for every p.

  5. Thus, X is an intersection of closed sets.

  6. Thus, X is closed.

Theorem 8.9 (Weierstrass’ theorem)

Let V be a real n-dim normed linear space. Let f:V(,] be a proper closed function with S=domf.

Let p=infxVf(x). Let X=f1(p) be the set of minimizers of f (over all of V). For every pR, let Sp denote the sublevel set {x|f(x)p}.

Assume that one of the following conditions are true.

  1. S=domf is closed and bounded.

  2. There exists a scalar rR such that the sublevel set Sr={x|f(x)r} is nonempty and bounded.

  3. f is coercive.

Then, X (the set of minimizers of f) is nonempty and compact.

Here is this result in a simplified language. The theorem addresses the problem of unconstrained minimization of a function.

  1. If f is proper and closed and its domain is closed and bounded, then its set of minimizers is nonempty and compact.

  2. If f is proper and closed and any sublevel set of f is bounded, then its set of minimizers is nonempty and compact.

  3. If f is proper, closed and coercive, then its set of minimizers is nonempty and compact.

Since the set of minimizers is nonempty and compact, it is nonempty, closed and bounded.

Proof. If there exists xS such that f(x)=p, then X is nonempty. To show that X is compact, we just need to show that it is closed and bounded. By Theorem 4.66, every closed and bounded subset of V which is a real n-dimensional normed linear space is compact. By Theorem 8.8, the optimal value p< and X, the set of minimizers of a proper and closed function, is closed. Thus, we just need to show that X is nonempty and bounded.

Assume that condition (1) holds.

  1. Since f is proper, hence S is nonempty.

  2. Consider a sequence {xk} of S such that limkf(xk)=p.

  3. Since S is bounded, hence {xk} has a convergent subsequence by Bolzano-Weierstrass theorem (Theorem 4.69).

  4. Let {xl} be the convergent subsequence of {xk} with limxl=x.

    1. Such a sequence exists since p=infxVf(x).

    2. Thus, for every kN, there exists xkS such that f(xk)p<1k.

  5. Since S is closed, hence xS.

  6. Since {f(xk)} is convergent and {f(xl)} is a subsequence of {f(xk)}, hence

    limlf(xl)=limkf(xk)=p.
  7. Since f is closed, hence it is lower semicontinuous (l.s.c.) at xS (see Theorem 3.119).

  8. Also, note that since {f(xk)} is convergent, hence

    lim inflf(xl)=limlf(xl).
  9. Since f is l.s.c. at xS, hence by Theorem 3.108,

    f(x)lim inflf(xl)=limlf(xl)=lim infkf(xk)=p.
  10. Since p is the infimum value of f, hence f(x) cannot be smaller than p.

  11. Thus, f(x)=p must be true.

  12. Since f is proper and xS, hence p>. Thus the optimal value is finite.

  13. Thus, f(x)=p means that x is an optimal solution for the minimization of f.

  14. Thus, the set X=f1(p) is nonempty.

  15. X is bounded since S is bounded by hypothesis.

  16. Since X is closed and bounded. Hence, X is compact.

Assume that condition (2) holds.

  1. The sublevel set Sr is nonempty and bounded.

  2. Since f is closed, hence Sr is also closed.

  3. Consider the restriction of f on Sr given by

    f~={f(x),f(x)r, otherwise .
  4. Then, domf~=Sr. Thus, domf~ is nonempty, closed and bounded.

  5. Then, f~ never takes the value and is not identically . Hence, f~ is a proper function.

  6. Since f is closed, hence sublevel sets of f~ are also closed.

    1. For any p>r, the sublevel set of f~ is identical to Sr.

    2. For any pr, the sublevel set of f~ is identical to Sp, the corresponding sublevel set of f.

  7. Thus, f~ is closed too.

  8. Applying condition (1) on f~, the set of minimizers of f~ is nonempty and compact.

  9. We also note that the minimizers of f are identical to the minimizers of f~.

  10. Thus, the set of minimizers of f is nonempty and compact.

Assume that condition (3) holds.

  1. Since f is proper, hence it has some nonempty sublevel sets.

  2. Let rR be one such scalar such that the sublevel set Sr={xS|f(x)r} is nonempty.

  3. By Theorem 8.7, the nonempty sublevel sets of f are bounded. Hence, Sr is bounded.

  4. Then, by applying condition (2), the set of minimizers of f is nonempty and compact.

Corollary 8.1 (Minimizing a proper closed function over a closed set)

Let f:V(,] be a proper closed function with S=domf. Let AS be a nonempty closed set. Consider the problem of minimizing f over A.

Further, assume that one of the following conditions are true.

  1. A is bounded.

  2. There exists a scalar rR such that the set {xA|f(x)r} is nonempty and bounded.

  3. f is coercive over A.

Then, the set of minimizers of f over A is nonempty and compact.

Proof. We define a restriction g:V(,] of f over the set A as follows

g(x)={f(x),xA, otherwise .
  1. domg=A. Thus, domg is closed.

  2. Since f never takes the value , hence g also never takes the value .

  3. Since A is nonempty, hence there exists xV such that g(x)<.

  4. Hence, g is a proper function.

  5. Also, note that the set {xA|f(x)r} is nothing but the sublevel set of g for the scalar r.

    {x|g(x)r}=A{x|f(x)r}.
  6. Since f is closed, all its sublevel sets are closed.

  7. A is closed, hence A{x|f(x)r} is also closed.

  8. Hence, all the sublevel sets of g are also closed. Hence, g is also closed.

  9. The set of minimizers of f over A is nothing but the set of minimizers of g.

  10. If f is coercive over A, then g is coercive (over its entire domain).

Thus, applying Theorem 8.9, the set of minimizers of g is nonempty and compact. So is the set of minimizers of f over A.

Corollary 8.2 (Minimizing a real valued l.s.c. function over a closed set)

Let f:VR be a real valued function with domf=V. Let AV be a nonempty closed set. Assume that f is lower semicontinuous over A. Consider the problem of minimizing f over A.

Further, assume that one of the following conditions are true.

  1. A is bounded.

  2. There exists a scalar rR such that the set {xA|f(x)r} is nonempty and bounded.

  3. f is coercive over A.

Then, the set of minimizers of f over A is nonempty and compact.

Proof. We define a restriction g:V(,] of f over the set A as follows

g(x)={f(x),xA, otherwise .
  1. domg=A. Thus, domg is closed.

  2. Since f is real valued hence g also never takes the value .

  3. Since A is nonempty, hence there exists xV such that g(x)<.

  4. Hence, g is a proper function.

  5. Since f is l.s.c. on A, hence g is closed.

  6. Also, note that the set {xA|f(x)r} is nothing but the sublevel set of g for the scalar r.

  7. The set of minimizers of f over A is nothing but the set of minimizers of g.

  8. If f is coercive over A, then g is coercive (over its entire domain).

Thus, applying Theorem 8.9, the set of minimizers of g is nonempty and compact. So is the set of minimizers of f over A.

8.1.6. Some Simple Problems and Technical Results#

In this subsection, we provide some technical results on some simple optimization problems. These results are useful building blocks for bigger problems.

Theorem 8.10 (Nonnegativity of affine functional on nonnegative orthant)

Consider the affine function f:RnR given by

f(x)=aTx+b.

Then, f(x)0 for every x0 if and only if a0 and b0.

Proof. Assume that a0 and b0. Then f(x) is product and sum of nonnegative terms. Hence, f(x)0.

For the converse, assume that f(x)0 for every x0.

  1. If b<0, then for x=0, f(x)=b<0. Hence, b cannot be smaller than 0.

  2. Consider some a0.

  3. Assume that ai<0 for some i[1,,n].

  4. Let x be such that xj=0 for all ji and xi=t for some t>0.

  5. Then, f(x)=tai+b.

  6. By increasing t suitably, we can get f(x)<0.

  7. Thus, a0 is also necessary to ensure that f(x)0 for all x0.

Theorem 8.11 (Minimization of linear functional over unit ball)

For any aRn, the optimal value of the problem

infx1aTx

is a.

Proof. If a=0, then aTx=0. The minimum value is 0 which is equal to a.

Now consider the case where a0.

  1. By Cauchy Schwartz inequality

    aTxaxa

    for any x1.

  2. Thus, we have established a lower bound:

    infx1aTxa.
  3. We now show that the lower bound is indeed attained.

  4. Let x=aa.

  5. Then x=1. Thus, x belongs to the unit ball.

  6. Now aTx=a. Thus, the lower bound is attained.