10.6. Linear Constraints#

This section focuses on optimization problems where the constraints are linear (either equality or inequality constraints). We present the optimality conditions in the form of special cases of KKT conditions.

In the sequel, we present optimality conditions for the following optimization problems

  1. Necessary optimality conditions for the minimization of a smooth cost function with linear inequality constraints.

  2. Necessary and sufficient optimality conditions for the minimization of a convex and smooth cost function with linear inequality constraints.

Before proceeding with the optimality conditions, we present a few results which belong to a class of results called “theorems of the alternative”.

10.6.1. Farkas’ Lemma#

Farkas’ lemma is a solvability theorem for a finite system of linear inequalities. Standard versions of Farkas’ lemma consist of two different systems of linear equations and inequalities and exactly one of them is feasible.

Theorem 10.53 (Farkas’ lemma)

Let ARm×n and bRm. Then, exactly one of the following two statements is true.

  1. There exists xRn such that Ax=b,x0.

  2. There exists yRm such that ATy0,yTb<0.

(1) and (2) are called strong alternatives as exactly one of them must be true. In contrast weak alternatives are statements in which at most one of them can be true.

Proof. We first show that both (1) and (2) cannot be true simultaneously.

  1. Assume that both (1) and (2) are true.

  2. Note that

    yTAx=yT(Ax)=yTb<0

    since Ax=b by (1) and yTb<0 by (2).

  3. At the same time

    yTAx=(yTA)x=(ATy)Tx0

    since by (2) ATy0 and by (1) x0.

  4. Thus, we have a contradiction.

  5. Hence (1) and (2) cannot be true simultaneously.

We now show that if (1) doesn’t hold then (2) must be true. In other words, not (1) (2).

  1. Let a1,,an be the columns of A.

  2. Note that the columns of ARm.

  3. Then, (1) can be interpreted as saying that b is a conic combination of columns of A.

  4. Let Q denote the cone generated by the columns of A

    Q=cone{a1,,an}.
  5. Q is nonempty, closed and convex cone. The closedness of Q is due to Theorem 9.131 since Q is a conic hull of a finite set of points.

  6. Since (1) doesn’t hold true, hence bQ.

  7. By the strict separation theorem Theorem 9.169, the set Q and the point b can be strictly separated by a hyperplane.

  8. Specifically, there exists pRm and βR such that

    pTb>β and pTxβxQ.
  9. Since 0Q, hence β0.

  10. Pick any i{1,,n}.

  11. Then, taiQ for all t>0 since Q is a cone containing ai.

  12. Thus, pTtaiβ for all t>0.

  13. Thus, pTaiβt for all t>0.

  14. Since β0, hence taking the limit t, we get pTai0.

  15. This holds true for every i=1,,n.

  16. Choose y=p.

  17. Then, yTb=(pTb)<β0.

  18. Thus, yTb<0.

  19. Also, yTai0 for all i=1,,n.

  20. Thus, yTA0 and yTb<0 as desired in (2).

Showing that if (2) doesn’t hold true, then (1) must be true is straightforward.

  1. Assume that (2) is false.

  2. For contradiction, assume that (1) is false.

  3. Then (2) must be true by the previous argument.

  4. We have a contradiction.

  5. Hence, (1) must be true.

Theorem 10.54 (Farkas’ lemma version 2)

Let ARm×n and bRm. Then, exactly one of the following two statements is true.

  1. There exists xRn such that Axb.

  2. There exists yRn such that ATy=0,yTb<0,y0.

(2) is also equivalent to the following statement:

3 There exists yRm such that ATy=0,yTb=1,y0.

Proof. We first show that (2) and (3) are equivalent.

  1. Clearly (3) (2).

  2. Assume (2) is true.

  3. Let yTb=s. Let r=1s. Then, r>0.

  4. Let y~=yr.

  5. Then, y~0 since y0 and r>0.

  6. Also y~b=ryTb=1ss=1.

  7. Finally ATy~=rATy=0.

  8. Thus, y~ satisfies (3).

Next, we show that (1) and (2) cannot be both true simultaneously.

  1. For contradiction, assume that both (1) and (2) are true.

  2. Then

    yTAx=yT(Ax)yTb<0

    since Axb from (1), y0 from (2) and yTb<0 from (2).

  3. Also

    yTAx=(yTA)x=(ATy)Tx=0Tx=0

    since ATy=0 from (2).

  4. We have a contradiction.

  5. Thus, both (1) and (2) cannot be true simultaneously.

We next prove that if (2) is false then (1) must be true.

  1. Since (2) and (3) are equivalent hence (3) is false too.

  2. Combine the system of equations ATy=0 and yTb=1 as A~y=b~ where

    A~=[ATbT] and b~=[001].
  3. Since (3) if false, hence there doesn’t exist yRm such that A~y=b~.

  4. This is identical to statement (1) of the original Farkas’ lemma in Theorem 10.53.

  5. Hence statement (2) of Theorem 10.53 must hold true.

  6. Thus, there exists vRn+1 such that A~Tv0 and b~Tv<0.

  7. Set

    v=[xt]

    where xRn and tR.

  8. Then b~Tv<0 implies that t>0.

  9. A~Tv0 simplifies to

    Ax+tb0.
  10. This further simplifies to

    Ax+tb0AxtbA(1tx)b.

    The inequality sign changes since t>0.

  11. Clearly, 1tx satisfies (1) as desired. Thus, (1) is indeed true.

Theorem 10.55 (Farkas’ lemma version 3)

Let ARm×n and cRn.

Then the following statements are equivalent.

  1. The implication Ax0cTx0 holds true.

  2. There exists y0 such that ATy=c.

Proof. (2) (1)

  1. (2) is same as statement (1) of Theorem 10.53.

  2. Thus, by Theorem 10.53, there is no xRn such that Ax0 and xTc<0.

  3. Hence for every xRn,
    Ax0xTc0.

  4. Replacing x by x, we see that for every xRn,
    Ax0xTc0.

(1) (2)

  1. For every x, Ax0cTx0.

  2. Thus, For every x, Ax0cTx0.

  3. Thus, there doesn’t exist any x with Ax0 and cTx<0.

  4. Thus, by Theorem 10.53, there exists y0 such that ATy=c.

10.6.2. Gordan’s Theorem#

Theorem 10.56 (Gordan’s theorem)

Let ARm×n. Then exactly one of the following two systems has a solution.

  1. Ax0.

  2. p0,ATp=0,p0.

Proof. (1) not (2)

  1. Assume that (1) has a solution given by x.

  2. For contradiction, assume that (2) also has a solution.

  3. Hence, there exists a nonzero p such that ATp=0 and p0.

  4. Multiplying the equality ATp=0 from left by x, we get

    xTATp=(Ax)Tp=0.
  5. Since Ax0 and p0, hence every term in Ax is negative and every term in p is nonnegative.

  6. Hence (Ax)Tp=0 is possible only if p=0. A contradiction.

  7. Hence (2) doesn’t have a solution.

not (1) (2)

  1. Assume that (1) doesn’t have a solution.

  2. The system (1) is equivalent to the system

    Ax+s10;s>0.
  3. Define A~=[A1].

  4. Let c=en+1 (i.e., (0,0,,0,1).

  5. Then the above system is equivalent to

    A~[xs]0,cT[xs]>0.
  6. The infeasibility of (1) is thus equivalent to the infeasibility of the system

    A~w0,cTw>0

    where wRn+1.

  7. Since this system is infeasible, hence by Farkas' lemma, the system

    A~Tz=c,z0

    must have a solution.

  8. In other words, there exists z0 such that

    ATz=0,1Tz=1.
  9. Since 1Tz=1, hence z0.

  10. Thus, there exists z0, with z0 such that ATz=0.

  11. Hence, the system (2) has a solution z.

10.6.3. Smooth Cost Function, Linear Inequality Constraints#

Theorem 10.57 (KKT necessary optimality conditions for linearly constrained problems)

Consider an optimization problem of the form

(10.9)#minimize f(x)subject to x,aibi,i=1,,m.

where f is continuously differentiable over Rn, a1,,amRn and b1,,bmR. Assume that x is a local minimum of this optimization problem. Then there exist nonnegative scalars t1,,tm0 such that

(10.10)#f(x)+i=1mtiai=0

and

(10.11)#ti(x,aibi)=0,i=1,,m.

Proof. We are given that x is a local minimum point of (10.9).

  1. Let C denote the constraint set

    C={xRn|x,aibi,i=1,,m}.
  2. By Theorem 10.45, x must be a stationary point.

  3. Hence for every xC, we have

    xx,f(x)0.
  4. Introduce a change of variable y=xx in the stationary point criterion.

  5. We must have y,f(x)0 for every y satisfying y+x,aibi for every i=1,,m.

  6. Equivalently, we must have y,f(x)0 for every y satisfying

    y,aibix,ai,i=1,,m.
  7. The i-th linear inequality constraint is called active at x if x,ai=bi.

  8. The i-the constraint is called inactive if x,ai<bi.

  9. Let the set of active constraints be denoted by

    I(x)={i|x,ai=bi}.
  10. If the i-th constraint is active, then y+x,aibi is equivalent to y,ai0.

  11. If the i-th constraint is inactive, then y+x,aibi is equivalent to y,aibix,ai a positive quantity.

  12. Thus, we must have y,f(x)0 whenever y satisfies

    y,ai0,iI(x),y,aibix,ai,iI(x).
  13. We claim that the second set of inequalities are redundant.

    1. Suppose that we pick a y such that y,ai0,iI(x).

    2. At x, we have bix,ai>0 for every iI(x).

    3. Then it is possible to choose a small enough α>0 such that αy,aibix,ai for every i.

      1. We can choose an αi for every i as follows.

      2. If y,ai0, then let αi=1.

      3. If y,ai>0, then let

        αi=bix,aiy,ai.
      4. Let α=min{α1,,αm}.

    4. We can now see that

      y,ai0iI(x)αy,aibix,aii=1,,m.
    5. But then by stationarity of x, we must have αy,f(x)0.

    6. Since α>0, hence we must have y,f(x)0.

  14. Hence, we must have y,f(x)0 whenever y satisfies

    y,ai0,iI(x).
  15. We are ready to apply the Farkas’ lemma Theorem 10.53.

    1. Form A by combining ai for every iI(x) as columns.

    2. Let v=f(x).

    3. We have ATy0yTv0.

    4. Hence the system ATy0,yTv<0 is infeasible.

    5. Hence there exists t0 such that At=v.

  16. Hence there exists ti0 for every iI(x) such that

    f(x)=iI(x)tiai.
  17. By defining ti=0 for every iI(x), we have

    ti(x,aibi)=0i=1,,m

    and

    f(x)+i=1mtiai=0

    as desired.

10.6.4. Convex and Smooth Cost Function, Linear Inequality Constraints#

Theorem 10.58 (KKT necessary and sufficient optimality conditions for linearly constrained problems)

Consider an optimization problem of the form

minimize f(x)subject to x,aibi,i=1,,m.

where f is a convex continuously differentiable over Rn, a1,,amRn and b1,,bmR. Let x be a feasible solution of this optimization problem. Then x is an optimal solution if and only if there exist nonnegative scalars t1,,tm0 such that

f(x)+i=1mtiai=0

and

ti(x,aibi)=0,i=1,,m.

Proof. Assume that x is an optimal solution. Then it is also a local minimizer. Then the necessary conditions follow from Theorem 10.57.

For the converse, assume that x is a feasible solution and the conditions for the scalars t1,,tm are satisfied.

  1. Consider any feasible solution x.

  2. Define a function

    h(x)=f(x)+i=1mti(x,aibi).
  3. By differentiating w.r.t. x, we have

    h(x)=f(x)+i=1mtiai.
  4. By hypothesis we have, h(x)=0.

  5. Since h is convex, hence by Theorem 10.48, x is a minimizer of h over Rn.

  6. By hypothesis, we have ti(x,aibi)=0,i=1,,m.

  7. Hence

    i=1mti(x,aibi)=0.
  8. For any feasible point, we have x,aibi for every i.

  9. Since by hypothesis ti0, hence for any feasible point,

    i=1mti(x,aibi)0.
  10. Hence

    f(x)=f(x)+i=1mti(x,aibi)=h(x)h(x)=f(x)+i=1mti(x,aibi)f(x).
  11. Hence for every feasible solution x, we have f(x)f(x).

  12. This proves that x is a global optimal solution.

The scalars t1,,tm that appear in the previous two results are known as the Lagrange multipliers.

  1. Each multiplier ti is associated with the corresponding inequality constraint x,aibi.

  2. Each multiplier is nonnegative.

  3. The conditions in (10.11) are known as complementary slackness conditions.

  4. At the local minimizer, either the constraint is active with x,ai=bi or ti=0.

  5. The results can be extended to support linear equality constraints too.

10.6.5. Linear Constraints (Equality and Inequality)#

Theorem 10.59 (KKT optimality conditions for linearly constrained problems)

Consider an optimization problem of the form

(10.12)#minimize f(x)subject to x,aibi,i=1,,mand x,cj=dj,j=1,,p

where f is continuously differentiable over Rn, a1,,am,c1,,cpRn and b1,,bm,d1,,dpR. Then we have the following

  1. (Necessity of KKT conditions) If x is a local minimum of this optimization problem, then there exist nonnegative scalars t1,,tm0 and real scalars r1,,rpR

    such that

    (10.13)#f(x)+i=1mtiai+j=1prjcj=0

    and

    (10.14)#ti(x,aibi)=0,i=1,,m.
  2. (Sufficiency for convex cost functions) If f is also convex over Rn and x is a feasible point of the problem (10.12) for which there exist nonnegative scalars t1,,tm0 and real scalars r1,,rpR such that (10.13) and (10.14) are satisfied, then x is an optimal solution.

The key idea for the proof is convert each linear equality constraint to two linear inequality constraints. Then we can leverage the previous results.