Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

9.4 Constrained Optimization

In Section 9.3 we saw that the gradients of a surface can be used to find its maxima or minima. Since we allowed any inputs to the surface, the corresponding problems were unconstrained. An unconstrained optimization problem is a problem of the form:

Find: x=arg maxall x{f(x)}.\textbf{Find: } x_* = \argmax_{\text{all } x}\{f(x)\}.

A constrained optimization problem restricts the range of possible inputs, xx.

Constraint Functions

The set X\mathcal{X} could be defined explicitly, by listing all of its elements. In practice, we usually defined the set implicitly via list of criteria that all its elements must satisfy. These are the constraints. We say that X\mathcal{X} is defined by a set of equality constraints if:

For the remainder of this class we will assume that all constrained optimization problems are defined exclusively by a set of equality constraints, so will use constrained optimization to mean, an optimization problem restricted by a collection of equality constraints. To learn about constrained problems that incorporate inequality constraints, take an optimization class, or read on the Karush-Kuhn-Tucker (KKT) conditions.

When X\mathcal{X} is defined implicitly, we can use the constraints to check whether an input belongs to X\mathcal{X}. Just evaluate gj(x)g_j(x) for each j{1,2,...,m}j \in \{1,2,...,m\}. If each gj(x)=0g_j(x) = 0 then xXx \in \mathcal{X}. If any gj(x)0g_j(x) \neq 0, then xXx \notin \mathcal{X}. In contrast, it is not always easy to recover the explicit representation of a set from a list of constraints.

The same is true in reverse for explicit representations. Given an explicit representation it is easy to list the elements of a set, but it takes some work to identify a rule that all of the elements satisfy, and all of the elements of the complement fail to satisfy.

To go back and forth between a list of constraints and explicit representation of a set it is helpful to remember the relationship between simple algebraic expressions and the geometry they represent. We practiced this skill in Section 8.2 when we studied level sets.

For example, every linear set may be described by setting ax1+bx2+c=0a x_1 + b x_2 + c = 0 for some aa, bb, and cc. Every circular set may be described by setting x12+x22=r2x_1^2 + x_2^2 = r^2 for some radius r>0r > 0. Every elliptical set may be described by setting ax12+bx22=r2a x_1^2 + b x_2^2 = r^2 for some a>0 a > 0, b>0b > 0, and r>0r > 0.

The same formulas extend naturally to higher dimensions. For example, every planar set may be expressed by setting a1x1+a2x2+a3x3+c=ax+c=0a_1 x_1 + a_2 x_2 + a_3 x_3 + c = a \cdot x + c = 0 for some a=[a1,a2,a3]a = [a_1,a_2,a_3] and some cc. In this case the vector aa is the “normal” vector to the plane (it is perpendicular to the plane) so the choice of aa controls the orientation of the plane. The choice of cc controls the specific position of the plane.

When we use multiple constraints, the set X\mathcal{X} may be visualized as the intersection of a series of level sets, one per constraint. For the rest of this chapter we will assume that there is only one equality constraint.

Lagrange Multipliers

Introducing constraints on the inputs both restricts the range of possible input values we need to consider, and relaxes the requirements we need to enforce on candidate optima. If ff is smooth, and xx is unconstrained, then xx_* cannot be an extrema of ff unless f(x)=0\nabla f(x_*) = 0. In other words, when xx is unconstrained, all finite extrema of smooth ff correspond to locations where the surface is flat. If we add constraints, then, restricted by the constraints, ff may be optimized at a location xXx_* \in \mathcal{X} where ff is not flat, provided that there is no way to move uphill or downhill on ff, from xx_*, while staying inside of X\mathcal{X}.

Recall that a directional derivative, vf(x)\partial_v f(x_*) evaluates the instanteous rate of change (slope) of the surface ff along a path oriented in the direction vv leaving the point xx_*. Let X\mathcal{X} denote a set defined by a collection of equality constraints. Let xXx_* \in \mathcal{X} denote a point in X\mathcal{X}. Let T(x)\mathcal{T}(x_*) denote the collection of tangent directions to the set X\mathcal{X} leaving xx_*. These are all of the vectors vv such that the line x(t)=x+tv^x(t) = x_* + t \hat{v} is tangent to X\mathcal{X} at xx_*. The collection of tangents represents all of the directions in which we could move away from xx_* while remaining inside of X\mathcal{X}.

A point xx_* may be a solution to a constrained optimization problem if it:

  1. Satisfies the constraint g(x)=0g(x_*) = 0

  2. There is no direction vT(x)v \in \mathcal{T}(x_*) along which f(x+tv^)f(x + t \hat{v}) is increasing or decreasing.

In the optimization literature the first constraint is called primal feasibility and the second is called dual feasibility. A point xx_* that satisfies 1 and 2 is “feasible” in the sense that it could be an optima. If xx_* does not satisfy 1 and 2, then it cannot solve the constrained optimization problem. This is entirely analogous to the observation that, in an unconstrained problem, xx_* cannot be an extrema unless f(x)=0\nabla f(x_*) = 0, so a point xx_* is only feasible if f(x)=0\nabla f(x_*) = 0.

The second constraint is the most interesting. It requires that the surface ff is flat at xx_* along every path x(t)Xx(t) \in \mathcal{X} passing through xx_*. This is only possible if:

Dual feasibility is the generalization of the feasibility statement we derived in Section 9.3: xx_* cannot be an extrema of ff unless f(x)=0\nabla f(x_*) = 0, to constrained problems. Both generalize the familiar uni-dimensional rule, “set the derivative of ff to zero.” from Section 3.3.

To simplify the dual feasibility statement, recall that every directional derivative may be expressed in terms of the gradient of ff (see Section 9.2):

vf(x)=f(x)v^\partial_v f(x_*) = \nabla f(x_*) \cdot \hat{v}

where v^\hat{v} is the unit vector pointing in the same direction as vv. So, dual feasibility may be expressed:

  1. f(x)v^=0\nabla f(x_*) \cdot \hat{v} = 0 for all vT(x)v \in \mathcal{T}(x_*).

This is a bit easier to think about since f(x)\nabla f(x_*) is fixed by ff and xx_*. It separates the term associated with the slope of ff from the term associated with admissible directions along which we could move the input xx.

If f(x)=0\nabla f(x_*) = 0, then f(x)v^=0v^=0\nabla f(x_*) \cdot \hat{v} = 0 \cdot \hat{v} = 0. So, if xx_* is a location where ff is flat, then xx_* is feasible for both the unconstrained and the constrained problem. This makes sense. If xx_* is a local maximizer or minimizer of ff, it will remain a local maximizer or minimizer of ff if we restrict the allowed ways in which we could perturb the input xx away from xx_*. So, any point that is in XC\mathcal{XC}, where f\nabla f is zero, is feasible.

The dual feasibility constrain allows other solutions. Suppose that f(x)0\nabla f(x_*) \neq 0. By definition, v^=1\|\hat{v}\| = 1, so v^0\hat{v} \neq 0. Recall that (see Section 8.1), two nonzero vectors are perpendicular if and only if their inner product equals zero. Therefore, if f(x)0\nabla f(x_*) \neq 0, then f(x)v^=0\nabla f(x_*) \cdot \hat{v} = 0 if and only if vv is perpendicular to f(x)\nabla f(x_*).

Now we can write the dual feasibility constraint geometrically:

In other words, all tangent lines to the set X\mathcal{X}, passing through xx_*, must be perpendicular to the gradient of ff at xx_*.

To use this constraint, we need to convert it into a system of algebraic equations that we can manipulate. Once again, we can rely on ideas we introduced earlier in the course.

Recall that (see Section 9.2), the gradient of some surface, h(x)\nabla h(x), points perpendicularly to the level set of hh at xx. We argued this fact as follows.

If x(t)x(t) is a path constrained to a level set of some smooth surface hh, then h(x(t))h(x(t)) is a constant function of tt. So, the directional derivative of hh, at xx, along any path constrained to a level set of hh, must equal zero. The converse is also true. If vh(x)=0\partial_v h(x) = 0, then the line x(t)=x+vtx(t) = x + v t is tangent to the level set of hh passing through xx, since hh is flat along the direction vv leaving xx.

This observation provides an alternate description for the set of directions tangent to a level set. Given a smooth function hh, and an input xx, every direction vv such that vh(x)=0\partial_v h(x) = 0 is tangent to the level set of hh passing through xx.

We can use this conclusion to express the tangent set, T(x)\mathcal{T}(x_*) algebraicly.

By assumption X\mathcal{X} was defined by a single equality constraint, g(x)=0g(x) = 0. So, X\mathcal{X} is a level set of the constraint function gg. It follows that:

T(x)={v such that vg(x)=0}={v such that g(x)v^=0}.\mathcal{T}(x_*) = \{v \text{ such that } \partial_v g(x_*) = 0 \} = \{v \text{ such that } \nabla g(x_*) \cdot \hat{v} = 0 \}.

If g(x)0\nabla g(x_*) \neq 0, then g(x)v^=0\nabla g(x_*) \cdot \hat{v} = 0 if and only if vv is perpendicular to g(x)\nabla g(x_*). Therefore:

T(x)={v perpendicular to g(x)}.\mathcal{T}(x_*) = \{v \text{ perpendicular to } \nabla g(x_*) \}.

Now, dual feasibility requires that:

  1. f(x)\nabla f(x_*) is perpendicular to every vv that is perpendicular to g(x)\nabla g(x_*).

In other words, f(x)\nabla f(x_*) and g(x)\nabla g(x_*) must both be perpendicular to the same collection of tangent directions T(x)\mathcal{T}(x_*).

The tangent set, T(x)\mathcal{T}(x_*) is the collection of all vectors that satisfy the equation g(x)v=0\nabla g(x_*) \cdot v = 0. This is one linear equation. If xx is dd dimensional, it defines a d1d - 1 dimensional collection of vectors. For example, if d=2d = 2, then the tangent set is one-dimensional, and corresponds to a line. If d=3d = 3, then the tangent set is two-dimensional, and corresponds to a plane. In each case, there is only one direction perpendicular to the tangent set.

So, if f(x)\nabla f(x_*) and g(x)\nabla g(x_*) are both perpendicular to the same collection of tangent directions T(x)\mathcal{T}(x_*), then they must be parallel vectors!

We can now write down the feasibility constraints as a system of equations:

So, to identify candidate optima for a constrained optimization problem:

  1. Write down the constraint g(x)=0g(x_*) = 0

  2. Solve for f(x)\nabla f(x) and g(x)\nabla g(x).

  3. Write down the system of equations

    f(x)=λg(x)\nabla f(x_*) = \lambda \nabla g(x_*)

    where xx_* and λ\lambda are free variables.

  4. Solve for xx_* and λ\lambda by enforcing primal and dual feasibility.

  5. If there are multiple feasible points, evaluate ff at each and pick the feasible point that maximizes (or minimizes) ff.

Example

Let’s solve the constrained optimization problem we established at the start of the chapter using Lagrange multipliers.

Maximize: 14x2+125y2Given: x2+y29=0.\begin{aligned} & \textbf{Maximize: } \frac{1}{4} x^2 + \frac{1}{25} y^2 \\ & \textbf{Given: } x^2 + y^2 - 9 = 0. \end{aligned}

Working in order:

  1. Primal feasibility demands:

    x2+y29=0x_*^2 + y_*^2 - 9 = 0
  2. The gradients are:

    f(x,y)=[24x,225y]\nabla f(x_*,y_*) = [\frac{2}{4} x_*, \frac{2}{25} y_*]

    and:

    g(x,y)=[2x,2y]\nabla g(x_*,y_*) = [2 x_*, 2 y_*]
  3. So, dual feasibility demands:

    [24x225y]=λ[2x2y]\left[\begin{array}{c} \frac{2}{4} x_* \\ \frac{2}{25} y_* \end{array} \right] = \lambda \left[ \begin{array}{c} 2 x_* \\ 2 y_* \end{array} \right]

    Or, as a systems of equations:

    14x=λx(λ14)x=0125y=λy(λ125)y=0.\begin{aligned} & \frac{1}{4} x_* = \lambda x_* \Rightarrow \left(\lambda - \frac{1}{4} \right) x_* = 0 \\ & \frac{1}{25} y_* = \lambda y_* \Rightarrow \left(\lambda - \frac{1}{25} \right) y_* = 0. \end{aligned}

    The first equation requires that x=0x_* = 0 or that λ=1/4\lambda = 1/4. The second requires that y=0y_* = 0 or λ=1/25\lambda = 1/25. Since λ\lambda cannot equal both 1/41/4 and 1/251/25 at the same time, the system can only admit solutions where:

    (x=0,y=?,λ=1/25) or (x=?,y=0,λ=1/4)(x_* = 0, y_* = ?, \lambda = 1/25) \text{ or } (x_* = ?, y_* = 0, \lambda = 1/4)

    In either case, there is only one remaining unknown. It is dtermined by enforcing primal feasibility. Recall that

    x2+y2=9.x_*^2 + y_*^2 = 9.

    Therefore, if x=0x_* = 0, then y=±3y_* = \pm 3, and if y=0y_* = 0, then x=±3x_* = \pm 3.

    So, we are left with four feasible solutions:

    (x=0,y=±3) or (x=±3,y=0).(x_* = 0, y_* = \pm 3) \text{ or } (x_* = \pm 3, y_* = 0).
  4. Now that we’ve identified all the feasible points, we need only evaluate ff at each to find the maximizers:

    (x=0,y=3):f(0,3)=925(x=0,y=3):f(0,3)=925(x=3,y=0):f(3,0)=94(x=3,y=0):f(3,0)=94\begin{aligned} & (x_* = 0, y_* = 3): & f(0,3) = \frac{9}{25} \\ & (x_* = 0, y_* = -3): & f(0,-3) = \frac{9}{25} \\ & (x_* = 3, y_* = 0): & f(3,0) = \frac{9}{4} \\ & (x_* = -3, y_* = 0): & f(-3,0) = \frac{9}{4} \\ \end{aligned}

    Since 9/4>9/259/4 > 9/25, the points (x=±3,y=0)(x_* = \pm 3, y_* = 0) are the maximizers. The remining feasible points minimize ff.