In Section 9.3 we saw that the gradients of a surface can be used to find its maxima or minima. Since we allowed any inputs to the surface, the corresponding problems were unconstrained. An unconstrained optimization problem is a problem of the form:
A constrained optimization problem restricts the range of possible inputs, .
Constraint Functions¶
The set could be defined explicitly, by listing all of its elements. In practice, we usually defined the set implicitly via list of criteria that all its elements must satisfy. These are the constraints. We say that is defined by a set of equality constraints if:
Inequality Constraints
We can also define sets using a collection of inequality constraints. An inequality constraint is a constraint of the form:
for some constraint function . Note, we could use for any , but, this is the same as requiring or where . Similarly, we could use , but this is the same as requiring , or where .
For the remainder of this class we will assume that all constrained optimization problems are defined exclusively by a set of equality constraints, so will use constrained optimization to mean, an optimization problem restricted by a collection of equality constraints. To learn about constrained problems that incorporate inequality constraints, take an optimization class, or read on the Karush-Kuhn-Tucker (KKT) conditions.
Multiple Constraints
To add additional constraints, simply append more equality constraints to your list of “given” statements. For example, if and , and we wanted to maximize over all that lie on the unit sphere, and on the plane , then we would aim to:
Notice that the constraints are joined by an “and” statement. It follows that the set may be expressed as an intersection of the sets defined by each individual constraint:
When is defined implicitly, we can use the constraints to check whether an input belongs to . Just evaluate for each . If each then . If any , then . In contrast, it is not always easy to recover the explicit representation of a set from a list of constraints.
The same is true in reverse for explicit representations. Given an explicit representation it is easy to list the elements of a set, but it takes some work to identify a rule that all of the elements satisfy, and all of the elements of the complement fail to satisfy.
To go back and forth between a list of constraints and explicit representation of a set it is helpful to remember the relationship between simple algebraic expressions and the geometry they represent. We practiced this skill in Section 8.2 when we studied level sets.
For example, every linear set may be described by setting for some , , and . Every circular set may be described by setting for some radius . Every elliptical set may be described by setting for some , , and .
The same formulas extend naturally to higher dimensions. For example, every planar set may be expressed by setting for some and some . In this case the vector is the “normal” vector to the plane (it is perpendicular to the plane) so the choice of controls the orientation of the plane. The choice of controls the specific position of the plane.
When we use multiple constraints, the set may be visualized as the intersection of a series of level sets, one per constraint. For the rest of this chapter we will assume that there is only one equality constraint.
Lagrange Multipliers¶
Introducing constraints on the inputs both restricts the range of possible input values we need to consider, and relaxes the requirements we need to enforce on candidate optima. If is smooth, and is unconstrained, then cannot be an extrema of unless . In other words, when is unconstrained, all finite extrema of smooth correspond to locations where the surface is flat. If we add constraints, then, restricted by the constraints, may be optimized at a location where is not flat, provided that there is no way to move uphill or downhill on , from , while staying inside of .
Recall that a directional derivative, evaluates the instanteous rate of change (slope) of the surface along a path oriented in the direction leaving the point . Let denote a set defined by a collection of equality constraints. Let denote a point in . Let denote the collection of tangent directions to the set leaving . These are all of the vectors such that the line is tangent to at . The collection of tangents represents all of the directions in which we could move away from while remaining inside of .
A point may be a solution to a constrained optimization problem if it:
Satisfies the constraint
There is no direction along which is increasing or decreasing.
In the optimization literature the first constraint is called primal feasibility and the second is called dual feasibility. A point that satisfies 1 and 2 is “feasible” in the sense that it could be an optima. If does not satisfy 1 and 2, then it cannot solve the constrained optimization problem. This is entirely analogous to the observation that, in an unconstrained problem, cannot be an extrema unless , so a point is only feasible if .
The second constraint is the most interesting. It requires that the surface is flat at along every path passing through . This is only possible if:
Dual feasibility is the generalization of the feasibility statement we derived in Section 9.3: cannot be an extrema of unless , to constrained problems. Both generalize the familiar uni-dimensional rule, “set the derivative of to zero.” from Section 3.3.
To simplify the dual feasibility statement, recall that every directional derivative may be expressed in terms of the gradient of (see Section 9.2):
where is the unit vector pointing in the same direction as . So, dual feasibility may be expressed:
for all .
This is a bit easier to think about since is fixed by and . It separates the term associated with the slope of from the term associated with admissible directions along which we could move the input .
If , then . So, if is a location where is flat, then is feasible for both the unconstrained and the constrained problem. This makes sense. If is a local maximizer or minimizer of , it will remain a local maximizer or minimizer of if we restrict the allowed ways in which we could perturb the input away from . So, any point that is in , where is zero, is feasible.
The dual feasibility constrain allows other solutions. Suppose that . By definition, , so . Recall that (see Section 8.1), two nonzero vectors are perpendicular if and only if their inner product equals zero. Therefore, if , then if and only if is perpendicular to .
Now we can write the dual feasibility constraint geometrically:
In other words, all tangent lines to the set , passing through , must be perpendicular to the gradient of at .
To use this constraint, we need to convert it into a system of algebraic equations that we can manipulate. Once again, we can rely on ideas we introduced earlier in the course.
Recall that (see Section 9.2), the gradient of some surface, , points perpendicularly to the level set of at . We argued this fact as follows.
If is a path constrained to a level set of some smooth surface , then is a constant function of . So, the directional derivative of , at , along any path constrained to a level set of , must equal zero. The converse is also true. If , then the line is tangent to the level set of passing through , since is flat along the direction leaving .
This observation provides an alternate description for the set of directions tangent to a level set. Given a smooth function , and an input , every direction such that is tangent to the level set of passing through .
We can use this conclusion to express the tangent set, algebraicly.
By assumption was defined by a single equality constraint, . So, is a level set of the constraint function . It follows that:
If , then if and only if is perpendicular to . Therefore:
Now, dual feasibility requires that:
is perpendicular to every that is perpendicular to .
In other words, and must both be perpendicular to the same collection of tangent directions .
The tangent set, is the collection of all vectors that satisfy the equation . This is one linear equation. If is dimensional, it defines a dimensional collection of vectors. For example, if , then the tangent set is one-dimensional, and corresponds to a line. If , then the tangent set is two-dimensional, and corresponds to a plane. In each case, there is only one direction perpendicular to the tangent set.
So, if and are both perpendicular to the same collection of tangent directions , then they must be parallel vectors!
We can now write down the feasibility constraints as a system of equations:
So, to identify candidate optima for a constrained optimization problem:
Write down the constraint
Solve for and .
Write down the system of equations
where and are free variables.
Solve for and by enforcing primal and dual feasibility.
If there are multiple feasible points, evaluate at each and pick the feasible point that maximizes (or minimizes) .
Example¶
Let’s solve the constrained optimization problem we established at the start of the chapter using Lagrange multipliers.
Working in order:
Primal feasibility demands:
The gradients are:
and:
So, dual feasibility demands:
Or, as a systems of equations:
The first equation requires that or that . The second requires that or . Since cannot equal both and at the same time, the system can only admit solutions where:
In either case, there is only one remaining unknown. It is dtermined by enforcing primal feasibility. Recall that
Therefore, if , then , and if , then .
So, we are left with four feasible solutions:
Now that we’ve identified all the feasible points, we need only evaluate at each to find the maximizers:
Since , the points are the maximizers. The remining feasible points minimize .