Constrained Optimization - Data 89 Course Notes

In Section 9.3 we saw that the gradients of a surface can be used to find its maxima or minima. Since we allowed any inputs to the surface, the corresponding problems were unconstrained. An unconstrained optimization problem is a problem of the form:

\textbf{Find: } x_* = \argmax_{\text{all } x}\{f(x)\}.

(1)

A constrained optimization problem restricts the range of possible inputs, $x$ .

Constrained Optimization

A constrained optimization problem asks for the inputs $x$ that maximize (or minimize) a surface, $f$ , restricted to a set of inputs $\mathcal{X}$ :

\textbf{Find: } x_* = \argmax_{x \in \mathcal{X}}\{f(x)\}.

(2)

You can read this expression, find the inputs $x$ , restricted to the set $\mathcal{X}$ , that maximize $f$ among all other $x \in \mathcal{X}$ . Alternately:

\begin{aligned} & \textbf{Maximize: } f(x) \\ & \textbf{Given: } x \in \mathcal{X} \end{aligned}

(3)

where the command, “maximize” may mean to either find the largest function value, the input that returns the largest function value, or both. In many problems we are interested in the input the maximizes the function, e.g. the argument maximizer.

Constraint Functions¶

The set $\mathcal{X}$ could be defined explicitly, by listing all of its elements. In practice, we usually defined the set implicitly via list of criteria that all its elements must satisfy. These are the constraints. We say that $\mathcal{X}$ is defined by a set of equality constraints if:

Inequality Constraints

We can also define sets using a collection of inequality constraints. An inequality constraint is a constraint of the form:

h(x) \leq 0

(6)

for some constraint function $h$ . Note, we could use $h(x) = c$ for any $c$ , but, this is the same as requiring $h(x) - c = 0$ or $\tilde{h}(x) = 0$ where $\tilde{h}(x) = g(x) - c$ . Similarly, we could use $h(x) \geq 0$ , but this is the same as requiring $-h(x) \leq 0$ , or $\tilde{h}(x) \leq 0$ where $\tilde{h}(x) = - h(x)$ .

For the remainder of this class we will assume that all constrained optimization problems are defined exclusively by a set of equality constraints, so will use constrained optimization to mean, an optimization problem restricted by a collection of equality constraints. To learn about constrained problems that incorporate inequality constraints, take an optimization class, or read on the Karush-Kuhn-Tucker (KKT) conditions.

Example

Suppose that $x = [x_1,x_2]$ and $f(x) = \frac{1}{4} x_1^2 + \frac{1}{25} x_2^2$ . Suppose that we wanted to maximize $f(x)$ over all input vectors $x$ that lie on the circle with radius 3. Then, to express $\mathcal{X}$ implicitly, we need to write $\mathcal{X}$ as the set of all vectors $[x_1,x_2]$ that satisfy some constraint.

Recall that, a circle is defined as the set of all points equidistant from the origin. This is the collection of all vectors with a fixed magnitude. So, we can write $\mathcal{X}$ as the collection of all $x$ such that $\|x\| = 3$ . This is, the set of all $x$ such that:

\sqrt{x_1^2 + x_2^2} = 3

(7)

or:

x_1^2 + x_2^2 = 9.

(8)

So, if we set:

g(x) = x_1^2 + x_2^2 - 9

(9)

then we can write our optimization problem in the standard form for a constrained optimization problem (subject to an equality constraint):

\begin{aligned} & \textbf{Maximize: } f(x) = \frac{1}{4} x_1^2 + \frac{1}{25} x_2^2 \\ & \textbf{Given: } g(x) = x_1^2 + x_2^2 - 9 = 0. \end{aligned}

(10)

Multiple Constraints

To add additional constraints, simply append more equality constraints to your list of “given” statements. For example, if $x = [x_1,x_2,x_3]$ and $f(x) = \frac{1}{4} x_1^2 + \frac{1}{25} x_2^2 - \frac{1}{9} x_3^2$ , and we wanted to maximize $f$ over all $x$ that lie on the unit sphere, and on the plane $2 x_1 - x_2 + 4 x_3 = 0.2$ , then we would aim to:

\begin{aligned} & \textbf{Maximize: } f(x) = \frac{1}{4} x_1^2 + \frac{1}{25} x_2^2 \\ & \textbf{Given: } g_1(x) = x_1^2 + x_2^2 + x_3^2 - 1 = 0 \text{ and } g_2(x) = 2 x_1 - x_2 + 4 x_3 - 0.2 = 0 . \end{aligned}

(11)

Notice that the constraints are joined by an “and” statement. It follows that the set $\mathcal{X}$ may be expressed as an intersection of the sets defined by each individual constraint:

\mathcal{X} = \mathcal{X}_1 \cap \mathcal{X}_2, \text{ where } \mathcal{X}_j = \{\text{all } x \text{ such that } g_j(x) = 0\}.

(12)

When $\mathcal{X}$ is defined implicitly, we can use the constraints to check whether an input belongs to $\mathcal{X}$ . Just evaluate $g_j(x)$ for each $j \in \{1,2,...,m\}$ . If each $g_j(x) = 0$ then $x \in \mathcal{X}$ . If any $g_j(x) \neq 0$ , then $x \notin \mathcal{X}$ . In contrast, it is not always easy to recover the explicit representation of a set from a list of constraints.

The same is true in reverse for explicit representations. Given an explicit representation it is easy to list the elements of a set, but it takes some work to identify a rule that all of the elements satisfy, and all of the elements of the complement fail to satisfy.

To go back and forth between a list of constraints and explicit representation of a set it is helpful to remember the relationship between simple algebraic expressions and the geometry they represent. We practiced this skill in Section 8.2 when we studied level sets.

For example, every linear set may be described by setting $a x_1 + b x_2 + c = 0$ for some $a$ , $b$ , and $c$ . Every circular set may be described by setting $x_1^2 + x_2^2 = r^2$ for some radius $r > 0$ . Every elliptical set may be described by setting $a x_1^2 + b x_2^2 = r^2$ for some $a > 0$ , $b > 0$ , and $r > 0$ .

The same formulas extend naturally to higher dimensions. For example, every planar set may be expressed by setting $a_1 x_1 + a_2 x_2 + a_3 x_3 + c = a \cdot x + c = 0$ for some $a = [a_1,a_2,a_3]$ and some $c$ . In this case the vector $a$ is the “normal” vector to the plane (it is perpendicular to the plane) so the choice of $a$ controls the orientation of the plane. The choice of $c$ controls the specific position of the plane.

When we use multiple constraints, the set $\mathcal{X}$ may be visualized as the intersection of a series of level sets, one per constraint. For the rest of this chapter we will assume that there is only one equality constraint.

Lagrange Multipliers¶

Introducing constraints on the inputs both restricts the range of possible input values we need to consider, and relaxes the requirements we need to enforce on candidate optima. If $f$ is smooth, and $x$ is unconstrained, then $x_*$ cannot be an extrema of $f$ unless $\nabla f(x_*) = 0$ . In other words, when $x$ is unconstrained, all finite extrema of smooth $f$ correspond to locations where the surface is flat. If we add constraints, then, restricted by the constraints, $f$ may be optimized at a location $x_* \in \mathcal{X}$ where $f$ is not flat, provided that there is no way to move uphill or downhill on $f$ , from $x_*$ , while staying inside of $\mathcal{X}$ .

Recall that a directional derivative, $\partial_v f(x_*)$ evaluates the instanteous rate of change (slope) of the surface $f$ along a path oriented in the direction $v$ leaving the point $x_*$ . Let $\mathcal{X}$ denote a set defined by a collection of equality constraints. Let $x_* \in \mathcal{X}$ denote a point in $\mathcal{X}$ . Let $\mathcal{T}(x_*)$ denote the collection of tangent directions to the set $\mathcal{X}$ leaving $x_*$ . These are all of the vectors $v$ such that the line $x(t) = x_* + t \hat{v}$ is tangent to $\mathcal{X}$ at $x_*$ . The collection of tangents represents all of the directions in which we could move away from $x_*$ while remaining inside of $\mathcal{X}$ .

A point $x_*$ may be a solution to a constrained optimization problem if it:

Satisfies the constraint $g(x_*) = 0$
There is no direction $v \in \mathcal{T}(x_*)$ along which $f(x + t \hat{v})$ is increasing or decreasing.

In the optimization literature the first constraint is called primal feasibility and the second is called dual feasibility. A point $x_*$ that satisfies 1 and 2 is “feasible” in the sense that it could be an optima. If $x_*$ does not satisfy 1 and 2, then it cannot solve the constrained optimization problem. This is entirely analogous to the observation that, in an unconstrained problem, $x_*$ cannot be an extrema unless $\nabla f(x_*) = 0$ , so a point $x_*$ is only feasible if $\nabla f(x_*) = 0$ .

The second constraint is the most interesting. It requires that the surface $f$ is flat at $x_*$ along every path $x(t) \in \mathcal{X}$ passing through $x_*$ . This is only possible if:

Dual feasibility is the generalization of the feasibility statement we derived in Section 9.3: $x_*$ cannot be an extrema of $f$ unless $\nabla f(x_*) = 0$ , to constrained problems. Both generalize the familiar uni-dimensional rule, “set the derivative of $f$ to zero.” from Section 3.3.

To simplify the dual feasibility statement, recall that every directional derivative may be expressed in terms of the gradient of $f$ (see Section 9.2):

\partial_v f(x_*) = \nabla f(x_*) \cdot \hat{v}

(13)

where $\hat{v}$ is the unit vector pointing in the same direction as $v$ . So, dual feasibility may be expressed:

$\nabla f(x_*) \cdot \hat{v} = 0$ for all $v \in \mathcal{T}(x_*)$ .

This is a bit easier to think about since $\nabla f(x_*)$ is fixed by $f$ and $x_*$ . It separates the term associated with the slope of $f$ from the term associated with admissible directions along which we could move the input $x$ .

If $\nabla f(x_*) = 0$ , then $\nabla f(x_*) \cdot \hat{v} = 0 \cdot \hat{v} = 0$ . So, if $x_*$ is a location where $f$ is flat, then $x_*$ is feasible for both the unconstrained and the constrained problem. This makes sense. If $x_*$ is a local maximizer or minimizer of $f$ , it will remain a local maximizer or minimizer of $f$ if we restrict the allowed ways in which we could perturb the input $x$ away from $x_*$ . So, any point that is in $\mathcal{XC}$ , where $\nabla f$ is zero, is feasible.

The dual feasibility constrain allows other solutions. Suppose that $\nabla f(x_*) \neq 0$ . By definition, $\|\hat{v}\| = 1$ , so $\hat{v} \neq 0$ . Recall that (see Section 8.1), two nonzero vectors are perpendicular if and only if their inner product equals zero. Therefore, if $\nabla f(x_*) \neq 0$ , then $\nabla f(x_*) \cdot \hat{v} = 0$ if and only if $v$ is perpendicular to $\nabla f(x_*)$ .

Now we can write the dual feasibility constraint geometrically:

In other words, all tangent lines to the set $\mathcal{X}$ , passing through $x_*$ , must be perpendicular to the gradient of $f$ at $x_*$ .

To use this constraint, we need to convert it into a system of algebraic equations that we can manipulate. Once again, we can rely on ideas we introduced earlier in the course.

Recall that (see Section 9.2), the gradient of some surface, $\nabla h(x)$ , points perpendicularly to the level set of $h$ at $x$ . We argued this fact as follows.

If $x(t)$ is a path constrained to a level set of some smooth surface $h$ , then $h(x(t))$ is a constant function of $t$ . So, the directional derivative of $h$ , at $x$ , along any path constrained to a level set of $h$ , must equal zero. The converse is also true. If $\partial_v h(x) = 0$ , then the line $x(t) = x + v t$ is tangent to the level set of $h$ passing through $x$ , since $h$ is flat along the direction $v$ leaving $x$ .

This observation provides an alternate description for the set of directions tangent to a level set. Given a smooth function $h$ , and an input $x$ , every direction $v$ such that $\partial_v h(x) = 0$ is tangent to the level set of $h$ passing through $x$ .

We can use this conclusion to express the tangent set, $\mathcal{T}(x_*)$ algebraicly.

By assumption $\mathcal{X}$ was defined by a single equality constraint, $g(x) = 0$ . So, $\mathcal{X}$ is a level set of the constraint function $g$ . It follows that:

\mathcal{T}(x_*) = \{v \text{ such that } \partial_v g(x_*) = 0 \} = \{v \text{ such that } \nabla g(x_*) \cdot \hat{v} = 0 \}.

(14)

If $\nabla g(x_*) \neq 0$ , then $\nabla g(x_*) \cdot \hat{v} = 0$ if and only if $v$ is perpendicular to $\nabla g(x_*)$ . Therefore:

\mathcal{T}(x_*) = \{v \text{ perpendicular to } \nabla g(x_*) \}.

(15)

Now, dual feasibility requires that:

$\nabla f(x_*)$ is perpendicular to every $v$ that is perpendicular to $\nabla g(x_*)$ .

In other words, $\nabla f(x_*)$ and $\nabla g(x_*)$ must both be perpendicular to the same collection of tangent directions $\mathcal{T}(x_*)$ .

The tangent set, $\mathcal{T}(x_*)$ is the collection of all vectors that satisfy the equation $\nabla g(x_*) \cdot v = 0$ . This is one linear equation. If $x$ is $d$ dimensional, it defines a $d - 1$ dimensional collection of vectors. For example, if $d = 2$ , then the tangent set is one-dimensional, and corresponds to a line. If $d = 3$ , then the tangent set is two-dimensional, and corresponds to a plane. In each case, there is only one direction perpendicular to the tangent set.

So, if $\nabla f(x_*)$ and $\nabla g(x_*)$ are both perpendicular to the same collection of tangent directions $\mathcal{T}(x_*)$ , then they must be parallel vectors!

We can now write down the feasibility constraints as a system of equations:

So, to identify candidate optima for a constrained optimization problem:

Write down the constraint $g(x_*) = 0$
Solve for $\nabla f(x)$ and $\nabla g(x)$ .
Write down the system of equations
$\nabla f(x_*) = \lambda \nabla g(x_*)$
(20)
where $x_*$ and $\lambda$ are free variables.
Solve for $x_*$ and $\lambda$ by enforcing primal and dual feasibility.
If there are multiple feasible points, evaluate $f$ at each and pick the feasible point that maximizes (or minimizes) $f$ .

Example¶

Let’s solve the constrained optimization problem we established at the start of the chapter using Lagrange multipliers.

\begin{aligned} & \textbf{Maximize: } \frac{1}{4} x^2 + \frac{1}{25} y^2 \\ & \textbf{Given: } x^2 + y^2 - 9 = 0. \end{aligned}

(21)

Working in order:

Primal feasibility demands:
$x_*^2 + y_*^2 - 9 = 0$
(22)
The gradients are:
$\nabla f(x_*,y_*) = [\frac{2}{4} x_*, \frac{2}{25} y_*]$
(23)
and:
$\nabla g(x_*,y_*) = [2 x_*, 2 y_*]$
(24)
So, dual feasibility demands:
$\left[\begin{array}{c} \frac{2}{4} x_* \\ \frac{2}{25} y_* \end{array} \right] = \lambda \left[ \begin{array}{c} 2 x_* \\ 2 y_* \end{array} \right]$
(25)
Or, as a systems of equations:
$\begin{aligned} & \frac{1}{4} x_* = \lambda x_* \Rightarrow \left(\lambda - \frac{1}{4} \right) x_* = 0 \\ & \frac{1}{25} y_* = \lambda y_* \Rightarrow \left(\lambda - \frac{1}{25} \right) y_* = 0. \end{aligned}$
(26)
The first equation requires that $x_* = 0$ or that $\lambda = 1/4$ . The second requires that $y_* = 0$ or $\lambda = 1/25$ . Since $\lambda$ cannot equal both $1/4$ and $1/25$ at the same time, the system can only admit solutions where:
$(x_* = 0, y_* = ?, \lambda = 1/25) \text{ or } (x_* = ?, y_* = 0, \lambda = 1/4)$
(27)
In either case, there is only one remaining unknown. It is dtermined by enforcing primal feasibility. Recall that
$x_*^2 + y_*^2 = 9.$
(28)
Therefore, if $x_* = 0$ , then $y_* = \pm 3$ , and if $y_* = 0$ , then $x_* = \pm 3$ .
So, we are left with four feasible solutions:
$(x_* = 0, y_* = \pm 3) \text{ or } (x_* = \pm 3, y_* = 0).$
(29)
Now that we’ve identified all the feasible points, we need only evaluate $f$ at each to find the maximizers:
$\begin{aligned} & (x_* = 0, y_* = 3): & f(0,3) = \frac{9}{25} \\ & (x_* = 0, y_* = -3): & f(0,-3) = \frac{9}{25} \\ & (x_* = 3, y_* = 0): & f(3,0) = \frac{9}{4} \\ & (x_* = -3, y_* = 0): & f(-3,0) = \frac{9}{4} \\ \end{aligned}$
(30)
Since $9/4 > 9/25$ , the points $(x_* = \pm 3, y_* = 0)$ are the maximizers. The remining feasible points minimize $f$ .

9.4 Constrained Optimization

Constraint Functions¶

Lagrange Multipliers¶

Example¶