Partial Derivatives and Linearization

Partial Derivatives¶

Suppose that $f(x) = f(x_1,x_2,...,x_d)$ is a scalar-valued function of $d$ variables. How can we find the “slope” of $f$ at some input $x = [x_1,x_2,...,x_d]$ ? What does it mean to differentiate a function that accepts more than one input?

The simplest approach is to take a derivative with respect to one input variable at a time, holding the others fixed. This is equivalent to selecting a cross-section of a surface, then, since cross-sections are functions with a single input,differentiating the cross-section using the regular derivative.

First, isolate a cross section. The animation below isolates a $y$ -cross section of the same surface visualized in Section 8.2. The selected cross-section holds $x = -0.75$ while allowing $y$ to vary.

cross section of a surface corresponding to $x = -0.75$ .

Second, take a derivative of the cross-section. The animation below isolates the derivative at $y = 0.25$ using the limiting definition of a derivative:

\frac{d}{ds} g(s) = \lim_{\Delta s \rightarrow 0} \frac{g(s + \frac{1}{2} \Delta s) - g(s - \frac{1}{2} \Delta s)}{\Delta s}

(1)

In this case, the function $g(s)$ is a $y$ cross-section of $f$ with $x = -0.75$ so $g(s) = f(-0.75,s)$ .

A derivative is the limit of the slope of secants.

The slope recovered above is the slope of the surface holding $x$ fixed at -0.75 and varying $y$ about 0.25. We call the slope of a function with respect to only one input variable a partial derivative.

Partial Derivatives

If $f(x) = f(x_1,x_2,...x_d)$ is a function on $d$ inputs, then its partial derivative with respect to the $j^{th}$ input variable, at $x$ , is:

\partial_{x_j}f(x) = \frac{d}{dt} f(x_1,x_2,...,x_j + t, ... x_d)\Big|_{t = 0}

(2)

Applying the limiting definition of the derivative:

\partial_{x_j}f(x) = \lim_{\Delta x \rightarrow 0} \frac{f(x_1,x_2,...,x_j + \frac{1}{2}\Delta x, ... x_d) - f(x_1,x_2,...,x_j - \frac{1}{2} \Delta x, ... x_d)}{\Delta x}.

(3)

For example, if $d = 2$ , then:

\begin{aligned} & \partial_{x_1} f(x) = \frac{d}{dt} f(x_1 + t, x_2) \Big|_{t = 0} = \lim_{\Delta x \rightarrow 0} \frac{f(x_1 + \frac{1}{2}\Delta x, x_2) - f(x_1 - \frac{1}{2}\Delta x, x_2)}{\Delta x} \\ & \partial_{x_2} f(x) = \frac{d}{dt} f(x_1, x_2 + t)\Big|_{t = 0} = \lim_{\Delta x \rightarrow 0} \frac{f(x_1, x_2 + \frac{1}{2}\Delta x) - f(x_1, x_2 - \frac{1}{2}\Delta x)}{\Delta x} \\ \end{aligned}

(4)

In any case, the partial derivative with respect to the $j^{th}$ input variable is the derivative of the cross-section produced by holding all variables except $x_j$ fixed.

Run the code cell below to visualize the cross-sections of surfaces and their tangent lines.

from utils_lsg import show_surface_cross_section

show_surface_cross_section()

Examples¶

Let’s practice. Here are three examples:

$f(x,y) = x + 7 y - 3$ find $\partial_{x} f(x,y)$ and $\partial_y f(x,y)$ .

Solution

\begin{aligned} & \partial_x f(x,y) = \frac{d}{dt} f(x+t,y)\Big|_{t = 0} = \frac{d}{dt} (x + t) + 7 y - 3 \Big|_{t = 0}= 1. \\ & \partial_y f(x,y) = \frac{d}{dt} f(x,y+t)\Big|_{t = 0} = \frac{d}{dt} x + 7 (y + t) - 3 \Big|_{t = 0} = 7. \\ \end{aligned}

(5)

$f(x,y) = x^2 \times (1 + y^3)$ find $\partial_{x} f(x,y)$ and $\partial_y f(x,y)$ .

Solution

\begin{aligned} & \partial_x f(x,y) = \frac{d}{dt} f(x+t,y) = \frac{d}{dt} (x + t)^2 \times (1 + y^3) \Big|_{t = 0} = 2x(1 + y^3). \\ & \partial_y f(x,y) = \frac{d}{dt} f(x,y+t) = \frac{d}{dt} x^2 \times (1 + (y+t)^3)\Big|_{t = 0} = x^2 ( 3 y^2). \\ \end{aligned}

(6)

$f(x,y) = \log(x - y)$ find $\partial_{x} f(x,y)$ and $\partial_y f(x,y)$ .

Solution

\begin{aligned} & \partial_x f(x,y) = \frac{d}{dt} f(x+t,y) = \frac{d}{dt} \log((x + t) - y) \Big|_{t = 0} = \frac{1}{x - y}. \\ & \partial_y f(x,y) = \frac{d}{dt} f(x,y+t) = \frac{d}{dt} \log(x - (y + t)) \Big|_{t = 0} = \frac{-1}{x - y}. \\ \end{aligned}

(7)

Notice that, in each case, the corresponding partial can be found by treating all the other inputs as if they were constants, and taking a regular derivative with respect to the variable of interest. For instance:

\partial_z 3 x^2 y z + 5 x z^{-1} = 3 x^2 y - 5 x z^{-2}

(8)

since:

\frac{d}{dz} a z + b z^{-1} = a - b z^{-2}

(9)

In this case, since we asked for a partial with respect to $z$ , we pretended that $x$ and $y$ were constants.

Partials Depend on All Inputs¶

Examples 2 and 3 above show that, for a generic surface $f$ , the partial derivative $\partial_{x_j} f(x)$ will be a function of all of the input variables, not just the input $x_j$ .

For instance:

\partial_{x} y \sin(x) = y \cos(x).

(10)

This makes sense since the $x$ cross-sections $f(x, 1)$ and $f(x,0)$ do not produce the same curve:

\begin{aligned} & f(x,1) = \sin(x), \quad f(x,0) = 0. \end{aligned}

(11)

The animation below continues the examples illustrated in the animations above. This time, we add a second $y$ cross-section by fixing a new $x$ value. The two cross-sections are different curves, so they have different slopes for different input values of $y$ . As a result, the partial derivative $\partial_{y} f(x,y)$ depends on both $x$ and $y$ .

Partial derivatives of $f$ with respect to $y$ along two different cross-sections.

Linearization¶

In Section 6.1 we saw that it is possible to approximate smooth functions with polynomial functions (e.g. linear, quadratic, or cubic functions) whose coefficients are derived by differentiating a function about a point where it is easy to evaluate. The same idea extends to functions of multiple variables.

Suppose that we know the value of $f(x,y)$ at some $x_*$ and some $y_*$ , and, we also know its partials, $\partial_{x} f(x,y)$ , $\partial_{y} f(x,y)$ at $x_*$ and $y_*$ . Then, for $x,y$ close to $x_*, y_*$ :

f(x,y) \simeq f(x_*,y_*) + \partial_{x} f(x_*,y_*) \times (x - x_*) + \partial_{y} f(x_*,y_*) \times (y - y_*).

(12)

Writing $x = x_* + \Delta x$ and $y = y_* + \Delta y$ :

f(x_* + \Delta x,y_* + \Delta y) \simeq f(x_*,y_*) + \partial_{x} f(x_*,y_*) \times \Delta x + \partial_{y} f(x_*,y_*) \times \Delta y.

(13)

Compare this result to the linear approximation introduced in Section 6.1. The formula above amounts to applying the linear correction to $f(x,y)$ separately in $x$ , then in $y$ .

For example, given $f(x,y) = x^2 + x e^{-y}$ ,

\partial_x f(x,y) = 2 x + e^{-y}

(14)

and

\partial_y f(x,y) = -x e^{-y}.

(15)

So:

\begin{aligned} f(0.1,0.2) =0.092 & \approx f(0,0) + \partial_x f(0,0) \times 0.1 + \partial_y f(0,0) \times 0.2 \\ & = 0 + (2 \times 0 + e^0) \times 0.1 + (-0 \times e^{0}) \times 0.2 \\ & = 1 \times 0.1 = 0.1. \end{aligned}

(16)

The surface:

\tilde{f}_1(x,y) = f(x_*,y_*) + \partial_{x} f(x_*,y_*) \times (x - x_*) + \partial_{y} f(x_*,y_*) \times (y - y_*)

(17)

defines a plane since it takes the form:

\tilde{f}_1(x,y) = a + b x + c y

(18)

for $a = f(x_*,y_*) - \partial_{x} f(x_*,y_*) x_* - \partial_{y} f(x_*,y_*) y_*.$ This should not be surprising. A plane is the two-dimensional equivalent of a line, since it is a surface that is a linear function of both inputs.

The plane defined by a linear approximation to a surface about $x_*,y_*$ is tangent to the surface at $x_*,y_*$ just like the linear approximation to a function of a single variable is tangent to the function. It has the same slope as the surface where it intersects the surface. Accordingly, we call the plane formed by a linear approximation a tangent plane.

We can write the formula for a tangent plane more concisely using an inner product:

f(x_* + \Delta x,y_* + \Delta y) \simeq f(x_*,y_*) + [\partial_{x} f(x_*,y_*), \partial_{y} f(x_*,y_*)] \cdot [\Delta x, \Delta y].

(20)

This form is nice since it collects like terms into vectors. In particular, it groups the collection of partial derivatives into a single vector. The next section, Section 9.1, is all about the vector of partial derivatives. This vector is important since it encodes all of the information about the slope of the surface at $x_*,y_*$ needed to build a tangent plane to the surface.

9.1 Partial Derivatives and Linearization