Suppose that f(x)=f(x1,x2,...,xd) is a scalar-valued function of d variables. How can we find the “slope” of f at some input x=[x1,x2,...,xd]? What does it mean to differentiate a function that accepts more than one input?
The simplest approach is to take a derivative with respect to one input variable at a time, holding the others fixed. This is equivalent to selecting a cross-section of a surface, then, since cross-sections are functions with a single input,differentiating the cross-section using the regular derivative.
First, isolate a cross section. The animation below isolates a y-cross section of the same surface visualized in Section 8.2. The selected cross-section holds x=−0.75 while allowing y to vary.
cross section of a surface corresponding to x=−0.75.
Second, take a derivative of the cross-section. The animation below isolates the derivative at y=0.25 using the limiting definition of a derivative:
In this case, the function g(s) is a y cross-section of f with x=−0.75 so g(s)=f(−0.75,s).
A derivative is the limit of the slope of secants.
The slope recovered above is the slope of the surface holding x fixed at -0.75 and varying y about 0.25. We call the slope of a function with respect to only one input variable a partial derivative.
Run the code cell below to visualize the cross-sections of surfaces and their tangent lines.
from utils_lsg import show_surface_cross_section
show_surface_cross_section()
Notice that, in each case, the corresponding partial can be found by treating all the other inputs as if they were constants, and taking a regular derivative with respect to the variable of interest. For instance:
Examples 2 and 3 above show that, for a generic surface f, the partial derivative ∂xjf(x) will be a function of all of the input variables, not just the input xj.
The animation below continues the examples illustrated in the animations above. This time, we add a second y cross-section by fixing a new x value. The two cross-sections are different curves, so they have different slopes for different input values of y. As a result, the partial derivative ∂yf(x,y) depends on bothx and y.
Partial derivatives of f with respect to y along two different cross-sections.
In Section 6.1 we saw that it is possible to approximate smooth functions with polynomial functions (e.g. linear, quadratic, or cubic functions) whose coefficients are derived by differentiating a function about a point where it is easy to evaluate. The same idea extends to functions of multiple variables.
Suppose that we know the value of f(x,y) at some x∗ and some y∗, and, we also know its partials, ∂xf(x,y), ∂yf(x,y) at x∗ and y∗. Then, for x,y close to x∗,y∗:
Compare this result to the linear approximation introduced in Section 6.1. The formula above amounts to applying the linear correction to f(x,y) separately in x, then in y.
for a=f(x∗,y∗)−∂xf(x∗,y∗)x∗−∂yf(x∗,y∗)y∗. This should not be surprising. A plane is the two-dimensional equivalent of a line, since it is a surface that is a linear function of both inputs.
The plane defined by a linear approximation to a surface about x∗,y∗ is tangent to the surface at x∗,y∗ just like the linear approximation to a function of a single variable is tangent to the function. It has the same slope as the surface where it intersects the surface. Accordingly, we call the plane formed by a linear approximation a tangent plane.
We can write the formula for a tangent plane more concisely using an inner product:
This form is nice since it collects like terms into vectors. In particular, it groups the collection of partial derivatives into a single vector. The next section, Section 9.1, is all about the vector of partial derivatives. This vector is important since it encodes all of the information about the slope of the surface at x∗,y∗ needed to build a tangent plane to the surface.