Section 5.3 introduced the idea of a series. A series is a sum of infinitely many ordered terms.
This section develops a general method for approximating functions using series. This method is widely used in mathematics and in science. It allows accurate approximation of arbitrary smooth functions with polynomials.
Suppose that f(x) is a smooth function of x. We are asked to approximate f(x) for some x. For example, we could be asked for e0.1.
Often, we can’t find f(x), but can evaluate f at some nearby x∗. For instance, e0=1. If f is smooth, then we can use information about f at x∗, to approximate f(x) when x is close to x∗.
The simplest approximation is f(x)≈f(x). We can do better if we use more information.
Suppose that we know f(x∗) for some x∗, and we know the slope of f(x) at x∗, dxdf(x)∣x=x∗=f′(x∗).
The function f~1(x) is linear in x. It passes through (x∗,f(x∗)) with slope f′(x∗). We’ve denoted it f~ to emphasize that it approximates f, and added the subscript 1 to emphasize that the approximation is a first order polynomial in x (that is, a line).
The figure below shows a smooth function f, and the linear approximations to f at x∗=−3 (red), x∗=−0.5 (gold), and x∗=2 (green). Notice that each linear approximation accounts for the slope of f.
We can develop higher order polynomial approximations by approximating f′(x), then integrating. By the Fundamental Theorem of Calculus:
The figure below shows a smooth function f, and the linear (dashed) and quadratic (dotted) approximations to f at x∗=−3 (red), x∗=−0.5 (gold), and x∗=2 (green). Notice that each linear approximation accounts for the slope of f, while each quadratic approximation account for its slope and curvature at x∗. Also notice that the quadratic approximations are accurate farther from the points of expansion.
We can use quadratic approximations to help draw functions. If you are asked to draw f, and can evaluate both f′ and f′′, then when you select reference points x∗ for plotting, check the slope and second derivative, then sketch the tangent, and the tangent parabola f~2.
Exponential Example
Suppose that you are asked to approximate f(0.1)=e0.1. We know that e0=1, so, since the function ex is continuous, we could make the rough guess, e0.1≈1.
To improve our guess, we should account for the slope of ex at x=0. SInce dxdex=ex, the slope at 0 is f′(0)=e0=1. So, the linear approximation is:
To improve further, we should account for the curvature in f(x)=ex at x=0. As before, dx2d2f(x)=dx2d2ex=ex so f′′(0)=e0=1. Therefore, the quadratic approximation is:
This is a remarkably accurate approximation. In fact, f(0.1)=1.10517..., so the quadratic approximation is accurate up to the ten-thousandth’s place!
Logarithmic Example
Suppose that you are asked to approximate f(0.9)=log0.9. We know that log(1)=0, so, since the function log(x) is continuous, we could make the rough guess, log(0.9)≈0.
To improve our guess, we should account for the slope of log(x) at x=1. SInce dxdlog(x)=x1, the slope at 1 is f′(1)=11=1. So, the linear approximation is:
To improve further, we should account for the curvature in f(x)=log(x) at x=1. The second derivative is dx2d2f(x)=dxdx1=x2−1 so f′′(1)=12−1=−1. Therefore, the quadratic approximation is:
This is, again, a remarkably accurate approximation. In fact, f(0.1)=−0.10536..., so the quadratic approximation is accurate up to the ten-thousandth’s place!
We can develop even better approximations by including more details about f at x∗. If we account for f′′′ then we can introduce a cubic approximation:
Notice, the factor 1/6 in front of the cubic term comes from integrating f′′′(x) three times. First in the linear approximation to f′′(x) to produce f′′′(x∗)(x−x∗), then to produce 21f′′′(x∗)(x−x∗)2 in the quadratic approximation to f′(x), then to produce 3×21f′′′(x∗)(x−x∗)3 in the cubic approximation to f(x).
The figure below shows a smooth function f, its quadratic (dotted) and cubic (dash-dot) approximations at x∗=−3 (red), x∗=−0.5 (gold), and x∗=2 (green). Notice that the cubic approximations are extremely accurate, even for x far from x∗.
Taylor series generalize this idea by continuing the process out to infinitely many terms.
Derivation
To derive the general form, simply recurse the argument we used to build the quadratic and cubic approximations.
Suppose that we’d already derived the approximation to order m. We have derived it up to m=3 already.
Imagine you could show that the mth order polynomial approximation to any smooth function f about x∗ is given by:
So, if the mth order approximation takes the form of a Taylor series truncated its mth order term, then the m+1st order approximation takes the form of a Taylor series truncated at its m+1st order term. Since the linear approximation is a Taylor series trunacted to its first order term the linear case establishes the quadratic case, which establishes the cubic case, which establishes the quartic case, and so on. The base case m=1 implies m=2, which implies m=3, which implies m=4, etc. Repeating this argument to infinity establishes the full Taylor series:
The logistic function is a popular function in statistics and machine learning, and is the heart of logistic regression. Let’s work out a fifth order approximation to the logistic function about x∗=0.
The derivatives of the logistic function, evaluated at x∗=0 are:
f(0)=logistic(0)
f′(0)
f′′(0)
f′′′(0)
f(3)(0)
f(4)(0)
f(5)(0)
21
41
0
−81
0
41
All the even order derivatives except f(0) equal zero since logistic(x)−1/2 is odd about x=1/2.
Therefore, the quintic approximation to the logistic is:
The approximation has errors that converge to zero at rate O(x7) since f(6)(0)=0.
The quintic approximation about x=0 is illustrated in red in the figure below. The seventh order approximation is illustrated as a dotted gold curve.
Notice that, adding the seventh order term only improves the approximation slightly. It is often true that we see diminishing returns as we add higher order terms to a polynomial approximation. This is especially true when we approximate bounded functions. It is also typical to see high order polynomial approximation swerve rapidly away from the function it is approximating after following it closely over some interval. Adding higher order terms generally widens the interval (though often with diminishing returns) while producing bigger errors outside the interval where the series is converging.
Suppose that f(x)=ex. Then, let’s find its Taylor series about x=0.
To find the Taylor series we need to work out all of the derivatives of ex at x=0. This is easy, since the derivative of the exponential is the exponential:
The figure below illustrates the first four approximations (linear, quadratic, cubic, quartic) to the exponential produced by its Taylor series about x∗=0. In particular, ex≃1+x. Higher order approximations are illustrated with lighter dotted curves.
To show that this series converges for all x, use the ratio test (see Section 5.3):
Since r<1 for all x the series converges for all x.
The Taylor series for the exponential gives us a new series that we can close. We can add it to the list we started with the geometric series:
Sequence Name
Formula
Series
Series Value
Convergence
Geometric
rn
∑n=0∞rn
1−r1
if −1<r<1
Harmonic
n−1
∑n=1∞n−1
∞
diverges
Exponential Taylor Series
xn/n!
∑n=0∞n!1xn
ex
converges for all x
Approximating e
Taylor series provide a nice tool when we want to approximate important transcendental (irrational) numbers. For example, we can use the Taylor expansion of ex to develop very accurate approximations to e without using many terms. Applying our Taylor series:
when x is small. This explains why, when we’ve drawn normal density functions (bell curves), we see a smooth parabola shape for x near zero. That parabola is 1−0.5x2.
The figure below illustrates the first four approximations (linear, quadratic, cubic, quartic) to the logarithm produced by its Taylor series about x∗=1. In particular, log(x)≃x−1. Higher order approximations are illustrated with lighter dotted curves.
To identify the region where the series converges, use the ratio test from Section 5.3:
The series converges if r<1 and diverges if r>1, so the Taylor series for the logarithm converges if and only if ∣Δx∣<1. This means that log(x) only equals its Taylor series about x∗=1 for x∈(0,2).
We can now add the log Taylor series to the list of infinite series we can close: