Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

7.2 Integrals of Compositions

In Section 3.2 we introduced the idea of a function composition.

We’ve seen a variety of distributions that involve a composition, most importantly, the normal distribution, whose PDF has the form:

PDF(x)e12x2.\text{PDF}(x) \propto e^{-\frac{1}{2} x^2}.

The simplest function compositions use a linear function for hh:

f(x)=g(h(x))=g((xs)/σ),h(x)=(xs)/σf(x) = g(h(x)) = g((x - s)/\sigma), \quad h(x) = (x - s)/\sigma

These compositions translate the function by ss, and dilates it by σ\sigma (see Section 3.2). Many distribution families are defined by first picking some family of functions gg, then allowing arbitrary linear hh. In this case the parameters ss and σ\sigma of hh act as location and scale parameters that can be adjusted to translate and dilate the base model, gg.

Integration by Substitution (Change of Variables)

Consider an integral of a function composition:

x=abg(h(x))dx.\int_{x = a}^b g(h(x)) dx.

In Section 7.1 we worked out rules for integrating sums and products of functions. What about nested functions?

The procedure used to integrate a function composition is often called “uu” substitution after the convention u=h(x)u = h(x). There is no reason to call the output of the inner function uu. So, we won’t call this rule “uu” substitution. Instead, we’ll name it based on the action that justifies the rule: change of variables

Often, we want to just take an integral of a composition, e.g. x=abg(h(x))dx\int_{x = a}^b g(h(x)) dx. In this case we are missing the h(x)dxh'(x) dx term needed to replace g(h(x))dxg(h(x)) dx with g(u)dug(u) du.

So, we will add it in ourselves by multiplying the integrand with 1=h(x)/h(x)1 = h'(x)/h'(x):

g(h(x))dx=g(h(x))h(x)h(x)dx=g(h(x))h(x)h(x)dx.g(h(x)) dx = g(h(x)) \frac{h'(x)}{h'(x)} dx = \frac{g(h(x))}{h'(x)} h'(x) dx.

Now, to write everything in terms of uu, we can substitute h(x)=uh(x) = u and h(x)dx=duh'(x) dx = du. However, we get stuck with h(x)h'(x). This is still a function of xx, not uu. We want to write everything in terms of uu.

If hh is monotonic, then hh is invertible over its range, so we can define an inverse function h1h^{-1} such that h1(h(x))=xh^{-1}(h(x)) = x and h(h1(u))=uh(h^{-1}(u)) = u. Then, given u=h(x)u = h(x), x=h1(u)x = h^{-1}(u). For example, if h(x)=exh(x) = e^x, then h1(x)=log(x)h^{-1}(x) = \log(x). Then we can replace h(x)h'(x) with h(h1(u))h'(h^{-1}(u)).

Now:

g(h(x))dx=g(h(x))h(x)h(x)dx=g(u)h(h1(u))du.g(h(x)) dx = \frac{g(h(x))}{h'(x)} h'(x) dx = \frac{g(u)}{h'(h^{-1}(u))} du.

Integrating returns the standard form for integration by change of variables:

g(h(x))dx=g(u)h(h1(u))du.\int g(h(x)) dx = \int \frac{g(u)}{h'(h^{-1}(u))} du.

Then:

x=abg(h(x))dx=u=h(a)h(b)g(u)h(h1(u))du.\int_{x = a}^b g(h(x)) dx = \int_{u = h(a)}^{h(b)} \frac{g(u)}{h'(h^{-1}(u))} du.

Let g~(u)=g(u)/h(h1(u))\tilde{g}(u) = g(u)/h'(h^{-1}(u)). Then let G~\tilde{G} represent the anti-derivative (indefinite integral) of g~\tilde{g}. Then:

x=abg(h(x))dx=u=h(a)h(b)g~(u)du=G~(h(b))G~(h(a)).\int_{x = a}^b g(h(x)) dx = \int_{u = h(a)}^{h(b)} \tilde{g}(u) du = \tilde{G}(h(b)) - \tilde{G}(h(a)).

This form is a bit messy as a single formula. It’s easier to remember as a procedure:

  1. Identify a term in the integrand we would like to replace (e.g. a function of xx, h(x)h(x), that we will use to define a new variable, uu).

  2. Replace h(x)h(x) with uu everywhere inside the integral.

  3. Find h(x)=ddxh(x)h'(x) = \frac{d}{dx} h(x).

  4. Replace dxdx with du/h(x)du/h'(x) inside the integral.

  5. Solve for xx in terms of uu, x=h1(u)x = h^{-1}(u).

  6. Replace all remaining xx’s inside the integral with h1(u)h^{-1}(u).

  7. Update the bounds of integration from x[a,b]x \in [a,b] to u[h(a),h(b)]u \in [h(a),h(b)].

  8. Integrate.

Change of Density

The rule we just worked out for changing variables inside of an integral provides a general rule for updating the density of a continuous random variable after transforming the random variable.

Suppose that XX is a continuous random variable, with density function fX(x)f_X(x). Then to find the probability that XX lands in an interval, we would evaluate an integral over the density:

Pr(X[a,b])=x=abfX(x)dx.\text{Pr}(X \in [a,b]) = \int_{x = a}^b f_X(x) dx.

Similarly, to find the CDF we would integrate:

FX(x)=Pr(X(,x])=s=xfX(s)ds.F_X(x) = \text{Pr}(X \in (-\infty,x]) = \int_{s = -\infty}^x f_X(s) ds.

Suppose now, that Y=h(X)Y = h(X) for some monotonically increasing, differentiable function hh. What is the density of YY, fY(y)f_Y(y)?

Well, just like XX, the chance that YY lands in an interval is related to its density by an integral:

Pr(Y[c,d])=x=cdfY(y)dy.\text{Pr}(Y \in [c,d]) = \int_{x = c}^d f_Y(y) dy.

The CDF of YY is also related to its density by an integral:

FY(y)=Pr(Y(,y])=s=yfY(s)ds.F_Y(y) = \text{Pr}(Y \in (-\infty,y]) = \int_{s = -\infty}^y f_Y(s) ds.

There are now two ways to find the density of YY:

  1. Start from integrals involving xx. Use integration by change of variables to replace xx with y=h(x)y = h(x).

  2. Start from integrals involving yy. Try to re-express them in terms of xx. Then, match sides to solve for the density of yy. Check that integrating over yy gives the same answer as integrating over xx.

We will adopt the second approach.

Consider the CDF of YY. As always, if we know the CDF, then we can recover the density, and chances on intervals. So, if we can work out the CDF of YY, then we have recovered it’s distribution.

In particular, if we know the CDF of YY, then we can find its density since:

fY(y)=ddyFY(y).f_Y(y) = \frac{d}{dy} F_Y(y).

The CDF of YY is:

FY(y)=Pr(Yy)=Pr(h(X)y).F_Y(y) = \text{Pr}(Y \leq y) = \text{Pr}(h(X) \leq y).

Recall that, if h(x)h(x) is monotonically increasing, then h(x)yh(x) \leq y if and only if xh1(y)x \leq h^{-1}(y). Therefore:

FY(y)=Pr(Xh1(y))=FX(h1(y)).F_Y(y) = \text{Pr}(X \leq h^{-1}(y)) = F_X(h^{-1}(y)).

So, recalling the product rule, and that ddxFX(x)=fX(x)\frac{d}{dx} F_X(x) = f_X(x),

fY(y)=ddyFX(h1(y))=fX(h1(y))ddyh1(y).f_Y(y) = \frac{d}{dy} F_X(h^{-1}(y)) = f_X(h^{-1}(y)) \frac{d}{dy} h^{-1}(y).

So:

We’ve already studied a special case.

Suppose that h(x)h(x) is linear, h(x)=σx+sh(x) = \sigma x + s for some σ>0\sigma > 0. Then h(x)=σh'(x) = \sigma, so, by the change of density formula:

fY(y)fX(h1(y))=fX(xsσ).f_Y(y) \propto f_X(h^{-1}(y)) = f_X\left( \frac{x - s}{\sigma} \right).

It follows that Y=h(X)Y = h(X) has density proportional to the density of XX translated by ss and dilated by σ\sigma. If we dilate a distribution by a factor of σ\sigma, then we must divide its height by σ\sigma so that it integrates to one. As before, if we double the width of a rectangle, we have to half its height to keep its area fixed.

By that logic, we should have:

fY(y)=1σfX(xsσ)f_Y(y) = \frac{1}{\sigma} f_X\left( \frac{x - s}{\sigma} \right)

Using the exact change of density formula gives the same result since h(x)=σh'(x) = \sigma for all xx:

fY(y)=fX(h1(y))1h(h1(y))=fX(xsσ)1σ.f_Y(y) = f_X(h^{-1}(y)) \frac{1}{|h'(h^{-1}(y))|} = f_X\left( \frac{x - s}{\sigma} \right) \frac{1}{|\sigma|} .

This is a sensible rule. Replacing XX with 2X2X doubles the distance between any sampled values of XX, so should halve its density.

Let’s check that the change of density formula actually returns the correct density for YY. To confirm, we’ll use integration by substitution. We will check the case when hh is monotonically increasing. To check the general case, break the range of xx into segments where hh is monotonic then work one segment at a time.

As long as hh is increasing:

Pr(X[a,b])=Pr(Y[h(a),h(b)]).\text{Pr}(X \in [a,b]) = \text{Pr}(Y \in [h(a),h(b)]).

So, integrating over the density of YY:

Pr(Y[h(a),h(b)])=y=h(a)h(b)fY(y)dy.\text{Pr}(Y \in [h(a),h(b)]) = \int_{y = h(a)}^h(b) f_Y(y) dy.

Using the change of density formula:

y=h(a)h(b)fY(y)dy=y=h(a)h(b)fX(h1(y))[ddyh1(y)]dy.\int_{y = h(a)}^{h(b)} f_Y(y) dy = \int_{y = h(a)}^{h(b)} f_X(h^{-1}(y)) \left[\frac{d}{dy} h^{-1}(y) \right] dy.

That looks like a mess, but we have all the parts we need to change of variables.

Let x=h1(y)x = h^{-1}(y). Then:

  1. fX(h1(y))=fX(x)f_X(h^{-1}(y)) = f_X(x),

  2. dx=ddyh1(y)dydx = \frac{d}{dy} h^{-1}(y) dy

  3. h1(h(a))=ah^{-1}(h(a)) = a and h1(h(b))=bh^{-1}(h(b)) = b.

So:

Pr(X[a,b])=y=h(a)h(b)fX(h1(y))[ddyh1(y)]dy=x=abfX(x)dx=Pr(X[a,b]).\begin{aligned} \text{Pr}(X \in [a,b]) & = \int_{y = h(a)}^{h(b)} f_X(h^{-1}(y)) \left[\frac{d}{dy} h^{-1}(y) \right] dy \\ & = \int_{x = a}^b f_X(x) dx = \text{Pr}(X \in [a,b]). \end{aligned}

To visualize this change of density, open the code cell below. Set the X distribution to “Beta” and use parameters α=β=3\alpha = \beta = 3. Set g(x)g(x) to “Quadratic” and use coefficients a=1a = 1 and b=0b = 0.

Set the number of samples to 1,000 and click “Draw Samples.” Then click “Transform” to push the same set of samples through the function g(x)g(x). Finally, click “Show Density.”

from utils import show_change_of_density

show_change_of_density()

Notice that the histogram for XX is symmetric about X=0.5X = 0.5, while the histogram for YY is skewed so that its mode is closer to Y=0Y = 0. Why?

Think about how the slope of g(x)g(x) affects the spacing between samples. When g(x)g(x) has a shallow slope, then distant samples are mapped near each other, so the density of YY increases. In contrast, where the slope is steep, nearby samples get spread apart, so YY is less dense. This explains the 1/slope of transform1/|\text{slope of transform}| term in the change of density formula.

Try varying the transform and distribution for XX. You’ll see that, where the transform increases slowly, YY is denser, and where it increases quickly, YY is less dense.

Here are some more examples. You can use the same demonstration to test each case.