7.2 Integrals of Compositions
In Section 3.2 we introduced the idea of a function composition.
We’ve seen a variety of distributions that involve a composition, most importantly, the normal distribution, whose PDF has the form:
PDF ( x ) ∝ e − 1 2 x 2 . \text{PDF}(x) \propto e^{-\frac{1}{2} x^2}. PDF ( x ) ∝ e − 2 1 x 2 . The simplest function compositions use a linear function for h h h :
f ( x ) = g ( h ( x ) ) = g ( ( x − s ) / σ ) , h ( x ) = ( x − s ) / σ f(x) = g(h(x)) = g((x - s)/\sigma), \quad h(x) = (x - s)/\sigma f ( x ) = g ( h ( x )) = g (( x − s ) / σ ) , h ( x ) = ( x − s ) / σ These compositions translate the function by s s s , and dilates it by σ \sigma σ (see Section 3.2 ). Many distribution families are defined by first picking some family of functions g g g , then allowing arbitrary linear h h h . In this case the parameters s s s and σ \sigma σ of h h h act as location and scale parameters that can be adjusted to translate and dilate the base model, g g g .
Integration by Substitution (Change of Variables) ¶ Consider an integral of a function composition:
∫ x = a b g ( h ( x ) ) d x . \int_{x = a}^b g(h(x)) dx. ∫ x = a b g ( h ( x )) d x . In Section 7.1 we worked out rules for integrating sums and products of functions. What about nested functions?
The procedure used to integrate a function composition is often called “u u u ” substitution after the convention u = h ( x ) u = h(x) u = h ( x ) . There is no reason to call the output of the inner function u u u . So, we won’t call this rule “u u u ” substitution. Instead, we’ll name it based on the action that justifies the rule: change of variables
Suppose that f ( x ) = g ∘ h ( x ) = g ( h ( x ) ) f(x) = g \circ h(x) = g(h(x)) f ( x ) = g ∘ h ( x ) = g ( h ( x )) and that h ( x ) h(x) h ( x ) is a monotonic function.
Let u = h ( x ) u = h(x) u = h ( x ) and d u ( x ) = h ′ ( x ) d x du(x) = h'(x) dx d u ( x ) = h ′ ( x ) d x . Then:
∫ g ( h ( x ) ) h ′ ( x ) d x = ∫ g ( u ) d u . \int g(h(x)) h'(x) dx = \int g(u) du. ∫ g ( h ( x )) h ′ ( x ) d x = ∫ g ( u ) d u . Let G G G denote the indefinite integral of g g g . Then:
∫ a b g ( h ( x ) ) h ′ ( x ) d x = G ( u ) ∣ h ( a ) h ( b ) = G ( h ( b ) ) − G ( h ( a ) ) . \int_{a}^b g(h(x)) h'(x) dx = G(u) \Big|_{h(a)}^{h(b)} = G(h(b)) - G(h(a)). ∫ a b g ( h ( x )) h ′ ( x ) d x = G ( u ) ∣ ∣ h ( a ) h ( b ) = G ( h ( b )) − G ( h ( a )) . This procedure is equivalent to changing variables by replacing x x x with u = h ( x ) u = h(x) u = h ( x ) .
The proof follows from the fundamental theorem of calculus and the chain rule for derivatives.
First, let F ( x ) = G ∘ h ( x ) = G ( h ( x ) ) F(x) = G \circ h(x) = G(h(x)) F ( x ) = G ∘ h ( x ) = G ( h ( x )) where G G G is the antiderivative of g g g , d d x G ( x ) = G ′ ( x ) = g ( x ) \frac{d}{dx} G(x) = G'(x) = g(x) d x d G ( x ) = G ′ ( x ) = g ( x ) . Then, by the product rule:
F ′ ( x ) = d d x F ( x ) = G ′ ( h ( x ) ) h ′ ( x ) = g ( h ( x ) ) h ′ ( x ) . F'(x) = \frac{d}{dx} F(x) = G'(h(x)) h'(x) = g(h(x)) h'(x). F ′ ( x ) = d x d F ( x ) = G ′ ( h ( x )) h ′ ( x ) = g ( h ( x )) h ′ ( x ) . So, by the Fundamental Theorem of Calculus:
F ( x ) = ∫ F ′ ( x ) d x = ∫ g ( h ( x ) ) h ′ ( x ) d x . F(x) = \int F'(x) dx = \int g(h(x)) h'(x) dx. F ( x ) = ∫ F ′ ( x ) d x = ∫ g ( h ( x )) h ′ ( x ) d x . Then:
G ( h ( b ) ) − G ( h ( a ) ) = F ( b ) − F ( a ) = ∫ x = a b F ′ ( x ) d x = ∫ x = a b g ( h ( x ) ) h ′ ( x ) d x . G(h(b)) - G(h(a)) = F(b) - F(a) = \int_{x = a}^b F'(x) dx = \int_{x = a}^b g(h(x)) h'(x) dx. G ( h ( b )) − G ( h ( a )) = F ( b ) − F ( a ) = ∫ x = a b F ′ ( x ) d x = ∫ x = a b g ( h ( x )) h ′ ( x ) d x . Suppose that X ∼ Normal ( 0 , 1 ) X \sim \text{Normal}(0,1) X ∼ Normal ( 0 , 1 ) . Then, since X X X is a standard normal random variable, E [ X ] = 0 \mathbb{E}[X] = 0 E [ X ] = 0 and SD [ X ] = 1 \text{SD}[X] = 1 SD [ X ] = 1 . Let’s find the mean absolute deviation in X X X (see Section 4.3 ).
First, write down the expectation as a weighted average:
MAD [ X ] = E [ ∣ X − E [ X ] ∣ ] = E [ ∣ X − 0 ∣ ] = ∫ x = − ∞ ∞ ∣ x ∣ PDF ( x ) d x = 1 2 π ∫ x = − ∞ ∞ ∣ x ∣ e − 1 2 x 2 d x . \begin{aligned} \text{MAD}[X] & = \mathbb{E}[|X - \mathbb{E}[X]|] = \mathbb{E}[|X - 0|] \\
& = \int_{x = -\infty}^{\infty} |x| \text{PDF}(x) dx = \frac{1}{\sqrt{2 \pi}} \int_{x = -\infty}^{\infty} |x| e^{-\frac{1}{2} x^2} dx. \end{aligned} MAD [ X ] = E [ ∣ X − E [ X ] ∣ ] = E [ ∣ X − 0∣ ] = ∫ x = − ∞ ∞ ∣ x ∣ PDF ( x ) d x = 2 π 1 ∫ x = − ∞ ∞ ∣ x ∣ e − 2 1 x 2 d x . Next, let’s think a bit about symmetries. The normal density PDF ( x ) ∝ e − 1 2 x 2 \text{PDF}(x) \propto e^{-\frac{1}{2} x^2} PDF ( x ) ∝ e − 2 1 x 2 is an even function. The function ∣ x ∣ |x| ∣ x ∣ is also even since ∣ − x ∣ = ∣ x ∣ |-x| = |x| ∣ − x ∣ = ∣ x ∣ . The product of two even functions is an even function. Therefore:
MAD [ X ] = 2 2 π ∫ x = 0 ∞ x e − 1 2 x 2 d x . \text{MAD}[X] = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x e^{-\frac{1}{2} x^2} dx. MAD [ X ] = 2 π 2 ∫ x = 0 ∞ x e − 2 1 x 2 d x . Notice that e − 1 2 x 2 e^{-\frac{1}{2} x^2} e − 2 1 x 2 is a composition of the functions g ( x ) = e − x g(x) = e^{-x} g ( x ) = e − x and the quadratic function h ( x ) = 1 2 x 2 h(x) = \frac{1}{2} x^2 h ( x ) = 2 1 x 2 .
So, let’s try integrating by changing variables.
Let u = h ( x ) = 1 2 x 2 u = h(x) = \frac{1}{2} x^2 u = h ( x ) = 2 1 x 2 . Then d u = h ′ ( x ) d x = 2 2 x d x = x d x du = h'(x) dx = \frac{2}{2} x dx = x dx d u = h ′ ( x ) d x = 2 2 x d x = x d x . So:
MAD [ X ] = 2 2 π ∫ x = 0 ∞ x e − 1 2 x 2 d x = 2 2 π ∫ x = 0 ∞ e − 1 2 x 2 x d x = 2 2 π ∫ x = 0 ∞ g ( h ( x ) ) h ′ ( x ) d x = 2 2 π ∫ u = h ( 0 ) h ( ∞ ) g ( u ) d u = 2 2 π ∫ u = 0 ∞ e − u d u . \begin{aligned} \text{MAD}[X] & = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x e^{-\frac{1}{2} x^2} dx \\
& = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} e^{-\frac{1}{2} x^2} x dx \\
& = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} g(h(x)) h'(x) dx \\
& = \frac{2}{\sqrt{2 \pi}} \int_{u = h(0)}^{h(\infty)} g(u) du \\
& = \frac{2}{\sqrt{2 \pi}} \int_{u = 0}^{\infty} e^{-u} du. \end{aligned} MAD [ X ] = 2 π 2 ∫ x = 0 ∞ x e − 2 1 x 2 d x = 2 π 2 ∫ x = 0 ∞ e − 2 1 x 2 x d x = 2 π 2 ∫ x = 0 ∞ g ( h ( x )) h ′ ( x ) d x = 2 π 2 ∫ u = h ( 0 ) h ( ∞ ) g ( u ) d u = 2 π 2 ∫ u = 0 ∞ e − u d u . Pay careful attention to the bounds of integration. In the end, we still integrate from 0 to infinity since h ( 0 ) = 1 2 0 2 = 0 h(0) = \frac{1}{2} 0^2 = 0 h ( 0 ) = 2 1 0 2 = 0 and lim x → ∞ h ( x ) = 1 2 lim x → ∞ x 2 = ∞ \lim_{x \rightarrow \infty} h(x) = \frac{1}{2} \lim_{x \rightarrow \infty} x^2 = \infty lim x → ∞ h ( x ) = 2 1 lim x → ∞ x 2 = ∞ .
Integrating:
∫ u = 0 ∞ e − u d u = − e − u ∣ 0 ∞ = − ( 0 − 1 ) = 1. \int_{u = 0}^{\infty} e^{-u} du = -e^{-u} \Big|_{0}^{\infty} = -(0 - 1) = 1. ∫ u = 0 ∞ e − u d u = − e − u ∣ ∣ 0 ∞ = − ( 0 − 1 ) = 1. Therefore:
MAD [ X ] = 2 2 π = 2 π ≈ 2 3 ≈ 0.80. \text{MAD}[X] = \frac{2}{\sqrt{2 \pi}} = \sqrt{\frac{2}{\pi}} \approx \sqrt{\frac{2}{3}} \approx 0.80. MAD [ X ] = 2 π 2 = π 2 ≈ 3 2 ≈ 0.80. Notice that, as Jensen guarantees, MAD [ X ] ≈ 2 / 3 < 1 = 1 = SD [ X ] \text{MAD}[X] \approx \sqrt{2/3} < \sqrt{1} = 1 = \text{SD}[X] MAD [ X ] ≈ 2/3 < 1 = 1 = SD [ X ] .
Often, we want to just take an integral of a composition, e.g. ∫ x = a b g ( h ( x ) ) d x \int_{x = a}^b g(h(x)) dx ∫ x = a b g ( h ( x )) d x . In this case we are missing the h ′ ( x ) d x h'(x) dx h ′ ( x ) d x term needed to replace g ( h ( x ) ) d x g(h(x)) dx g ( h ( x )) d x with g ( u ) d u g(u) du g ( u ) d u .
So, we will add it in ourselves by multiplying the integrand with 1 = h ′ ( x ) / h ′ ( x ) 1 = h'(x)/h'(x) 1 = h ′ ( x ) / h ′ ( x ) :
g ( h ( x ) ) d x = g ( h ( x ) ) h ′ ( x ) h ′ ( x ) d x = g ( h ( x ) ) h ′ ( x ) h ′ ( x ) d x . g(h(x)) dx = g(h(x)) \frac{h'(x)}{h'(x)} dx = \frac{g(h(x))}{h'(x)} h'(x) dx. g ( h ( x )) d x = g ( h ( x )) h ′ ( x ) h ′ ( x ) d x = h ′ ( x ) g ( h ( x )) h ′ ( x ) d x . Now, to write everything in terms of u u u , we can substitute h ( x ) = u h(x) = u h ( x ) = u and h ′ ( x ) d x = d u h'(x) dx = du h ′ ( x ) d x = d u . However, we get stuck with h ′ ( x ) h'(x) h ′ ( x ) . This is still a function of x x x , not u u u . We want to write everything in terms of u u u .
If h h h is monotonic, then h h h is invertible over its range, so we can define an inverse function h − 1 h^{-1} h − 1 such that h − 1 ( h ( x ) ) = x h^{-1}(h(x)) = x h − 1 ( h ( x )) = x and h ( h − 1 ( u ) ) = u h(h^{-1}(u)) = u h ( h − 1 ( u )) = u . Then, given u = h ( x ) u = h(x) u = h ( x ) , x = h − 1 ( u ) x = h^{-1}(u) x = h − 1 ( u ) . For example, if h ( x ) = e x h(x) = e^x h ( x ) = e x , then h − 1 ( x ) = log ( x ) h^{-1}(x) = \log(x) h − 1 ( x ) = log ( x ) . Then we can replace h ′ ( x ) h'(x) h ′ ( x ) with h ′ ( h − 1 ( u ) ) h'(h^{-1}(u)) h ′ ( h − 1 ( u )) .
Now:
g ( h ( x ) ) d x = g ( h ( x ) ) h ′ ( x ) h ′ ( x ) d x = g ( u ) h ′ ( h − 1 ( u ) ) d u . g(h(x)) dx = \frac{g(h(x))}{h'(x)} h'(x) dx = \frac{g(u)}{h'(h^{-1}(u))} du. g ( h ( x )) d x = h ′ ( x ) g ( h ( x )) h ′ ( x ) d x = h ′ ( h − 1 ( u )) g ( u ) d u . Integrating returns the standard form for integration by change of variables:
∫ g ( h ( x ) ) d x = ∫ g ( u ) h ′ ( h − 1 ( u ) ) d u . \int g(h(x)) dx = \int \frac{g(u)}{h'(h^{-1}(u))} du. ∫ g ( h ( x )) d x = ∫ h ′ ( h − 1 ( u )) g ( u ) d u . Then:
∫ x = a b g ( h ( x ) ) d x = ∫ u = h ( a ) h ( b ) g ( u ) h ′ ( h − 1 ( u ) ) d u . \int_{x = a}^b g(h(x)) dx = \int_{u = h(a)}^{h(b)} \frac{g(u)}{h'(h^{-1}(u))} du. ∫ x = a b g ( h ( x )) d x = ∫ u = h ( a ) h ( b ) h ′ ( h − 1 ( u )) g ( u ) d u . Let g ~ ( u ) = g ( u ) / h ′ ( h − 1 ( u ) ) \tilde{g}(u) = g(u)/h'(h^{-1}(u)) g ~ ( u ) = g ( u ) / h ′ ( h − 1 ( u )) . Then let G ~ \tilde{G} G ~ represent the anti-derivative (indefinite integral) of g ~ \tilde{g} g ~ . Then:
∫ x = a b g ( h ( x ) ) d x = ∫ u = h ( a ) h ( b ) g ~ ( u ) d u = G ~ ( h ( b ) ) − G ~ ( h ( a ) ) . \int_{x = a}^b g(h(x)) dx = \int_{u = h(a)}^{h(b)} \tilde{g}(u) du = \tilde{G}(h(b)) - \tilde{G}(h(a)). ∫ x = a b g ( h ( x )) d x = ∫ u = h ( a ) h ( b ) g ~ ( u ) d u = G ~ ( h ( b )) − G ~ ( h ( a )) . Suppose that f ( x ) = g ∘ h ( x ) = g ( h ( x ) ) f(x) = g \circ h(x) = g(h(x)) f ( x ) = g ∘ h ( x ) = g ( h ( x )) and that h ( x ) h(x) h ( x ) is a monotonic function.
Let u = h ( x ) u = h(x) u = h ( x ) and d u ( x ) = h ′ ( x ) d x du(x) = h'(x) dx d u ( x ) = h ′ ( x ) d x . Let h − 1 h^{-1} h − 1 denote the inverse function that recovers x x x from u u u , x = h − 1 ( u ) x = h^{-1}(u) x = h − 1 ( u ) .
Then:
∫ g ( h ( x ) ) d x = ∫ g ( u ) h ′ ( h − 1 ( u ) ) d u = ∫ g ~ ( u ) d u . \int g(h(x)) dx = \int \frac{g(u)}{h'(h^{-1}(u))} du = \int \tilde{g}(u) du. ∫ g ( h ( x )) d x = ∫ h ′ ( h − 1 ( u )) g ( u ) d u = ∫ g ~ ( u ) d u . Let G ~ \tilde{G} G ~ denote the indefinite integral of g ~ ( x ) = g ( u ) / h ′ ( h − 1 ( u ) ) \tilde{g}(x) = g(u)/h'(h^{-1}(u)) g ~ ( x ) = g ( u ) / h ′ ( h − 1 ( u )) . Then:
∫ a b g ( h ( x ) ) d x = G ~ ( u ) ∣ h ( a ) h ( b ) = G ~ ( h ( b ) ) − G ~ ( h ( a ) ) . \int_{a}^b g(h(x)) dx = \tilde{G}(u) \Big|_{h(a)}^{h(b)} = \tilde{G}(h(b)) - \tilde{G}(h(a)). ∫ a b g ( h ( x )) d x = G ~ ( u ) ∣ ∣ h ( a ) h ( b ) = G ~ ( h ( b )) − G ~ ( h ( a )) . This form is a bit messy as a single formula. It’s easier to remember as a procedure:
Identify a term in the integrand we would like to replace (e.g. a function of x x x , h ( x ) h(x) h ( x ) , that we will use to define a new variable, u u u ).
Replace h ( x ) h(x) h ( x ) with u u u everywhere inside the integral.
Find h ′ ( x ) = d d x h ( x ) h'(x) = \frac{d}{dx} h(x) h ′ ( x ) = d x d h ( x ) .
Replace d x dx d x with d u / h ′ ( x ) du/h'(x) d u / h ′ ( x ) inside the integral.
Solve for x x x in terms of u u u , x = h − 1 ( u ) x = h^{-1}(u) x = h − 1 ( u ) .
Replace all remaining x x x ’s inside the integral with h − 1 ( u ) h^{-1}(u) h − 1 ( u ) .
Update the bounds of integration from x ∈ [ a , b ] x \in [a,b] x ∈ [ a , b ] to u ∈ [ h ( a ) , h ( b ) ] u \in [h(a),h(b)] u ∈ [ h ( a ) , h ( b )] .
Integrate.
Suppose that X ∼ Normal ( 0 , 1 ) X \sim \text{Normal}(0,1) X ∼ Normal ( 0 , 1 ) . Find E [ ∣ X ∣ 3 ] \mathbb{E}[|X|^3] E [ ∣ X ∣ 3 ] .
As before, start by writing the expectation as a weighted average:
E [ ∣ X ∣ 3 ] = 1 2 π ∫ x = − ∞ ∞ ∣ x ∣ 3 e − 1 2 x 2 d x . \mathbb{E}[|X|^3] = \frac{1}{\sqrt{2 \pi}} \int_{x = -\infty}^{\infty} |x|^3 e^{-\frac{1}{2} x^2} dx. E [ ∣ X ∣ 3 ] = 2 π 1 ∫ x = − ∞ ∞ ∣ x ∣ 3 e − 2 1 x 2 d x . Then, since ∣ x ∣ 3 |x|^3 ∣ x ∣ 3 is the composition of an odd function, x 3 x^3 x 3 , with an even function, ∣ x ∣ |x| ∣ x ∣ , the function ∣ x ∣ 3 |x|^3 ∣ x ∣ 3 is even. The normal density is also even, so ∣ x ∣ 3 e − 1 2 x 2 |x|^3 e^{-\frac{1}{2} x^2} ∣ x ∣ 3 e − 2 1 x 2 is an even function. So, using the same symmetry argument we used before:
E [ ∣ X ∣ 3 ] = 2 2 π ∫ x = 0 ∞ x 3 e − 1 2 x 2 d x . \mathbb{E}[|X|^3] = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x^3 e^{-\frac{1}{2} x^2} dx. E [ ∣ X ∣ 3 ] = 2 π 2 ∫ x = 0 ∞ x 3 e − 2 1 x 2 d x . We’d like to set u = h ( x ) = 1 2 x 2 u = h(x) = \frac{1}{2} x^2 u = h ( x ) = 2 1 x 2 and g ( x ) = e − x g(x) = e^{-x} g ( x ) = e − x . However, this time the term outside g ( h ( x ) ) g(h(x)) g ( h ( x )) is x 3 ≠ h ′ ( x ) = d d x 1 2 x 2 = x x^3 \neq h'(x) = \frac{d}{dx} \frac{1}{2} x^2 = x x 3 = h ′ ( x ) = d x d 2 1 x 2 = x . So we need to use the second version of integration by change of variables.
u = h ( x ) = 1 2 x 2 u = h(x) = \frac{1}{2} x^2 u = h ( x ) = 2 1 x 2 .
Update the integral:
2 2 π ∫ x = 0 ∞ x 3 e − u d x \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x^3 e^{-u} dx 2 π 2 ∫ x = 0 ∞ x 3 e − u d x h ′ ( x ) = d d x 1 2 x 2 = x h'(x) = \frac{d}{dx} \frac{1}{2} x^2 = x h ′ ( x ) = d x d 2 1 x 2 = x . So, d u = x d x du = x dx d u = x d x .
Replace d x dx d x with d u / h ′ ( x ) du/h'(x) d u / h ′ ( x ) inside the integral:
2 2 π ∫ x = 0 ∞ x 3 e − u x d x x = 2 2 π ∫ x = 0 ∞ x 3 e − u d u x \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x^3 e^{-u} \frac{x dx}{x} = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x^3 e^{-u} \frac{du}{x} 2 π 2 ∫ x = 0 ∞ x 3 e − u x x d x = 2 π 2 ∫ x = 0 ∞ x 3 e − u x d u Simplifying:
2 2 π ∫ x = 0 ∞ x 2 e − u d u \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} x^2 e^{-u} du 2 π 2 ∫ x = 0 ∞ x 2 e − u d u Solve for x x x in terms of u u u .
u = h ( x ) = 1 2 x 2 ⇒ x = ± 2 u u = h(x) = \frac{1}{2} x^2 \Rightarrow x = \pm \sqrt{2 u} u = h ( x ) = 2 1 x 2 ⇒ x = ± 2 u Since we are only integrating over positive x x x , we can assume that x = 2 u x = \sqrt{2 u} x = 2 u so h − 1 ( u ) = 2 u h^{-1}(u) = \sqrt{2 u} h − 1 ( u ) = 2 u .
Replace all remaining x x x ’s inside the integral with h − 1 ( u ) h^{-1}(u) h − 1 ( u ) .
2 2 π ∫ x = 0 ∞ ( 2 u ) 2 e u d u = 2 2 π ∫ x = 0 ∞ 2 u e − u d u \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} (\sqrt{2 u})^2 e^{u} du = \frac{2}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} 2 u e^{-u} du 2 π 2 ∫ x = 0 ∞ ( 2 u ) 2 e u d u = 2 π 2 ∫ x = 0 ∞ 2 u e − u d u Simplifying again:
4 2 π ∫ x = 0 ∞ u e − u d u . \frac{4}{\sqrt{2 \pi}} \int_{x = 0}^{\infty} u e^{-u} du. 2 π 4 ∫ x = 0 ∞ u e − u d u . Update the bounds of integration from x ∈ [ a , b ] x \in [a,b] x ∈ [ a , b ] to u ∈ [ h ( a ) , h ( b ) ] = [ 1 2 0 2 , 1 2 ∞ 2 ] = [ 0 , ∞ ] u \in [h(a),h(b)] = [\frac{1}{2} 0^2, \frac{1}{2} \infty^2] = [0,\infty] u ∈ [ h ( a ) , h ( b )] = [ 2 1 0 2 , 2 1 ∞ 2 ] = [ 0 , ∞ ] .
4 2 π ∫ u = 0 ∞ u e − u d u . \frac{4}{\sqrt{2 \pi}} \int_{u = 0}^{\infty} u e^{-u} du. 2 π 4 ∫ u = 0 ∞ u e − u d u . Integrate. Notice that the integrand is a product, u × e − u u \times e^{-u} u × e − u . So, we’ll use integration by parts (see Section 7.1 ):
∫ u = 0 ∞ u e − u d u = − u e − u ∣ 0 ∞ − ∫ u = 0 ∞ ( − e − u ) d u = ( 0 − 0 ) + ∫ u = 0 ∞ e − u d u = − e − u ∣ 0 1 = − ( 0 − 1 ) = 1. \begin{aligned} \int_{u = 0}^{\infty} u e^{-u} du & = -ue^{-u} |_{0}^{\infty} - \int_{u = 0}^{\infty}(-e^{-u}) du \\
& = (0 - 0) + \int_{u=0}^{\infty} e^{-u} du \\
& = -e^{-u} |_{0}^1 = -(0-1) = 1. \end{aligned} ∫ u = 0 ∞ u e − u d u = − u e − u ∣ 0 ∞ − ∫ u = 0 ∞ ( − e − u ) d u = ( 0 − 0 ) + ∫ u = 0 ∞ e − u d u = − e − u ∣ 0 1 = − ( 0 − 1 ) = 1. Therefore:
E [ ∣ X ∣ 3 ] = 4 2 π = 2 2 π . \mathbb{E}[|X|^3] = \frac{4}{\sqrt{2 \pi}} = 2 \sqrt{\frac{2}{\pi}}. E [ ∣ X ∣ 3 ] = 2 π 4 = 2 π 2 . Change of Density ¶ The rule we just worked out for changing variables inside of an integral provides a general rule for updating the density of a continuous random variable after transforming the random variable.
Suppose that X X X is a continuous random variable, with density function f X ( x ) f_X(x) f X ( x ) . Then to find the probability that X X X lands in an interval, we would evaluate an integral over the density:
Pr ( X ∈ [ a , b ] ) = ∫ x = a b f X ( x ) d x . \text{Pr}(X \in [a,b]) = \int_{x = a}^b f_X(x) dx. Pr ( X ∈ [ a , b ]) = ∫ x = a b f X ( x ) d x . Similarly, to find the CDF we would integrate:
F X ( x ) = Pr ( X ∈ ( − ∞ , x ] ) = ∫ s = − ∞ x f X ( s ) d s . F_X(x) = \text{Pr}(X \in (-\infty,x]) = \int_{s = -\infty}^x f_X(s) ds. F X ( x ) = Pr ( X ∈ ( − ∞ , x ]) = ∫ s = − ∞ x f X ( s ) d s . Suppose now, that Y = h ( X ) Y = h(X) Y = h ( X ) for some monotonically increasing, differentiable function h h h . What is the density of Y Y Y , f Y ( y ) f_Y(y) f Y ( y ) ?
Well, just like X X X , the chance that Y Y Y lands in an interval is related to its density by an integral:
Pr ( Y ∈ [ c , d ] ) = ∫ x = c d f Y ( y ) d y . \text{Pr}(Y \in [c,d]) = \int_{x = c}^d f_Y(y) dy. Pr ( Y ∈ [ c , d ]) = ∫ x = c d f Y ( y ) d y . The CDF of Y Y Y is also related to its density by an integral:
F Y ( y ) = Pr ( Y ∈ ( − ∞ , y ] ) = ∫ s = − ∞ y f Y ( s ) d s . F_Y(y) = \text{Pr}(Y \in (-\infty,y]) = \int_{s = -\infty}^y f_Y(s) ds. F Y ( y ) = Pr ( Y ∈ ( − ∞ , y ]) = ∫ s = − ∞ y f Y ( s ) d s . There are now two ways to find the density of Y Y Y :
Start from integrals involving x x x . Use integration by change of variables to replace x x x with y = h ( x ) y = h(x) y = h ( x ) .
Start from integrals involving y y y . Try to re-express them in terms of x x x . Then, match sides to solve for the density of y y y . Check that integrating over y y y gives the same answer as integrating over x x x .
We will adopt the second approach.
Consider the CDF of Y Y Y . As always, if we know the CDF, then we can recover the density, and chances on intervals. So, if we can work out the CDF of Y Y Y , then we have recovered it’s distribution.
In particular, if we know the CDF of Y Y Y , then we can find its density since:
f Y ( y ) = d d y F Y ( y ) . f_Y(y) = \frac{d}{dy} F_Y(y). f Y ( y ) = d y d F Y ( y ) . The CDF of Y Y Y is:
F Y ( y ) = Pr ( Y ≤ y ) = Pr ( h ( X ) ≤ y ) . F_Y(y) = \text{Pr}(Y \leq y) = \text{Pr}(h(X) \leq y). F Y ( y ) = Pr ( Y ≤ y ) = Pr ( h ( X ) ≤ y ) . Recall that, if h ( x ) h(x) h ( x ) is monotonically increasing, then h ( x ) ≤ y h(x) \leq y h ( x ) ≤ y if and only if x ≤ h − 1 ( y ) x \leq h^{-1}(y) x ≤ h − 1 ( y ) . Therefore:
F Y ( y ) = Pr ( X ≤ h − 1 ( y ) ) = F X ( h − 1 ( y ) ) . F_Y(y) = \text{Pr}(X \leq h^{-1}(y)) = F_X(h^{-1}(y)). F Y ( y ) = Pr ( X ≤ h − 1 ( y )) = F X ( h − 1 ( y )) . So, recalling the product rule, and that d d x F X ( x ) = f X ( x ) \frac{d}{dx} F_X(x) = f_X(x) d x d F X ( x ) = f X ( x ) ,
f Y ( y ) = d d y F X ( h − 1 ( y ) ) = f X ( h − 1 ( y ) ) d d y h − 1 ( y ) . f_Y(y) = \frac{d}{dy} F_X(h^{-1}(y)) = f_X(h^{-1}(y)) \frac{d}{dy} h^{-1}(y). f Y ( y ) = d y d F X ( h − 1 ( y )) = f X ( h − 1 ( y )) d y d h − 1 ( y ) . So:
If X X X is a continuous random variable with density f X f_X f X , and Y = h ( X ) Y = h(X) Y = h ( X ) for some differentiable, monotonically increasing function h h h , then:
f Y ( y ) = f X ( h − 1 ( y ) ) d d y h − 1 ( y ) . f_Y(y) = f_X(h^{-1}(y)) \frac{d}{dy} h^{-1}(y). f Y ( y ) = f X ( h − 1 ( y )) d y d h − 1 ( y ) . It is often helpful to write this result in terms of x x x . Let h ′ ( x ) = d d x h ( x ) h'(x) = \frac{d}{dx} h(x) h ′ ( x ) = d x d h ( x ) . Then,
d d y h − 1 ( y ) = 1 h ′ ( x ) at x = h − 1 ( y ) \frac{d}{dy} h^{-1}(y) = \frac{1}{h'(x)} \text{ at } x = h^{-1}(y) d y d h − 1 ( y ) = h ′ ( x ) 1 at x = h − 1 ( y ) So:
f Y ( y ) = f X ( x ) 1 h ′ ( x ) at x = h − 1 ( y ) . f_Y(y) = f_X(x) \frac{1}{h'(x)} \text{ at } x = h^{-1}(y). f Y ( y ) = f X ( x ) h ′ ( x ) 1 at x = h − 1 ( y ) . If h h h is differentiable and invertible, but decreasing, then:
f Y ( y ) = f X ( x ) 1 ∣ h ′ ( x ) ∣ at x = h − 1 ( y ) . f_Y(y) = f_X(x) \frac{1}{|h'(x)|} \text{ at } x = h^{-1}(y). f Y ( y ) = f X ( x ) ∣ h ′ ( x ) ∣ 1 at x = h − 1 ( y ) . This formula is the change of density formula.
The absolute value enters since densities are nonnegative and the 1 / h ′ ( x ) d x 1/h'(x) dx 1/ h ′ ( x ) d x term represents the length of an interval. Lengths can never be negative. We didn’t need the absolute value in the increasing case since, if h h h is increasing, then h ′ ( x ) > 0 h'(x) > 0 h ′ ( x ) > 0 so ∣ h ′ ( x ) ∣ = h ′ ( x ) |h'(x)| = h'(x) ∣ h ′ ( x ) ∣ = h ′ ( x ) .
Why is d d y h − 1 ( y ) \frac{d}{dy} h^{-1}(y) d y d h − 1 ( y ) equal to 1 / h ′ ( x ) 1/h'(x) 1/ h ′ ( x ) at x = h − 1 ( y ) x = h^{-1}(y) x = h − 1 ( y ) ?
Well:
h ( h − 1 ( y ) ) = y h(h^{-1}(y)) = y h ( h − 1 ( y )) = y so:
d d y h ( h − 1 ( y ) ) = d d y y = 1. \frac{d}{dy} h(h^{-1}(y)) = \frac{d}{dy} y = 1. d y d h ( h − 1 ( y )) = d y d y = 1. Now, applying the product rule:
d d y h ( h − 1 ( y ) ) = h ′ ( h − 1 ( y ) ) [ d d y h − 1 ( y ) ] . \frac{d}{dy} h(h^{-1}(y)) = h'(h^{-1}(y)) \left[ \frac{d}{dy} h^{-1}(y) \right]. d y d h ( h − 1 ( y )) = h ′ ( h − 1 ( y )) [ d y d h − 1 ( y ) ] . So, matching terms:
h ′ ( h − 1 ( y ) ) [ d d y h − 1 ( y ) ] = 1 h'(h^{-1}(y)) \left[ \frac{d}{dy} h^{-1}(y) \right] = 1 h ′ ( h − 1 ( y )) [ d y d h − 1 ( y ) ] = 1 or:
d d y h − 1 ( y ) = 1 h ′ ( h − 1 ( y ) ) . \frac{d}{dy} h^{-1}(y) = \frac{1}{h'(h^{-1}(y))}. d y d h − 1 ( y ) = h ′ ( h − 1 ( y )) 1 . Geometrically, h − 1 h^{-1} h − 1 is the reflection of h h h across the line y = x y = x y = x . So, since the slope of h h h is change in y = h ( x ) y = h(x) y = h ( x ) over change in x x x , the slope of h − 1 h^{-1} h − 1 is the change in x x x over the change in y y y , which is one over the slope of h h h .
The change of density formula can be intimidating and hard to remember. To use it, treat it as a procedure:
Write down the transformation Y = h ( X ) Y = h(X) Y = h ( X ) .
Solve for x x x in terms of y y y to get x = h − 1 ( y ) x = h^{-1}(y) x = h − 1 ( y ) .
Solve for h ′ ( x ) = d d x h ( x ) h'(x) = \frac{d}{dx} h(x) h ′ ( x ) = d x d h ( x ) .
Plug in.
If you forget how to change densities, set up the integral for the CDF instead, then apply integration by substitution to integrate over y y y instead of x x x . Proceed using the “u-substitution” rule until all x x x terms in the integral have been replaced with y y y ’s. The function inside the integral is now the density of Y Y Y .
We’ve already studied a special case.
Suppose that h ( x ) h(x) h ( x ) is linear, h ( x ) = σ x + s h(x) = \sigma x + s h ( x ) = σ x + s for some σ > 0 \sigma > 0 σ > 0 . Then h ′ ( x ) = σ h'(x) = \sigma h ′ ( x ) = σ , so, by the change of density formula:
f Y ( y ) ∝ f X ( h − 1 ( y ) ) = f X ( x − s σ ) . f_Y(y) \propto f_X(h^{-1}(y)) = f_X\left( \frac{x - s}{\sigma} \right). f Y ( y ) ∝ f X ( h − 1 ( y )) = f X ( σ x − s ) . It follows that Y = h ( X ) Y = h(X) Y = h ( X ) has density proportional to the density of X X X translated by s s s and dilated by σ \sigma σ . If we dilate a distribution by a factor of σ \sigma σ , then we must divide its height by σ \sigma σ so that it integrates to one. As before, if we double the width of a rectangle, we have to half its height to keep its area fixed.
By that logic, we should have:
f Y ( y ) = 1 σ f X ( x − s σ ) f_Y(y) = \frac{1}{\sigma} f_X\left( \frac{x - s}{\sigma} \right) f Y ( y ) = σ 1 f X ( σ x − s ) Using the exact change of density formula gives the same result since h ′ ( x ) = σ h'(x) = \sigma h ′ ( x ) = σ for all x x x :
f Y ( y ) = f X ( h − 1 ( y ) ) 1 ∣ h ′ ( h − 1 ( y ) ) ∣ = f X ( x − s σ ) 1 ∣ σ ∣ . f_Y(y) = f_X(h^{-1}(y)) \frac{1}{|h'(h^{-1}(y))|} = f_X\left( \frac{x - s}{\sigma} \right) \frac{1}{|\sigma|} . f Y ( y ) = f X ( h − 1 ( y )) ∣ h ′ ( h − 1 ( y )) ∣ 1 = f X ( σ x − s ) ∣ σ ∣ 1 . This is a sensible rule. Replacing X X X with 2 X 2X 2 X doubles the distance between any sampled values of X X X , so should halve its density.
Suppose that X ∼ Normal ( 0 , 1 ) X \sim \text{Normal}(0,1) X ∼ Normal ( 0 , 1 ) and Y = σ X + μ Y = \sigma X + \mu Y = σ X + μ . Then:
f Y ( y ) = 1 σ f X ( ( x − μ ) / σ ) = 1 σ 1 2 π e − 1 2 ( x − μ σ ) 2 f_Y(y) = \frac{1}{\sigma} f_X((x - \mu)/\sigma) = \frac{1}{\sigma} \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} \left(\frac{x - \mu}{\sigma} \right)^2} f Y ( y ) = σ 1 f X (( x − μ ) / σ ) = σ 1 2 π 1 e − 2 1 ( σ x − μ ) 2 The function on the right is the density function for a generic normal random variable or a Gaussian random variable, Y ∼ Normal ( μ , σ ) Y \sim \text{Normal}(\mu,\sigma) Y ∼ Normal ( μ , σ ) .
Let’s check that the change of density formula actually returns the correct density for Y Y Y . To confirm, we’ll use integration by substitution. We will check the case when h h h is monotonically increasing. To check the general case, break the range of x x x into segments where h h h is monotonic then work one segment at a time.
As long as h h h is increasing:
Pr ( X ∈ [ a , b ] ) = Pr ( Y ∈ [ h ( a ) , h ( b ) ] ) . \text{Pr}(X \in [a,b]) = \text{Pr}(Y \in [h(a),h(b)]). Pr ( X ∈ [ a , b ]) = Pr ( Y ∈ [ h ( a ) , h ( b )]) . So, integrating over the density of Y Y Y :
Pr ( Y ∈ [ h ( a ) , h ( b ) ] ) = ∫ y = h ( a ) h ( b ) f Y ( y ) d y . \text{Pr}(Y \in [h(a),h(b)]) = \int_{y = h(a)}^h(b) f_Y(y) dy. Pr ( Y ∈ [ h ( a ) , h ( b )]) = ∫ y = h ( a ) h ( b ) f Y ( y ) d y . Using the change of density formula:
∫ y = h ( a ) h ( b ) f Y ( y ) d y = ∫ y = h ( a ) h ( b ) f X ( h − 1 ( y ) ) [ d d y h − 1 ( y ) ] d y . \int_{y = h(a)}^{h(b)} f_Y(y) dy = \int_{y = h(a)}^{h(b)} f_X(h^{-1}(y)) \left[\frac{d}{dy} h^{-1}(y) \right] dy. ∫ y = h ( a ) h ( b ) f Y ( y ) d y = ∫ y = h ( a ) h ( b ) f X ( h − 1 ( y )) [ d y d h − 1 ( y ) ] d y . That looks like a mess, but we have all the parts we need to change of variables.
Let x = h − 1 ( y ) x = h^{-1}(y) x = h − 1 ( y ) . Then:
f X ( h − 1 ( y ) ) = f X ( x ) f_X(h^{-1}(y)) = f_X(x) f X ( h − 1 ( y )) = f X ( x ) ,
d x = d d y h − 1 ( y ) d y dx = \frac{d}{dy} h^{-1}(y) dy d x = d y d h − 1 ( y ) d y
h − 1 ( h ( a ) ) = a h^{-1}(h(a)) = a h − 1 ( h ( a )) = a and h − 1 ( h ( b ) ) = b h^{-1}(h(b)) = b h − 1 ( h ( b )) = b .
So:
Pr ( X ∈ [ a , b ] ) = ∫ y = h ( a ) h ( b ) f X ( h − 1 ( y ) ) [ d d y h − 1 ( y ) ] d y = ∫ x = a b f X ( x ) d x = Pr ( X ∈ [ a , b ] ) . \begin{aligned} \text{Pr}(X \in [a,b]) & = \int_{y = h(a)}^{h(b)} f_X(h^{-1}(y)) \left[\frac{d}{dy} h^{-1}(y) \right] dy \\
& = \int_{x = a}^b f_X(x) dx = \text{Pr}(X \in [a,b]). \end{aligned} Pr ( X ∈ [ a , b ]) = ∫ y = h ( a ) h ( b ) f X ( h − 1 ( y )) [ d y d h − 1 ( y ) ] d y = ∫ x = a b f X ( x ) d x = Pr ( X ∈ [ a , b ]) . Suppose that X ∈ [ 0 , 1 ] X \in [0,1] X ∈ [ 0 , 1 ] and:
PDF ( x ) = 30 x 2 ( 1 − x ) 2 . \text{PDF}(x) = 30 x^2 (1 - x)^2. PDF ( x ) = 30 x 2 ( 1 − x ) 2 . Let Y = h ( x ) = X 2 Y = h(x) = X^2 Y = h ( x ) = X 2 where h ( x ) = x 2 h(x) = x^2 h ( x ) = x 2 . It follows that:
h ′ ( x ) = 2 x h'(x) = 2x h ′ ( x ) = 2 x
h − 1 ( y ) = y h^{-1}(y) = \sqrt{y} h − 1 ( y ) = y
So:
f Y ( y ) = 1 ∣ h ′ ( x ) ∣ f X ( x ) = 30 2 x x 2 ( 1 − x ) 2 at x = h − 1 ( y ) = y . f_Y(y) = \frac{1}{|h'(x)|} f_X(x) = \frac{30}{2 x} x^2 (1 - x)^2 \text{ at } x = h^{-1}(y) = \sqrt{y}. f Y ( y ) = ∣ h ′ ( x ) ∣ 1 f X ( x ) = 2 x 30 x 2 ( 1 − x ) 2 at x = h − 1 ( y ) = y . Plugging in:
f Y ( y ) = 15 x ( 1 − x ) 2 = 15 y ( 1 − y ) 2 . f_Y(y) = 15 x (1 - x)^2 = 15 \sqrt{y} (1 - \sqrt{y})^2. f Y ( y ) = 15 x ( 1 − x ) 2 = 15 y ( 1 − y ) 2 . The support of Y Y Y is [ 0 , 1 ] [0,1] [ 0 , 1 ] since the smallest possible value of X 2 X^2 X 2 is zero, and the largest is one.
To visualize this change of density, open the code cell below. Set the X distribution to “Beta” and use parameters α = β = 3 \alpha = \beta = 3 α = β = 3 . Set g ( x ) g(x) g ( x ) to “Quadratic” and use coefficients a = 1 a = 1 a = 1 and b = 0 b = 0 b = 0 .
Set the number of samples to 1,000 and click “Draw Samples.” Then click “Transform” to push the same set of samples through the function g ( x ) g(x) g ( x ) . Finally, click “Show Density.”
from utils import show_change_of_density
show_change_of_density()
Notice that the histogram for X X X is symmetric about X = 0.5 X = 0.5 X = 0.5 , while the histogram for Y Y Y is skewed so that its mode is closer to Y = 0 Y = 0 Y = 0 . Why?
Think about how the slope of g ( x ) g(x) g ( x ) affects the spacing between samples. When g ( x ) g(x) g ( x ) has a shallow slope, then distant samples are mapped near each other, so the density of Y Y Y increases. In contrast, where the slope is steep, nearby samples get spread apart, so Y Y Y is less dense. This explains the 1 / ∣ slope of transform ∣ 1/|\text{slope of transform}| 1/∣ slope of transform ∣ term in the change of density formula.
Try varying the transform and distribution for X X X . You’ll see that, where the transform increases slowly, Y Y Y is denser, and where it increases quickly, Y Y Y is less dense.
Here are some more examples. You can use the same demonstration to test each case.
Suppose that X ∼ Exp ( λ ) X \sim \text{Exp}(\lambda) X ∼ Exp ( λ ) and Y = x Y = \sqrt{x} Y = x .
Let h ( x ) = x 1 / 2 h(x) = x^{1/2} h ( x ) = x 1/2 . It follows that:
h ′ ( x ) = 1 2 x − 1 / 2 h'(x) = \frac{1}{2} x^{-1/2} h ′ ( x ) = 2 1 x − 1/2
h − 1 ( y ) = y 2 h^{-1}(y) = y^2 h − 1 ( y ) = y 2
So:
f Y ( y ) = 1 ∣ h ′ ( x ) ∣ f X ( x ) = 2 x 1 / 2 λ e − λ x at x = h − 1 ( y ) = y 2 . f_Y(y) = \frac{1}{|h'(x)|} f_X(x) = 2 x^{1/2} \lambda e^{-\lambda x} \text{ at } x = h^{-1}(y) = y^2. f Y ( y ) = ∣ h ′ ( x ) ∣ 1 f X ( x ) = 2 x 1/2 λ e − λ x at x = h − 1 ( y ) = y 2 . Plugging in:
f Y ( y ) = 2 λ y e − λ y 2 . f_Y(y) = 2 \lambda y e^{-\lambda y^2}. f Y ( y ) = 2 λ y e − λ y 2 . The support of Y Y Y is [ 0 , ∞ ) [0,\infty) [ 0 , ∞ ) since the smallest possible value of X \sqrt{X} X is zero, and the largest is infinity.
Suppose that X ∼ Exponential ( 0 , 1 ) X \sim \text{Exponential}(0,1) X ∼ Exponential ( 0 , 1 ) and Y = 1 / x Y = 1/x Y = 1/ x .
Let h ( x ) = x − 1 h(x) = x^{-1} h ( x ) = x − 1 . It follows that:
∣ h ′ ( x ) ∣ = ∣ − x − 2 ∣ = x − 2 |h'(x)| = |-x^{-2}| = x^{-2} ∣ h ′ ( x ) ∣ = ∣ − x − 2 ∣ = x − 2
h − 1 ( y ) = 1 / y = y − 1 h^{-1}(y) = 1/y = y^{-1} h − 1 ( y ) = 1/ y = y − 1 .
So:
f Y ( y ) = 1 ∣ h ′ ( x ) ∣ f X ( x ) = x 2 λ e − λ x at x = h − 1 ( y ) = 1 y . f_Y(y) = \frac{1}{|h'(x)|} f_X(x) = x^2 \lambda e^{-\lambda x} \text{ at } x = h^{-1}(y) = \frac{1}{y}. f Y ( y ) = ∣ h ′ ( x ) ∣ 1 f X ( x ) = x 2 λ e − λ x at x = h − 1 ( y ) = y 1 . Plugging in:
f Y ( y ) = λ y 2 e − λ y . f_Y(y) = \frac{\lambda}{y^2} e^{-\frac{\lambda}{y}}. f Y ( y ) = y 2 λ e − y λ . The support of Y Y Y is [ 0 , ∞ ) [0,\infty) [ 0 , ∞ ) since the smallest possible value of 1 / X 1/X 1/ X is zero (X → ∞ X \rightarrow \infty X → ∞ ), and the largest is infinity (X → 0 X \rightarrow 0 X → 0 ).
Suppose that X ∼ Uniform ( 0 , 1 ) X \sim \text{Uniform}(0,1) X ∼ Uniform ( 0 , 1 ) and Y = − log ( X ) Y = -\log(X) Y = − log ( X ) .
Let h ( x ) = − log ( x ) h(x) = -\log(x) h ( x ) = − log ( x ) . It follows that:
∣ h ′ ( x ) ∣ = ∣ − x − 1 ∣ = x − 1 |h'(x)| = |-x^{-1}| = x^{-1} ∣ h ′ ( x ) ∣ = ∣ − x − 1 ∣ = x − 1
h − 1 ( y ) = e − y h^{-1}(y) = e^{-y} h − 1 ( y ) = e − y .
So:
f Y ( y ) = 1 ∣ h ′ ( x ) ∣ f X ( x ) = x × 1 = x at x = h − 1 ( y ) = e − y . f_Y(y) = \frac{1}{|h'(x)|} f_X(x) = x \times 1 = x \text{ at } x = h^{-1}(y) = e^{-y}. f Y ( y ) = ∣ h ′ ( x ) ∣ 1 f X ( x ) = x × 1 = x at x = h − 1 ( y ) = e − y . Plugging in:
f Y ( y ) = e − y . f_Y(y) = e^{-y}. f Y ( y ) = e − y . The support of Y Y Y is [ 0 , ∞ ) [0,\infty) [ 0 , ∞ ) since the smallest possible value of − log ( X ) -\log(X) − log ( X ) is zero (X = 1 X = 1 X = 1 ), and the largest is infinity (X → 0 X \rightarrow 0 X → 0 ).
So, Y ∼ Exponential ( 1 ) Y \sim \text{Exponential}(1) Y ∼ Exponential ( 1 ) .
Suppose that X ∼ Normal ( 0 , 1 ) X \sim \text{Normal}(0,1) X ∼ Normal ( 0 , 1 ) and Y = X 2 Y = X^2 Y = X 2 .
Let h ( x ) = x 2 h(x) = x^2 h ( x ) = x 2 . Now we have a problem since h h h is not monotonic. However, since the density of X X X is an even function, and h h h is an even function, the density of Y Y Y would not change had we conditioned on X > 0 X > 0 X > 0 . Restricted to positive x x x , x 2 x^2 x 2 is monotonic. So, we can use our familiar rule, replacing f X ( x ) f_{X}(x) f X ( x ) with f X ∣ X > 0 ( x ) f_{X|X > 0}(x) f X ∣ X > 0 ( x ) .
Then:
∣ h ′ ( x ) ∣ = ∣ 2 x ∣ = 2 x |h'(x)| = |2x| = 2x ∣ h ′ ( x ) ∣ = ∣2 x ∣ = 2 x
h − 1 ( y ) = y h^{-1}(y) = \sqrt{y} h − 1 ( y ) = y .
So:
f Y ( y ) = 1 ∣ h ′ ( x ) ∣ f X ∣ X > 0 ( x ) = 1 2 x × 2 2 π e − 1 2 x 2 at x = h − 1 ( y ) = y . f_Y(y) = \frac{1}{|h'(x)|} f_{X|X > 0}(x) = \frac{1}{2 x} \times \frac{2}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^2} \text{ at } x = h^{-1}(y) = \sqrt{y}. f Y ( y ) = ∣ h ′ ( x ) ∣ 1 f X ∣ X > 0 ( x ) = 2 x 1 × 2 π 2 e − 2 1 x 2 at x = h − 1 ( y ) = y . Plugging in:
f Y ( y ) = 1 2 π y − 1 / 2 e − 1 2 y . f_Y(y) = \frac{1}{\sqrt{2 \pi}} y^{-1/2} e^{- \frac{1}{2} y}. f Y ( y ) = 2 π 1 y − 1/2 e − 2 1 y . The support of Y Y Y is [ 0 , ∞ ) [0,\infty) [ 0 , ∞ ) since the smallest possible value of X 2 X^2 X 2 is zero (at X = 0 X = 0 X = 0 ), and the largest is infinity (as X → ∞ X \rightarrow \infty X → ∞ ).