Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

10.3 Integration on Manifolds

A variety of problems in probability and statistics involve integrating over a connected subset of points in a multi-dimensional space. This chapter will focus on two examples.

  1. Suppose that [X,Y][X,Y] are random variables. Let S=X+YS = X + Y. How is SS distributed?

  2. Suppose that [X,Y][X,Y] are random variables. Let R2=X2+Y2R^2 = X^2 + Y^2. How is RR distributed?

Each of these problems involve summing/integrating over a subset of points in a plane. In the first case we need to run sums/integrals over lines. In the second, we need to run sums/integrals over circles. Lines and circles are examples of manifolds.

For example:

  1. The set of all [x,y][x,y] such that x+y=sx + y = s is a manifold. It is equivalent to the line y=sxy = s - x.

  2. The set of all [x,y][x,y] such that x2+y2=r2x^2 + y^2 = r^2 is a manifold. It is equivalent to the circle with radius rr.

  3. The set of all [x,y][x,y] such that ax2+by2=r2a x^2 + b y^2 = r^2 is a manifold. It is equivalent to an ellipse.

  4. The set of all [x,y][x,y] such that x×y=cx \times y = c is a manifold. It is equivalent to the curve y=c/xy = c/x.

In general, if we define a random variable, G=g(X,Y)G = g(X,Y), then the set of [X,Y][X,Y] where G=cG = c will correspond to a level set of gg. If gg is some smooth, scalar-valued function, then its level sets are usually manifolds, or unions of finitely many manifolds. To find the distribution of GG we will either need to find its mass function, density function, or cumulative distribution function. In each case, we will want to understand the chance that G=cG = c, or GcG \leq c. These chances will require integrating or summing over the level sets of gg. Thus, they will usually require integrating or summing over some collection of manifolds.

The figure below shows an example manifold in red. It is produced by taking a level set of a scalar valued function of xx and yy (shown as contours in black with a heatmap to illustrate the function value).

Example Manifold.

Sums of Random Variables

Suppose that XX and YY are jointly distributed, discrete random variables. Let S=X+YS = X + Y. What is PMFS(s)=Pr(S=s)\text{PMF}_S(s) = \text{Pr}(S = s)?

Consider all the pairs [X,Y][X,Y] such that S=sS = s. Since S=X+YS = X + Y, the collection of all pairs [X,Y][X,Y] such that S=sS = s is the set where Y=sXY = s - X. The figure below shows, in red, the set of all xx and yy such that s=x+y=5s = x + y = 5 if x[0,4]x \in [0,4] and y[0,4]y \in [0,4]. Notice that, the manifold corresponds to the line y=5xy = 5 - x, and is a level set of the scalar valued function x+yx + y.

Linear Manifold.

So, to find the chance that S=sS = s, we should take a union over all pairs of the form [X=x,Y=sx][X = x, Y = s - x]. Since this union runs over distinct xx we can use additivity:

We’ve run calculations like this before (see Section 2.1). For example, suppose that you roll two fair, six-sided, die. Let XX and YY denote the values rolled on each die. Let S=X+YS = X + Y. How is SS distributed? Try to find the distribution of SS yourself using a joint distribution table. Then, open the dropdown below

If XX and YY are continuously distributed then SS is also continuously distributed. Its density function is also expressed as a “sum” over the line y=sxy = s - x. As usual, to move from the discrete case to the continuous case, exchange a sum with an integral:

Convolution

If XX and YY are independent, then their joint mass/density function factors into a product of marginals:

Pr(X=x,Y=y)=Pr(X=x)×Pr(Y=y)fX,Y(x,y)=fX(x)×fY(y).\begin{aligned} & \text{Pr}(X = x,Y = y) = \text{Pr}(X = x) \times \text{Pr}(Y = y) \\ & f_{X,Y}(x,y) = f_{X}(x) \times f_{Y}(y). \end{aligned}

Factoring the joint inside of a sum over a line produces a convolution:

Convolution is an essential integral operation. It is important in many image and signal processing problems (e.g. deblurring in microscopy, astronomy, or medical imaging, and low/high-pass filtering of audio). It is the key operation behind the architecture of the convolutional neural networks widely used for computer vision, automated driving, and image classification. Convolution is also important in the theories of diffusion, a wide variety of dynamical systems, and the procedures for fast multiplication that allow computers to perform algebra quickly.

Example: Sums of Independent Exponential Random Variables

Suppose that XExponential(λ)X \sim \text{Exponential}(\lambda) and YExponential(λ)Y \sim \text{Exponential}(\lambda) are independent, identically distributed, exponential random variables. Then X0X \geq 0 and Y0Y \geq 0.

Let S=X+YS = X + Y. Then, S0S \geq 0 since XX and YY are nonnegative.

If S=sS = s, then X[0,s]X \in [0,s] since, if X>sX > s then Y=sXY = s - X would be negative. Therefore:

fS(s)=x=0sfX(x)fY(sx)=λ2x=0seλxeλ(sx)dx=λ2eλsx=0se(λλ)xdx=λ2e(λ+μ)xx=0s1dx=λ2seλs.\begin{aligned} f_S(s) & = \int_{x = 0}^{s} f_X(x) f_Y(s - x)\\ & = \lambda^2 \int_{x = 0}^s e^{-\lambda x} e^{-\lambda(s - x)} dx \\ & = \lambda^2 e^{-\lambda s} \int_{x = 0}^s e^{-(\lambda - \lambda) x} dx \\ & = \lambda^2 e^{-(\lambda + \mu) x} \int_{x = 0}^s 1 dx \\ & = \lambda^2 s e^{-\lambda s}. \end{aligned}

So, SS is a nonnegative random variable with density function:

fS(s)seλs.f_S(s) \propto s e^{-\lambda s}.

In this case SS is an example of a gamma random variable.

You will iterate this analysis on your homework to find the distribution of the sum of nn independent, identical, exponential random variables.

Interactive

You can visualize convolution as follows. First, fix ss. Then fY(sx)f_Y(s - x) is the function fYf_Y reflected (x-x in the argument), then translated by ss. So, convolving fXf_X with fYf_Y is the same as:

  1. Reflecting fYf_Y.

  2. Translating its reflection by ss.

  3. Taking the product of fXf_X with the translated, reflected density fYf_Y.

  4. Finding the area underneath the product by integrating over all xx.

Run the code cell below to experiment with the convolution of different densities. You can choose the densities fXf_X and fYf_Y, visualize fX(x)f_X(x) and fY(sx)f_Y(s - x) for different ss, reveal their product, then compute the convolution by finding the area under the product. Repeating for all ss recovers the density function of SS.

%matplotlib inline
from utils_convolution import show_convolution

show_convolution()

Sums of Squares of Random Variables

To estimate variances we often need to understand the distribution of sums of squares. In this chapter we’ll focus on the simple case when we add the squares of two random variables. We’ll also restrict our attention to continuous random variables.

Suppose that XX and YY are jointly distributed continuous random variables. Let R2=X2+Y2R^2 = X^2 + Y^2 denote the sum of squares. Then, the set of all [x,y][x,y] such that x2+y2=r2x^2 + y^2 = r^2 is a circle of radius rr:

Circular Manifold.

How is R2R^2 distributed?

In this case it will be easier to find the distribution of RR by first working out its cumulative distribution function. As always, we can find its density function by differentiating its cumulative distribution function: fR(r)=ddrFR(r)f_R(r) = \frac{d}{dr} F_R(r) (see Sections 2.3 and 2.4).

By definition,

FR(r)=Pr(Rr)=Pr(R2r2)=Pr(X2+Y2r2).F_R(r) = \text{Pr}(R \leq r) = \text{Pr}(R^2 \leq r^2) = \text{Pr}(X^2 + Y^2 \leq r^2).

The region where x2+y2r2x^2 + y^2 \leq r^2 is the collection of all points within a circle radius rr, centered at the origin. So, to find the CDF of RR, we will need to find the volume under the joint density over the circle of radius rr (see Section 8.3):

FR(r)=[x,y] such that x2+y2r2fX,Y(x,y)dxdy.F_R(r) = \iint_{[x,y] \text{ such that } x^2 + y^2 \leq r^2} f_{X,Y}(x,y) dx dy.

To integrate over a circular region, we will use an iterated integral in polar coordinates.

Integration in Polar Coordinates

Polar Differential.

In particular, the chance that XX and YY are contained in a circle with radius rr_* is:

all x2+y2r2fX,Y(x,y)dxdy=r=0r[θ=02πfX,Y(rcos(θ),rsin(θ))dθ]rdr.\iint_{\text{all } x^2 + y^2 \leq r_*^2} f_{X,Y}(x,y) dx dy = \int_{r = 0}^{r_*} \left[\int_{\theta = 0}^{2 \pi} f_{X,Y}(r \cos(\theta), r \sin(\theta)) d\theta \right] r dr.

The integrand, ff, is usually simpler in polar coordinates if it is only a function of rr. If fX,Y(x,y)=g(r)f_{X,Y}(x,y) = g(r) for some nonnegative function gg, and r=x2+y2r = \sqrt{x^2 + y^2}, then the density is rotationally symmetric.

The integral of a rotationally symmetric density, fX,Y(x,y)=g(r)f_{X,Y}(x,y) = g(r), over a circular region, is:

all x2+y2r2fX,Y(x,y)dxdy=r=0r[θ=02πg(r)dθ]rdr=r=0rg(r)[θ=02π1dθ]rdr=2πr=0rg(r)rdr.\begin{aligned} \iint_{\text{all } x^2 + y^2 \leq r_*^2} f_{X,Y}(x,y) dx dy & = \int_{r = 0}^{r_*} \left[\int_{\theta = 0}^{2 \pi} g(r) d\theta \right] r dr \\ & = \int_{r = 0}^{r_*} g(r) \left[\int_{\theta = 0}^{2 \pi} 1 d\theta \right] r dr & = 2 \pi \int_{r = 0}^{r_*} g(r) r dr. \end{aligned}

This is a univariate integral in rr alone.

Example: Independent Normal Random Variables

Suppose that XX and YY are independent, identically distributed, standard normal random variables. Then both XX and YY are supported on (,)(-\infty, \infty) with density function:

fX(z)=fY(z)e12z2.f_X(z) = f_Y(z) \propto e^{-\frac{1}{2} z^2}.

Since XX and YY are independent, the random vector, [X,Y][X,Y], has joint density:

fX,Y(x,y)=fX(x)×fY(y)e12(x2+y2)=g(r)f_{X,Y}(x,y) = f_X(x) \times f_Y(y) \propto e^{-\frac{1}{2}(x^2 + y^2)} = g(r)

where:

g(r)=e12r2.g(r) = e^{-\frac{1}{2} r^2}.

So, the joint density of XX and YY is rotationally symmetric!

To check our work, run the code cell below and select “Independent Normal.” You should see a bell shaped peak, with circular level sets. The surface is unchanged when you rotate it, so is rotatiopnally symmetric.

from utils_lsg import show_level_sets

show_level_sets()

Let’s find the distribution of R2=X2+Y2R^2 = X^2 + Y^2 and R=X2+Y2R = \sqrt{X^2 + Y^2}. Along the way we’ll work out the normalizing constant for a single standard normal random variable.

Let’s find the distribution of R2R^2 and RR first. Both can be recovered from the CDF:

FR2(r2)=Pr(R2r2)=Pr(Rr)=FR(r).F_{R^2}(r_*^2) = \text{Pr}(R^2 \leq r_*^2) = \text{Pr}(R \leq r_*) = F_R(r_*).

The CDF evaluated at rr_* is the volume under the joint density of XX and YY over the circle with radius rr_*. Integrating in polar coordinates:

FR(r)2πr=0rg(r)rdr=2πr=0rre12r2dr.F_R(r_*) \propto 2 \pi \int_{r = 0}^{r_*} g(r) r dr = 2 \pi \int_{r = 0}^{r_*} r e^{-\frac{1}{2} r^2} dr.

To evaluate the integral, let’s integrate by change of variables (see Section 7.2). Let u=12r2u = \frac{1}{2}r^2 so that du=(ddr12r2)dr=rdrdu = (\frac{d}{dr} \frac{1}{2} r^2) dr = r dr. Then:

FR(r)2πu=012r2eudu=2πeuu=012r2=2π(1e12r2).\begin{aligned} F_R(r_*) & \propto 2 \pi \int_{u = 0}^{\frac{1}{2} r_*^2} e^{-u} du \\ & = - 2 \pi e^{-u} \Big|_{u = 0}^{\frac{1}{2} r_*^2} = 2 \pi (1 - e^{-\frac{1}{2} r_*^2}).\end{aligned}

It follows that:

FR(r)=2πc(1e12r2)F_R(r) = 2 \pi c (1 - e^{-\frac{1}{2} r^2})

for some normalization constant cc.

To find the normalization constant, recall that limrFR(r)=1\lim_{r \rightarrow \infty} F_R(r) = 1 for any CDF. When rr diverges, r2r^2 diverges, so e12r2e^{-\frac{1}{2} r^2} converges to zero. Therefore:

1=limrFR(r)=2πc(10)=2πc.1 = \lim_{r \rightarrow \infty} F_R(r) = 2 \pi c(1 - 0) = 2 \pi c.

Therefore, c=1/(2π)c = 1/(2 \pi).

So:

FR(r)=1e12r2.F_R(r) = 1 - e^{-\frac{1}{2} r^2}.

It follows that:

fR(r)=ddrFR(r)=re12r2.f_R(r) = \frac{d}{dr} F_R(r) = r e^{-\frac{1}{2} r^2}.

So, RR is a nonnegative random variable with density fR(r)=re12r2f_R(r) = r e^{-\frac{1}{2} r^2}. In this case, RR is an example of a chi-squared random variable with two degrees of freedom.

To find the distribution of the sum of squares, apply the change of density formula using y=h(r)=r2y = h(r) = r^2 (see Section 7.2). Then h(r)=2rh'(r) = 2 r and h1(y)=y=rh^{-1}(y) = \sqrt{y} = r. Therefore:

fR2(y)=re12r212r=12rre12r2=12e12r2=12e12(y)2=12e12y.\begin{aligned} f_R^2(y) & = r e^{-\frac{1}{2} r^2} \frac{1}{2r} = \frac{1}{2} \frac{r}{r} e^{-\frac{1}{2} r^2} \\& = \frac{1}{2} e^{-\frac{1}{2} r^2} = \frac{1}{2} e^{-\frac{1}{2} (\sqrt{y})^2} = \frac{1}{2} e^{-\frac{1}{2} y}. \end{aligned}

So, the sum of squares, R2R^2, is a nonnegative random variable with density proportional to an exponential function. It follows that the sum of squares is an exponential random variable with rate parameter 1/21/2.

Look back to the stage in this analysis where we worked out the normalization factor cc. This is the normalization factor for the joint density:

fX,Y(x,y)=cg(r)=ce12(x2+y2).f_{X,Y}(x,y) = c g(r) = c e^{-\frac{1}{2} (x^2 + y^2)}.

Since XX and YY are independent, with marginals, fX(z)=fY(z)e12z2f_X(z) = f_Y(z) \propto e^{-\frac{1}{2}z^2}:

fX,Y(x,y)=cg(r)=ce12x2e12y2=(ce12x2)(ce12y2)=fX(x)fY(y)..\begin{aligned} f_{X,Y}(x,y) = c g(r) &= c e^{-\frac{1}{2} x^2} e^{-\frac{1}{2} y^2} \\ & = (\sqrt{c} e^{-\frac{1}{2} x^2}) (\sqrt{c} e^{-\frac{1}{2} y^2}) \\ & = f_X(x) f_Y(y). \end{aligned}.

So, the normalizing constants for the marginal densities is: c=1/2π\sqrt{c} = 1/\sqrt{2 \pi}. Therefore, the standard normal density is normalized by:

fZ(z)=12πe12z2.f_Z(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}.

This argument explains the factor of 2π\sqrt{2 \pi} in the definition of the normal density.