Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

6.4 Limiting Distributions

Sections 6.1 through 6.3 developed methods for approximating smooth functions (exponentials, logarithms) and factorials. In this chapter we will use those tools to derive the distributions for two essential probability models. Both can be derived as limits of the familiar Binomial distribution from Section 2.2.

Recall that XBinomial(n,p)X \sim \text{Binomial}(n,p) if XX is the total number of successes in nn independent, identical binary trials with success probability pp. Then:

X{0,1,2,...,n} and PMF(x)=(nx)px(1p)nx.X \in \{0,1,2,...,n\} \text{ and } \text{PMF}(x) = \left( \begin{array}{c} n \\ x \end{array} \right) p^x (1 - p)^{n - x}.

Open the distribution plotter linked below to experiment with the Binomial PMF.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Binomial");

Poisson Distributions

We express the statement, XX is drawn from a Poisson distribution with parameter λ\lambda:

XPoisson(λ).X \sim \text{Poisson}(\lambda).

The Poisson distribution is normalized since, by the Taylor series expansion of the exponential:

x=0eλλxx!=eλx=0λxx!=eλeλ=1\sum_{x = 0}^{\infty} e^{-\lambda} \frac{\lambda^x}{x!} = e^{-\lambda} \sum_{x = 0}^{\infty} \frac{\lambda^x}{x!} = e^{-\lambda} e^{\lambda} = 1

Experiment with the Poisson distribution below. Try setting λ=5\lambda = 5.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Poisson");

You should notice that the result looks roughly Binomial with a mode near λ=5\lambda = 5. We will recover the Poisson as a limit of the Binomial distribution.

To find the appropriate limit, open the Binomial explorer again.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Binomial");

Now, attempt the following.

  1. Gradually increase nn. Start with n=10n = 10, then work upwards.

  2. As you increase nn, decrease pp. Keep p5/np \approx 5/n. This will keep the peak of the Binomial PMF near x=5x = 5 even as nn increases. We need to keep the peak near x=5x = 5 since the Poisson PMF with λ=5\lambda = 5 was peaked near x=5x = 5.

You should see that, as nn increases, your PMF looks closer and closer to the PMF for Poisson(5)\text{Poisson}(5).

Repeat this experiment for λ=2\lambda = 2 and λ=10\lambda = 10. Each time track the Binomial PMF as you increase nn, while decreasing pp. You should see that, if you keep p=λ/np = \lambda/n, then, in each case, the sequence of Binomial PMF’s will approach the Poisson PMF.

... as a Binomial Limit

This experiment illustrates a limiting relationship between the Poisson and Binomial distributions. This relationship is sometimes called the Law of Small Numbers:

Before proving this law, it’s worth examining the limit statement.

  1. Holding n×p(n)=λn \times p(n) = \lambda is equivalent to keeping E[X]=n×p(n)\mathbb{E}[X] = n \times p(n) fixed at λ\lambda as nn increases. So, this is a limit where, as the number of trials increases, the chance of success per trial decreases, so that the expected total number of successes remains constant.

  2. Like any limiting statement, the Law of Small Numbers is most useful as an approximation. It guarantees that, when nn is large, and pp is small, then XBinomial(n,p)X \sim \text{Binomial}(n,p) is approximately Poisson distributed. This result is useful since the Poisson distribution is easier to work with than the Binomial.

The Law of Small Numbers is often invoked when we ask about the total number of successes in a large number of trials that each rarely succeed. The “small” in small numbers references the idea that, if pp is small, then E[X]=np\mathbb{E}[X] = n p is much smaller than nn. We could have as well named this limiting relationship the “Law of Rare Counts.”

When do limits of this kind occur in practice?

... from Exponential Waiting Times

We’ve actually already seen situation where the limit involved in the Law of Small Numbers is sensible.

Recall the random incidents model from Section 6.2:

Suppose that, instead of asking for the time between successive incidents, we ask for the total number of incidents that occur between times 0 and time tt. Let XX denote the total number of incidents that occur. Then:

XPoisson(λt).X \sim \text{Poisson}(\lambda t).

In other words:

In the setting described above, the limit that appears in the law of small numbers is sensible. The expected value of X(n)X(n) should converge to something sensible as nn diverges since the number of intervals in the partition, nn, was an arbitrary number introduced to help analyze X=limnX(n)X = \lim_{n \rightarrow \infty} X(n).

Proof of the Law of Small Numbers

To establish the law of small numbers, we need to show that the Binomial PMF converges to the Poisson PMF in the limit as nn diverges, provides p(n)p(n) behaves like λ/n\lambda/n. The support of the Binomial converges to the support of the Poisson since {0,1,2,...,n}\{0,1,2,...,n\} approaches {0,1,2,...,}\{0,1,2,...,\infty\} as nn diverges.

Substituting p(n)=λ/np(n) = \lambda/n into the Binomial PMF gives:

(nx)(λn)x(nλn)nx=(nx)(λnλ)x(1λn)n.\left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{\lambda}{n} \right)^x \left(\frac{n - \lambda}{n} \right)^{n - x} = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{\lambda}{n - \lambda} \right)^x \left(1 - \frac{\lambda}{n} \right)^{n}.

One term is ready for a limit. By the limiting expression for the exponential:

limn(1λn)n=eλ.\lim_{n \rightarrow \infty} \left(1 - \frac{\lambda}{n} \right)^{n} = e^{-\lambda}.

This is the normalizing constant of the Poisson distribution.

Next, we need to work out the limit of:

(nx)(λnλ)x=n!x!(nx)!(λnλ)x.\left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{\lambda}{n - \lambda} \right)^x = \frac{n!}{x!(n - x)!} \left(\frac{\lambda}{n - \lambda} \right)^x.

To find this limit, we will expand it as a product. Let:

r(x+1)=Pr(X=x+1)Pr(X=x)r(x+1) = \frac{\text{Pr}(X = x+1)}{\text{Pr}(X = x)}

denote the ratio of successive values of the PMF. Then, the PMF at xx can be expanded:

Pr(X=x)=Pr(X=0)×Pr(X=1)Pr(X=0)×Pr(X=2)Pr(X=1)×...×Pr(X=x)Pr(X=x1)=Pr(X=0)y=1xr(y)\begin{aligned} \text{Pr}(X = x) & = \text{Pr}(X = 0) \times \frac{\text{Pr}(X = 1)}{\text{Pr}(X = 0)} \times \frac{\text{Pr}(X = 2)}{\text{Pr}(X = 1)} \times ... \times \frac{\text{Pr}(X = x)}{\text{Pr}(X = x - 1)} \\ & = \text{Pr}(X = 0) \prod_{y=1}^x r(y) \end{aligned}

This form is convenient since:

Pr(X=0)=(n0)p0(1p)n=1×1×(1λn)n\text{Pr}(X = 0) = \left( \begin{array}{c} n \\ 0 \end{array} \right) p^0 (1 - p)^n = 1 \times 1 \times \left(1 - \frac{\lambda}{n} \right)^{n}

which converges to the normalizing constant eλe^{-\lambda} we derived before.

Moreover, each ratio rr is simple:

r(y)=n!n!(y1)!y!(n(y1))!(ny)!pypy1(1p)ny(1p)n(y1)=ny+1yp1p\begin{aligned} r(y) & = \frac{n!}{n!} \frac{(y - 1)!}{y!} \frac{(n - (y - 1))!}{(n - y)!} \frac{p^y}{p^{y-1}} \frac{(1 -p)^{n - y}}{(1 - p)^{n - (y - 1)}} \\ & = \frac{n - y + 1}{y} \frac{p}{1 - p} \end{aligned}

So, if p=λ/np = \lambda/n, then:

r(y)=ny+1yλ/n1λ/n=ny+1yλnλr(y) = \frac{n - y + 1}{y} \frac{\lambda/n}{1 - \lambda/n} = \frac{n - y + 1 }{y} \frac{\lambda}{n - \lambda}

Now, if nn diverges, nn will dominate y+1- y + 1, so the first term simplifies to n/yn/y, and nn will dominate λ-\lambda, so the second term will simplify to λ/n\lambda/n. Therefore:

limnny+1yλnλ=limnnyλn=λy.\lim_{n \rightarrow \infty} \frac{n - y + 1}{y} \frac{\lambda}{n - \lambda} = \lim_{n \rightarrow \infty} \frac{n}{y} \frac{\lambda}{n} = \frac{\lambda}{y}.

So, the PMF at xx, in the limit of infinite nn, is:

PMF(x)=Pr(X=0)y=1xr(y)=eλy=1xλy=eλλxx×(x1)×...2×1=eλλxx!.\text{PMF}(x) = \text{Pr}(X = 0) \prod_{y=1}^x r(y) = e^{-\lambda} \prod_{y=1}^x \frac{\lambda}{y} = e^{-\lambda} \frac{\lambda^x}{x\times(x - 1)\times ... 2 \times 1} = e^{-\lambda} \frac{\lambda^x}{x!}.

The right hand side is the Poisson PMF! \square

Normal Distributions

We express the statement, XX is drawn from a standard normal distribution with parameter:

XNormal(0,1).X \sim \text{Normal}(0,1).

You can experiment with the normal distribution using the code cell below. Notice that, changing the mean parameter translates the density, while changing the standard deviation dilates it.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Normal");

The normal distribution is an important model since it is achieved as the limit of many other distributions. In particular, if we draw a set of nn independent, identical samples from a distribution with finite variance, and compute their sample average, then the sample average will be approximately normally distributed for large nn.

In this chapter, we’ll recover the formula for the standard normal density:

PDF(x)=12πe12x2\text{PDF}(x) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^2}

from a limit of Binomial random variables.

... as a Binomial Limit

First, run the code cell below to visualize a Binomial PMF.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Binomial");

This time, keep the success probability, pp, fixed, and increase nn. Start with p=0.5p = 0.5 and n=4n = 4, then gradually increase nn. You should see that, even for relatively small nn, the Binomial PMF approaches a bell curve shape. If you repeat this experiment for p0.5p \neq 0.5 you’ll see the same result, though the bell curve will start out skewed.

So, consider the Binomial PMF with p=0.5p = 0.5:

PMF(x)=(nx)(12)x(112)nx=(nx)(12)x(12)nx=(nx)(12)n.\begin{aligned} \text{PMF}(x) & = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{1}{2} \right)^x \left(1 - \frac{1}{2} \right)^{n - x} \\ & = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{1}{2} \right)^x \left(\frac{1}{2} \right)^{n - x} \\ & = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{1}{2} \right)^n. \end{aligned}

So, when p=1/2p = 1/2, the Binomial PMF is proportional to the choose coefficient as a function of xx:

PMF(x)(nx).\text{PMF}(x) \propto \left( \begin{array}{c} n \\ x \end{array} \right).

This means that our analysis of the Binomial PMF will serve two ends at once. First, we will see that, as nn diverges, the Binomial PMF produces a normal curve. Second, by showing that the Binomial PMF approaches a normal curve, we will also develop a normal approximation for binomial coefficients.

Before starting, we will have to fix a basic discrepancy between our two models.

If ZNormal(0,1)Z \sim \text{Normal}(0,1) then E[Z]=0\mathbb{E}[Z] = 0 and Var[Z]=1\text{Var}[Z] = 1.

In contrast, if XnBinomial(n,p)X_n \sim \text{Binomial}(n,p) then E[Xn]=np\mathbb{E}[X_n] = n p and Var[Xn]=np(1p)\text{Var}[X_n] = n p (1 - p). So, XnX_n cannot converge to ZZ as nn diverges, since the expected value of a Binomial distribution is proportional to nn and is nonzero when p0p \neq 0. Worse, its standard deviation grows at rate O(n)\mathcal{O}(\sqrt{n}).

To fix this issue, we will show that a standardized version of XX approaches ZZ. To standardize a random variable, subtract off its mean and divide by its standard deviation (see Section 4.3).

So, let:

Zn=Xnnpnp(1p)Z_n = \frac{X_n - n p}{\sqrt{n p (1 - p)}}

We’ve added the subscript “sub nn” to ZZ to indicate that ZnZ_n is a standardized Binomial random variable on nn trials.

When p=0.5p = 0.5:

Zn=Xnn×0.5n0.52=1n(2Xnn).Z_n = \frac{X_n - n \times 0.5}{\sqrt{n 0.5^2}} = \frac{1}{\sqrt{n}}(2 X_n - n).

In terms of ZnZ_n:

Xn=12(n+nZn)=n2(1+1nZn).X_n = \frac{1}{2}(n + \sqrt{n} Z_n) = \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} Z_n \right).

Now, the PMF of ZnZ_n is:

Pr(Zn=z)=Pr(1n(2Xnn)=z)=Pr(Xn=n2(1+1nz)).\text{Pr}(Z_n = z) = \text{Pr} \left( \frac{1}{\sqrt{n}}(2 X_n - n) = z \right) = \text{Pr}\left(X_n = \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \right).

Then, using the formula for the Binomial PMF:

Pr(Zn=z)=(12)n(nn2(1+1nz)).\text{Pr}(Z_n = z) = \left(\frac{1}{2} \right)^n \left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right).

To simplify, first expand the Binomial coefficient as a ratio of factorials:

(nn2(1+1nz))=n!(n2(1+1nz))!×(n2(11nz))!\left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right) = \frac{n!}{\left(\frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \right)! \times \left(\frac{n}{2} \left(1 - \frac{1}{\sqrt{n}} z \right) \right)!}

Next, apply Stirling’s approximation to approximate each term:

n!2πe(ne)n+12,(n2(1±1nz))!2πe(n2e(1±1nz))n2(1±1nz)+12\begin{aligned} & n! \simeq \sqrt{2 \pi e} \left( \frac{n}{e} \right)^{n + \frac{1}{2}}, \\ & \left(\frac{n}{2} \left(1 \pm \frac{1}{\sqrt{n}} z \right) \right)! \simeq \sqrt{2 \pi e} \left( \frac{n}{2 e} \left(1 \pm \frac{1}{\sqrt{n}} z \right) \right)^{\frac{n}{2} \left(1 \pm \frac{1}{\sqrt{n}} z \right) + \frac{1}{2}} \end{aligned}

Substituting each term for its approximation, then cancelling like terms, gives:

(nn2(1+1nz))2n+12π(nz2)(1+1nz)n2(1+1nz)(11nz)n2(11nz)\left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right) \sim \frac{2^{n + 1}}{\sqrt{2 \pi (n - z^2)}} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)}

Therefore:

Pr(Zn=z)=(12)n(nn2(1+1nz))22π(nz2)(1+1nz)n2(1+1nz)(11nz)n2(11nz).\begin{aligned} \text{Pr}(Z_n = z) &= \left( \frac{1}{2}\right)^n \left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right) \\ & \simeq \frac{2}{\sqrt{2 \pi (n - z^2)}} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)}. \end{aligned}

When nn is large, nz2n - z^2 will be dominated by nn. Therefore, we can make the approximation:

Pr(Zn=z)2n12π(1+1nz)n2(1+1nz)(11nz)n2(11nz).\text{Pr}(Z_n = z) \simeq \frac{2}{\sqrt{n}} \frac{1}{\sqrt{2 \pi}} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)}.

The standard normal random variable, ZZ, is continuously distributed, so is parameterized by a density. Each ZnZ_n is a discrete random variable. To recover a density from a probability, we need to divide out by the length of an interval.

In this case we can construct a density from ZnZ_n by replacing ZnZ_n with a random variable WnW_n, where WnZn=zUniform(zΔzn/2,z+Δzn/2)W_n|Z_n = z \sim \text{Uniform}(z - \Delta z_n/2, z + \Delta z_n/2) where Δzn\Delta z_n is the gap between successive possible values of ZnZ_n. Since Zn=1n(2Xnn)Z_n = \frac{1}{\sqrt{n}}(2 X_n - n), and XnX_n are integer valued, Δzn=2n\Delta z_n = \frac{2}{\sqrt{n}}.

Then, work with the density function of WnW_n:

1ΔznPr(Zn=z)=12π(1+1nz)n2(1+1nz)(11nz)n2(11nz).\frac{1}{\Delta z_n} \text{Pr}(Z_n = z) = \frac{1}{\sqrt{2 \pi}} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)}.

This procedure is the same as:

  1. Representing the PMF of ZnZ_n with a bar plot. The widths of the bars is Δzn\Delta z_n.

  2. Scaling the height of the bars by their widths so that their area returns the PMF value. This returns the density function for WnW_n.

Integrating over the density function of WnW_n, with bounds equal to the endpoints of the bars, will sum over the PMF of ZnZ_n. So, all probability questions we could ask about ZnZ_n could be answered by integrating over the density function of WnW_n.

Now that we’ve handled the normalizing constants, focus on the functional form:

(1+1nz)n2(1+1nz)(11nz)n2(11nz)=[(1+1nz)×(11nz)]n2(11nz1+1nz)z2n=(11nz2)n2(11nz1+1nz)z2n\begin{aligned} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)} & = \left[\left(1 + \frac{1}{\sqrt{n}} z \right) \times \left(1 - \frac{1}{\sqrt{n}} z \right) \right]^{-\frac{n}{2}} \left(\frac{1 - \frac{1}{\sqrt{n}} z}{1 + \frac{1}{\sqrt{n}} z} \right)^{\frac{z}{2} \sqrt{n}} \\ & = \left(1 - \frac{1}{n} z^2 \right)^{-\frac{n}{2}} \left(\frac{1 - \frac{1}{\sqrt{n}} z}{1 + \frac{1}{\sqrt{n}} z} \right)^{\frac{z}{2} \sqrt{n}} \end{aligned}

To take the limit as nn goes to infinity, express each term in the form used for the limiting definition of the exponential:

(11nz2)n2=[(11nz2)n]12[ez2]12=e12z2(11nz)z2n=[(11nz)z2n]z2[ez]z2=e12z2(1+1nz)z2n=[(1+1nz)z2n]z2[ez]z2=e12z2\begin{aligned} & \left(1 - \frac{1}{n} z^2 \right)^{-\frac{n}{2}} = \left[\left(1 - \frac{1}{n} z^2 \right)^n \right]^{-\frac{1}{2}} \simeq \left[e^{-z^2} \right]^{-\frac{1}{2}} = e^{\frac{1}{2} z^2} \\ & \left(1 - \frac{1}{\sqrt{n}} z \right)^{\frac{z}{2} \sqrt{n}} = \left[\left(1 - \frac{1}{\sqrt{n}} z \right)^{\frac{z}{2} \sqrt{n}} \right]^{\frac{z}{2}} \simeq \left[e^{-z} \right]^{\frac{z}{2}} = e^{-\frac{1}{2} z^2} \\ & \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{z}{2} \sqrt{n}} = \left[\left(1 + \frac{1}{\sqrt{n}} z \right)^{\frac{z}{2} \sqrt{n}} \right]^{-\frac{z}{2}} \simeq \left[e^{z} \right]^{-\frac{z}{2}} = e^{-\frac{1}{2} z^2} \\ \end{aligned}

Therefore:

limn(11nz2)n2(11nz1+1nz)z2n=e12z2×e12z2×e12z2=e12z2\lim_{n \rightarrow \infty} \left(1 - \frac{1}{n} z^2 \right)^{-\frac{n}{2}} \left(\frac{1 - \frac{1}{\sqrt{n}} z}{1 + \frac{1}{\sqrt{n}} z} \right)^{\frac{z}{2} \sqrt{n}} = e^{\frac{1}{2} z^2} \times e^{-\frac{1}{2} z^2} \times e^{-\frac{1}{2} z^2} = e^{-\frac{1}{2} z^2}

So:

limn1ΔznPr(Zn=z)=12πe12z2.\lim_{n \rightarrow \infty} \frac{1}{\Delta z_n} \text{Pr}(Z_n = z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}.

The expression on the left is a PDF since Δzn\Delta z_n converges to zero as nn diverges. The expression on the right is the standard normal density function. Therefore:

It follows that: