Sections 6.1 through 6.3 developed methods for approximating smooth functions (exponentials, logarithms) and factorials. In this chapter we will use those tools to derive the distributions for two essential probability models. Both can be derived as limits of the familiar Binomial distribution from Section 2.2.
Recall that X∼Binomial(n,p) if X is the total number of successes in n independent, identical binary trials with success probability p. Then:
Experiment with the Poisson distribution below. Try setting λ=5.
from utils_dist import run_distribution_explorer
run_distribution_explorer("Poisson");
You should notice that the result looks roughly Binomial with a mode near λ=5. We will recover the Poisson as a limit of the Binomial distribution.
To find the appropriate limit, open the Binomial explorer again.
from utils_dist import run_distribution_explorer
run_distribution_explorer("Binomial");
Now, attempt the following.
Gradually increase n. Start with n=10, then work upwards.
As you increase n, decrease p. Keep p≈5/n. This will keep the peak of the Binomial PMF near x=5 even as n increases. We need to keep the peak near x=5 since the Poisson PMF with λ=5 was peaked near x=5.
You should see that, as n increases, your PMF looks closer and closer to the PMF for Poisson(5).
Repeat this experiment for λ=2 and λ=10. Each time track the Binomial PMF as you increase n, while decreasing p. You should see that, if you keep p=λ/n, then, in each case, the sequence of Binomial PMF’s will approach the Poisson PMF.
This experiment illustrates a limiting relationship between the Poisson and Binomial distributions. This relationship is sometimes called the Law of Small Numbers:
Before proving this law, it’s worth examining the limit statement.
Holding n×p(n)=λ is equivalent to keeping E[X]=n×p(n) fixed at λ as n increases. So, this is a limit where, as the number of trials increases, the chance of success per trial decreases, so that the expected total number of successes remains constant.
Like any limiting statement, the Law of Small Numbers is most useful as an approximation. It guarantees that, when n is large, and p is small, then X∼Binomial(n,p) is approximately Poisson distributed. This result is useful since the Poisson distribution is easier to work with than the Binomial.
The Law of Small Numbers is often invoked when we ask about the total number of successes in a large number of trials that each rarely succeed. The “small” in small numbers references the idea that, if p is small, then E[X]=np is much smaller than n. We could have as well named this limiting relationship the “Law of Rare Counts.”
We’ve actually already seen situation where the limit involved in the Law of Small Numbers is sensible.
Recall the random incidents model from Section 6.2:
Suppose that, instead of asking for the time between successive incidents, we ask for the total number of incidents that occur between times 0 and time t. Let X denote the total number of incidents that occur. Then:
In the setting described above, the limit that appears in the law of small numbers is sensible. The expected value of X(n) should converge to something sensible as n diverges since the number of intervals in the partition, n, was an arbitrary number introduced to help analyze X=limn→∞X(n).
To establish the law of small numbers, we need to show that the Binomial PMF converges to the Poisson PMF in the limit as n diverges, provides p(n) behaves like λ/n. The support of the Binomial converges to the support of the Poisson since {0,1,2,...,n} approaches {0,1,2,...,∞} as n diverges.
Substituting p(n)=λ/n into the Binomial PMF gives:
Now, if n diverges, n will dominate −y+1, so the first term simplifies to n/y, and n will dominate −λ, so the second term will simplify to λ/n. Therefore:
You can experiment with the normal distribution using the code cell below. Notice that, changing the mean parameter translates the density, while changing the standard deviation dilates it.
from utils_dist import run_distribution_explorer
run_distribution_explorer("Normal");
The normal distribution is an important model since it is achieved as the limit of many other distributions. In particular, if we draw a set of n independent, identical samples from a distribution with finite variance, and compute their sample average, then the sample average will be approximately normally distributed for large n.
In this chapter, we’ll recover the formula for the standard normal density:
First, run the code cell below to visualize a Binomial PMF.
from utils_dist import run_distribution_explorer
run_distribution_explorer("Binomial");
This time, keep the success probability, p, fixed, and increase n. Start with p=0.5 and n=4, then gradually increase n. You should see that, even for relatively small n, the Binomial PMF approaches a bell curve shape. If you repeat this experiment for p=0.5 you’ll see the same result, though the bell curve will start out skewed.
This means that our analysis of the Binomial PMF will serve two ends at once. First, we will see that, as n diverges, the Binomial PMF produces a normal curve. Second, by showing that the Binomial PMF approaches a normal curve, we will also develop a normal approximation for binomial coefficients.
Before starting, we will have to fix a basic discrepancy between our two models.
If Z∼Normal(0,1) then E[Z]=0 and Var[Z]=1.
In contrast, if Xn∼Binomial(n,p) then E[Xn]=np and Var[Xn]=np(1−p). So, Xn cannot converge to Z as n diverges, since the expected value of a Binomial distribution is proportional to n and is nonzero when p=0. Worse, its standard deviation grows at rate O(n).
To fix this issue, we will show that a standardized version of X approaches Z. To standardize a random variable, subtract off its mean and divide by its standard deviation (see Section 4.3).
The standard normal random variable, Z, is continuously distributed, so is parameterized by a density. Each Zn is a discrete random variable. To recover a density from a probability, we need to divide out by the length of an interval.
In this case we can construct a density from Zn by replacing Zn with a random variable Wn, where Wn∣Zn=z∼Uniform(z−Δzn/2,z+Δzn/2) where Δzn is the gap between successive possible values of Zn. Since Zn=n1(2Xn−n), and Xn are integer valued, Δzn=n2.
Representing the PMF of Zn with a bar plot. The widths of the bars is Δzn.
Scaling the height of the bars by their widths so that their area returns the PMF value. This returns the density function for Wn.
Integrating over the density function of Wn, with bounds equal to the endpoints of the bars, will sum over the PMF of Zn. So, all probability questions we could ask about Zn could be answered by integrating over the density function of Wn.
Now that we’ve handled the normalizing constants, focus on the functional form:
The expression on the left is a PDF since Δzn converges to zero as n diverges. The expression on the right is the standard normal density function. Therefore: