Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

13.4 The Central Limit Theorem

Motivation

We’ve shown that sample averages are guaranteed to converge to an underlying expectation provided the samples are sufficiently uncorrelated and are drawn with the same (or converging) expectations. We proved this result with an upper bound on tail probabilities for the distribution of the sample average. Unfortunately, the tail bounds used often overestimate the true tail probabilities. So, while strong enough to prove the weak law of large numbers, they are too weak to give useful finite sample guarantees when the user wants to guarantee accuracy with high probability.

Our results have been imprecise since we have avoided the exact distribution of the sum or sample average. This was useful since finding exact distributions is hard. We relied on summary quantites (e.g. expectations and variances) since these summaries were easier to compute and powerful enough to develop a general theory that applied no matter the original distribution. To improve on this theory we will need to study the actual distribution of long term sums and sample averages.

All examples and results in this section will pursue the exact distribution of a sum in a limit where the sum includes many terms.

Remarkably, the exact distribution of a sum, or sample average, converges, in the limit of infinitely many samples, to a normal distribution, regardless the original distribution used to generate the samples.

This theorem is the last major result in most introductory probability classes. It is especially useful when we want to produce confidence intervals. It suggests much tighter intervals than the tail bounds developed in Section 13.2. It explains the ubiquity of the normal distribution in applied probability and probability modeling.

Motivating Examples

To anticipate the result, we’ll consider two examples.

Bernoulli/Binomial Random Variables

Suppose that {Xj}j=1n\{X_j\}_{j=1}^n are drawn independently and identically from a Bernoulli distribution with success probability pp.

In this case we can write down the distribution of SnS_n exactly. The sum, Sn=j=1nIjS_n = \sum_{j=1}^n I_j, is a sum of independent, identical indicators, so is drawn from a binomial distribution with parameters nn and pp:

SnBinomial(n,p).S_n \sim \text{Binomial}(n,p).

Since we have a closed form for the PMF of the sum, we can study the limiting distribution of the sum directly. Run the code cell below to visualize the binomial distribution. Choose some pp and hold it fixed. Then gradually increase nn.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Binomial");

You should notice three main effects:

  1. As you increase nn, the peak in the distribution slides to the right. We’ve known this for a while. The expected value of a binomial distribution is, by additivity of expectation, npn p, and its mode is close to npn p. So, the peak of the distribution will move rightward along the line npn p as a function of nn.

  2. The distribution gets wider as nn increases. We proved this in Section 13.1. The variance in a sum of independent, identical random variables grows proportionally to the number of terms in the sum. So, the standard deviation in SnS_n will grow proportionally to n\sqrt{n}. Using our rules for the variance of a sum, SD[Sn]=np(1p)\text{SD}[S_n] = \sqrt{n p (1 - p)}.

    Notice, the standard deviation grows slower than the mean (O(n1/2)\mathcal{O}(n^{1/2}) vs O(n)\mathcal{O}(n)). So, if your axis adjusts to fit the bulk of the distribution, the distribution may appear to grow narrower as nn increases. Pay attention to the marks on the xx-axis. The distribution is getting wider as nn increases.

  3. The distribution becomes more and more bell-shaped.

The last observation is the remarkable one. It is true for the sum of any Bernoulli random variable with p0p \neq 0 or 1. It would also be true had we used essentially any distribution to sample the XX’s!

The function for the bell-curve you are seeing is given by the normal curve with mean μ=np\mu = np and standard deviation σ=np(1p)\sigma = \sqrt{n p (1- p)}:

12πσe12(sμ)2σ2.\frac{1}{\sqrt{2 \pi \sigma}} e^{-\frac{1}{2}\frac{(s - \mu)^2}{\sigma^2}}.

A random variable with density of the form provided above is a normal random variable. You can experiment with normal random variables by running the code cell below. Note that the normal curve has the same shape as the bell shaped histogram we observed for the binomial with large nn.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Normal");

We proved that the binomial does, in fact, converge to the normal curve at the end of Section 6.4. There, we assumed that p=1/2p = 1/2, and proceeded with a detailed limiting analysis based on Striling’s formula (see Section 6.3).

For now, we will accept the claim based on the observation that the plotted histogram, and plotted curve, look suspiciously similar. We will recapitulate our old proof at the end of this chapter. We save the proof since the specific limiting analysis depends on details of the binomial PMF.

Our goal now is to see that we would arrive at the same normal curve, from any initial distribution. Since we can’t test all initial distributions, we’ll try a different test case.

Uniform Random Variables

The simplest discrete case is XjBernoulli(0.5)X_j \sim \text{Bernoulli}(0.5). This is a uniform distribution on the set {0,1}\{0,1\}. Let’s try the simplest continuous analog, XjUniform([0,1])X_j \sim \text{Uniform}([0,1]). Then Xj[0,1]X_j \in [0,1] for all jj and:

fXj(x)={1 if x[0,1]0 else f_{X_j}(x) = \begin{cases} 1 & \text{ if } x \in [0,1] \\ 0 & \text{ else } \end{cases}

Since we’ll need this density repeatedly, let’s give it a simpler name, fU(x)f_U(x), where UU stands for uniform.

To find the density of SnS_n, work recursively:

S2=X1+X2S3=S2+X3S4=S3+X4Sn+1=Sn+Xn+1.S_2 = X_1 + X_2 \quad \Rightarrow \quad S_3 = S_2 + X_3 \quad \Rightarrow \quad S_4 = S_3 + X_4 \quad \Rightarrow \quad S_{n+1} = S_n + X_{n+1}.

Then, since all of the XX’s are independent, we can derive the distribution of the sum directly by convolution (see Section 10.3):

fS2(s)=[fX1fX2](s)fS3(s)=[fS2fX3](s)fS4(s)=[fS3fX4](s)fSn+1(s)=[fSnfXn+1](s).f_{S_2}(s) = [f_{X_1} * f_{X_2}](s) \quad \Rightarrow \quad f_{S_3}(s) = [f_{S_2} * f_{X_3}](s) \quad \Rightarrow \quad f_{S_4}(s) = [f_{S_3} * f_{X_4}](s) \quad \Rightarrow \quad f_{S_{n+1}}(s) = [f_{S_n} * f_{X_{n+1}}](s).

Since the variables are identical, and uniformly distributed, we are left with the recursion:

fSn+1(s)=[fSn(s)fU(s)]f_{S_{n+1}}(s) = [f_{S_n}(s) * f_U(s)]

with base case:

fS1(s)=fU(s)={1 if x[0,1]0 else .f_{S_1}(s) = f_{U}(s) = \begin{cases} 1 & \text{ if } x \in [0,1] \\ 0 & \text{ else } \end{cases}.

So, to find the density of SnS_n we need to convolve the uniform density with itself nn times.

You ran the first step in discussion two weeks ago:

fS2(s)=[fUfU](s)={s if s[0,1]2s if s[1,2]0 else.f_{S_2}(s) = [f_U * f_U](s) = \begin{cases} s & \text{ if } s \in [0,1] \\ 2 - s & \text{ if } s \in [1,2] \\ 0 \text{ else}. \end{cases}

This density is shaped like a tent.

Sum of Two Uniform.

So, to find S3S_3, we need to compute [fS2fU](s)[f_{S_2} * f_U](s). Since both densities are defined piecewise, the resulting density for S3S_3 will also be defined piecwise. We can work out the boundaries between pieces as follows:

  1. S3<0S_3 < 0 is impossible since S3S_3 is a sum of nonnegative variables.

  2. S3[0,1)S_3 \in [0,1) requires S2<1S_2 < 1 since X3X_3 is nonnegative.

  3. S3[1,2]S_3 \in [1,2] allows any S2[0,2]S_2 \in [0,2].

  4. S3(2,3]S_3 \in (2,3] requires S2>1S_2 > 1 since X3X_3 is less than one.

  5. S3>3S_3 > 3 is impossible since S3S_3 is a sum of three variables, all less than or equal to 1.

The first and fifth observations fix the support, S3[0,3]S_3 \in [0,3]. The middle three observations divide the support into three intervals, [0,1),[1,2],(2,3][0,1), [1,2], (2,3].

So, to find the density function, we should run convolution separately on each interval. To see the explicit convolution, open the drop down below.

After convolving, we find that fS3(s3)f_{S_3}(s_3) is a piecewise function that is:

  1. zero outside [0,3][0,3],

  2. an upward facing parabola centered at zero connecting (0,0)(0,0) to (1,1/2)(1,1/2),

  3. a downward facing parabola centered at 1.5 connecting (1,1/2)(1,1/2) to (2,1/2)(2,1/2),

  4. then an upward facing parabola centered at 3 connecting (2,1/2)(2,1/2) to (3,0)(3,0).

Reading in order, the density function is constant and zero, curves up, curves down, curves up, then is constant at zero again. The result is a symmetric bell centered at 1.5. The central location is sensible since E[S3]=3E[Xj]=3×12=1.5\mathbb{E}[S_3] = 3 \mathbb{E}[X_j] = 3 \times \frac{1}{2} = 1.5.

Sum of Three Uniform.

Clearly, this procedure is too involved to continue by hand. Nevertheless, the graphical trend is clear:

  1. S1S_1 is drawn from a box shaped distribution

  2. S2S_2 is drawn from a tent shaped distribution

  3. S3S_3 is drawn from a bell shaped distribution (with parabolic pieces)

It should not be too surprising that, if we keep going, the resulting distribution becomes a smoother and smoother bell. Here are the first four densities:

Sum of Four Uniform.

The last density is very bell shaped! It is a symmetric, piecewise density, centered at 2, with three pieces, all cubic functions of s3s_3.

The limiting bell is, once again, a normal curve.

The Theorem

First, let’s recall the definition of a normal random variable (see Sections 5.4 and Section 6.4).

To make this statement formal, we need to adjust it in two ways. First, we need to define what we mean by convergence.

Second, we need to change to standard variables. Neither the sum, nor the sample average admit sensible limiting distributions since the expectation of the sum diverges, while the variance of the sample average converges to zero.

Go back to the binomial example. The distribution of SnS_n slides rightward while widening as nn increases. So, SnS_n doesn’t approach a random variable drawn from any fixed limiting distribution.

The sample average, Xˉn\bar{X}_n, behaves more nicely. Its mean stays planted at μ\mu for all nn. However, as we increase nn, the distribution of the sample average concentrates, so SD[Xˉn]\text{SD}[\bar{X}_n] converges to zero. Therefore, if Xˉn\bar{X}_n approaches a limiting variable, the limiting variable is deterministic, limnXˉn=μ\lim_{n \rightarrow \infty} \bar{X}_n = \mu. So, Xˉn\bar{X}_n does not have an informative limiting distribution either. It’s limiting distribution has infinite density at μ\mu and is zero everywhere else.

To get a sensible limiting distribution we need to find a transformation of SnS_n and Xˉn\bar{X}_n whose mean and standard deviation converge to sensible values as nn diverges.

Recall that:

By construction, ZnZ_n is a standard variable, so:

E[Zn]=0,SD[Zn]=1.\mathbb{E}[Z_n] = 0, \quad \text{SD}[Z_n] = 1.

Application: Frequencies Estimate Chances

Suppose that {Xj}j=1n\{X_j\}_{j=1}^n are independent, identically sampled Bernoulli random variables with unknown success probability pp. Then, the maximum likelihood estimator for pp is the sample average:

p^(X1,X2,...,Xn)=1nj=1nXj\hat{p}(X_1,X_2,...,X_n) = \frac{1}{n} \sum_{j=1}^n X_j

The sample average is the observed frequency of success in the nn trials.

If nn is large, then p^\hat{p} is a sample average of a large collection of independent, identical samples. So, it is approximately normally distributed with mean E[Xj]=p\mathbb{E}[X_j] = p and variance 1nVar[Xj]=1np(1p)\frac{1}{n}\text{Var}[X_j] = \frac{1}{n} p (1 - p).

So, as long as nn is large, the probability that the observed frequency differs from the true success probability by more than kk standard deviations is, approximately:

Pr(p^p>kSD[p^])=Pr(p^pSD[p^]>k)=Pr(Zn>k)Pr(Z>k)\text{Pr}(|\hat{p} - p| > k \text{SD}[\hat{p}]) = \text{Pr}\left( \frac{|\hat{p} - p|}{\text{SD}[\hat{p}]} > k \right) = \text{Pr}(|Z_n| > k) \approx \text{Pr}(Z > k)

where ZZ is a standard normal random variable.

The probabilities that a standard normal random variable is larger, in magnitude, than an integer kk are well-known. We’ll compare them to the upper bounds produced by Chebyshev’s inequality (see Section 13.3)

kk123
Chebyshev11/41/41/91/9
CLT0.320.05<0.01< 0.01

Notice how much smaller the tail probabilities suggested by the CLT are than the upper bounds provided by Chebyshev. It follows that, when the assumptions of the CLT apply, the tail probabilities for a sample sum or sample average will often be much smaller than the Chebyshev bounds.

So, for sufficiently large nn, the chance that the observed frequency is within 2 standard deviations of the true chance converges to about 95%, and the chance it is within three standard deviations of the mean converges to about 99.7%:

limnPr(p^p>p(1p)/n)=0.32...limnPr(p^p>2p(1p)/n)=0.95...limnPr(p^p>2p(1p)/n)=0.997...\begin{aligned} & \lim_{n \rightarrow \infty}\text{Pr}(|\hat{p} - p| > p (1 - p)/\sqrt{n}) = 0.32... \\ & \lim_{n \rightarrow \infty}\text{Pr}(|\hat{p} - p| > 2 p (1 - p)/\sqrt{n}) = 0.95... \\ & \lim_{n \rightarrow \infty}\text{Pr}(|\hat{p} - p| > 2 p (1 - p)/\sqrt{n}) = 0.997... \end{aligned}

This example illustrates the power of the CLT. The CLT allows us to approximate exact tail probabilities in the limit of a large sample size, regardless the initial distribution that produces the samples! It also explains the folk wisdom, “When estimating an unknown quantity, your result is probably accurate within ± 2 standard deviations.

Examples

We are not equipped to prove the CLT for arbitrary distributions. If you’d like to see the general proof, take a second course in probability (e.g. Data 140 or Stat 134).

We will satisfy ourselves with some specific cases where we can perform the limiting analysis directly. We completed the first case in Section 6.4.

Bernoulli/Binomial Random Variables

Let’s start with the simplest discrete example.

Suppose that {Xj}j=1nBernoulli(1/2)\{X_j\}_{j=1}^n \sim \text{Bernoulli}(1/2). Then SnBinomial(n,1/2)S_n \sim \text{Binomial}(n,1/2) so:

PMF(x)=(nx)(12)x(112)nx=(nx)(12)x(12)nx=(nx)(12)n.\begin{aligned} \text{PMF}(x) & = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{1}{2} \right)^x \left(1 - \frac{1}{2} \right)^{n - x} \\ & = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{1}{2} \right)^x \left(\frac{1}{2} \right)^{n - x} \\ & = \left( \begin{array}{c} n \\ x \end{array} \right) \left(\frac{1}{2} \right)^n. \end{aligned}

The matching standardized variable is:

Zn=Xnn×0.5n×0.52=1n(2Xnn).Z_n = \frac{X_n - n \times 0.5}{\sqrt{n \times 0.5^2}} = \frac{1}{\sqrt{n}}(2 X_n - n).

The PMF of ZnZ_n is:

Pr(Zn=z)=Pr(1n(2Xnn)=z)=Pr(Xn=n2(1+1nz)).\text{Pr}(Z_n = z) = \text{Pr} \left( \frac{1}{\sqrt{n}}(2 X_n - n) = z \right) = \text{Pr}\left(X_n = \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \right).

So, using the formula for the Binomial PMF:

Pr(Zn=z)=(12)n(nn2(1+1nz)).\text{Pr}(Z_n = z) = \left(\frac{1}{2} \right)^n \left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right).

To simplify, first expand the Binomial coefficient as a ratio of factorials:

(nn2(1+1nz))=n!(n2(1+1nz))!×(n2(11nz))!\left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right) = \frac{n!}{\left(\frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \right)! \times \left(\frac{n}{2} \left(1 - \frac{1}{\sqrt{n}} z \right) \right)!}

Next, apply Stirling’s approximation (see Section 6.3) to approximate each term:

n!2πe(ne)n+12,(n2(1±1nz))!2πe(n2e(1±1nz))n2(1±1nz)+12\begin{aligned} & n! \simeq \sqrt{2 \pi e} \left( \frac{n}{e} \right)^{n + \frac{1}{2}}, \\ & \left(\frac{n}{2} \left(1 \pm \frac{1}{\sqrt{n}} z \right) \right)! \simeq \sqrt{2 \pi e} \left( \frac{n}{2 e} \left(1 \pm \frac{1}{\sqrt{n}} z \right) \right)^{\frac{n}{2} \left(1 \pm \frac{1}{\sqrt{n}} z \right) + \frac{1}{2}} \end{aligned}

Substituting each term for its approximation, then cancelling like terms, gives:

Pr(Zn=z)=(12)n(nn2(1+1nz))22π(nz2)(1+1nz)n2(1+1nz)(11nz)n2(11nz).\begin{aligned} \text{Pr}(Z_n = z) &= \left( \frac{1}{2}\right)^n \left( \begin{array}{c} n \\ \frac{n}{2} \left(1 + \frac{1}{\sqrt{n}} z \right) \end{array} \right) \\ & \simeq \frac{2}{\sqrt{2 \pi (n - z^2)}} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)}. \end{aligned}

When nn is large, nz2n - z^2 will be dominated by nn. Therefore, we can make the approximation:

Pr(Zn=z)2n12π(1+1nz)n2(1+1nz)(11nz)n2(11nz).\text{Pr}(Z_n = z) \simeq \frac{2}{\sqrt{n}} \frac{1}{\sqrt{2 \pi}} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)}.

The constant 2/n2/\sqrt{n} out front is the interval between successive possible values of ZnZ_n. We’ll call this gap Δzn\Delta z_n. To convert to a density function, we will divide out by Δzn\Delta z_n. Dividing by Δzn\Delta z_n cancels the 2/n2/\sqrt{n} term. For details, check the dropdown below.

Now that we’ve handled the normalizing constants, focus on the functional form:

(1+1nz)n2(1+1nz)(11nz)n2(11nz)=[(1+1nz)×(11nz)]n2(11nz1+1nz)z2n=(11nz2)n2(11nz1+1nz)z2n\begin{aligned} \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 + \frac{1}{\sqrt{n}} z \right)} \left(1 - \frac{1}{\sqrt{n}} z \right)^{-\frac{n}{2}\left(1 - \frac{1}{\sqrt{n}} z \right)} & = \left[\left(1 + \frac{1}{\sqrt{n}} z \right) \times \left(1 - \frac{1}{\sqrt{n}} z \right) \right]^{-\frac{n}{2}} \left(\frac{1 - \frac{1}{\sqrt{n}} z}{1 + \frac{1}{\sqrt{n}} z} \right)^{\frac{z}{2} \sqrt{n}} \\ & = \left(1 - \frac{1}{n} z^2 \right)^{-\frac{n}{2}} \left(\frac{1 - \frac{1}{\sqrt{n}} z}{1 + \frac{1}{\sqrt{n}} z} \right)^{\frac{z}{2} \sqrt{n}} \end{aligned}

To take the limit as nn goes to infinity, express each term in the form used for the limiting definition of the exponential from Section 6.2:

(11nz2)n2=[(11nz2)n]12[ez2]12=e12z2(11nz)z2n=[(11nz)z2n]z2[ez]z2=e12z2(1+1nz)z2n=[(1+1nz)z2n]z2[ez]z2=e12z2\begin{aligned} & \left(1 - \frac{1}{n} z^2 \right)^{-\frac{n}{2}} = \left[\left(1 - \frac{1}{n} z^2 \right)^n \right]^{-\frac{1}{2}} \simeq \left[e^{-z^2} \right]^{-\frac{1}{2}} = e^{\frac{1}{2} z^2} \\ & \left(1 - \frac{1}{\sqrt{n}} z \right)^{\frac{z}{2} \sqrt{n}} = \left[\left(1 - \frac{1}{\sqrt{n}} z \right)^{\frac{z}{2} \sqrt{n}} \right]^{\frac{z}{2}} \simeq \left[e^{-z} \right]^{\frac{z}{2}} = e^{-\frac{1}{2} z^2} \\ & \left(1 + \frac{1}{\sqrt{n}} z \right)^{-\frac{z}{2} \sqrt{n}} = \left[\left(1 + \frac{1}{\sqrt{n}} z \right)^{\frac{z}{2} \sqrt{n}} \right]^{-\frac{z}{2}} \simeq \left[e^{z} \right]^{-\frac{z}{2}} = e^{-\frac{1}{2} z^2} \\ \end{aligned}

Therefore:

limn(11nz2)n2(11nz1+1nz)z2n=e12z2×e12z2×e12z2=e12z2\lim_{n \rightarrow \infty} \left(1 - \frac{1}{n} z^2 \right)^{-\frac{n}{2}} \left(\frac{1 - \frac{1}{\sqrt{n}} z}{1 + \frac{1}{\sqrt{n}} z} \right)^{\frac{z}{2} \sqrt{n}} = e^{\frac{1}{2} z^2} \times e^{-\frac{1}{2} z^2} \times e^{-\frac{1}{2} z^2} = e^{-\frac{1}{2} z^2}

So:

limn1ΔznPr(Zn=z)=12πe12z2.\lim_{n \rightarrow \infty} \frac{1}{\Delta z_n} \text{Pr}(Z_n = z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}.

The expression on the left is a PDF since Δzn\Delta z_n converges to zero as nn diverges. The expression on the right is the standard normal density function. Therefore:

If you like challenging algebra exercises, try the same limiting analysis for generic pp. The steps are all the same, but you will need to pay careful attention when you standardize and when you apply Striling’s formula.

Exponential Random Variables

Let’s try a continuous example. The uniform case is hard, so we’ll pick the next simplest example, exponential random variables.

Suppose that {Xj}j=1n\{X_j\}_{j=1}^n are drawn independently and identically from an exponential distribution with parameter λ\lambda.

In this case, we can work out the distribution of the sum SnS_n exactly using convolution.

Run the code cell below to visualize the convolution of two exponential random variables. Select exponential for both distributions, then match their parameters.

%matplotlib inline
from utils_convolution import show_convolution

show_convolution()

The distribution you’ve produced has density function:

fS2(s)=λ2seλs.f_{S_2}(s) = \lambda^2 s e^{-\lambda s}.

We proved this result in Section 10.3.

You worked out the result for general nn on homework 12. For the necessary work, check the matching solutions. The sum is gamma distributed with shape parameter nn and rate parameter λ\lambda:

Sn0,fSn(s)=λn(n1)!sn1eλs.S_n \geq 0, \quad f_{S_n}(s) = \frac{\lambda^n}{(n-1)!} s^{n-1} e^{-\lambda s}.

Experiment with gamma distributions by running the cell below. Start with shape = 1. This returns an exponential random variable. Then increase shape to 2. This produces the gamma distribution of S2S_2. Then gradually increase the shape parameter. As you do, you will see the distribution shift to the right and become gradually more bell shaped. It remains skewed, though the skew decreases as the shape parameter increases. The mean of the distribution, which lies to the right of its mode, increases linearly in nn.

from utils_dist import run_distribution_explorer

run_distribution_explorer("Gamma");

Next, convert to a standard variable. To standardize, we need the expectation and variance in the sum. Both follow from our usual rules for expectations and variances of sums (see Section 13.1). The expectation and variance of an exponential random variable are 1/λ1/\lambda and 1/λ21/\lambda^2 (see Section 7.1).

So:

Zn=Snn/λn/λ=λnSnn.Z_n = \frac{S_n - n/\lambda}{\sqrt{n}/\lambda} = \frac{\lambda}{\sqrt{n}} S_n - \sqrt{n}.

To find the density function of ZnZ_n, use the linear change of density formula from Section 7.2. Set h(s)=λnsnh(s) = \frac{\lambda}{\sqrt{n}} s - \sqrt{n}. Then h(s)=λnh'(s) = \frac{\lambda}{\sqrt{n}} and h1(z)=nλ(z+n)h^{-1}(z) = \frac{\sqrt{n}}{\lambda}(z + \sqrt{n}). Then:

fZn(z)=fSn(h1(z))1λ/n=nλλn(n1)!(nλ(z+n))n1eλnλ(z+n)f_{Z_n}(z) = f_{S_n}(h^{-1}(z)) \frac{1}{\lambda/\sqrt{n}} = \frac{\sqrt{n}}{\lambda} \frac{\lambda^n}{(n-1)!} \left(\frac{\sqrt{n}}{\lambda} (z + \sqrt{n}) \right)^{n-1} e^{-\lambda \frac{\sqrt{n}}{\lambda} (z + \sqrt{n})}

Simplifying:

fZn(z)=nnen(n1)!(n+z)n1enz=nnen(n1)!(n(1+zn))n1enz=n2n1en(n1)!(1+zn)n1enz=nn1/2en(n1)!(1+zn)n1enz.\begin{aligned} f_{Z_n}(z) & = \frac{\sqrt{n}^n e^{-n}}{(n-1)!}(\sqrt{n} + z)^{n-1} e^{-\sqrt{n} z} \\ & = \frac{\sqrt{n}^n e^{-n}}{(n-1)!}\left(\sqrt{n} \left(1 + \frac{z}{\sqrt{n}} \right) \right)^{n-1} e^{-\sqrt{n} z} \\ & = \frac{\sqrt{n}^{2n - 1} e^{-n}}{(n-1)!} \left(1 + \frac{z}{\sqrt{n}}\right)^{n-1} e^{-\sqrt{n} z} \\ & = \frac{n^{n - 1/2} e^{-n}}{(n-1)!} \left(1 + \frac{z}{\sqrt{n}}\right)^{n-1} e^{-\sqrt{n} z}. \end{aligned}

In the limit as nn diverges:

limnnn1/2en(n1)!=12π.\lim_{n \rightarrow \infty} \frac{n^{n - 1/2} e^{-n}}{(n-1)!} = \frac{1}{\sqrt{2 \pi}}.

Thus, the normalization constant converges to the normalization constant of the standard normal, 1/2π1/\sqrt{2 \pi}. Open the drop-down below for the step-by-step analysis using tools from Section 6.

Now that we’ve worked out the limit of the normalization constant, we need only work out the limit of the functional form:

limnfZn(z)=12π(limn(1+zn)n1enz).\lim_{n \rightarrow \infty} f_{Z_n}(z) = \frac{1}{\sqrt{2 \pi}} \left(\lim_{n \rightarrow \infty} \left(1 + \frac{z}{\sqrt{n}}\right)^{n-1} e^{-\sqrt{n} z} \right).

In this case it will be easier to work out the limit of the logarithm of the functional form. Since the log is continuous, the limit of the logarithm is the logarithm of the limit. So, we can find the original limit by taking the limit of the logarithm, then exponentiating our answer.

limnlog((1+zn)n1enz)=limn(n1)log(1+zn)nz.\lim_{n \rightarrow \infty} \log\left(\left(1 + \frac{z}{\sqrt{n}}\right)^{n-1} e^{-\sqrt{n} z} \right) = \lim_{n \rightarrow \infty} (n-1) \log\left(1 + \frac{z}{\sqrt{n}} \right) - \sqrt{n} z.

We can simplify the first term since limn0z/n=0\lim_{n \rightarrow 0} z/\sqrt{n} = 0 so:

limnlog(1+zn)=log(0)=0.- \lim_{n \rightarrow \infty} \log\left(1 + \frac{z}{\sqrt{n}} \right) = -\log(0) = 0.

Therefore:

limn(n1)log(1+zn)nz=limnnlog(1+zn)nz.\lim_{n \rightarrow \infty} (n-1) \log\left(1 + \frac{z}{\sqrt{n}} \right) - \sqrt{n} z = \lim_{n \rightarrow \infty} n \log\left(1 + \frac{z}{\sqrt{n}} \right) - \sqrt{n} z.

Since z/nz/\sqrt{n} is small when nn is large, we can replace the logarithm with its Taylor expansion near zero (see Section 6.1). In this case we will need to go past our usual linear approximation to the log and include quadratic terms:

log(1+x)x12x2+O(x3).\log(1 + x) \simeq x - \frac{1}{2} x^2 + \mathcal{O}(x^3).

Therefore:

log(1+zn)zn12z2n+O(n3/2).\log\left(1 + \frac{z}{\sqrt{n}} \right) \simeq \frac{z}{\sqrt{n}} - \frac{1}{2} \frac{z^2}{n} + \mathcal{O}(n^{-3/2}).

Multiplying by nn:

nlog(1+zn)nz12z2+O(n1/2).n \log\left(1 + \frac{z}{\sqrt{n}} \right) \simeq \sqrt{n} z - \frac{1}{2} z^2 + \mathcal{O}(n^{-1/2}).

Therefore:

nlog(1+zn)nz=12z2+O(n1/2).n \log\left(1 + \frac{z}{\sqrt{n}}\right) - \sqrt{n} z = - \frac{1}{2} z^2 + \mathcal{O}(n^{-1/2}).

Anything proportional to n1/2n^{-1/2} converges to zero as nn diverges so:

limnnlog(1+zn)nz=12z2.\lim_{n \rightarrow \infty }n \log\left(1 + \frac{z}{\sqrt{n}}\right) - \sqrt{n} z = -\frac{1}{2} z^2.

So, working back:

limn(1+zn)n1enz=e12z2.\lim_{n \rightarrow \infty} \left(1 + \frac{z}{\sqrt{n}}\right)^{n-1} e^{-\sqrt{n} z} = e^{-\frac{1}{2} z^2}.

This is the functional form for a standard normal density!

Putting the functional form and the normalization constant together:

limnfZn(z)=12πe12z2=fZ(z)\lim_{n \rightarrow \infty} f_{Z_n}(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2} = f_Z(z)

where ZZ is a standard normal variable. Therefore limnZn=Z\lim_{n \rightarrow \infty} Z_n = Z, a standard normal random variable!