Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

13.3 The Law of Large Numbers

We can now prove one of the essential results in probability: the chance that a sample average differs from its expectation by any fixed threshold converges to zero as the number of samples diverges, no matter how small the threshold, as long as the samples are sufficiently uncorrelated. This result is the weak law of large numbers.

In particular, if the samples are drawn independently and identically then μn=E[Xj]=μ\mu_n = \mathbb{E}[X_j] = \mu and:

Var[Xˉn]=1nVar[X1]\text{Var}[\bar{X}_n] = \frac{1}{n} \text{Var}[X_1]

Then, the chance that Xˉn\bar{X}_n differs from its expectation by more than any fixed tolerance ϵ\epsilon converges to zero at slowest at rate O(n1)\mathcal{O}(n^{-1}):

Pr(Xˉnμ>ϵ)1nVar[X1]ϵ2.\text{Pr}(|\bar{X}_n - \mu| > \epsilon) \leq \frac{1}{n} \frac{\text{Var}[X_1]}{\epsilon^2}.

You can experiment with the weak law of large numbers using this Law of Large Numbers Interactive. You can choose the distribution that produces XX, then track Xˉn\bar{X}_n as nn increases. You can also choose the tolerance (window half-width) about the true expected value. You will see that, as nn increases, the sample average eventually settles within the tolerance of its expectation. The panel on the right shows the empirical distribution of the sampled values {Xj}j=1n\{X_j\}_{j=1}^n. As nn increases, the empirical distribution converges to the underlying distribution, so its expectation (the sample average) converges to the underlying expected value.

Chances are Long Run Frequencies

In Section 1.2 we defined chances as long run frequencies. The law of large numbers makes this definition more concrete.

Let EE be some event. Let IjI_j be an indicator for the event on the jthj^{th} trial of a sequence of nn independent and identical repetitions of the random process. Then, the frequency of the event over nn trials is the sample average of the indicators:

Fr(E)=1nj=1nIj=Iˉn.\text{Fr}(E) = \frac{1}{n} \sum_{j=1}^n I_j = \bar{I}_n.

The variance in an indicator random variable is p(1p)1/4p (1 - p) \leq 1/4. So, the probability that an empirical frequency differs from its expectation can be bounded using Chebyshev’s inequality. The expected value of an indicator is the success probability for the corresponding event so:

Pr(Fr(E)Pr(E)>ϵ)1n1/4ϵ2=1n1(2ϵ)2.\text{Pr}(|\text{Fr}(E) - \text{Pr}(E)| > \epsilon) \leq \frac{1}{n} \frac{1/4}{\epsilon^2} = \frac{1}{n} \frac{1}{(2 \epsilon)^2}.

So:

limnPr(Fr(E)Pr(E)>ϵ)=0\lim_{n \rightarrow \infty} \text{Pr}(|\text{Fr}(E) - \text{Pr}(E)| > \epsilon) = 0

for any ϵ>0\epsilon > 0 and any event EE.

In other words, the probability that the observed frequency with which an event occurs in a sequence of independent, identical repetitions, differs from the chance of the event by more than some tolerance, converges to zero at rate O(1/n)\mathcal{O}(1/n) or faster, no matter the tolerance. Thus, observed frequencies converge to underlying chances!