The Law of Large Numbers - Data 89 Course Notes

We can now prove one of the essential results in probability: the chance that a sample average differs from its expectation by any fixed threshold converges to zero as the number of samples diverges, no matter how small the threshold, as long as the samples are sufficiently uncorrelated. This result is the weak law of large numbers.

The Weak Law of Large Numbers

Suppose that $\{X_j\}_{j=1}^n$ is a sequence of random variables such that:

\frac{1}{n} \sum_{j=1}^n \mathbb{E}[X_j] = \mu

(1)

for some $\mu < \infty$ and where:

\lim_{n \rightarrow \infty} \frac{1}{n^2} \sum_{i=1}^j \sum_{j=1}^n \text{Cov}[X_i,X_j] = 0

(2)

as when the samples are drawn independently and identically. Let

$\bar{X}_n = \frac{1}{n} \sum_{j=1}^n X_j$

denote the sample average. Then:

\lim_{n \rightarrow \infty} \text{Pr}(|\bar{X}_n - \mu| > \epsilon) = 0

(3)

for any $\epsilon > 0$ .

Proof

The proof follows immediately from the main results in Section 13.1 and 13.2.

Let:

\mu_n = \frac{1}{n} \sum_{j=1}^n \mathbb{E}[X_j] = \mathbb{E}[\bar{X}_n].

(4)

Then, apply Chebyshev’s inequality:

\text{Pr}(|\bar{X}_n - \mu_n| > \epsilon) = \text{Pr}(|\bar{X}_n - \mathbb{E}[\bar{X}_n]| > \epsilon) \leq \frac{\text{Var}[\bar{X}_n]}{\epsilon^2}.

(5)

The variance in the sample average is the average pairwise covariance for a uniformly selected pair of samples:

\text{Var}[\bar{X}_n] = \frac{1}{n^2} \sum_{i=1}^j \sum_{j=1}^n \text{Cov}[X_i,X_j].

(6)

So, if the average covariance converges to zero, then the variance does as well. If the variance converges to zero then:

\begin{aligned} \lim_{n \rightarrow \infty} \text{Pr}(|\bar{X}_n - \mu_n| > \epsilon) \leq \frac{\lim_{n \rightarrow \infty} \text{Var}[\bar{X}_n]}{\epsilon^2} = \frac{0}{\epsilon^2} = 0. \end{aligned}

(7)

Since $\mu_n$ converges to $\mu$ , it follows that, the chance $\bar{X}_n$ differs from $\mu$ by more than $\epsilon$ converges to zero as $n$ diverges, for any fixed $\epsilon > 0$ .

In particular, if the samples are drawn independently and identically then $\mu_n = \mathbb{E}[X_j] = \mu$ and:

\text{Var}[\bar{X}_n] = \frac{1}{n} \text{Var}[X_1]

(8)

Then, the chance that $\bar{X}_n$ differs from its expectation by more than any fixed tolerance $\epsilon$ converges to zero at slowest at rate $\mathcal{O}(n^{-1})$ :

\text{Pr}(|\bar{X}_n - \mu| > \epsilon) \leq \frac{1}{n} \frac{\text{Var}[X_1]}{\epsilon^2}.

(9)

Pay Attention to the Variance in each Sample

The weak law of large numbers offers an asymptotic guarantee that holds in a limit of infinitely large $n$ . In practice we can never collect infinitely many data points, so need finite sample guarantees. The tail bound we used to prove the weak law of large numbers offers a finite sample guarantee: If $\{X_j\}_{j=1}^n$ are drawn independently and identically from a distribution with mean $\mu$ and variance $\sigma^2$ then:

\text{Pr}(|\bar{X}_n - \mu| > \epsilon) \leq \frac{1}{n} \frac{\sigma^2}{\epsilon^2}.

(10)

This bound holds for finite $n$ .

Notice, the size of the bound depends on the variance $\sigma^2$ . The larger this variance, the slower it decays. This is sensible behavior. If the individual samples are highly variable, then it will take more samples before the sample average is stable.

So, don’t forget the variance in each individual sample. Don’t assume that a sample average is a reliable estimator unless you are sure that $n$ is large enough to make $\sigma^2/n$ small.

This point is often forgotten when expected values are used to assign prices to objects whose value is random (e.g. a lottery ticket). The expected value is only a reliable predictor of a sample average if we use enough samples to tame the variance in the individual samples. If you only buy one lottery ticket, then the expected value has very little to do with its worth. The expected value is not a good measure of its worth until $n$ is large enough to make $\sigma^2/n$ small. Until then, the sample average will remain highly variable.

You can experiment with the weak law of large numbers using this Law of Large Numbers Interactive. You can choose the distribution that produces $X$ , then track $\bar{X}_n$ as $n$ increases. You can also choose the tolerance (window half-width) about the true expected value. You will see that, as $n$ increases, the sample average eventually settles within the tolerance of its expectation. The panel on the right shows the empirical distribution of the sampled values $\{X_j\}_{j=1}^n$ . As $n$ increases, the empirical distribution converges to the underlying distribution, so its expectation (the sample average) converges to the underlying expected value.

Chances are Long Run Frequencies¶

In Section 1.2 we defined chances as long run frequencies. The law of large numbers makes this definition more concrete.

Let $E$ be some event. Let $I_j$ be an indicator for the event on the $j^{th}$ trial of a sequence of $n$ independent and identical repetitions of the random process. Then, the frequency of the event over $n$ trials is the sample average of the indicators:

\text{Fr}(E) = \frac{1}{n} \sum_{j=1}^n I_j = \bar{I}_n.

(11)

The variance in an indicator random variable is $p (1 - p) \leq 1/4$ . So, the probability that an empirical frequency differs from its expectation can be bounded using Chebyshev’s inequality. The expected value of an indicator is the success probability for the corresponding event so:

\text{Pr}(|\text{Fr}(E) - \text{Pr}(E)| > \epsilon) \leq \frac{1}{n} \frac{1/4}{\epsilon^2} = \frac{1}{n} \frac{1}{(2 \epsilon)^2}.

(12)

So:

\lim_{n \rightarrow \infty} \text{Pr}(|\text{Fr}(E) - \text{Pr}(E)| > \epsilon) = 0

(13)

for any $\epsilon > 0$ and any event $E$ .

In other words, the probability that the observed frequency with which an event occurs in a sequence of independent, identical repetitions, differs from the chance of the event by more than some tolerance, converges to zero at rate $\mathcal{O}(1/n)$ or faster, no matter the tolerance. Thus, observed frequencies converge to underlying chances!

13.3 The Law of Large Numbers

Chances are Long Run Frequencies¶