Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

13.5 Chapter Summary

Interactive Tools:

  1. Law of Large Numbers Interactive - Use this interactive to watch sample averages over large data sets converge to expectations.

  2. Convolution Visualizer - Use this tool to visualize convolutions.

  3. Distribution Plotter. - Use this tool to visualize densities and to experiment with limiting distributions.

Variance of Sums and Averages

All definitions and results are available in Section 13.1.

  1. The variance in a sum of nn random variables, Sn=j=1nXjS_n = \sum_{j=1}^n X_j, is the sum of all the pairwise covariances:

    Var[Sn]=i=1nj=1nCov[Xi,Xj]=j=1nVar[Xj]+2i=1nj=i+1nCov[Xi,Xj].\text{Var}[S_n] = \sum_{i=1}^n \sum_{j=1}^n \text{Cov}[X_i,X_j] = \sum_{j=1}^n \text{Var}[X_j] + 2 \sum_{i=1}^n \sum_{j = i+1}^n \text{Cov}[X_i,X_j].
    • Positive associations increase the variance of the sum, negative associations decrease the variance of the sum.

    • If the variables are uncorrelated, as when they are independent, then the variance in the sum is the sum of the variances:

      Var[Sn]=j=1nVar[Xj].\text{Var}[S_n] = \sum_{j=1}^n \text{Var}[X_j].
    • If all of the variables have the same variance and are uncorrelated, then:

      Var[Sn]=nVar[X1].\text{Var}[S_n] = n \text{Var}[X_1].
  2. The variance in the sample average of nn random variables, Xˉn=1nj=1nXj=1nSn\bar{X}_n = \frac{1}{n} \sum_{j=1}^n X_j = \frac{1}{n} S_n is the average of all the pairwise covariances:

    Var[Xˉn]=1n2i=1nj=1nCov[Xi,Xj].\text{Var}[\bar{X}_n] = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n \text{Cov}[X_i,X_j].
    • If the variables are uncorrelated, as when they are independent, then the variance in the sample average:

      Var[Sn]=1n2j=1nVar[Xj].\text{Var}[S_n] = \frac{1}{n^2}\sum_{j=1}^n \text{Var}[X_j].
    • If all of the variables have the same variance and are uncorrelated, then:

      Var[Sn]=1nVar[X1].\text{Var}[S_n] = \frac{1}{n} \text{Var}[X_1].
    • So, the standard deviation of a sample average of nn independent, identical, random variables is SD[Xˉn]=1nSD[X1]\text{SD}[\bar{X}_n] = \frac{1}{\sqrt{n}} \text{SD}[X_1].

  3. As a result, the sample averages of independent, identical random variables are consistent estimators for their expectation, in the sense that, the expected square error in the estimate converges to zero as the number of samples diverges.

    • The same result holds provided the random variables have a convergent mean, and are sufficiently uncorrelated. It does not require independent or identical random variables.

Tail Bounds

All definitions and results are available in Section 13.2.

  1. Markov’s Inequality: If YY is a nonnegative random variable:

    Pr(Y>y)E[Y]y\text{Pr}(Y > y_*) \leq \frac{\mathbb{E}[Y]}{y_*}

    for any y>0y_* > 0.

  2. Chebyshev’s Inequality: If YY is a random variable:

    Pr(YE[Y]>ϵ)Var[Y]ϵ2.\text{Pr}(|Y - \mathbb{E}[Y]| > \epsilon) \leq \frac{\text{Var}[Y]}{\epsilon^2}.

The Law of Large Numbers

All definitions and results are available in Section 13.3.

  1. The (Weak) Law of Large Numbers: If {Xj}j=1n\{X_j\}_{j=1}^n is a sequence of independent, identically distributed random variables with mean μ\mu, then:

    Pr(Xˉnμ>ϵ)n0\text{Pr}(|\bar{X}_n - \mu| > \epsilon) \xrightarrow{n \rightarrow \infty} 0

    at rate O(n1)\mathcal{O}(n^{-1}) or faster for all ϵ>0\epsilon > 0.

    • In other words, the distribution of the sample average concentrates about the underlying expectation.

    • We proved the weak law using Chebyshev’s inequality. The same statement holds anytime the variance in the sample average converges to zero in the limit as nn diverges, so may also hold for sufficiently uncorrelated random variables.

  2. Observed frequencies are sample averages of indicators. The weak law implies that, the observed frequency of an event, in nn identical, independent repetitions of the process, are guaranteed to converge to the underlying chance of the event, if such a chance exists. Thus, the law of large numbers shows that, if chances exist, then they must be measurable using long run frequencies.

The Central Limit Theorem

All definitions and results are available in Section 13.4.

  1. Normal Random Variables: A random variable XNormal(μ,σ2)X \sim \text{Normal}(\mu,\sigma^2) is normally distributed with mean μ\mu and variance σ2\sigma^2 if it can take on any real value and has density function:

    fX(x)=12πσe12(xμσ)2.f_{X}(x) = \frac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma} \right)^2}.
    • A random variable ZZ is drawn from a standard normal distribution if μ=0\mu = 0 and σ=1\sigma = 1. Then:

      fZ(z)=12πe12z2.f_{Z}(z) = \frac{1}{\sqrt{2 \pi} } e^{-\frac{1}{2}z^2}.
  2. The Central Limit Theorem (CLT): If {Xj}j=1n\{X_j\}_{j=1}^n are independent, identically distributed random variables with finite mean μ\mu and finite variance σ2\sigma^2, then the standardized sample average:

    Zn=XˉnE[Xˉn]SD[Xˉn]=SnE[Sn]SD[Sn]Z_n = \frac{\bar{X}_n - \mathbb{E}[\bar{X}_n]}{\text{SD}[\bar{X}_n]} = \frac{S_n - \mathbb{E}[S_n]}{\text{SD}[S_n]}

    converges, in distribution, to a standard normal random variable:

    limn]inftyZn=ZNormal(0,1)\lim_{n \rightarrow ]infty} Z_n = Z \sim \text{Normal}(0,1)

    regardless the distribution used to sample the XX’s.

    • As a result, sums and sample averages of many independent, identical random variables are approximately normally distributed.

    • In particular, SnS_n is approximately drawn from a Normal(nμ,nσ2)\text{Normal}(n \mu, n \sigma^2) distribution and Xˉn\bar{X}_n is approximately drawn from a Normal(μ,σ2/n)\text{Normal}(\mu,\sigma^2/n) distribution.