Variance - Data 89 Course Notes

Expected values summarize the position of a distribution on the number line with a central value. Often it is important to summarize both the position and the spread in a distribution. In terms of a random variable, we often want to return some best prediction (e.g. an expected value) plus or minus some anticipated degree of variation.

This chapter will focus on variance and standard deviation. Standard deviation and variance both measure the degree of variability in a random variable. Equivalently, they are summary measures of the “breadth”, “width”, or “spread” in a distribution.

Definition¶

To summarize the spread in a distribution, we will start by centering it.

Centering is often this first step in standardizing a random variable.

Next, we will try to measure the average deviation in $X$ by measuring the average size of $X_0$ . If $X_0$ is typically small, then most samples are near the expected value, so the distribution can’t spread much. If, on the other hand, $X_0$ is typically large, then most samples are far from their expected value, so the distribution must be very broad.

To measure the average deviation, we could try $\mathbb{E}[X_0]$ . To keep our notation concise, let $\bar{x} = \mathbb{E}[X]$ . Then:

\text{E}[X_0] = \mathbb{E}[X - \bar{x}]

(1)

Next, by the translation property (linearity) of expectation:

\text{E}[X_0] = \mathbb{E}[X - \bar{x}] = \mathbb{E}[X] - \bar{x} = \bar{x} - \bar{x} = 0.

(2)

So, the expected value of $X_0$ is always zero. This is not a surprise, since $X_0$ was a centered variable.

To measure the average size of $X_0$ , we should find the expected value of some function of $X_0$ , $s(X_0)$ , chosen so that $s$ is nonnegative and monotonically increasing the farther $X_0$ is from 0. We want to use a nonnegative function since size is commonly understood, like distance, or length, to be nonnegative. Moreover, when measuring variability, we don’t want positive and negative deviations to cancel out.

The most natural choice would be to measure the expected absolute deviation:

Most statisticians select a related measure. Instead of averaging the asbsolute deviation, it is common practice to average the squared deviation, then correct the square with a square root outside the expectation. These two steps define the variance and the standard deviation:

Notice, if $X$ has unit $[x]$ , then variance has units $[x]^2$ . For instance, if $X$ is the price of an investment, then $\text{Var}[X]$ has unit of $\text{dollars}^2$ not $\text{dollars}.$ The standard deviation has units $\text{dollars}$ . For this reason, it is really the standard deviation, not the variance, that measures the spread, or variability, in $X$ .

The variance is related to the spread, or variability, in $X$ through its relation to the standard deviation. Large variances indicate large standard deviations. Since the variance is an expected square, it’s value alone is often hard to interpret and is easy to misread.

The standard deviation and mean absolute deviation are differ. In particular:

\text{MAD}[X] = \mathbb{E}[|X_0|] = \mathbb{E}[(|X_0|^2)^{1/2}] \leq \mathbb{E}[|X_0|^2]^{1/2} = \text{SD}[X].

(6)

The middle inequality is Jensen’s inequality applied to the square root. Square roots are concave functions, so expected roots are less than or equal to the square root of an expectation.

The mean absolute deviation and standard deviation differ since the standard deviation averages squared deviations. As a result, it is much more sensitive to large deviations, and discounts small deviations.

Given an expectation and a standard deviation, it is common practice to standardize a random variable.

Notice, a random variable, and its standardization, are related by a linear transformation. Often, we will define distribution families by first posing some model for a standard variable, $Z$ , then by allowing $X = a Z + b$ for any choice of $a$ and $b$ . The choice of $b$ assigns the distribution a central location. The choice of $a$ selects its variability, or spread, about that central location. This is why we focused on linear transformations of the inputs and outputs to functions in Section 3.2.

While standard deviations provide direct measures of spread, we will focus our study on variances. It is easy to compute standard deviations from variances, and variances have stronger algebraic properties, so are more convenient to work with.

Rules of Variance¶

Like expectations, variances are popular summaries since they admit obey convenient rules. These rules make it possible to break problems down into simpler parts. We won’t cover too many rules in this chapter. Instead, we’ll just check the rules associated with linear transformations:

Constants: If $X=c$ then $\text{Var}[X] = 0$ .
So, the variance of a constant is zero.
Nonnegativity: $\text{Var}[X] \geq 0$ and equals zero if and only if $X = c$ for some constant $c$ .
Variance is nonnegative by construction. Variance is an expected square, and all squares are nonnegative. The average of a series of nonnegative quantities is itself nonnegative.
The variance of a constant is always zero. If the variance is zero, then the random variable is equal to a constant. This is natural, the random variable doesn’t vary if its variance is zero.
Translation: Given any $b$ ,
$\text{Var}[X + b] = \text{Var}[X].$
(12)
So, the variance after a translation is the variance before the translation. This is an entirely sensible rule. Variances are associated to the spread, or width, of a distribution. The spread, or width, are unchanged by translating the distribution.
It follows that:
$\text{SD}[X + b] = \text{SD}[X].$
(13)
Scaling: Given any $a$ ,
$\text{Var}[a X] = a^2 \text{Var}[X].$
(14)
Proof: Just apply the definition, then use rules of expectation from Section 4.2:
$\begin{aligned} \text{Var}[a X] & = \mathbb{E}[(a X - \mathbb{E}[a X])^2] = \mathbb{E}[(a X - a \mathbb{E}[X])^2] \\ & = \mathbb{E}[(a(X - \bar{x}))^2] = \mathbb{E}[a^2 X_0^2] = a^2 \mathbb{E}[X_0^2] = a^2 \text{Var}[X]. \end{aligned}$
(15)
So, the variance after a scaling is the variance before the scaling, multiplied by the scaling squared. $\square$
It follows that:
$\text{SD}[a X] = |a| \text{SD}[X]$
(16)
Caution
Remember, $\text{Var}[a X] \neq a \text{Var}[X]$ . To avoid mixing this up, check units. The units of the variance are the units of $X$ , squared. So, replacing $X$ with $a X$ should change the variance by a factor of $a^2$ , not $a$ .

Computing Variances¶

Formally, the variance is the expected value of the nonnegative random variable $X_0^2 = (X - \bar{x})^2$ . So, when $X$ is discrete:

\text{Var}[X] = \sum_{\text{all } x} (x - \bar{x})^2 \text{PMF}(x).

(17)

When $X$ is continuous:

\text{Var}[X] = \int_{\text{all } x} (x - \bar{x})^2 \text{PDF}(x) dx.

(18)

We’ll often work with a formula that breaks the variance into simpler parts.

Proof: As usual, start from the definition, expand, then apply properties of expectation:

\text{Var}[X] = \mathbb{E}[(X - \bar{x})^2] = \mathbb{E}[X^2 - 2 \bar{x} X + \bar{x}^2].

(20)

Then, by the additivity of expectation:

\text{Var}[X] = \mathbb{E}[X^2] +\mathbb{E}[-2 \bar{x} X] + \mathbb{E}[\bar{x}^2].

(21)

Then, since $\bar{x}$ is a constant, we can use linearity to pull all constants outside the expectations:

\text{Var}[X] = \mathbb{E}[X^2] -2 \bar{x} \mathbb{E}[ X] + \bar{x}^2.

(22)

Finally, since $\mathbb{E}[X] = \bar{x}$ :

\text{Var}[X] = \mathbb{E}[X^2] - 2 \bar{x}^2 + \bar{x}^2 = \mathbb{E}[X^2] - \bar{x}^2.

(23)

So, the variance in $X$ is the expected square, $\mathbb{E}[X^2]$ , minus the squared expectation, $\bar{x}^2$ . $\square$

If you get stuck trying to compute a variance, this is often the first formula you should try next. In many cases it is easier to evaluate $\mathbb{E}[X]$ and $\mathbb{E}[X^2]$ than it is to evaluate $\mathbb{E}[(X - \bar{x})^2]$ directly.

Other Moments¶

So far we’ve seen two summaries based on expectations. These are each examples of the moments of a distribution.

The raw moments are expectations of the kind $\mathbb{E}[X^n]$ for various integers $n$ . The first raw moment is the expected value since $n = 1$ returns $\mathbb{E}[X^1] = \mathbb{E}[X]$ .

The central moments are the raw moments of the centered variable. These are expectations of the kind $\mathbb{E}[(X - \bar{x})^n]$ for various integers $n$ . The second central moment is the variance. Central moments can always be recovered from linear combinations of raw moments like we saw above.

Higher order central moments have been used to define other shape summaries. For example, the third central moment is commonly used as a measure of the skew in a distribution, and the fourth central moment is used to check whether the distribution is “bell shaped” in the same fashion as the famous “normal” distribution.

4.3 Variance

Definition¶

Rules of Variance¶

Computing Variances¶

Other Moments¶