Expectation Reference - Data 89 Course Notes

Expectations¶

For details, see Sections 4.1, 8.3, 10.2, and 13.3.

The expected value of a random variable, $\mathbb{E}[X]$ , is the weighted average of possible $x$ against the PMF/PDF:
$\mathbb{E}[X] = \begin{cases} \sum_{\text{all } x} x \text{PMF}(x) & \text{ if discrete} \\ \int_{\text{all } x} x \text{PDF}(x) dx & \text{ if continuous} \end{cases}$
(1)
- The expected value is equivalent to the center of mass of the distribution
- Long run sample averages converge to the expected value
The expected value of a function of a random variable, $\mathbb{E}[g(X)]$ , is the weighted average over each $x$ , of $g(x)$ , weighted by the PMF/PDF
$\mathbb{E}[g(X)] = \begin{cases} \sum_{\text{all } x} g(x) \text{PMF}(x) & \text{ if discrete} \\ \int_{\text{all } x} g(x) \text{PDF}(x) dx & \text{ if continuous} \end{cases}$
(2)
The expected value of a function of multiple random varibales is defined analogously. For example:
$\mathbb{E}[g(X,Y)] = \begin{cases} \sum_{\text{all } x, y} g(x,y) \text{Pr}(X = x, Y = y) & \text{ if discrete} \\ \iint_{\text{all } x,y} g(x,y) f_{X,Y}(x,y) dx dy & \text{ if continuous} \end{cases}$
(3)
The conditional expectation of a random variable is its expected value when sampled from a conditional distribution, for example, if $X$ and $Y$ are jointly distributed, continuous variables, then:
$\mathbb{E}_{Y|X = x}[g(Y)] = \int_{\text{all } y} g(y) f_{Y|X = x}(y) dy.$
(4)
The expected value is distinct from the:
- Mode: the most likely outcome, or collection of outcomes that maximize the PMF/PDF.
- Median: the “midpoint” value $x$ such that $\text{Pr}(X < x_*) = \text{Pr}(X > x_*)$ .

Rules of Expectations¶

For details, see Sections 4.1, 4.2, 7.1, and 10.2.

Expectations of key distributions:
- Constants: $\mathbb{E}[c] = c$ .
- Indicators: if $X \sim \text{Bernoulli}(p)$ , then $\mathbb{E}[X] = p$ .
- Symmetric: If $X$ is drawn symmetrically about $x_*$ , then $\mathbb{E}[X] = x_*$ .
Linearity: $\mathbb{E}[a X + b] = a \mathbb{E}[X] + b$ .
- Remember, this rule only works for linear functions. If $g$ is a nonlinear function of $x$ , then $\mathbb{E}[g(X)]$ need not equal $g(\mathbb{E}[X])$ .
Additivity: for any pair of random variables $X$ and $Y$ , $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$ .
Jensen’s Inequality: If $g$ is a strictly convex function and $X$ is a random variable with nonzero variance, then:
$\mathbb{E}[g(X)] > g(\mathbb{E}[X]).$
(5)
Tail Sums and Integrals:
- If $X$ is a count valued (integer valued) random variable, then $\mathbb{E}[X] = \sum_{x = 1}^{\infty} \text{Pr}(X \geq x)$ .
- If $X$ is a continuously distributed, nonnegative random variable, then $\mathbb{E}[X] = \int_{x = 0}^{\infty} \text{Pr}(X \geq x) dx$ .
Iterated Expectation: If $X$ and $Y$ are drawn jointly, then:
$\mathbb{E}_{X,Y}[g(X,Y)] = \mathbb{E}_X[ \mathbb{E}_{Y|X}[g(X,Y)]].$
(6)
In particular:
$\mathbb{E}[Y] = \mathbb{E}_X[ \mathbb{E}[Y|X]].$
(7)

Variance:¶

For details, see Section 4.3.

Given, $\mathbb{E}[X] = \bar{x}$ , the variance and standard deviation of a random variable are:
$\text{Var}[X] = \mathbb{E}[(X - \bar{x})^2], \quad \text{SD}[X] = \sqrt{\text{Var}[X]}$
(8)
- The standard deviation measures the breadth, spread, or width of the distribution
Properties of Variance:
- $\text{Var}[X] \geq 0$
- $\text{Var}[c] = 0$
- $\text{Var}[X + b] = \text{Var}[X]$
- $\text{Var}[a X] = a^2 \text{Var}[X]$
To compute variances, we often use:
$\text{Var}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2$
(9)
- The variance in a random variable is its expected square, minus its squared expectation.

Covariance¶

All definitions and results are available in Sections 11.1 and 13.1.

The covariance between the random variables, $X$ and $Y$ , is defined:
$\text{Cov}[X,Y] = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]$
(10)
where $\bar{x} = \mathbb{E}[X]$ and $\bar{y} = \mathbb{E}[Y].$ The variables $X_0 = X - \mathbb{E}[X]$ and $Y_0 = Y - \mathbb{E}[Y]$ are centered.
It may be expanded as the expected product of the variables minus the product of their expectations:
$\text{Cov}[X,Y] = \mathbb{E}[X \times Y] - \mathbb{E}[X] \times \mathbb{E}[Y].$
(11)
Covariance Matrices: If $\{X_j\}_{j=1}^n$ is a collection of $n$ random variables, then the covariance matrix is the $n \times n$ array with $i,j$ entries $\text{Cov}[X_i,X_j].$
Properties of covariance:
- The covariance is unchanged by translations (adding constants) to the variables, $\text{Cov}[X +s,Y+t] = \text{Cov}[X,Y]$ .
- The correlation does depend on the scale of each variable, and $\text{Cov}[aX,bY] = ab\text{Cov}[X,Y]$ .
- The sign of the covariance indicates the sign of the association between two variables.
- The covariance is zero if $X$ and $Y$ are independent. However, dependent variables may also share a covariance equal to zero.
- The covariance between a random variable and itself is the variance, $\text{Cov}[X,X] = \text{Var}[X]$ .
- The covariance between any random variable and a constant is zero.
Variance of Sums and Sample Averages:
- The variance of a sum of random variables is a sum of all the pairwise covariances:
  $\begin{aligned} \text{Var}\left[ \sum_{j=1}^n X_j \right] & = \sum_{i=1}^n \sum_{j=1}^n \text{Cov}[X_i,X_j] \\ & = \sum_{j = 1}^n \text{Var}[X_j] + 2 \sum_{j=1}^{n} \sum_{i = j + 1}^n \text{Cov}[X_i,X_j]. \end{aligned}$
  (12)
  In the special case when $n = 2$ :
  $\text{Var}[X + Y] = \text{Var}[X] + \text{Var}[Y] + 2 \text{Cov}[X,Y].$
  (13)
- The variance of a sample average is the average of all the pairwise covariances:
  $\text{Var}\left[ \frac{1}{n} \sum_{j=1}^n X_j \right] = \frac{1}{n^2}\sum_{j=1}^n \text{Cov}[X_i,X_j].$
  (14)

Correlation¶

All definitions and results are available in Sections 11.1 and 11.2.

The correlation between two random variables, $X$ and $Y$ , is defined as the covariance in the standardized variables. It may be computed:
$\text{Corr}[X,Y] = \frac{\text{Cov}[X,Y]}{\text{SD}[X]\text{SD}[Y]}.$
(15)
- The correlation measures the strength of the association between $X$ and $Y$ .
Properties of correlation:
- The correlation is unchanged by translations (adding constants) to the variables, $\text{Corr}[X +s,Y+t] = \text{Corr}[X,Y]$ .
- The correlation does depend on the scale of each variable, and $\text{Corr}[aX,bY] = \text{Corr}[X,Y]$ if $a > 0$ and $b > 0$ .
- The sign of the correlation indicates the sign of the association between two variables.
- The correlation is zero if $X$ and $Y$ are independent. However, dependent variables may also be uncorrelated (have correlation equal to 0).
- The correlation is between $[-1,+1]$ and $|\text{Corr}[X,Y]| = 1$ if and only if $Y$ is a linear function of $X$ .
Correlation Interpretation:
- The empirical correlation between a collection of sample pairs is the cosine of the angle between the vectors formed by the centered samples.
- The correlation between two variables equals the slope of the best fit line between the standardized variables.

14.3 Expectation Reference

Expectations¶

Rules of Expectations¶

Variance:¶

Covariance¶

Correlation¶