Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

14.3 Expectation Reference

Expectations

For details, see Sections 4.1, 8.3, 10.2, and 13.3.

  1. The expected value of a random variable, E[X]\mathbb{E}[X], is the weighted average of possible xx against the PMF/PDF:

    E[X]={all xxPMF(x) if discreteall xxPDF(x)dx if continuous\mathbb{E}[X] = \begin{cases} \sum_{\text{all } x} x \text{PMF}(x) & \text{ if discrete} \\ \int_{\text{all } x} x \text{PDF}(x) dx & \text{ if continuous} \end{cases}
    • The expected value is equivalent to the center of mass of the distribution

    • Long run sample averages converge to the expected value

  2. The expected value of a function of a random variable, E[g(X)]\mathbb{E}[g(X)], is the weighted average over each xx, of g(x)g(x), weighted by the PMF/PDF

    E[g(X)]={all xg(x)PMF(x) if discreteall xg(x)PDF(x)dx if continuous\mathbb{E}[g(X)] = \begin{cases} \sum_{\text{all } x} g(x) \text{PMF}(x) & \text{ if discrete} \\ \int_{\text{all } x} g(x) \text{PDF}(x) dx & \text{ if continuous} \end{cases}

    The expected value of a function of multiple random varibales is defined analogously. For example:

    E[g(X,Y)]={all x,yg(x,y)Pr(X=x,Y=y) if discreteall x,yg(x,y)fX,Y(x,y)dxdy if continuous\mathbb{E}[g(X,Y)] = \begin{cases} \sum_{\text{all } x, y} g(x,y) \text{Pr}(X = x, Y = y) & \text{ if discrete} \\ \iint_{\text{all } x,y} g(x,y) f_{X,Y}(x,y) dx dy & \text{ if continuous} \end{cases}
  3. The conditional expectation of a random variable is its expected value when sampled from a conditional distribution, for example, if XX and YY are jointly distributed, continuous variables, then:

    EYX=x[g(Y)]=all yg(y)fYX=x(y)dy.\mathbb{E}_{Y|X = x}[g(Y)] = \int_{\text{all } y} g(y) f_{Y|X = x}(y) dy.
  4. The expected value is distinct from the:

    • Mode: the most likely outcome, or collection of outcomes that maximize the PMF/PDF.

    • Median: the “midpoint” value xx such that Pr(X<x)=Pr(X>x)\text{Pr}(X < x_*) = \text{Pr}(X > x_*).

Rules of Expectations

For details, see Sections 4.1, 4.2, 7.1, and 10.2.

  1. Expectations of key distributions:

    • Constants: E[c]=c\mathbb{E}[c] = c.

    • Indicators: if XBernoulli(p)X \sim \text{Bernoulli}(p), then E[X]=p\mathbb{E}[X] = p.

    • Symmetric: If XX is drawn symmetrically about xx_*, then E[X]=x\mathbb{E}[X] = x_*.

  2. Linearity: E[aX+b]=aE[X]+b\mathbb{E}[a X + b] = a \mathbb{E}[X] + b.

    • Remember, this rule only works for linear functions. If gg is a nonlinear function of xx, then E[g(X)]\mathbb{E}[g(X)] need not equal g(E[X])g(\mathbb{E}[X]).

  3. Additivity: for any pair of random variables XX and YY, E[X+Y]=E[X]+E[Y]\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y].

  4. Jensen’s Inequality: If gg is a strictly convex function and XX is a random variable with nonzero variance, then:

    E[g(X)]>g(E[X]).\mathbb{E}[g(X)] > g(\mathbb{E}[X]).
  5. Tail Sums and Integrals:

    • If XX is a count valued (integer valued) random variable, then E[X]=x=1Pr(Xx)\mathbb{E}[X] = \sum_{x = 1}^{\infty} \text{Pr}(X \geq x).

    • If XX is a continuously distributed, nonnegative random variable, then E[X]=x=0Pr(Xx)dx\mathbb{E}[X] = \int_{x = 0}^{\infty} \text{Pr}(X \geq x) dx.

  6. Iterated Expectation: If XX and YY are drawn jointly, then:

    EX,Y[g(X,Y)]=EX[EYX[g(X,Y)]].\mathbb{E}_{X,Y}[g(X,Y)] = \mathbb{E}_X[ \mathbb{E}_{Y|X}[g(X,Y)]].

    In particular:

    E[Y]=EX[E[YX]].\mathbb{E}[Y] = \mathbb{E}_X[ \mathbb{E}[Y|X]].

Variance:

For details, see Section 4.3.

  1. Given, E[X]=xˉ\mathbb{E}[X] = \bar{x}, the variance and standard deviation of a random variable are:

    Var[X]=E[(Xxˉ)2],SD[X]=Var[X]\text{Var}[X] = \mathbb{E}[(X - \bar{x})^2], \quad \text{SD}[X] = \sqrt{\text{Var}[X]}
    • The standard deviation measures the breadth, spread, or width of the distribution

  2. Properties of Variance:

    • Var[X]0\text{Var}[X] \geq 0

    • Var[c]=0\text{Var}[c] = 0

    • Var[X+b]=Var[X]\text{Var}[X + b] = \text{Var}[X]

    • Var[aX]=a2Var[X]\text{Var}[a X] = a^2 \text{Var}[X]

  3. To compute variances, we often use:

    Var[X]=E[X2]E[X]2\text{Var}[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2
    • The variance in a random variable is its expected square, minus its squared expectation.

Covariance

All definitions and results are available in Sections 11.1 and 13.1.

  1. The covariance between the random variables, XX and YY, is defined:

    Cov[X,Y]=E[(XE[X])(YE[Y])]\text{Cov}[X,Y] = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]

    where xˉ=E[X]\bar{x} = \mathbb{E}[X] and yˉ=E[Y].\bar{y} = \mathbb{E}[Y]. The variables X0=XE[X]X_0 = X - \mathbb{E}[X] and Y0=YE[Y]Y_0 = Y - \mathbb{E}[Y] are centered.

    It may be expanded as the expected product of the variables minus the product of their expectations:

    Cov[X,Y]=E[X×Y]E[X]×E[Y].\text{Cov}[X,Y] = \mathbb{E}[X \times Y] - \mathbb{E}[X] \times \mathbb{E}[Y].
  2. Covariance Matrices: If {Xj}j=1n\{X_j\}_{j=1}^n is a collection of nn random variables, then the covariance matrix is the n×nn \times n array with i,ji,j entries Cov[Xi,Xj].\text{Cov}[X_i,X_j].

  3. Properties of covariance:

    • The covariance is unchanged by translations (adding constants) to the variables, Cov[X+s,Y+t]=Cov[X,Y]\text{Cov}[X +s,Y+t] = \text{Cov}[X,Y].

    • The correlation does depend on the scale of each variable, and Cov[aX,bY]=abCov[X,Y]\text{Cov}[aX,bY] = ab\text{Cov}[X,Y].

    • The sign of the covariance indicates the sign of the association between two variables.

    • The covariance is zero if XX and YY are independent. However, dependent variables may also share a covariance equal to zero.

    • The covariance between a random variable and itself is the variance, Cov[X,X]=Var[X]\text{Cov}[X,X] = \text{Var}[X].

    • The covariance between any random variable and a constant is zero.

  4. Variance of Sums and Sample Averages:

    • The variance of a sum of random variables is a sum of all the pairwise covariances:

      Var[j=1nXj]=i=1nj=1nCov[Xi,Xj]=j=1nVar[Xj]+2j=1ni=j+1nCov[Xi,Xj].\begin{aligned} \text{Var}\left[ \sum_{j=1}^n X_j \right] & = \sum_{i=1}^n \sum_{j=1}^n \text{Cov}[X_i,X_j] \\ & = \sum_{j = 1}^n \text{Var}[X_j] + 2 \sum_{j=1}^{n} \sum_{i = j + 1}^n \text{Cov}[X_i,X_j]. \end{aligned}

      In the special case when n=2n = 2:

      Var[X+Y]=Var[X]+Var[Y]+2Cov[X,Y].\text{Var}[X + Y] = \text{Var}[X] + \text{Var}[Y] + 2 \text{Cov}[X,Y].
    • The variance of a sample average is the average of all the pairwise covariances:

      Var[1nj=1nXj]=1n2j=1nCov[Xi,Xj].\text{Var}\left[ \frac{1}{n} \sum_{j=1}^n X_j \right] = \frac{1}{n^2}\sum_{j=1}^n \text{Cov}[X_i,X_j].

Correlation

All definitions and results are available in Sections 11.1 and 11.2.

  1. The correlation between two random variables, XX and YY, is defined as the covariance in the standardized variables. It may be computed:

    Corr[X,Y]=Cov[X,Y]SD[X]SD[Y].\text{Corr}[X,Y] = \frac{\text{Cov}[X,Y]}{\text{SD}[X]\text{SD}[Y]}.
    • The correlation measures the strength of the association between XX and YY.

  2. Properties of correlation:

    • The correlation is unchanged by translations (adding constants) to the variables, Corr[X+s,Y+t]=Corr[X,Y]\text{Corr}[X +s,Y+t] = \text{Corr}[X,Y].

    • The correlation does depend on the scale of each variable, and Corr[aX,bY]=Corr[X,Y]\text{Corr}[aX,bY] = \text{Corr}[X,Y] if a>0a > 0 and b>0b > 0.

    • The sign of the correlation indicates the sign of the association between two variables.

    • The correlation is zero if XX and YY are independent. However, dependent variables may also be uncorrelated (have correlation equal to 0).

    • The correlation is between [1,+1][-1,+1] and Corr[X,Y]=1|\text{Corr}[X,Y]| = 1 if and only if YY is a linear function of XX.

  3. Correlation Interpretation:

    • The empirical correlation between a collection of sample pairs is the cosine of the angle between the vectors formed by the centered samples.

    • The correlation between two variables equals the slope of the best fit line between the standardized variables.