Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

10.2 Conditional and Iterated Expectation

In the last chapter we saw that double integrals and double sums can be expanded as iterated integrals and iterated sums. Since expectations are weighted averages, expectations involving two or more random variables may be expanded as multiple integrals or multiple sums. In each case, converting from a multiple integral to an iterated integral, will convert a single expectation into a pair of nested expectations.

This approach is very useful in applied problems. It breaks a problem involving two (or more) random variables into a sequence of problems, each involving only one variable at a time.

In order to understand the nested expectations, we will need to understand expectations over a single variable, fixing a different variable. An expectation given a fixed condition on some of the random inputs is a conditional expectation.

Conditional Expectation

Suppose that XX and YY are jointly distributed random variables and g(X,Y)g(X,Y) is a scalar-valued function of XX and YY. Then, the conditional expectation of g(X,Y)g(X,Y) is the expected value of g(X,Y)g(X,Y) given some constraint on the values of XX and/or YY.

Applying the division rule; conditional equals joint over marginal (see Section 1.5 and Section 8.3):

EYX=x[g(x,Y)]={all yg(x,Y)Pr(X=x,Y=y)Pr(X=x) if discreteall yg(x,Y)fX,Y(x,y)fX(x)dy if continuous\mathbb{E}_{Y|X = x}[g(x,Y)] = \begin{cases} \sum_{\text{all } y} g(x,Y) \frac{\text{Pr}(X = x,Y = y)}{\text{Pr}(X = x)} & \text{ if discrete} \\ \\ \int_{\text{all } y} g(x,Y) \frac{f_{X,Y}(x,y)}{f_{X}(x)} dy & \text{ if continuous} \end{cases}

Recall that, all expectations are the center of mass of some distribution (see Section 4.1). So, we can visualize the conditional expectation of YY as the center of mass of the conditional distribution of YY given X=xX = x. Since conditional distributions are proportional to cross-sections of joint distributions (see Section 8.3), we can imagine conditional expectations as the center of mass of a cross-section of a joint.

The figure below shows an example joint density function as a heat map. The contours are level sets of the density. The solid red line shows yˉ(x)\bar{y}(x), the conditional expectation of YY given X=xX = x. The vertical dashed and dotted red lines show the range of possible YY for X=0X = 0 and X=1.5X = 1.5. The conditional distributions fYX=0f_{Y|X = 0} and fYX=1.5f_{Y|X = 1.5} are shown in the panel to the right. Notice that, the center of the conditionals matches the yy-coordinate where the solid red line intersects the vertical red lines.

Conditional Expectation.

Run the code cell below for an interactive example. You can vary the value of xx, and track how both the conditional distribution of YY and its expectation vary as a function of xx. Notice that the conditional density of YX=xY|X = x is proportional to the yy-cross section of the joint density shown in the left-hand panel.

from utils_cond_exp import show_conditional_expectation

show_conditional_expectation()

In both examples the conditional expectation of YY given X=xX = x depends on the choice of xx. This should not be surprising. The conditional distribution of YY given X=xX = x is proportional to the yy-cross-section at xx. The yy-cross-section varies depending on the choice of xx, so the conditional expectation of YY given X=xX = x may also vary with xx.

Iterated Expectation

Iterated expectation expresses a joint expectation over both XX and YY as an iterated expectation, first over YY given XX, then over XX. As usual, we can choose whether to work over XX on the outside and YY on the inside, or YY on the outside and XX on the inside.

Loosely, you can remember this law as expressing a joint average as an average of conditional averages. Or, more succinctly, as an average of averages.

Examples

Like the other algebraic properties of expectation, applying iterated expectation can make it much easier to compute some joint expectations. Here are some example cases:

  1. Suppose that XX and YY are jointly distributed, where E[X]=2\mathbb{E}[X] = 2 and yˉ(x)=E[YX=x]=4x3\bar{y}(x) = \mathbb{E}[Y|X = x] = 4 x - 3. What is E[Y]\mathbb{E}[Y]?

  2. Suppose that II and SS are drawn sequentially. Suppose that IBernoulli(1/3)I \sim \text{Bernoulli}(1/3) is an indicator variable. If I=0I = 0, draw SS from a binomial on 100 trials with success probability 1/51/5. If I=1I = 1, draw SS from a binomial on 100 trials with success probability 3/53/5. Then:

    SI=i{Binomial(100,1/5) if i=0Binomial(100,3/5) if i=1S|I = i \sim \begin{cases} \text{Binomial}(100,1/5) & \text{ if } i = 0 \\ \text{Binomial}(100,3/5) & \text{ if } i = 1 \\ \end{cases}

    What is E[S]\mathbb{E}[S]?

  3. Suppose that WGeometric(p)W \sim \text{Geometric}(p). What is E[W]\mathbb{E}[W]?

Independent Products

We can use iterated expectation to prove one last property of expectations. In Section 4.2 we worked out rules for simplifying expectations of sums. We can now simplify expectations of some (not all) products: