Joint and Marginal Probabilities - Data 89 Course Notes

Section 1.3 established rules for “not” statements and “or” statements. We haven’t worked out what we should do for “if” statements and “and” statements.

This section is all about “and” statements.

And Statements and Joint Probabilities¶

What is the chance that two events happen simultaneously? For example, what’s the chance that, when I draw a card from a deck, it is both a face card and in a red suit?

We’ve seen that concatenating two events with an “or” statement produces a new event equal to the union of the component events, $A \text{ or } B \text{ happen}$ means $\omega \in A \cup B$ . Combining two events with an “and” has the opposite effect. Instead of producing a larger set that contains both of its component sets, combining two sets with an “and” produces a smaller set that is a subset of both its parents.

For example:

\{a,b,c,d\} \text{ and } \{c,d,e,f,g\} = \{c,d\}

(2)

Notice that the “and” operation selects for only the elements contained in both sets. On a Venn diagram, this corresponds to the region where the sets overlap, or, intersect. Accordingly, we call the operation that selects only the elements in both sets an intersection. The intersection of two sets is denotes $\cap$ .

We can now write joint probabilities three ways. It is important that you can read all three:

\text{Pr}(A,B) = \text{Pr}(A \text{ and } B) = \text{Pr}(A \cap B)

(3)

Bounding Joint Probabilities¶

Often, we would like to compute a chance, but don’t know enough about our problem to calculate the actual value. Alternately, the necessary calculation may be too cumbersome to perform, or, may be overly precise. In these situations it can be helpful to bound the chances instead.

Statistical Testing Example

It is common practice to define statistical tests with controlled error rates. This means the test is designed to guarantee: “I will return a wrong answer at most $p$ percent of the time under some assumptions.” It is common to use a bound instead of an equality since we want tests that apply to a wide variety of systems and are not sensitive to assumptions that we can’t justify. As a result, the collection of assumptions are usually chosen to allow many probability models. Then most events don’t have a uniquely defined chance. Instead, there is an upper bound that holds for all chances that could be assigned consistent with the listed assumptions.

Let’s practice deriving bounds for joint probabilities.

An Upper Bound¶

In Section 1.1 we observed that expanding an event to include more outcomes cannot make the event less probable. You’ll prove this fact on your homework. For now, take it as a natural consequence of equating probabilities to proportions. Increasing the number of ways an event can occur cannot decrease its frequency in a series of trials. We used that idea to argue that applying a union never decreased the chance of an event.

Applying an intersection has the opposite effect. Instead of making an event more generic by allowing alternative outcomes, applying an intersection makes an event more specific. Thinking in terms of conditions, applying an “or” loosens the conditions that define an event while applying an “and” tightens those conditions.

Formally:

A \cap B \subset A \text{ and } A \cap B \subset B

(4)

since every element of $A \cap B$ is contained in both $A$ and $B$ . It follows that:

\text{Pr}(A, B) \leq \text{Pr}(A) \quad \text{ and } \quad \text{Pr}(A, B) \leq \text{Pr}(B)

(5)

Since the left hand sides of both inequalities above are identical, we can put the inequalities together. Suppose that $\text{Pr}(A) = 0.5$ and $\text{Pr}(B) = 0.2$ . If $\text{Pr}(A,B) \leq 0.5$ and is $\leq 0.2$ , then the smaller upper bound implies the larger upper bound. So, we might as well just say that $\text{Pr}(A,B) \leq 0.2$ . Therefore:

Example

Compare the chances that a 5 card hand, drawn from a thoroughly shuffled 52 card deck is a:

Flush: All five cards belong to the same suit. For example, all five cards are a heart.
Royal Flush: All five cards belong to the same suit and the five cards are an Ace, King, Queen, Jack, and Ten.

The Royal Flush is a more detailed event than Flush because it adds conditions. There are many more ways to draw a Flush than a Royal Flush. We can easily count the number of ways to draw a Royal Flush. There are only four! All hearts, all diamonds, all clubs, all spades.

How many ways are there to draw a Flush? Well, there are 4 suits to choose from. Then, given a suit, there are 13 options for the first card, 12 for the next, 11 for the next, 10 for the next, and 9 for the last. Any permutations of the same five cards make the same hand, so we’ve overcounted the number of distinct hands by a factor of 5! (see the Combinatorics Helper). Therefore, the number of flush hands is:

\frac{4 \times 13 \times 12 \times 11 \times 10 \times 9}{5!} = 4 \times \frac{13!}{5! 8!} = 4 \times \left(\begin{array}{c} 13 \\ 5 \end{array} \right)

(7)

The expression on the right means “13 choose 5”. It counts the number of distinct 5 card hands we can draw from the set of 13 that belong to any suit.

The number of five card hands from all 52 is, by the same logic, 52 choose 5. So, using probability by proportion:

\text{Pr}(\text{Flush}) = \frac{4 \times \left(\begin{array}{c} 13 \\ 5 \end{array} \right)}{\left(\begin{array}{c} 52 \\ 5 \end{array} \right)}, \quad \text{Pr}(\text{Royal Flush}) = \frac{4}{\left(\begin{array}{c} 52 \\ 5 \end{array} \right)}

(8)

That’s a huge difference, the chance of a Flush is 13 choose 5 times larger than the chance of a Royal Flush. That’s a factor of about 1,200!

We’ll come back to this observation later in the course.

A Lower Bound?¶

We derived an upper bound (see Section 1.3) and a lower bound on the chance of a union. One of the bounds followed the argument provided above. Expanding an event never makes it less likely, so the chance of a union is never less than the chance of its most likely parts. The other followed from an observation about frequencies. The probability of a union is never more than the sum of the probabilities of each of its parts.

Joint Probability Tables¶

It is often helpful to visualize joint probabilities with a table.

Suppose that we have two events, $A$ , and $B$ . For instance, if we roll a fair six-sided die, we could define:

A = \{\text{even}\} = \{2,4,6\}, \quad B = \{\leq 6\} = \{1,2,3,4,5\}

(9)

Every event, $E$ , and its complement, $E^c = \{\text{not } E\}$ partition $\Omega$ since no outcome can both satisfy, and not satisfy, an event, and, every outcome must either satisfy, or not satisfy, an event. So $A \cup A^c$ and $B \cup B^c$ both return $\Omega$ .

Combining these partitions produces a finer partition made up of four sets:

\begin{aligned} & A \text{ and } B = A \cap B, \quad & (\text{not } A) \text{ and } B = A^c \cap B \\ & A \text{ and } (\text{ not } B) = A \cap B^c, \quad & (\text{not } A) \text{ and } (\text{ not } B) = A^c \cap B^c \end{aligned}

(10)

We can write the same partition in terms of the outcomes in each set:

\begin{aligned} & \{2, 4\}, \quad & \{1,3,5\} \\ & \{6\}, \quad & \emptyset \end{aligned}

(11)

Since we are rolling a fair die, all outcomes are equally likely, so the probability of each event in the table is the number of ways it can happen divided by the number of possible rolls. Therefore:

\begin{aligned} & \text{Pr}(A,B) = 2/6, \quad & \text{Pr}(A^c,B) = 3/6 \\ & \text{Pr}(A,B^c) = 1/6 , \quad & \text{Pr}(A^c,B^c) = 0 \end{aligned}

(12)

If we are only interested in whether a roll is even, and whether it is less than 6, then the only four distinct outcomes are (even and less than 6), (odd and less than 6), (even and equal to 6), and (odd an equal to 6). So, we could replace $\Omega = \{1,2,3,4,5,6\}$ with:

\Omega' = \left\{ \begin{aligned} & \text{even and less than 6}, \quad & \text{odd and less than 6}, \\ & \text{even and equal to 6}, \quad & \text{odd and equal to 6} \end{aligned}\right\}

(13)

We now have an outcome space containing 4 outcomes that are naturally arranged into a table. Their chances were computed above. Their chances don’t match, so are just a categorical distribution over the four categories. It’s common practice to represent the categorical distribution over all intersections produced by a pair of events and their complements with a joint probability table.

In our example, the table is:

Event	$A$	not $A$
$B$	$\text{Pr}(A,B) = 2/6$	$\text{Pr}(A^c,B) = 3/6$
not $B$	$\text{Pr}(A,B^c) = 1/6$	$\text{Pr}(A^c,B^c) = 0$

We can generalize this idea to any pair of partitions. For example:

Event	$\{2,4\}$	$\{1,3,5\}$	$\{6\}$
$\{1,2,3\}$	$1/6$	$2/6$	0
$\{4,5\}$	$1/6$	$1/6$	0
$\{6\}$	0	$0$	$1/6$

Any joint probability table specifies a categorical distribution. Its entries are joint probabilities. Like any categorical probabilities, they must add to one. So, any table containing:

all nonnegative numbers
that add to one

could be a valid joint probability table. Fact (2.) is useful since it helps find joint probabilities from partially complete tables. For instance, if we know:

Event	$C$	not $C$
$D$	$1/6$	$2/6$
not $D$	$2/6$	?

Then, to ensure the entries add to one, the missing entry, $\text{Pr}(C^c, D^c)$ must equal $1 - (1/6 + 2/6 + 2/6) = 1/6$ . You should check that this is just an application of the complements rule.

Marginal Probabilities¶

The addition rule for unions relates the sum of the entries in a row or column of a joint probability table to the probability of the event defining the column. For instance:

\begin{aligned} \text{Pr}(A) & = \text{Pr}(\text{even}) \\ & = \text{Pr}(\text{(even and < 6) or (even and = 6)}) \\ & = \text{Pr}((A \cap B) \cup (A \cap B^c)) \\ & = \text{Pr}(A \cap B) + \text{Pr}(A \cap B^c) \\ & = \text{Pr}(A,B) + \text{Pr}(A,B^c) \end{aligned}

(14)

🛠️ Confirm for yourself that the addition rule applies for the union: $(A \cap B) \cup (A \cap B^c)$ . Check that the two parenthetical events are disjoint before proceeding.

Then, the probability of event $A$ , equals the sum of the joint probabilities in its column of the table:

Event	$A$
$B$	$\text{Pr}(A,B) = 2/6$
not $B$	$\text{Pr}(A,B^c) = 1/6$
$\text{ }$	$\text{Pr}(A) = 3/6$

We’ve dropped the column corresponding to $A^c$ from the table for this calculation since it is not needed. In plain language, the probability of the event $A$ is the chance $A$ and $B$ occur, plus the chance $A$ occurs and $B$ does not.

This example illustrates a general rule. The probability of any event can be represented as a sum of joint probabilities corresponding to a partition of the event. For instance:

Event	$\{2,4\}$
$\{1,2,3\}$	$1/6$
$\{4,5\}$	$1/6$
$\{6\}$	0
$\text{ }$	$\text{Pr}(\{2,4\}) = 2/6$

The probability in the bottom row is an example of a marginal probability.

Given a joint probability table, the marginal probabilities are the sums of the rows and columns of the table. You can remember the name marginal by thinking that the marginal probabilites live at the margin, or edge, of the table. In the original example we have four joint probabilities and four marginals:

\textbf{Joints:} = \begin{cases} & \text{Pr}(A,B) = p_{AB} \\ & \text{Pr}(A^c,B) = p_{A^cB} \\ & \text{Pr}(A,B^c) = p_{AB^c} \\ & \text{Pr}(A^c,B^c) = p_{A^cB^c} \\ \end{cases} \quad \textbf{Marginals:} = \begin{cases} & \text{Pr}(A) = p_{AB} + p_{AB^c} \\ & \text{Pr}(A^c) = p_{A^cB} + p_{A^cB^c} \\ & \text{Pr}(B) = ... \\ & \text{Pr}(B^c) = ... \\ \end{cases}

(15)

🛠️ To check your understanding, try to fill in the ... above.

Solutions

\textbf{Marginals:} = \begin{cases} & \text{Pr}(A) = p_{AB} + p_{AB^c} \\ & \text{Pr}(A^c) = p_{A^cB} + p_{A^cB^c} \\ & \text{Pr}(B) = p_{AB} + p_{A^cB} \\ & \text{Pr}(B^c) = p_{AB^c} + p_{A^cB^c} \\ \end{cases}

(16)

Here’s the completed joint probability table for our original example, including the marginals:

Event	$A$	not $A$	$B$ Marginals
$B$	$p_{AB} = 2/6$	$p_{A^cB} = 3/6$	$2/6 + 3/6 = 5/6$
not $B$	$p_{AB^c} = 1/6$	$p_{A^cB^c} = 0$	$2/6 + 0 = 1/6$
$A$ Marginals	$2/6 + 1/6 = 3/6$	$3/6 + 0 = 3/6$	1

Notice that:

So, if we add the marginals, then we have more rules that can be used to fill in missing entries. You can think about this like Sudoku. All the joint entries must add to one, all of the marginals along a boundary must add to one, and all the joint entries in a given row or column must add to the marginal for that row or column:

Event	$A$	not $A$	$B$ Marginals
$B$	$p_{AB}$	$p_{A^cB}$	$p_B = p_{AB} + p_{A^cB}$
not $B$	$p_{AB^c}$	$p_{A^cB^c} = 0$	$p_{B^c} = p_{AB^c} + p_{A^cB^c}$
$A$ Marginals	$p_{A} = p_{AB} + p_{AB^c}$	$p_{A^c} = p_{A^cB} + p_{A^cB^c}$	1

Here’s the full table for our larger example:

Event	$\{2,4\}$	$\{1,3,5\}$	$\{6\}$	Marginals
$\{1,2,3\}$	$1/6$	$2/6$	0	$3/6$
$\{4,5\}$	$1/6$	$1/6$	0	$2/6$
$\{6\}$	0	$0$	$1/6$	$1/6$
Marginals	$2/6$	$3/6$	$1/6$	1

The procedure we just used to compute marginal probabilities from joint probabilities is called marginalization. To marginalize:

You’ll use this strategy a lot in this class. It is often true that we want to find the probability of some event, the probability is hard to compute, but can be computed if we break down the event according to a list of ways it can happen. As long as we can compute the probability of each way it can occur, we can add those chance together to get the chance of the desired event.

Example

For instance, suppose that tomorrow’s weather obeys the joint probabilities:

Event	Rain	Clouds	Sun	Marginals
Cold	$2/10$	$3/10$	$1/10$	$6/10$
Warm	$1/10$	0	$1/10$	$2/10$
Hot	0	$0$	$2/10$	$2/10$
Marginals	$3/10$	$3/10$	$4/10$	1

Then, the chance it is warm is:

\begin{aligned} \text{Pr}(\text{Warm}) & = \text{Pr}(\text{Warm, Rain}) + \text{Pr}(\text{Warm, Clouds}) + \text{Pr}(\text{Warm, Sun}) \\ & = 1/10 + 0/10 + 1/10 = 2/10. \end{aligned}

(18)