Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1.4 Joint and Marginal Probabilities

Section 1.3 established rules for “not” statements and “or” statements. We haven’t worked out what we should do for “if” statements and “and” statements.

This section is all about “and” statements.

And Statements and Joint Probabilities

What is the chance that two events happen simultaneously? For example, what’s the chance that, when I draw a card from a deck, it is both a face card and in a red suit?

We’ve seen that concatenating two events with an “or” statement produces a new event equal to the union of the component events, A or B happenA \text{ or } B \text{ happen} means ωAB\omega \in A \cup B. Combining two events with an “and” has the opposite effect. Instead of producing a larger set that contains both of its component sets, combining two sets with an “and” produces a smaller set that is a subset of both its parents.

For example:

{a,b,c,d} and {c,d,e,f,g}={c,d}\{a,b,c,d\} \text{ and } \{c,d,e,f,g\} = \{c,d\}

Notice that the “and” operation selects for only the elements contained in both sets. On a Venn diagram, this corresponds to the region where the sets overlap, or, intersect. Accordingly, we call the operation that selects only the elements in both sets an intersection. The intersection of two sets is denotes \cap.

We can now write joint probabilities three ways. It is important that you can read all three:

Pr(A,B)=Pr(A and B)=Pr(AB)\text{Pr}(A,B) = \text{Pr}(A \text{ and } B) = \text{Pr}(A \cap B)

Bounding Joint Probabilities

Often, we would like to compute a chance, but don’t know enough about our problem to calculate the actual value. Alternately, the necessary calculation may be too cumbersome to perform, or, may be overly precise. In these situations it can be helpful to bound the chances instead.

Let’s practice deriving bounds for joint probabilities.

An Upper Bound

In Section 1.1 we observed that expanding an event to include more outcomes cannot make the event less probable. You’ll prove this fact on your homework. For now, take it as a natural consequence of equating probabilities to proportions. Increasing the number of ways an event can occur cannot decrease its frequency in a series of trials. We used that idea to argue that applying a union never decreased the chance of an event.

Applying an intersection has the opposite effect. Instead of making an event more generic by allowing alternative outcomes, applying an intersection makes an event more specific. Thinking in terms of conditions, applying an “or” loosens the conditions that define an event while applying an “and” tightens those conditions.

Formally:

ABA and ABBA \cap B \subset A \text{ and } A \cap B \subset B

since every element of ABA \cap B is contained in both AA and BB. It follows that:

Pr(A,B)Pr(A) and Pr(A,B)Pr(B)\text{Pr}(A, B) \leq \text{Pr}(A) \quad \text{ and } \quad \text{Pr}(A, B) \leq \text{Pr}(B)

Since the left hand sides of both inequalities above are identical, we can put the inequalities together. Suppose that Pr(A)=0.5\text{Pr}(A) = 0.5 and Pr(B)=0.2\text{Pr}(B) = 0.2. If Pr(A,B)0.5\text{Pr}(A,B) \leq 0.5 and is 0.2\leq 0.2, then the smaller upper bound implies the larger upper bound. So, we might as well just say that Pr(A,B)0.2\text{Pr}(A,B) \leq 0.2. Therefore:

We’ll come back to this observation later in the course.

A Lower Bound?

We derived an upper bound (see Section 1.3) and a lower bound on the chance of a union. One of the bounds followed the argument provided above. Expanding an event never makes it less likely, so the chance of a union is never less than the chance of its most likely parts. The other followed from an observation about frequencies. The probability of a union is never more than the sum of the probabilities of each of its parts.

Joint Probability Tables

It is often helpful to visualize joint probabilities with a table.

Suppose that we have two events, AA, and BB. For instance, if we roll a fair six-sided die, we could define:

A={even}={2,4,6},B={6}={1,2,3,4,5}A = \{\text{even}\} = \{2,4,6\}, \quad B = \{\leq 6\} = \{1,2,3,4,5\}

Every event, EE, and its complement, Ec={not E}E^c = \{\text{not } E\} partition Ω\Omega since no outcome can both satisfy, and not satisfy, an event, and, every outcome must either satisfy, or not satisfy, an event. So AAcA \cup A^c and BBcB \cup B^c both return Ω\Omega.

Combining these partitions produces a finer partition made up of four sets:

A and B=AB,(not A) and B=AcBA and ( not B)=ABc,(not A) and ( not B)=AcBc\begin{aligned} & A \text{ and } B = A \cap B, \quad & (\text{not } A) \text{ and } B = A^c \cap B \\ & A \text{ and } (\text{ not } B) = A \cap B^c, \quad & (\text{not } A) \text{ and } (\text{ not } B) = A^c \cap B^c \end{aligned}

We can write the same partition in terms of the outcomes in each set:

{2,4},{1,3,5}{6},\begin{aligned} & \{2, 4\}, \quad & \{1,3,5\} \\ & \{6\}, \quad & \emptyset \end{aligned}

Since we are rolling a fair die, all outcomes are equally likely, so the probability of each event in the table is the number of ways it can happen divided by the number of possible rolls. Therefore:

Pr(A,B)=2/6,Pr(Ac,B)=3/6Pr(A,Bc)=1/6,Pr(Ac,Bc)=0\begin{aligned} & \text{Pr}(A,B) = 2/6, \quad & \text{Pr}(A^c,B) = 3/6 \\ & \text{Pr}(A,B^c) = 1/6 , \quad & \text{Pr}(A^c,B^c) = 0 \end{aligned}

If we are only interested in whether a roll is even, and whether it is less than 6, then the only four distinct outcomes are (even and less than 6), (odd and less than 6), (even and equal to 6), and (odd an equal to 6). So, we could replace Ω={1,2,3,4,5,6}\Omega = \{1,2,3,4,5,6\} with:

Ω={even and less than 6,odd and less than 6,even and equal to 6,odd and equal to 6}\Omega' = \left\{ \begin{aligned} & \text{even and less than 6}, \quad & \text{odd and less than 6}, \\ & \text{even and equal to 6}, \quad & \text{odd and equal to 6} \end{aligned}\right\}

We now have an outcome space containing 4 outcomes that are naturally arranged into a table. Their chances were computed above. Their chances don’t match, so are just a categorical distribution over the four categories. It’s common practice to represent the categorical distribution over all intersections produced by a pair of events and their complements with a joint probability table.

In our example, the table is:

EventAAnot AA
BBPr(A,B)=2/6\text{Pr}(A,B) = 2/6 Pr(Ac,B)=3/6\text{Pr}(A^c,B) = 3/6
not BBPr(A,Bc)=1/6\text{Pr}(A,B^c) = 1/6 Pr(Ac,Bc)=0\text{Pr}(A^c,B^c) = 0

We can generalize this idea to any pair of partitions. For example:

Event{2,4}\{2,4\}{1,3,5}\{1,3,5\}{6}\{6\}
{1,2,3}\{1,2,3\}1/61/62/62/60
{4,5}\{4,5\}1/61/6 1/61/60
{6}\{6\}000 1/61/6

Any joint probability table specifies a categorical distribution. Its entries are joint probabilities. Like any categorical probabilities, they must add to one. So, any table containing:

  1. all nonnegative numbers

  2. that add to one

could be a valid joint probability table. Fact (2.) is useful since it helps find joint probabilities from partially complete tables. For instance, if we know:

EventCCnot CC
DD1/6 1/6 2/6 2/6
not DD2/62/6 ?

Then, to ensure the entries add to one, the missing entry, Pr(Cc,Dc)\text{Pr}(C^c, D^c) must equal 1(1/6+2/6+2/6)=1/61 - (1/6 + 2/6 + 2/6) = 1/6. You should check that this is just an application of the complements rule.

Marginal Probabilities

The addition rule for unions relates the sum of the entries in a row or column of a joint probability table to the probability of the event defining the column. For instance:

Pr(A)=Pr(even)=Pr((even and < 6) or (even and = 6))=Pr((AB)(ABc))=Pr(AB)+Pr(ABc)=Pr(A,B)+Pr(A,Bc)\begin{aligned} \text{Pr}(A) & = \text{Pr}(\text{even}) \\ & = \text{Pr}(\text{(even and < 6) or (even and = 6)}) \\ & = \text{Pr}((A \cap B) \cup (A \cap B^c)) \\ & = \text{Pr}(A \cap B) + \text{Pr}(A \cap B^c) \\ & = \text{Pr}(A,B) + \text{Pr}(A,B^c) \end{aligned}

🛠️ Confirm for yourself that the addition rule applies for the union: (AB)(ABc)(A \cap B) \cup (A \cap B^c). Check that the two parenthetical events are disjoint before proceeding.

Then, the probability of event AA, equals the sum of the joint probabilities in its column of the table:

EventAA
BBPr(A,B)=2/6\text{Pr}(A,B) = 2/6
not BBPr(A,Bc)=1/6\text{Pr}(A,B^c) = 1/6
 \text{ }Pr(A)=3/6\text{Pr}(A) = 3/6

We’ve dropped the column corresponding to AcA^c from the table for this calculation since it is not needed. In plain language, the probability of the event AA is the chance AA and BB occur, plus the chance AA occurs and BB does not.

This example illustrates a general rule. The probability of any event can be represented as a sum of joint probabilities corresponding to a partition of the event. For instance:

Event{2,4}\{2,4\}
{1,2,3}\{1,2,3\}1/61/6
{4,5}\{4,5\}1/61/6
{6}\{6\}0
 \text{ }Pr({2,4})=2/6\text{Pr}(\{2,4\}) = 2/6

The probability in the bottom row is an example of a marginal probability.

Given a joint probability table, the marginal probabilities are the sums of the rows and columns of the table. You can remember the name marginal by thinking that the marginal probabilites live at the margin, or edge, of the table. In the original example we have four joint probabilities and four marginals:

Joints:={Pr(A,B)=pABPr(Ac,B)=pAcBPr(A,Bc)=pABcPr(Ac,Bc)=pAcBcMarginals:={Pr(A)=pAB+pABcPr(Ac)=pAcB+pAcBcPr(B)=...Pr(Bc)=...\textbf{Joints:} = \begin{cases} & \text{Pr}(A,B) = p_{AB} \\ & \text{Pr}(A^c,B) = p_{A^cB} \\ & \text{Pr}(A,B^c) = p_{AB^c} \\ & \text{Pr}(A^c,B^c) = p_{A^cB^c} \\ \end{cases} \quad \textbf{Marginals:} = \begin{cases} & \text{Pr}(A) = p_{AB} + p_{AB^c} \\ & \text{Pr}(A^c) = p_{A^cB} + p_{A^cB^c} \\ & \text{Pr}(B) = ... \\ & \text{Pr}(B^c) = ... \\ \end{cases}

🛠️ To check your understanding, try to fill in the ... above.

Here’s the completed joint probability table for our original example, including the marginals:

EventAAnot AABB Marginals
BBpAB=2/6p_{AB} = 2/6 pAcB=3/6p_{A^cB} = 3/6 2/6+3/6=5/62/6 + 3/6 = 5/6
not BBpABc=1/6p_{AB^c} = 1/6 pAcBc=0p_{A^cB^c} = 0 2/6+0=1/62/6 + 0 = 1/6
AA Marginals2/6+1/6=3/62/6 + 1/6 = 3/63/6+0=3/63/6 + 0 = 3/61

Notice that:

So, if we add the marginals, then we have more rules that can be used to fill in missing entries. You can think about this like Sudoku. All the joint entries must add to one, all of the marginals along a boundary must add to one, and all the joint entries in a given row or column must add to the marginal for that row or column:

EventAAnot AABB Marginals
BBpABp_{AB}pAcBp_{A^cB}pB=pAB+pAcBp_B = p_{AB} + p_{A^cB}
not BBpABcp_{AB^c}pAcBc=0p_{A^cB^c} = 0 pBc=pABc+pAcBcp_{B^c} = p_{AB^c} + p_{A^cB^c}
AA MarginalspA=pAB+pABcp_{A} = p_{AB} + p_{AB^c}pAc=pAcB+pAcBcp_{A^c} = p_{A^cB} + p_{A^cB^c}1

Here’s the full table for our larger example:

Event{2,4}\{2,4\}{1,3,5}\{1,3,5\}{6}\{6\}Marginals
{1,2,3}\{1,2,3\}1/61/62/62/603/63/6
{4,5}\{4,5\}1/61/6 1/61/602/62/6
{6}\{6\}000 1/61/61/61/6
Marginals2/62/63/63/61/61/61

The procedure we just used to compute marginal probabilities from joint probabilities is called marginalization. To marginalize:

You’ll use this strategy a lot in this class. It is often true that we want to find the probability of some event, the probability is hard to compute, but can be computed if we break down the event according to a list of ways it can happen. As long as we can compute the probability of each way it can occur, we can add those chance together to get the chance of the desired event.