Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1.2 Probability as Proportion

Models of Chance

Equally Likely Outcomes

What’s the chance a coin lands heads?

Given no other information about the coin, or the tosser, you might answer 1/2. If asked why, you might answer that the coin has a 1/2 chance of landing heads because the coin has 2 sides.

You could then point to other examples. There is a 1/6 chance a die lands on any of its six sides. Given a 52 card deck there is a 1/52 chance of pulling any card after shuffling thoroughly. There is a 1/38 chance that a roulette ball lands in any pocket on the roulette wheel since there are 38 pockets on the wheel.

This is the oldest and most widely accepted model of chance. It describes, or approximates, many of the physical processes we use to introduce, produce, and learn about randomness. Many games of chance are built around physical processes where the chance of any outcome is simply one divided by the total number of possible outcomes, Ω|\Omega|.

While familiar and succesful, this model can’t quite be true. Take the deck of cards. If a player doesn’t shuffle enough times, then cards starting near the top can’t move to the bottom without cutting the deck. Cards starting at the bottom can’t move to the top. So, if we only shuffle once, and draw off the top, there aren’t really 52 possible options.

Moreover, you could reasonably argue that the cards starting close to the top are more likely to be drawn than those starting farther from the top. Shuffling again mixes the cards to balance the chances, but what is true for one shuffle should also be true for two shuffles, or three, or four. In fact, it strains belief that after any number of shuffles the deck is exactly mixed so that every card is exactly equally likely to appear on the top no matter where it started. Instead, we usually use 1/52 since, after many shuffles, 1/52 is a good approximation, and the approximation gets so close that, after a while, it is essentially true.

The card shuffling example raises a key assumption hidden in the claim that the probability of an outcome equals one divided by the number of possible outcomes: this model assumes all outcomes are equally likely. It turns out that the statements:

are equivalent when there are finitely many possible outcomes. The second statement is easier to reason with, so let’s think about it a bit more.

There are a couple common reasons to accept the claim that all events are equally likely:

The second argument is often used when we don’t really know how to assign chances to outcomes, so should work with the simplest plausible model until we collect enough evidence, or information, to reject it. This is often an argument used in hypothesis testing. The second argument may also be adopted as a simplifying ideal.

The last argument is often the best, and explains many of the examples that satisfy the first criteria. Cards are distinguished by the images painted on their surfaces. While these images differ, they change the physical properties of the cards in such minute ways that the cards should all behave the same when shuffled. Similarly, the slight differences in the images engraved on the sides of a coin, or cut into the sides of a die, are so small that they shouldn’t have much impact on how the coin or die rotates in the air, bounces, rolls, spins, and ultimately lands. The last argument is a symmetry argument. When outcomes are evidently asymmetric, as when we shuffle poorly and record the initial position of cards in the deck, then we shouldn’t assume equally likely outcomes.

A Thought Experiment

To test these ideas, consider the following thought experiment.

I offer to play you in a game involving a die, and show you a six-sided die I brought from home. I claim that the die is fair (all sides are equally likely). You are unsure, so ask to test the die first. You toss it ten times and it lands on the side labelled “4” nine out of the ten tosses. Would you still believe that the die is fair?

You would be within your rights to pause and contest my claim. A fair die landing on a specific side in nine of ten tosses seems absurd. It’s possible, but it would be very unlikely. Indeed, the probability a fair die lands on a specific side in nine of ten tosses is 10×(1/6)9×5/6=5/1,077,6960.000000510 \times (1/6)^9 \times 5/6 = 5/1,077,696 \approx 0.0000005. That about 5 in one million. So, you have pretty strong statistical evidence that the die is biased to the side four. Most professional statisticians would consider this event unlikely enough to reject the claim that the die is fair.

Nevertheless, I could persist. “The die is fair”, I claim. “Five in a million is small, sure, but it isn’t zero. Think about how many people have played with dice in California this year. There are about 40 million Californians. If one in every four played with dice this year, then about fifty should have seen this exact event. Moreover, its not enough just to know that a specific event is unlikely, since most events, spelled out in detail, are very unlikely. The sequence 1334621114 looks totally normal while the sequence 4444444442 looks suspicious, but, if the die is fair, then both would have chance (1/6)10=1/10,077,6960.0000001(1/6)^{10} = 1/10,077,696 \approx 0.0000001. They’re equally unlikely even though the first looks typical while the second looks atypical. In sum, coincidences happen (someone wins the lottery) and happen constantly.”

How could we resolve our dispute? Think about how you would resolve it before opening the drop down below.

The last option illustrates the advantage of justification (3.). The argument that outcomes are equally likely because they are indistinguishable to the process allows deduction. It establishes equal likelihood as the consequence of an alternate claim. If we accept the premise that the outcomes are indistinguishable to the process, then we must accept the consequence that they are equally likely. By establishing this chain from premise to conclusion we can shift our argument from probabilities to characteristics of outcomes that could influence the behavior of the process. If we can show that all aspects of the outcomes that influence the process are identical, then we can reach consensus that the outcomes are equally likely.

Frequency Measures Chance

The first option, just keep rolling, illustrates the second basic model of chance. It is the model preferred by most statisticians. If the underlying outcomes are not symmetric, and we have enough evidence to question the simplest model (equal likelihood), then we can’t simply set the probability of an outcome to 1/Ω1/|\Omega|. To work out the probabilities we could either try to derive them from some other premises that we are confident in, or, could try to measure them. The first approach is deductive, it derives chances from some alternate claim. The second is inductive. It measures chances by relating them to an experimental procedure:

The ratio of the number of times the event happened to the total number of trials ought to reflect the chance the event occurs in any individual trial. We call the ratio of the number of times an event occurs to the number of trials the frequency of the event.

For instance, if a coin is fair, then about half of all tosses should be heads and about half should be tails. Since each toss is random, the exact frequency of heads may differ from 1/2, and must differ from 1/2 when the total number of tosses is odd. For example, on a single toss, the exact frequency can only be 0 or 1. However, if we toss the coin many many times, the variability in the frequency should decrease. It’s plausible that we only see heads in 4 tosses. It’s very unlikely that we’ll only see heads in 100 tosses if the coin is fair.

This suggests an empirical definition for chance:

We’ll make this more formal later in the course. The relation between the number of trials, variability in the observed frequency, and underlying chance, is the subject of the first and most fundamental result in probability, the Law of Large Numbers. This result is fundamental since it establishes a hypothetical procedure that could measure chances objectively.

Probability as Proportion

We’ve now seen two models of chance:

The first model tells us directly how to compute the probability of any specific outcome. It doesn’t tell us how to compute the probability of an arbitrary event from the probability of each outcome. The second model is more helpful since it explains how probabilities of events should change when we vary the definition of the event. In short, if probabilities equal long run frequencies, then probabilities must obey the same algebra rules as long run frequencies. So, the first model asserts specific probabilities for outcomes, while the second asserts specific rules we can use to manipulate chances, no matter the chances assigned to outcomes.

Consider the second definition. Under the second definition the chance of any event must behave in the same way as the fraction of all trials in which the event occurs in a long sequence of trials. Therefore:

How should the probabilities of events combine?

Suppose that EE and FF are disjoint events. By definition, disjoint events are nonoverlapping sets of outcomes. This means that the events cannot occur simultaneously. Any trial will produce an outcome that is either in EE, in FF, or not in either. So, when the number of trials in EE or FF will equal the number of trials in EE plus the number of trials in FF. It follows that the frequency (fraction of all trials) of the event G=E or FG = E \text{ or } F must equal the frequency of the event EE plus the frequency of the event FF.

So, if probabilities are to behave as frequencies, then:

Rule (4.) is our first algebra rule for chances. We can use it to complete our probability model for equally likely outcomes.

This is our first real model of probability. It equates probability to the fraction of all outcomes contained in the event. It also provides a formula for computing the probability of an event. If all outcomes are equally likely and there are finitely many:

Examples

Let’s practice this approach.

Permutations

Suppose that I have three cards labeled aa, bb, and cc. The cards are otherwise identical. We shuffle the deck, then draw the cards in order from the top, without replacing any cards. We shuffle thoroughly, so each card is equally likely to appear in any location in the deck.

The first thing to do is to write down the outcome space. We’ve seen it already. It consists of all distinct ways in which we can order the three cards. Since the cards all have a unique label, the outcome space is the set of all permuations of the labels aa, bb, and cc:

Ω={abc,acb,bac,bca,cab,cba}\Omega = \{abc,acb,bac,bca,cab,cba\}

Next, count the number of possible outcomes. This will be the denominator each time we compute a chance:

Ω=6|\Omega| = 6

Finally, given an event EE, count the number of ways the event can happen. This will be the numerator. For example, if the event AA is all outcomes where aa occurs first. Then A=2|A| = 2. Therefore Pr(A)=2/6=1/3.\text{Pr}(A) = 2/6 = 1/3.

We can repeat this process for any event. 🛠️ Complete each example below:

EventVerbal DescriptionSubsetSize of SubsetChance
AAaa appears first{abc,acb}\{abc, acb\}22/6
BBaa and bb are next to each other{abc,bac,cab,cba}\{abc, bac, cab, cba\}
CCthe letters are in reverse alphabetical order{cba}\{cba\}
DDaa does not appear\emptyset
EEbb is either first, second, or thirdΩ\Omega
FFthe letters form a word that means ‘taxi’{cab}\{cab\}

Notice that, the probability of the emptyset is the probability that an impossible event occurs. If an event is impossible, then it never occurs, so its chance must be zero.

Notice also that, the probability of every event is at least as large as the probability of any more detailed event. For example the probability that aa appears first is larger than the probability that aa appears first and bb appears second. Or, the probability that aa and bb are next to each other is greater than the probability that aa and bb appear next to each other in the order abab. This is an important rule to keep in mind since it gives us a strategy for bounding probabilities. Adding detail to the description of an event never makes the event more likely, so, the probability of a detailed event is, at most, the probability of a less detailed description of the event.

Let’s add these rules to our growing list of probability rules:

Poker Hands

Suppose, now, that we are playing a card game with a standard 52 card deck. If we draw off the top of a thoroughly shuffled deck we can model any sequence of draws by:

  • considering an outcome a specific order of all 52 cards in the stack,

  • an event a specific statement about the order of the cards in the stack, and

  • assuming that all sequences of 52 are equally likely.

Once again, we are working with an outcome space Ω\Omega that contains all permutations of a list of nn distinguishable objects. In the previous example we had 3 objects. Now we have 52. These examples differ only in that the number of possible permutations of 52 cards is enormously large. Far too large to write out Ω\Omega explicitly. So, we’ll have to get better at counting. We will need to learn to count the sizes of sets without simply listing all the members of the set.

First, how big is Ω\Omega?

It’s hard to think about all 52 cards at once, so, imagine you are drawing cards off the top, one at a time. There are 52 options for the first card. Once we’ve drawn the first there are 51 remaining options for the second. Once we’ve drawn the second there are 50 remaining options for the third. The process continues. Each time the total number of options multiplies. There are 52×5152 \times 51 options for the first two cards. There are 52×51×5052 \times 51 \times 50 options for the first three cards. Repeating the pattern:

Ω=52×51×50×49×...×1=j=152(52j)=52!|\Omega| = 52 \times 51 \times 50 \times 49 \times ... \times 1 = \prod_{j=1}^{52} (52 - j) = 52!

Here, \prod means “take the product of” all the terms appearing inside the product symbol, sweeping over all values of the index jj. The product sign works like the summation symbol \sum.

Notice, this rule also works in our previous example. There, Ω=3!=3×2×1=6|\Omega| = 3! = 3 \times 2 \times 1 = 6.

Now let’s find the chance of an event. What is the chance that:

  1. AA={ I draw two aces in my first two draws}AA = \{\text{ I draw two aces in my first two draws}\}?

  2. AS={ I draw two ace then a spade in my first two draws}AS = \{\text{ I draw two ace then a spade in my first two draws}\}?

To find the probabilities, we need the size of each set:

Pairs of Die

Suppose that we roll two fair die. Then the outcome space Ω\Omega is all possible pairs of numbers between 1 and 36. Since there are six options for each roll, and the rolls don’t influence each other, we have:

Ω=6×6=36.|\Omega| = 6 \times 6 = 36.

What is the chance that:

There is a nice alternate way to think about this chance. Notice that:

Pr(the first roll is even and the second roll is less than 5)=12/36=(3/6)×(4/6)=Pr(even)×Pr(less than 5)\begin{aligned}\text{Pr}(\text{the first roll is even and the second roll is less than 5}) & = 12/36 \\ & = (3/6) \times (4/6) \\ & = \text{Pr}(\text{even}) \times \text{Pr}(\text{less than 5}) \end{aligned}

This example suggests another probability rule: Pr(A and B)=Pr(AB)=Pr(A)×Pr(B)\text{Pr}(A \text{ and } B) = \text{Pr}(A \cap B) = \text{Pr}(A) \times \text{Pr}(B). While intuitive, this rule is not strictly true, and will fail unless we are careful. Do not apply it blindly.

Take the two ace example above. The chance of drawing an ace in any individual card is 4/52=1/134/52 = 1/13. So, if we used the rule above we would have computed Pr(AA)=(1/13)×(1/13)=1/132\text{Pr}(AA) = (1/13) \times (1/13) = 1/13^2. Instead, we found Pr(AA)=(4/52)×(3/51)=(1/13)×(3/51)<(1/13)×(1/13)\text{Pr}(AA) = (4/52) \times (3/51) = (1/13) \times (3/51) < (1/13) \times (1/13). So, the rule cannot always be true.

The key difference in these examples is how the events relate to one another. In the dice rolling example the outcome of the first roll has no influence on the second roll. In the card example, the outcome of the first draw influences the second draw since, if we draw an ace, there are fewer aces remaining in the deck.

Notice that, even though all pairs of rolls are equally likely, all sums of pairs are not. There are 6 ways the rolls can add to 7, but there is only 1 way they can add to 2. So, the probability that we see a sum SS equal to s[2,12]s \in [2, 12] depends on ss.

This is an important point. Even if all outcomes are equally likely, all events are not.

Consider the biased die example introduced as a thought experiment. Even though every specific sequence of ten rolls are equally likely, there are very few sequences of 10 rolls where the total number of fours rolled is 9, while there are many sequences where the side labelled “4” appears 1/6×1021/6 \times 10 \approx 2 times.

Rules

Through example, we’ve seen some rules that will help us break probability calculations down into simpler pieces. Instead of always computing the probability of an event by counting the ways it can happen, and dividing by the total number of possible outcomes, we can break probabilities down using the following rules:

These rules are helpful since they allow us to manipulate our questions. If you can’t answer a question as asked, try to express the event according to one of these logical operations applied to events that are easier to work with.

In the next section we will start from these rules, and show that any probability model can be defined consistently as long as it satisfies the disjoint union rule.