Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1.3 The Rules of Chance

Not all outcomes are equally likely, not all systems are symmetric or close to symmetric, and, while we can build outcome spaces without equally likely outcomes from outcome spaces with equally likely outcomes, it is rarely worthwhile to expand the natural description of outcomes into a detailed description in pursuit of equal likelihood. In many cases, the virtue of a probability model is its ability to match observed frequencies, and to derive consequences of those frequencies, without requiring a detailed explanation for the mechanism that produced them.

What we want is a more general theory that allows any outcome space, and any assignment of chance to events, such that the assignment of chance to events could be equated to frequencies of outcomes in a string of repeated trials, or to proportions in a more detailed symmetric model. Thankfully, many models can satisfy these requirements as long as they satisfy three simple rules. These rules are the heart of probability.

The Probability Axioms

A probability model is a procedure that returns an answer for any question of the kind, “What is the probability of the event EE?” The procedure should provide an answer for any event EE in an outcome space. Since events are subsets of Ω\Omega, the probability model must specify a rule that accepts sets as inputs and returns chances. Since chances are meant to model proportions, or frequencies, they are real valued numbers between zero and one. The function that accepts any subset of Ω\Omega, and returns a real number representing its chance is the probability measure. A measure is any function that accepts all subsets of an outcome space Ω\Omega and returns a real number.

We’ve already fixed notation for the probability measure, since we defined Pr(E)\text{Pr}(E) to be the probability of the event EE. Therefore, Pr\text{Pr} is the probability measure. The probability model is the combination of the outcome space Ω\Omega, the collection of all its subsets that we might ask about as events, and the probability measure Pr\text{Pr} that returns the chance of any event we ask about.

A probability model is valid if the measure satisfies three rules. These are Kolmogorov’s probability axioms. The first two are not surprising, since chances are meant to model proportions or frequencies. The last is the most substantive. We observed that it was true for proportions (see Section 1.2).

Categorical Distributions

While the axioms provide some basic rules the measure Pr()\text{Pr}(\cdot) must satisfy, they give no guidance on the actual choice of measure. The axioms alone cannot determine the chance of an event. To determine a chance we need a model.

We saw this same conceptual division in the previous chapter.

  • Assuming equally likely outcomes fixed a model since it assigned chances to individual outcomes. It did not give any direction on how those chances combine, or how to compute the probability of an event containing multiple outcomes.

  • Asserting a relationship between probability and long run frequency did not provide a rule for computing chances. It did give explicit direction on how chances combine, namely, chances for disjoint events add.

We put these rules together to build up probability as proportion. By assigning a chance to each outcome, then by adding chances together to find the probability of events, we saw that, if outcomes are equally likely, then the probability of an event is the fraction of all possible outcomes that satisfy the event definition. We can repeat the same process by applying the axioms to any rule that assigns a well-defined chance to every outcome.

A categorical distribution is a rule that assigns a chance to every distinct outcome in Ω\Omega when Ω\Omega is finite. Then, by the additivity axiom:

Pr(E)=ωEPr(ω)\text{Pr}(E) = \sum_{\omega \in E} \text{Pr}(\omega)

You should read the equation above as: the probability of any event (EE) is the sum, over all the ways the event can happen (ωE\omega \in E), of the probability of each way the event can happen (Pr(ω)\text{Pr}(\omega))." In other words, the chance of an event is the total chance of all the outcomes contained in the event.

Categorical distributions are the most directly defined probability models; simply assign a chance to every outcome. The equally likely model we studied before is a uniform categorical distribution. It is uniform since it assigns the same chance to every outcome.

Categorical distributions get their name from their application. They are often used to categorize and classify data.

Using Axioms

The Complement Rule

An axiom is a mathematical statement that is asserted as a basic premise for a theory. Many rich mathematical areas are built by first posing a short list of self-evident (or, at least, plausible) axioms. Once accepted, the axioms are used to prove the rest of the theory. Let’s practice this way of thinking using the probability axioms.

In Section 1.2 we observed that, when we equated probability to proportion, the rule:

Pr(Ac)=1Pr(A)\text{Pr}(A^c) = 1 - \text{Pr}(A)

That is, the probability that an event does not occur equals one minus the probability that it does occur.

This rule might appear self-evident. For instance, it may feel instinctive that, if the probability that it rains tomorrow is 1/3, then the probability is doesn’t rain should be 2/3. Let’s show that we don’t need this rule as an additional fourth axiom. Instead, it is implied by the first three.

Let’s add this rule to our list:

Bounding The Chance of a Union

Our union rule looks different than the complement rule since it does not hold for all pairs of events. What happens if AA and BB are not disjoint?

An example suffices. What is the chance that a single roll of a fair die is even or is less than 4. The event that the roll is even contains the outcomes {2,4,6}\{2,4,6\}. The event that it is less than 4 contains the outcomes {1,2,3}\{1,2,3\}. So, the event that the roll is even or less than 4 contains the outcomes {1,2,3,4,6}\{1,2,3,4,6\}. There are five of these outcomes so the desired probability is 5/65/6.

What would have happened if we tried to apply our rule?

Pr(even)+Pr(less than 4)=36+36=156\text{Pr}(\text{even}) + \text{Pr}(\text{less than 4}) = \frac{3}{6} + \frac{3}{6} = 1 \neq \frac{5}{6}

In this case the sum of the individual probabilities is too large. It equals one, which is clearly wrong, since it is possible to roll a value that is neither even nor less than 4. The only such value is a 5, so Pr(even or less than 4)=1Pr(odd and greater than or equal to 4)=1(1/6)=5/6.\text{Pr}(\text{even or less than 4}) = 1 - \text{Pr}(\text{odd and greater than or equal to 4}) = 1 - (1/6) = 5/6.

The sum of the individual probabilities is too large since it counts an outcome twice. The outcome where the die lands on 2 appears in both events since 2 is both even and less than 4. In fact, {2}={even and < 4}={2,4,6}{1,2,3}\{2\} = \{\text{even and < 4}\} = \{2,4,6\} \cap \{1,2,3\}. The sum of the probability that the die roll is even and the probability that the roll is less than four counts the outcomes that are even and less than four twice, once for each set. Therefore, it overcounts the intersection of the events:

Pr(A)+Pr(B)=Pr(A but not B)+2×Pr(A and B)+Pr(B but not A)=(Pr(A but not B)+Pr(A and B)+Pr(B but not A))+Pr(A and B)=Pr(A or B)+Pr(A and B)\begin{aligned} \text{Pr}(A) + \text{Pr}(B) & = \text{Pr}(A \text{ but not } B) + 2 \times \text{Pr}(A \text{ and } B) + \text{Pr}(B \text{ but not } A) \\ & = (\text{Pr}(A \text{ but not } B) + \text{Pr}(A \text{ and } B) + \text{Pr}(B \text{ but not } A)) + \text{Pr}(A \text{ and } B) = \text{Pr}(A \text{ or } B) + \text{Pr}(A \text{ and } B) \end{aligned}

This is the general rule for unions:

Pr(A or B)=Pr(A)+Pr(B)Pr(A and B)\text{Pr}(A \text{ or } B) = \text{Pr}(A) + \text{Pr}(B) - \text{Pr}(A \text{ and } B)

Since all probabilities are nonnegative:

Pr(A or B)Pr(A)+Pr(B)\text{Pr}(A \text{ or } B) \leq \text{Pr}(A) + \text{Pr}(B)

with equality if and only if the intersection has probability zero, Pr(A and B)=0\text{Pr}(A \text{ and } B) = 0. In other words, if and only if the events are mutually exclusive.

Let’s add this last rule to our list:

You’ll practice with this rule in discussion section and on your HW.

Summary

To summarize our rules, let’s append a column to the logic and sets table from Section 1.1: