Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

2.1 Random Variables and Distributions

For instance, the shoe size of a randomly polled student. The shoe size is a number that is determined by the outcome, in this case, the choice of student. The formal definition generalizes the idea that, in many situations, we summarize a random outcome with a measurement, or with some summary number.

It is standard practice to denote:

  • A random variable with a captial letter, e.g. XX

  • If we want to emphasize that the random variable is determine dby some randomly selected outcome ω\omega then we might write it as a function X(ω)X(\omega)

  • A possible value of the random variable is xx

Distribution Functions

We can represent the table in the example discussed above with a bar plot:

PMF for the sum of two rolls.

This is an example of a probability histogram. The horizontal axis indicates possible values, ss, of the random variable SS. The vertical axis represents probability. The height of the bar at S=sS = s denotes Pr(S=s)\text{Pr}(S = s).

Notice: this is the first time we’ve been able to actually plot all the probabilities of each possible outcome. That’s because generic outcome spaces have no natural organization or order. In order to plot something, we need to be able to order an input axis. In many of our previous examples, there was no natural way to choose which outcomes to list before which other outcomes. Random variables are just randomly chosen numbers, drawn from some set of possible numbers. Since numbers are ordered, we can actually plot a list of values that determine the chance of any statement about the random variable. In other words, we can define functions which assign chances to values, and that determine the chance of any other statement or event regarding the variable. These are distributions.

The function that returns the height of each bar is an example of a distribution function.

Distribution functions are to random variables as probability measures are to events. A distribution function is a function that accepts a possible value of a random variable, and returns the probability of a standardized question about that value.

The most natural choice is to plot the chance of each possible value:

The histogram shown above is a probability mass function, or PMF. The heights of the histogram correspond to the PMF: $Pr(S=s)\text{Pr}(S = s) as a function of ss.

Note: the use of the word “mass” in the PMF might seem odd. It’s a reference to the common analogy that probability acts like a collection of masses assigned to objects, where all the masses add to one. We’ll see the reason to adopt this odd analogy in Section 2.3 when we consider continuous random variables, who don’t have a useful PMF, but are characterized by a notion of density.

The last example listed above is an example of a cumulative probability. It is cumulative since it is a sum of chances for sequential values of the random variable. Probabilities of this kind are also associated with a standard distribution function:

The CDF is assigned a standard notation:

FX(x)=Pr(Xx).F_X(x) = \text{Pr}(X \leq x).

The subscript XX means, for the random variable XX, the argument is an upper bound, and the value returned is the chance XX is less than or equal to the upper bound. We’ll use that notation interchangeably with the more transparent notation:

CDF(x)=Pr(Xx)\text{CDF}(x) = \text{Pr}(X \leq x)

and add a subscript when it is unclear which random variable is of interest.

In other words, the CDF is the running sum of the values of the PMF.

Here’s the CDF for the sum of two rolls:

CDF for the sum of two rolls.

Differences in CDF values return the probability that a random variable lands in any interval. For instance, the chance that SS is between 6 and 11 is:

Pr(S{6,7,8,9,10,11}=CDF(11)CDF(5)\text{Pr}(S \in \{6,7,8,9,10,11\} = \text{CDF}(11) - \text{CDF}(5)

since subtracting off the CDF evaluated at 5 will remove from the sum any chances contributed by S=1S = 1, S=2S = 2, ..., S=4S = 4, and S=5S = 5.

We can also use the CDF to find the chance that a random variables is greater than a lower bound by applying the complement rule:

Pr(X>x)=1Pr(Xx)=1CDF(x)\text{Pr}(X > x) = 1 - \text{Pr}(X \leq x) = 1 - \text{CDF}(x)

Since we can use the CDF to find the probability that a random variable is contained beneath any upper bound, between any two bounds, or above any lower bound, we can use the CDF to compute the chance of any event statement regarding a random variable. So, like the PMF, if we know the CDF, then we know every detail needed to compute chances. In other words, the PMF and the CDF both fully specify a probability model.