Random Variables and Distributions - Data 89 Course Notes

For instance, the shoe size of a randomly polled student. The shoe size is a number that is determined by the outcome, in this case, the choice of student. The formal definition generalizes the idea that, in many situations, we summarize a random outcome with a measurement, or with some summary number.

It is standard practice to denote:

A random variable with a captial letter, e.g. $X$
If we want to emphasize that the random variable is determined by some randomly selected outcome $\omega$ then we might write it as a function $X(\omega)$
A possible value of the random variable is $x$

The support is to the outcome space as random variables are to randomly generated outcomes. It is just the set of all possible values. For instance, if I toss a coin three times, and record the number of heads, $N$ , then the support of $N$ is the set $\{0,1,2,3\}$ .

Generally we will define random variables by either:

Describing the process that produces them. In this case the support, and distribution of the random variable, are derived as consequences of the process.
By directly fixing the support, and the chances assigned to different values of the random variable.

We will practice going from story/process to explicit definition via support/list of chances. For now, it is enough to know that, as for any random object, it is always a good idea to first ask, what are the possible values the random variable could return?

Example

Suppose that you roll two fair die, and the die are distinguishable. Then there are 36 possible joint outcomes, and the joint events table is:

Roll	1	2	3	4	5	6
1	(1,1)	(1,2)	(1,3)	(1,4)	(1,5)	(1,6)
2	(2,1)	(2,2)	(2,3)	(2,4)	(2,5)	(2,6)
3	(3,1)	(3,2)	(3,3)	(3,4)	(3,5)	(3,6)
4	(4,1)	(4,2)	(4,3)	(4,4)	(4,5)	(4,6)
5	(5,1)	(5,2)	(5,3)	(5,4)	(5,5)	(5,6)
6	(6,1)	(6,2)	(6,3)	(6,4)	(6,5)	(6,6)

All 36 possible outcomes $\omega$ are equally likely since the die are fair.

Now suppose that, as is true for many games, you are interested in the sum of the rolls. Let $S(\cdot)$ denote the function that accepts an outcome $\omega$ and returns the associated sum of rolls.

Anytime you consider a random variable you should first specify its support. At least, we roll two ones. At most, we roll two sixes. So, $S \in \{1,2,3,...,11,12\}$ .

Let’s fill in the table, replacing the outcomes, $\omega$ with the sum of the rolls:

Roll	1	2	3	4	5	6
1	2	3	4	5	6	7
2	3	4	5	6	7	8
3	4	5	6	7	8	9
4	5	6	7	8	9	10
5	6	7	8	9	10	11
6	7	8	9	10	11	12

Notice that, even though all pairs of rolls were equally likely, the number of ways the pairs can add up to some value $s \in \{1,2,..., 11,12\}$ depend on $s$ .

What’s the chance that $S = 5$ ?

To find the chance, use probability by proportion. First, isolate all pairs of rolls that add to five. The associated collection is a level set of the function $S(\omega)$ it is the collection $E_5 = \{\text{all } \omega \text{ such that } S(\omega) = 5\}$ . I’ve highlighted that set below:

Roll	1	2	3	4	5	6
1	.	.	.	5	.	.
2	.	.	5	.	.	.
3	.	5	.	.	.	.
4	5	.	.	.	.	.
5	.	.	.	.	.	.
6	.	.	.	.	.	.

There are four pairs of rolls that add to 5 (four outcomes in the set $E_5$ ) so:

\text{Pr}(S = 5) = \frac{|E_5|}{|\Omega|} = \frac{4}{36} = \frac{1}{9}.

(1)

We could repeat the same process for a different value. For instance, what’s the chance $S = 10$ ?

Again, isolate the corresponding level set, and count its size. That is, count the number of ways two pairs can add to 8:

Roll	1	2	3	4	5	6
1	.	.	.	.	.	.
2	.	.	.	.	.	.
3	.	.	.	.	.	.
4	.	.	.	.	.	10
5	.	.	.	.	10	.
6	.	.	.	10	.	.

There are three pairs of rolls that add to 5 (three outcomes in the set $E_{10}$ ) so:

\text{Pr}(S = 10) = \frac{|E_{10}|}{|\Omega|} = \frac{3}{36} = \frac{1}{12}.

(2)

Repeating this process for each possible value of $s$ gives:

Value $s$	2	3	4	5	6	7	8	9	10	11	12
Chance	1/36	2/36	3/36	4/36	5/36	6/36	5/36	4/36	3/36	2/36	1/36

Even though all pairs are equally likely, not all values of the random variable are equally likely. We are twice as likely to see $S = 7$ as $S = 10$ , and six times more likely to see $S = 7$ than $S = 2$ or than $S = 12$ . The middle values are more likely since there are more ways to pick a pair that add to 6 or to 7 or to 8 than to the extreme values like 2 or 12.

¶

Distribution Functions¶

We can represent the table in the example discussed above with a bar plot:

This is an example of a probability histogram. The horizontal axis indicates possible values, $s$ , of the random variable $S$ . The vertical axis represents probability. The height of the bar at $S = s$ denotes $\text{Pr}(S = s)$ .

Notice: this is the first time we’ve been able to actually plot all the probabilities of each possible outcome. That’s because generic outcome spaces have no natural organization or order. In order to plot something, we need to be able to order an input axis. In many of our previous examples, there was no natural way to choose which outcomes to list before which other outcomes. Random variables are just randomly chosen numbers, drawn from some set of possible numbers. Since numbers are ordered, we can actually plot a list of values that determine the chance of any statement about the random variable. In other words, we can define functions which assign chances to values, and that determine the chance of any other statement or event regarding the variable. These are distributions.

The function that returns the height of each bar is an example of a distribution function.

Distribution functions are to random variables as probability measures are to events. A distribution function is a function that accepts a possible value of a random variable, and returns the probability of a standardized question about that value.

The most natural choice is to plot the chance of each possible value:

The histogram shown above is a probability mass function, or PMF. The heights of the histogram correspond to the PMF: $ $\text{Pr}(S = s)$ as a function of $s$ .

Note: the use of the word “mass” in the PMF might seem odd. It’s a reference to the common analogy that probability acts like a collection of masses assigned to objects, where all the masses add to one. We’ll see the reason to adopt this odd analogy in Section 2.3 when we consider continuous random variables, who don’t have a useful PMF, but are characterized by a notion of density.

Calculating Chances from a PMF

Given a probability mass function (PMF) we can calculate the chances of events by summing the values of the PMF over all possible values of the random variable that satisfy the event. This is just an application of the additivity axiom from Section 1.3. For example:

What is the chance the sum of two rolls is even and less than 7?

\begin{aligned} \text{Pr}(S \text{ even and < 7}) &= \text{Pr}(S = 2 \text{ or } S = 4 \text{ or } S = 6) \\& = \text{Pr}(S = 2) + \text{Pr}(S = 4) + \text{Pr}(S = 6) \\& = \frac{1}{36}+ \frac{3}{36} + \frac{5}{36} \\ & = \frac{9}{36} = \frac{1}{4}. \end{aligned}

(4)

What is the chance the sum of two rolls is greater than 10?

\begin{aligned} \text{Pr}(S > 10) &= \text{Pr}(S = 11 \text{ or } S = 12) \\& = \text{Pr}(S = 11) + \text{Pr}(S = 12) \\& = \frac{2}{36}+ {1}{36}\\ & = \frac{3}{36} = \frac{1}{12}. \end{aligned}

(5)

What is the chance the sum of two rolls is less than or equal to 5?

\begin{aligned} \text{Pr}(S \leq 5) &= \text{Pr}(S = 1 \text{ or } S = 2 \text{ or } S = 3 \text{ or } S = 4) \\& = \text{Pr}(S = 1) + \text{Pr}(S = 2) + \text{Pr}(S = 3) + \text{Pr}(S = 4) \\& = \frac{1}{36}+ {2}{36} + \frac{3}{36} + \frac{4}{36}\\ & = \frac{10}{36} = \frac{5}{12}. \end{aligned}

(6)

The last example listed above is an example of a cumulative probability. It is cumulative since it is a sum of chances for sequential values of the random variable. Probabilities of this kind are also associated with a standard distribution function:

The CDF is assigned a standard notation:

F_X(x) = \text{Pr}(X \leq x).

(7)

The subscript $X$ means, for the random variable $X$ , the argument is an upper bound, and the value returned is the chance $X$ is less than or equal to the upper bound. We’ll use that notation interchangeably with the more transparent notation:

\text{CDF}(x) = \text{Pr}(X \leq x)

(8)

and add a subscript when it is unclear which random variable is of interest.

In other words, the CDF is the running sum of the values of the PMF.

Here’s the CDF for the sum of two rolls:

Differences in CDF values return the probability that a random variable lands in any interval. For instance, the chance that $S$ is between 6 and 11 is:

\text{Pr}(S \in \{6,7,8,9,10,11\} = \text{CDF}(11) - \text{CDF}(5)

(10)

since subtracting off the CDF evaluated at 5 will remove from the sum any chances contributed by $S = 1$ , $S = 2$ , ..., $S = 4$ , and $S = 5$ .

We can also use the CDF to find the chance that a random variables is greater than a lower bound by applying the complement rule:

\text{Pr}(X > x) = 1 - \text{Pr}(X \leq x) = 1 - \text{CDF}(x)

(11)

Since we can use the CDF to find the probability that a random variable is contained beneath any upper bound, between any two bounds, or above any lower bound, we can use the CDF to compute the chance of any event statement regarding a random variable. So, like the PMF, if we know the CDF, then we know every detail needed to compute chances. In other words, the PMF and the CDF both fully specify a probability model.

Solution

The value of the PMF at $x$ equals the difference in the successive values of the CDF closest to $x$ from below and above. If the possible values of $X$ are integers then $\text{Pr}(x) = \text{CDF}(x) - \text{CDF}(x - 1)$ .

So, given a plot of the CDF, the value of the PMF is the difference in height of the neighboring bars. Rapid changes in the CDF correspond to large PMF values. Where the CDF changes slowly, the PMF is short.

Interactive Example¶

Let’s explore the relationship between the PMF and CDF with a live code demo. To run the code, follow the instructions below.

Click on the power symbol on the upper right, then click on the play arrow on the code cell below:

from utils_dist import run_pdf_cdf_explorer

run_pdf_cdf_explorer(dist="Poisson", show="PDF");

You’ll should see a nice PMF above. Play with the parameter $\lambda$ using the available slider. This controls the shape of the PMF. Try $\lambda = 4$ or $\lambda = 5$ .

Then, move the slider that controls the position of the upper bound, $x$ . The visualization will highlight the bars of the PMF for all $y \leq x$ . The sum of the heights of these bars (the highlighted area) returns the corresponding CDF value since the CDF is the running sum of the PMF. To build up the CDF, gradually move the threshold, check the value for the shaded area printed above the visuals, then click “Save Value” to save the computed area at the current threshold. Repeat until you have a guess for the shape of the CDF. Then click “Reveal CDF”.

Now let’s try this the other way around. Run the code cell to create a new session:

from utils_dist import run_pdf_cdf_explorer

run_pdf_cdf_explorer(dist="Poisson", show="CDF");

Now, to recover the unknown values of the PMF, we should use the successive differences in the heights of the CDF bars. Pick a new distribution (still discrete) from the dropdown. Play with the parameters until you find a CDF you’re interested in. Then try to eyeball the PMF.

Vary the slider value for $x$ , record the difference in heights of the CDF bars, and click “Save Point” to add the computed value to the list of computed PMF values. Continue until you have a good sense of the PMF, then reveal it to check your guess.

2.1 Random Variables and Distributions

¶

Distribution Functions¶

Interactive Example¶