For instance, the shoe size of a randomly polled student. The shoe size is a number that is determined by the outcome, in this case, the choice of student. The formal definition generalizes the idea that, in many situations, we summarize a random outcome with a measurement, or with some summary number.
It is standard practice to denote:
A random variable with a captial letter, e.g.
If we want to emphasize that the random variable is determine dby some randomly selected outcome then we might write it as a function
A possible value of the random variable is
Distribution Functions¶
We can represent the table in the example discussed above with a bar plot:
This is an example of a probability histogram. The horizontal axis indicates possible values, , of the random variable . The vertical axis represents probability. The height of the bar at denotes .
Notice: this is the first time we’ve been able to actually plot all the probabilities of each possible outcome. That’s because generic outcome spaces have no natural organization or order. In order to plot something, we need to be able to order an input axis. In many of our previous examples, there was no natural way to choose which outcomes to list before which other outcomes. Random variables are just randomly chosen numbers, drawn from some set of possible numbers. Since numbers are ordered, we can actually plot a list of values that determine the chance of any statement about the random variable. In other words, we can define functions which assign chances to values, and that determine the chance of any other statement or event regarding the variable. These are distributions.
The function that returns the height of each bar is an example of a distribution function.
Distribution functions are to random variables as probability measures are to events. A distribution function is a function that accepts a possible value of a random variable, and returns the probability of a standardized question about that value.
The most natural choice is to plot the chance of each possible value:
The distribution function that accepts possible values, , of a random variable, , and returns their chance, is called a probability mass function (PMF):
The histogram shown above is a probability mass function, or PMF. The heights of the histogram correspond to the PMF: $ as a function of .
Note: the use of the word “mass” in the PMF might seem odd. It’s a reference to the common analogy that probability acts like a collection of masses assigned to objects, where all the masses add to one. We’ll see the reason to adopt this odd analogy in Section 2.3 when we consider continuous random variables, who don’t have a useful PMF, but are characterized by a notion of density.
Calculating Chances from a PMF
Given a probability mass function (PMF) we can calculate the chances of events by summing the values of the PMF over all possible values of the random variable that satisfy the event. This is just an application of the additivity axiom from Section 1.3. For example:
What is the chance the sum of two rolls is even and less than 7?
What is the chance the sum of two rolls is greater than 10?
What is the chance the sum of two rolls is less than or equal to 5?
The last example listed above is an example of a cumulative probability. It is cumulative since it is a sum of chances for sequential values of the random variable. Probabilities of this kind are also associated with a standard distribution function:
The distribution function that accepts possible values, , of a random variable, , and returns the chance, is called a cumulative distribution function (CDF).
The CDF is assigned a standard notation:
The subscript means, for the random variable , the argument is an upper bound, and the value returned is the chance is less than or equal to the upper bound. We’ll use that notation interchangeably with the more transparent notation:
and add a subscript when it is unclear which random variable is of interest.
By applying the addition rule for unions, we can always express the CDF as a running sum of the PMF:
where the means take the sum over all values of specified by the set in the subscript, in this case, all values .
In other words, the CDF is the running sum of the values of the PMF.
Here’s the CDF for the sum of two rolls:
Notice that the CDF increases slowly at first, grows fastest in the middle, the slows down again at the end. Think about how the change in height of neighboring bars is related to the PMF. Why does the CDf increase fastest where the PMF is largest?
Differences in CDF values return the probability that a random variable lands in any interval. For instance, the chance that is between 6 and 11 is:
since subtracting off the CDF evaluated at 5 will remove from the sum any chances contributed by , , ..., , and .
We can also use the CDF to find the chance that a random variables is greater than a lower bound by applying the complement rule:
Since we can use the CDF to find the probability that a random variable is contained beneath any upper bound, between any two bounds, or above any lower bound, we can use the CDF to compute the chance of any event statement regarding a random variable. So, like the PMF, if we know the CDF, then we know every detail needed to compute chances. In other words, the PMF and the CDF both fully specify a probability model.
How would you find the value of the PMF at given only the CDF? What is ?
Solution
The value of the PMF at equals the difference in the successive values of the CDF closest to from below and above. If the possible values of are integers then .
So, given a plot of the CDF, the value of the PMF is the difference in height of the neighboring bars. Rapid changes in the CDF correspond to large PMF values. Where the CDF changes slowly, the PMF is short.
The PMF is much more natural than the CDF. You’ve seen lots of histograms in your past classes. They mostly look like the PMF. So, why introduce a CDF?
We’ve introduced the CDF becuase:
There are important distributions whose CDF is easier to work with, and
There are important examples where a well-defined probability model is fully specified by a CDF, but is not by a PMF.
If you’re curious, wait for Section 2.3