This chapter introduced our main modeling tools.
Interactive Tools:¶
Distribution Plotter. This is the tool used in Sections 2.2 and 2.4 to visualized PMFs and PDFs. You will need it for your first HW. It is a good reference to come back to throughout the course. I suggest you bookmark it.
Dartboard Sampler - This is the example used in Section 2.3 to derive the idea of a density function.
Random Variables and Distributions¶
These definitions are all available in Section 2.1.
A random variable is a randomly selected number.
The support of a random variable is the range of possible values it can attain. The support is to random variables as the outcome space is to randomly chosen outcomes.
Random variables are modelled using distribution functions
A probability mass function (PMF) is the function:
We often visualize a PMF with a bar chart (probability histogram) with one bar per possible value of the random variable, and heights equal to the chance that value occurs
A valid PMF must return nonnegative values, and must be normalized (its values must sum to one). Visually, the area of all the bars in a probability histogram must equal 1.
A cumulative distribution function (CDF) is the function:
The PDF and CDF are related by the additivity property:
The CDF can be used to compute chances on intervals:
Discrete Models¶
These definitions are all available in Section 2.2.
A discrete random variable is a random variable that is not continuous. It is usually a random variable that can take on finitely many values, or is restricted to the integers, so represents random counts.
Random variables may be defined implicitly, by the process that generates outcomes, or explicitly by fixing a support and a distribution function
A Bernoulli random variable is:
Implicit: an indicator for a random event that returns 0 if the event doesn’t happen and 1 if the event does happen.
Explicit: a binary random variable with support and where .
The parameter of the Bernoulli is the success probability of the associated event.
A Geometric random variable is:
Implicit: the number of repetitions of independent, identical Bernoulli (binary) trials up to and including the first success.
Explicit: a random variable with support equal to the positive integers, , and PMF:
The parameter of the Geometric is the success probability of each trial.
A Binomial random variable is:
Implicit: the number of successes in a string of repeated identical, independent Bernoulli (binary) trials.
Explicit: a random variable supported on for some positive integer , with PMF:
The parameter is the number of trials, and is the chance of success in each trial.
Continuous Models¶
Section 2.3 is largely philosophical. It proves, and works to justify, the following statement:
If is a continuous random variable, then for all .
That is, all exact events have chance equal to zero. Section 2.3 shows that this property is needed in any model where chances vary continuously with changes to events. Open up the [Dartboard Sampler](Dartboard Sampler) and use it to try to compute the chance a dart is exactly a distance from the center of the board. You’ll see that, no matter how many samples you use, you’ll never find any exactly a distance from the center, no matter what you pick.
As a consequence, we never need to distinguish the events from or, from
We showed, by symmetry, that if is a uniform random variable, then probability is equal to proportion, where the size of sets is measured using length (1 dimension), area (2 dimensions), or volume (3 dimensions).
Probability Densities¶
These results are all explained in Section 2.3.
If is a continuous random variable then its probability density function is defined:
Any function that is both nonnegative and normalized (integrates to 1) could be a density. No function that is ever negative, or integrates to a number other than one, is a density.
We specify a continuous random variable by PDF, CDF, or measure, and move between all three:
PDF to measure:
PDF to CDF:
CDF to measure:
CDF to PDF:
is a Uniform random variable on if and is constant for all .
is an Exponential random variable with parameter if and if .
The parameter must be greater than 0
is a Pareto random variable with parameters if and .
Both parameters and must be greater than 0
Density functions are often written where is a simpler functional form that determines the shape of the distribution. Then where is the normalizing constant .
In general, is a function with some free parameters that depends on the parameters and . For example . Then, the normalizing constant is a function of the free parameter but is not a function of .
For example, the normalizing constant for the exponential is .
We should read densities by recognizing their support and functional forms first, then their normalizing constants.