Continuous Models - Data 89 Course Notes

In Section 2.2 we introduced three key discrete models: the Bernoulli, the Geometric, and the Binomial. These are all models for random counts. Each is associated with a natural random process. These are, indicate whether a random event happened, count the number of repetitions of a random process until an event happens, or, count the number of times an event occured in a sequence of repetitions of a process. In each case we derived a distribution function and support from a description of a random process.

In this section, we will study models for continuous random variables. These are random variables that can take on any value in an open interval on the real line. For instance, the time between successive lightning strikes on the Empire State Building could be any nonnegative real number.

Continuous random variables differ from discrete random variables in two key ways:

Continuous random variables can’t be sensibly described using a PMF.
- We can’t describe a continuous random variable explicitly in the same way we did for discrete random variables. At least, we can’t use the same distribution functions. We can still use a CDF. But, we can’t build up the model by assigning a chance to every possible value of the random variable. There are uncountably many, and, since each is infinitely precise, the chance that a continuous random variable hits any value exactly is zero.
- So, instead of working with probability mass functions, we’ll work with their continuous analog, probability density functions.
It is harder to derive the distribution function for a continuous random variable directly from the process that produces it.
- So, many continuous random variables are defined explicitly instead of implicitly. The modeler choose a range of possible value for the random variable, and a probability density function that has characteristics they desire, or has a shape that matches observed data.
- By introducing continuous random variables early, we’ll see the immediate need to understand how functions encode shapes, and how, if we want to model a shape, we could construct it as a function.

This section focuses on the issues that arise when we try to define a continuous random variable. It is really about definitions. If you’re interested in how questions, rather than what questions, and are willing to take definitions on faith, skip to Section 2.4. We’ll pick up with calculation, and modeling by shape, there.

Continuity in Measure¶

What do we mean when we say a random variable is continuous?

A continuous object, in mathematics, is one that can be deformed smoothly. A continuous function is smooth in the sense that, arbitrarily small changes in inputs to the function cannot lead to arbitrarily large changes in the value of the function. If two inputs are sufficiently similar, then the outputs of the function are also sufficiently similar. More accurately, if we want to make two outputs close, we can always pick inputs that are similar enough. As a result, if two inputs approach each other, then the outputs of the function approach each other.

Every function is a machine for anwering a question; given this input, what is the corresponding output? When you write $f(x) = y$ you can think that $f(x)$ poses the question what is the output corresponding to the input $x$ and the output of the function, $y$ , is the answer. So, continuous functions are continuous in the sense that, the answers they produce vary smoothly in the way the question is posed. Small variations in the question produce small variations in the answers, and, any two questions can be made similar enough that their answers must also be similar.

Probability models are continuous if the function that maps from chance question to chance value varies smoothly in the set up of the question. In other words, if the measure $\text{Pr}(\cdot)$ returns approximately the same answer for approximately the same events. A measure is continuous if the chance of $\text{Pr}(E)$ approaches the chance of $\text{Pr}(E')$ if the event $E$ is close enough to $E'$ .

For instance, the chance a long jumper jumps over 20 feet should be about the chance they jump over 20 feet and 1 inch. These chances may differ, but they shouldn’t differ by too much. Moreover, if we had asked, what is the chance the long jumper jumps over 20 feet and 0.1 inches, or 20 feet and 0.01 inches, the answers should be essentially the chance the long jumper jumps more than 20 feet.

Here’s the formal version of the same statement:

\text{Pr}(J \geq 20 \text{ ft } 0 \text{ inches}) \approx \text{Pr}(J \geq 20 \text{ ft } \epsilon \text{ inches})

(1)

and the two answers should approach one another as $\epsilon$ approaches zero from above.

Notice that, for the definition provided above to make sense, we need the ability to deform event statements continuously. It must be possible to define arbitrarily similar sets. This requires a continuous outcome space. In the case of random variables, a continuous set of possible values.

It is important to remember that, continuous random variables are not continuous simply because they can take on values in a continuum. They are continuous if and only if the chance of events varies smoothly in the choice of event.

Uniform Example¶

Imagine that you are given a spinner with a balanced needle that spins about a central axis. If you hit the needle it spins around its axis many times before slowly coming to a stop. When it stops, it will point to some angle $a$ between 0 and 360 degrees.

If you’ve ever played with a spinner, you’ll know that it is essentially impossible, if you hit the needle sufficiently hard, to predict where it will land. Tiny changes in the force applied lead to such large changes in its final position that we might as well model the final position as random. Just like tossing a coin, or rolling a die, the needle’s final position is determined by such subtle changes in the initiation of the process that the outcome behaves as if random.

The spinning needle is like, and unlike, a tossed coin in two ways. First, its position could be any angle in the continuum between 0 and 360 degrees. So it is continuous, not discrete. However, like a tossed coin, or a rolled die, the spinner behaves symmetrically. Unless the axis is stickier on one side, or the spinner is placed on a slant, the spinner will behave in the same way at all orientations. So, by the symmetry of the set up, most reasonable models for the behavior of the spinner should be symmetric.

Putting these two ideas together, the final resting angle $A$ should be a random variable, drawn uniformly and continuously from the interval $[0,360)$ . If two sets of angles could be interchanged by rotating the spinner (e.g. changing the direction we marked as zero degrees), then the chance the spinner lands in those sets must be equal. This is an equal likelihood statement.

As we saw in Section 1.2, when outcomes are equally likely, chance must equal proportion.

Since we can’t enumerate, or even list through all the outcomes, we can’t start by assigning equal chances to all outcomes. Instead, we will start from symmetric events, and work our way down by refining the events. By working top to bottom, we’ll show that, in this setting, chance remains a proportion.

What is the chance that the needle ends pointing to one half of the spinner? For example, what is the chance that $A \in [0, 180)$ ?

The sets $[0,180)$ and $[180, 360)$ can be interchanged by rotating the spinner about by a half turn. So, they must be equally likely. There is no reason that the needle should prefer one half to the other, and no reason it should behave differently between 0 and 180 degrees than it behaves between 180 and 360. Therefore:

\text{Pr}(A \in [0, 180)) = \text{Pr}(A \in [180,360))

(2)

Since the needle must land in one of the two halves, and they don’t overlap, the two halves partition $\Omega$ . Therefore, by the complements rule, $\text{Pr}(A \in [0, 180)) + \text{Pr}(A \in [180,360)) = 1$ . It follows that the needle has a 50-50 chance of ending in either half:

\text{Pr}(A \in [0, 180)) = \text{Pr}(A \in [180,360)) = 1/2

(3)

So, just by knowing that the chance model is symmetric, we’ve found a way to compute a chance. Let’s generalize the argument.

Suppose that we’d cut the range of angles into equal thirds, $[0,120), [120,240), [240,360)$ . Then, the needle would have no reason to prefer any third over any other, and the thirds partition $\Omega$ , so we’d have found that:

\text{Pr}(A \in [0, 120)) = \text{Pr}(A \in [120,240)) = \text{Pr}(A \in [240,360)) = 1/3

(4)

If we cut the range of angles into equal quarters we’d find that the same arguments apply, and the needle would land in each quarter with chance $1/4$ .

So, if we break the range of all angles into $n$ segments, then the chance that the needle ends in any specific segment is $1/n$ .

We can use this rule to find the chance that the needle lands in any interval. Suppose that we’d asked, what’s the chance that $A \in [0,a)$ for some upper bound $a \in (0,360)$ ? This interval has length $a$ . It covers the proportiona $a/360$ of the range of possible angles. It turns out that its chance must also be $a/360$ . We’d observed this was true whenever $a = 360/n$ for some $n$ . Open the dropdown provided below for a proof of the general statement.

Formal Argument

Pick some large $n$ . Then partition the interval $[0,360)$ into $n$ equal segments of length $360/n$ .

Then, we can build, to arbitrarily accurate approximation, the interval $[0,a)$ as a union of $\lfloor a n/360 \rfloor$ segments, where $\lfloor \cdot \rfloor$ means “round down.” By picking $n$ sufficiently large, we can make any approximation error arbitrarily small. Since we assumed continuity, any statement that holds for a sequence of events, holds for the limiting event, so the chance that $A \in [0,a)$ equals the chance that $A$ is in any of $\lfloor a n/360 \rfloor$ interchangeable segments of length $360/n$ . Then, by additivity:

\begin{aligned} \text{Pr}(A \in [0,a)) & \approx \lfloor a \frac{n}{360}\rfloor \text{Pr}(A \in [0,360/n)) \\& = \lfloor a \frac{n}{360} \rfloor \frac{1}{n} \\& \approx a \frac{n}{360} \frac{360}{n} \\ & = \frac{a}{360}. \end{aligned}

(5)

In other words, the probability that the angle lands in an interval of length $a$ is the length of the interval, $a$ , divided by the length of $\Omega$ , 360.

It follows that:

\text{Pr}(A \in \text{ any interval}) = \frac{\text{length of the interval}}{\text{length of the outcome space}} = \frac{\text{length of the interval}}{360}

(6)

This is probability by proportion with a different notion of size. In finite sets we count size by the number of elements in the set. When our sets are intervals on a line, it is more natural to count size with length.

We can easily extend this rule to higher dimensions. A uniform random point in a two-dimensional region is a point chosen so that the probability the point lands in any subregion is the ratio of the area of the subregion to the area of the full region. In three-dimensions, we use a ratio of volume. In all cases, the probability of an event is the ratio of the size of the event, measured using count, length, area, or volume as appropriate, to the size of the outcome space.

Exact Events Have Probability Zero¶

We’ve just seen a sensible procedure for building uniform probability models in continuous spaces. Notably, we built our model by posing a measure first. We never assigned chances to exact outcomes, as we did in the finite case, or when working with discrete random variables. Instead of starting from the bottom and building up, e.g. assigning chances to each outcome then adding them together, we started from the top and worked down. We assigned chances to intervals, then refined them.

In this section we’ll take this argument to its extreme limit, and show, by refining all the way down to an exact event, why we can’t build up continuous models by assigning chances to each exact event.

An exact event is an event of the kind $X = 0.1245630987654827361528369564$ when $X$ is drawn continuously. It is the event that $X$ equals some value exactly, to every decimal place. Geometrically, its the chance that $X$ lands exactly at some point on the real number line. Notice that, exact events are extremely detailed. They demand infinite precision.

Here’s an example. Suppose that we draw a random number between 0 and 1. I want to know whether the random number is 0.123456789012345678901234567890.... To check, I ask you to read off the digits one at a time. You read:

The tens digit is a 1. So far so good. We’ve just satisfied the event that the tens place matches: $X \in [0.1,0.2...)$ .
The hundredths digit is a 2. Great! We’ve just satisfied the event that the tens place and the hundredths places match: $X \in [0.12,0.13...)$
The thousandths digit is a 3. Spectacular! We’ve just satisfied the event that the tens place and the hundredths places and the thousandths places match: $X \in [0.123,0.124...)$

At this point most people would be satisfied. In most problems, accuracy to three decimal places is more than sufficient. But, I didn’t ask whether $X$ was about 0.123. I asked whether $X$ is exactly 0.12345678901234567890.... So we still haven’t finished checking.

Now you might get worried. That’s useful intuition. Every time we check a new digit we could fail. Each new digit demands ten times more accuracy. We’ve gotten quite close, but with each new digit we need to check that we’re ten times closer than we knew we were. Every time we add a digit, we shrink the event. We already know that shrinking events cannot increase their chance. Adding detail cannot make an event more likely. Usually, adding detail makes an event less likely.

So, we continue, but with increasing doubt:

The ten-thousandths digit is a 4. Phew! What a relief!
The ten-thousandths digit is a 5. Amazing... this really is remarkable. It can’t possibly keep going.
The hundred-thousandths digit is a 7. Wrong. $X \neq 0.12345678901234567890...$

Now, it is possible that we continued to match that the 6th digit, and the 7th, and the 8th, and so on, but for any reasonably continuous variable, we should fail eventually. It strains belief that we check every digit, each time demanding ten times greater precision, and never fail a single check. That’s like tossing a fair coin and never seeing it land tails.

Exercise

We can run the experiment listed above by hand. Write down 10 sequences of digits, all 1 or 0, each including 5 digits. For instance, one sequence could be 0110001011. Pretend you are producing the list at random, e.g. by flipping a coin.

Then, open the dropdown below to reveal a uniformly sampled sequence of 6 binary digits. Check the fraction of your list that match out to the first digit, then the first two digits, then the first three, and so on. You should see that each time you ask for a new digit of accuracy, about half of your sequences won’t match.

This experiment is analogous to the example described above. We can make them the same by relating binary strings to binary numbers. Set $X$ to $1/2$ times your first digit, plus $1/4$ times your first digit, plus $1/8$ times your third digit, and so on. Then your sequence acts like a decimal expansion in binary.

Hidden Sequence

1110101001

(8)

Let’s prove this claim for our uniform example.

Proof

Let’s follow the logic we sketched above. First ask for the chance we match to the ten’s place, then the ten’s and hundredth’s place, and so on. To avoid annoying boundary issues, pick $x \in [0,0.9)$ . Choosing $x$ closer to one only converts all the equalities below to upper bounds.

$\text{Pr}(X \in [x,x+0.1)) = \frac{|(x +0.1) - x|}{|1 - 0|} = \frac{0.1}{1} = 0.1$
$\text{Pr}(X \in [x,x+0.01)) = \frac{|(x +0.01) - x|}{|1 - 0|} = \frac{0.01}{1} = 0.01$
$\text{Pr}(X \in [x,x+0.001)) = \frac{|(x +0.001) - x|}{|1 - 0|} = \frac{0.001}{1} = 0.001$

By now the pattern should be apparent. Since uniform measures equate probability to proportion, and we started with an interval of length one, the chance that $X$ matches $x$ to $d$ digits is the chance $X$ falls in an interval of length $0.1^d$ . Each time we demand an extra digit of precision, the chance decreases by a factor of ten. That’s the multiplication rule coming in! The conditional chance that we match to three digits, having matched to two digits, is 1/10, since there are 10 possible choice for the next digit, and, when $X$ is uniform, the digits are independent and uniformly distributed.

Then, since the length of the interval, $0.1^d$ , vanishes as we demand infinite precision, the chance that $X \in [x,x+0.1^d)$ must also vanish. So, $\text{Pr}(X = x) = 0$ for all $x$ .

The same argument would have worked had we used any initial interval $[a,b]$ . Demanding an exact equality in a continuous space demands infinite precision, and defines an infinitely detailed events. Since each added detail makes the event smaller, the chance of an infinitely detailed event is zero, no matter the event!

This is a natural property we might want for any model we call continuous. The digits corresponding to very very fine details (e.g. the $d^{th}$ digits for large $d$ ) should be approximately uniformly distributed, and should be independent of one another. Let’s see that it is necessary for continuity, so does not require uniformity.

Proof

If $X$ is a continuous random variable, then $\text{Pr}(X \in [a,b])$ must converge to $\text{Pr}(X \in [c,d])$ if we let $a$ approach $c$ and $b$ approach $d$ . Let’s pick an easy case.

Consider the events $X \in (-\infty,x-\Delta x]$ and $X \in (-\infty,x + \Delta x]$ . These are $X \leq x - \Delta x$ and $X \leq x + \Delta x$ . We will assume that $\Delta x$ is small, positive, and consider what happens as $\Delta x$ approaches zero.

Now, two things are true:

First, by partitioning $\text{Pr}(X \leq x + \Delta x) = \text{Pr}((X \leq x - \Delta x) \cup (X \in (x - \Delta X, x + \Delta X]))$ . This is an or statement for disjoint events, so:

\text{Pr}(X \leq x + \Delta x) = \text{Pr}(X \leq x - \Delta x) + \text{Pr}(X \in (x - \Delta x,x + \Delta x])

(9)

Rearranging:

\text{Pr}(X \in (x - \Delta x,x + \Delta x]) = \text{Pr}(X \leq x + \Delta x) - \text{Pr}(X \leq x - \Delta x) .

(10)

The left hand side is the chance that $X$ is within $\pm \Delta x$ of $x$ , so is the chance that $X$ is approximately $x$ . The right hand side is the difference in the chances that $X$ is slightly greater than, and slightly less than $x$ .

Second, by continuity, $\text{Pr}(X \leq x - \Delta x)$ and $\text{Pr}(X \leq x + \Delta x)$ both converge to $\text{Pr}(X \leq x)$ as $\Delta x$ goes to zero becuase similar events must have similar chances. But then:

\lim_{\Delta x \rightarrow 0} \text{Pr}(X \leq x + \Delta x) - \text{Pr}(X \leq x - \Delta x) = \text{Pr}(X \leq x) - \text{Pr}(X \leq x) = 0.

(11)

It follows that:

\text{Pr}(X = x) = \lim_{\Delta x \rightarrow 0} \text{Pr}(X \in (x - \Delta x,x + \Delta x]) = 0.

(12)

In other words, the chance $X = x$ is the chance that $X$ is arbitrarily close to $x$ , and the chance that $X$ is arbitrarily close to $x$ approaches zero as we demand higher accuracy since chances of similar statements must converge when we take a limit that makes the statements identical. As a result, the chance of every exact event must equal zero!

Consider the contradicting case. If we assigned an exact event a nonzero chance, for example, $\text{Pr}(X = 1.0000000000...) = 0.2$ then the CDF, $\text{CDF}(x) = \text{Pr}(X \leq x)$ , would jump discontinuously at $x = 1$ from $\text{CDF}(0.999999999...)$ to $\text{CDF}(0.999999999...) + 0.2$ .

Events arbitrarily close to, but not including the exact value, $x = 1.00000000000...$ would be 1/5th less probable than essentially identical events that include $x = 1.000000000000...$ This is, for many models, not sensible behavior. It is certainly discontinuous.

A Discontinuous Example

Here’s an example that includes discontinuous and continuous behavior.

Consider the spinners used in game shows like Wheel of Fortune. These are large versions of the spinners described above, with one key difference. There are baffles around the outside of the spinner that divide the circle into even segements. These baffles are designed so that they bend when the spinner hits them, but slow it down. They try to bounce the spinner into one of the segments.

Occasionally, the spinner can get stuck on a baffle. Since the baffles are at exact locations, there are fintiley many baffles, and the spinner sticks on a baffle with nonzero frequency, the chance that the spinner points to a baffle, e.g. ends at an exact location, is nonzero. If the spinner does not stick on a baffle, then it behaves continuously.

Improbable Does Not Mean Impossible¶

We’ve just seen the most subtle fact about continuous random variables. When we sample a random variable, it returns some exact value. Yet, the probability that the random variable equals any exact value is zero - for all values.

This feels deeply paradoxical. Something must happen. Yet, the chance of any specific thing happening is zero! When we sample, we produce outcomes, but all outcomes are infinitely improbable.

On the one hand, the reasoning that brought us here is solid, and the basic modeling impulse that arbitrarily fine changes to events should not produce large changes in their chance is reasonable. It is the simplest ideal for a wide variety of cases. On the other, the conclusion feels absurd.

Somehow, in the continuous setting, an outcome can be possible, and completely improbable. Outcomes may have chance zero, yet could happen. In other words, just because a thing is improbable, does not mean it is impossible. This doesn’t feel strange if by improbable we mean, has a very small chance. It does feel strange if we mean, has no chance at all.

This distinction seems to separate two ideas most people think should be the same:

Something never happens.
Something cannot happen.

The first implies an event has chance zero, since it never occurs in a long string of trials. The second implies that it is impossible.

This distinction is not paradoxical since, when we work with continuous random variables, we should never really ask about exact outcomes. We can never measure a value to infinite precision with complete accuracy, so we could never check equality anyways. Moreover, in most continuous problems, the questions we ask are continuous, so our answers never depend on exactly assigning properties to points. Exact events, or points, only tend to matter in continuous problems in a limiting sense. We’ve seen that it is reasonable to think that, if a variable is continuous, then it can fall in any small interval, but the smaller we make the interval, the smaller the chance it lands there.

Let’s try to make this distinction clearer. Here are two ways to make it seem familiar.

Making Sense

Suppose I drop a pen, point down, onto a ruler. Then consider two scenarios.
- First, I tell you that the pen landed on the ruler and left a mark exactly 0.1 inches from the center. This shouldn’t seem weird. The pen must land somewhere, and, if I drop it from high enough, over the center, and it sticks a bit to my hand so it doesn’t fall straight down, then its certainly possible that it landed 0.1 inches off the center.
- Second, I pick up the pen again and tell you that I promise to drop it exactly 0.1 inches from the center. The first scenario was a measurement. This is a prediction. Do you believe me now? Probably not. I would need to have practiced pen dropping my whole life to be accurate within a tenth of an inch. To hit exactly I need to be perfectly accurate. The chance I hit 0.1 inches perfectly is, to all reasonable models, zero.
- So, while it possible that the pen lands at 0.1 inches, the chance it lands at 0.1 inches before it is dropped, is zero. There was nothing special about 0.1, so the same statement should hold for any exact event.
Think about how we measure length. No one debates that we can measure the lengths of intervals. The length of a ruler is one foot. The length of a meter stick is one meter. The length of the interval $[a,b]$ is $b - a$ . The length of a point is zero. Yet, an interval is just an infinite collection of points. Somehow, an infinite collection of points with length zero compose intervals of nonzero length.
- We could make the exact same arguments for areas, volumes, etc.
- What we are missing between points and length is the notion of an integral. We’ll see that, to build continuous models up from exact events, we will need to give up on addition, and move instead to integration.
- We’ll build up the calculus approach in Section 2.4

Building Continuous Measures¶

We’ve already seen how to build a uniform model in a continuous outcome space. Just equate probability to proportion, and measure size appropriately. In one dimension evaluate a ratio of lengths. In two dimensions, a ratio of areas. In three dimensions, of volume.

What should we do if we want to model a continuous random variable, but don’t have, or want, the symmetries needed to equate probability to proportion? After all, continuous random variables would be rather boring if they were always uniform.

Remember that, to specify a probability model, we only need to define the outcome space, collection of events, and a measure that assigns chances to events. We need to pick the measure carefully to ensure that we obey all the probability axioms, in particular, additivity.

The simplest approach is to assign chances to each outcome. This is what we did with the PMF. Recall that $\text{PMF}(x) = \text{Pr}(X = x).$

However, life is not so easy when every exact event has probability zero. Then $\text{PMF}(x) = \text{Pr}(X = x) = 0$ for all $x$ , so all we learn from the PMF is the fact that the random variable is continuous.

Recall that the PMF is not the only distribution function available. We can also define a random variable explicitly by fixing its support and its cumulative distribution function (CDF):

\text{CDF}(x) = \text{Pr}(X \leq x).

(13)

The CDF is agnostic to the definitional problems that plagued the PMF for continuous variables. It remains well defined. We did not have any issues imagining that whether a long jumper jumps less or more than 20 feet could have well-defined chances. The CDF remains well defined since it returns the chance that $X$ lands in intervals of finite length.

Let’s restate the definition of continuity using the CDF, then show that, if we are given a CDF, we can use it to compute the chance of any event. Thus, the CDF fully specifies a probability model.

This is, in essence, the same statement we made about measures, but for a fixed type of interval that can be specified by a single number (an upper bound). The two definitions are equivalent since we can use the CDF to compute the chance of any event. For instance:

2.3 Continuous Models

Continuity in Measure¶

Uniform Example¶

Exact Events Have Probability Zero¶

Improbable Does Not Mean Impossible¶

Building Continuous Measures¶