In Section 6.1 we derived the Taylor series for the exponential and the logarithm. This section uses those series to derive the limit definition of the exponential:
We will apply this formula to approximate probabilities that involve ratios of factorial terms and to derive the exponential distribution from first principles.
The Limiting Definition of ¶
Consider the limit:
At first glance it is hard to see how to evaluate this limit.
If , then the argument inside the parentheses is greater than 1, so it is natural to think that, as gets large, the limit should diverge. However, as gets large the term inside the parenthesis also approaches 1. In either case, applying the limit on the outside first produces a different answer than applying the limit on the inside.
To find the limit, note that logarithms are continuous functions. So, the logarithm of a limit is the limit of the logarithm:
Applying log rules:
Consider for fixed . As gets big, must get small. So, provided , we can replace the logarithm with its Taylor series about 1:
To isolate the leading term, notice that, for large , is small, so will be dominate for . Therefore, using the order notation from Section 5.2
where means on the order of, and means that the errors in the linear approximation to the logarithm decay proportionally to as diverges.
Now:
So:
Rearranging returns the limiting definition of the exponential function:
Approximating Proportions¶
Let’s use these ideas to estimate some probabilities. Each example below involves a chance we can compute explicitly by counting. In each case, the combinatorial formula can be approximated accurately using an exponential function derived using the same logic we applied above.
Sampling With and Without Replacement¶
Suppose you are a statistician working for the census. You are asked to run a survey. You can choose your sample size, . You sample from a large population of individuals. You may assume is much larger than .
You sample without replacement, but would like approximate chances by pretending you sampled with replacement. To justify your approximation, you argue that, since is much smaller than , the chance you would sample any individual twice had you sampled with replacement is very small.
What is the chance that, if you sample individuals, uniformly with replacement from a population of individuals, that you never draw an individual more than once?
We can find this chance using probability by proportion. You have options on each draw, and draw times. So, when drawing with replacement, there are samples you could collect. To avoid a duplicate on draws, count in sequence. There are options for your first draw, for your second, for your third, and so on. Therefore:
We could derive this chance using probability rules instead. By the multiplication rule:
It seems sensible that, when is much bigger than , this chance should be close to one. After all, it is a product of numbers close to one. However, it’s not so easy, from the formula we’ve written, to actually know whether is small enough relative to . Is , enough of a difference? What about , ?
This is an applied example of a famous toy problem. We introduce the problem below, then work out the general answer for arbitrary and .
The Birthday Problem¶
Suppose you are in a room with people. What’s the chance that any of the the 20 people share a birthday?
Before reading further, check your intuition. Do you think this chance is small? After all, there are 365 days in a year and 365 is much bigger than 20.
How would we find the chance? Well, let’s start with some reasonable modeling assumptions.
We’ll assume that:
No-one is born on a leap day, so there are 365 possible birthdays.
All birthdays are equally likely.
The birthdays for distinct individuals are independent. My birthday has no relation to yours.
Under these assumptions we can compute the chance that any two people share a birthday using rules of chance.
First, there are possible birthdays. Each individual could have any of the 365 birthdays, so there are different possible assignments of birthdays to individuals. So, .
Second, the complement of the event is , so:
How many ways can we assign birthdays without repeating a day?
Well, there are options for the first individual, then for the second, then for the third, and so on. For the individual there are birthdays left to pick since we’ve already picked dates.
Therefore:
So, for and :
This formula is correct, but its pretty hard to work with. First, the terms involved are enormous. Factorials of large numbers are absurdly large. For instance, . Many computer algebra systems overflow for numbers larger than . So, the numbers involved are too large to compute as written.
Worse, it is very hard to get any intuition for whether this number is close to 1, or close to 0, without somehow computing the ratio of each term exactly. We certainly can’t compute these ratios by hand and we can’t guess their magnitude by gut. Try it. Do you think is close to one, close to zero, or somewhere in between?
Even worse still, if we changed some minor feature of the problem, for instance, if the room had , or individuals, then this equation does not provide any insight into how the chance should change. The chance of a repeat birthday should increase as we add more people, but it is not clear, from this result, whether it increases quickly, or slowly, in .
So, let’s try to simplify the chance, and, where possible, approximate it with a more transparent function of and .
First, expand it as a product:
This is computationally much easier since the big product is a product of numbers closer to, and smaller than, 1.
The product representing the chance of no duplicate birthdays looks a lot like the product involved in the limiting definition of the exponential. Let’s try to approximate it using the same strategy:
The last approximation is valid in the limit as diverges. Then is small for all .
Now, we can approximate:
The sum since it can be expressed as a sum of pairs:
Therefore:
So, rearranging:
This function decays very quickly in for fixed . It suggests that we won’t need particularly large to see a duplicate birthday. For instance, if then so and .
So, with :
Therefore:
Using the more accurate approximation:
That’s a surprisingly large chance. With only 20 people there is almost a 50-50 chance that at least two individuals share a birthday!
The exponential approximation:
is also useful since it is much easier to analyze.
For instance, if we change and , but keep the ratio fixed, then the chance of no duplicates will remain about constant. Alternately, to solve for the size so that the chance of a duplicate first exceeds 0.5 we should use:
For this gives . So, in a room with more than 22 individuals there is more than a 50% chance that at least two will share a birthday!
Sampling With Replacement Sampling Without Replacement¶
We can now solve our original problem. In order to approximate chances by pretending a sample drawn without replacement was drawn with replacement, we need:
Since , and since we are looking for a chance near 1, we will want to choose so that is small. Since we are looking at an exponential with a small argument, we can replace the exponential with its Taylor series:
Then, if we want where is some small chance that our sample produced a duplicate, we should set:
Which requires , or:
This gives a fairly simple rule of thumb. If, for example, we want to approximate a sample, drawn without replacement, from a population of 10,000 individuals with a sample drawn with replacement, and want to guarantee that the sample with replacement produces no duplicates with at least a 98% chance, then and , so we should not use a sample size larger than .
Exponential Distributions from First Principles¶
So far, we have only ever introduced continuous random variables by explicitly fixing their distribution functions and support. In contrast, we introduced all of our discrete distributions with a process that produced the random variable.
We finally have enough mathematical muscle to start deriving continuous PDF’s from processes without starting from a uniformity assumption. Here we will show how to derive the exponential distribution from a waiting time process via the limiting definition of the exponential.
Examples could include bit flips in a computer, earthquake occurences in a city, radioactive decay in a collection of atoms, binding events of a neurotransmitter, and many other processes that produce incidents at random times. Note, in each case, some of the assumptions are circumspect. This model is often adopted since it is simple and is reasonably accurate. If, in particular, the time between successive incidents are dependent, or the chance an incident occurs increases as time passes, then this is not a good model.
To show that these assumptions require , start imagine that you check for incidents once every seconds. For example, if you are studying earthquakes, you might check your seismometer once every hour for evidence of an earthquake. You keep a record of the number of times, , you needed to check between successive records marking an incident.
How is distributed?
is a discrete random variable since it is a count. At smallest, .
At each check-in interval you ask the same binary question: did an incident occur during the last seconds?
Since your intervals are non-overlapping, assumption (2) suggests that your answers are independent. Moreover, since all of your intervals have equal duration, assumption (3) suggests that you are equally likely to see an incident in any of the intervals.
So, under assumptions (2) and (3), each interval is an independent, identical, binary trial.
Therefore, is the number of independent, identical, binary trials up until a first success (in this case, an observed incident). It follows that must be a geometric random variable:
where is the chance an incident occurs in an interval of length . Then:
Now that we can describe the distribution of , let’s try to work back to the distribution of .
The random time, is continuous, so there’s no point in pursuing its PDF directly. Let’s try to find it’s CDF instead. Recall that, .
The event is the same event as , since .
Let denote the CDF of . Then:
The duration was arbitrary. You could check once a day, once an hour, or once a minute. The frequency with which you check should not change the distribution of . So, let’s try to find the CDF of at some fixed , while taking to zero. To hold fixed at , while vanishes we need , or, . Set and take large:
When is large, is small, so by assumption (3), should approach . Therefore:
Now we can use the limiting definition of the exponential!
To find the PDF, take a derivative:
So, is a nonnegative continuous random variable with density function proportional to . It follows that: