Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

6.2 Exponential Approximation

In Section 6.1 we derived the Taylor series for the exponential and the logarithm. This section uses those series to derive the limit definition of the exponential:

We will apply this formula to approximate probabilities that involve ratios of factorial terms and to derive the exponential distribution from first principles.

The Limiting Definition of ee

Consider the limit:

limn(1+1nx)n.\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n.

At first glance it is hard to see how to evaluate this limit.

If x>0x > 0, then the argument inside the parentheses is greater than 1, so it is natural to think that, as nn gets large, the limit should diverge. However, as nn gets large the term inside the parenthesis also approaches 1. In either case, applying the limit on the outside first produces a different answer than applying the limit on the inside.

To find the limit, note that logarithms are continuous functions. So, the logarithm of a limit is the limit of the logarithm:

log(limn(1+1nx)n)=limnlog((1+1nx)n)\log \left(\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n \right) = \lim_{n \rightarrow \infty} \log \left(\left(1 + \frac{1}{n} x \right)^n \right)

Applying log rules:

limnlog((1+1nx)n)=limnnlog(1+1nx)\lim_{n \rightarrow \infty} \log\left(\left(1 + \frac{1}{n} x \right)^n \right) = \lim_{n \rightarrow \infty} n \log\left(1 + \frac{1}{n} x \right)

Consider log(1+x/n)\log(1 + x/n) for fixed xx. As nn gets big, x/nx/n must get small. So, provided n>xn > |x|, we can replace the logarithm with its Taylor series about 1:

log(1+1nx)=1nx12(1nx)2+13(1nx)3...\log\left(1 + \frac{1}{n} x \right) = \frac{1}{n} x - \frac{1}{2}\left(\frac{1}{n} x \right)^2 + \frac{1}{3}\left(\frac{1}{n} x \right)^3 - ...

To isolate the leading term, notice that, for large nn, x/n|x|/n is small, so (x/n)2(|x|/n)^2 will be dominate (x/n)m(|x|/n)^m for m>2m > 2. Therefore, using the order notation from Section 5.2

log(1+1nx)=1nx+O(x2n2)\log\left(1 + \frac{1}{n} x \right) = \frac{1}{n} x + \mathcal{O}\left( \frac{|x|^2}{n^2} \right)

where O\mathcal{O} means on the order of, and O(x2n2)\mathcal{O}\left( \frac{|x|^2}{n^2} \right) means that the errors in the linear approximation to the logarithm decay proportionally to x2n2\frac{|x|^2}{n^2} as nn diverges.

Now:

limnnlog(1+1nx)=limnn(1nx+O(x2n2))=limnx+O(x2n)=x+limnO(x2n)=x+0=x.\begin{aligned} \lim_{n \rightarrow \infty} n \log\left(1 + \frac{1}{n} x \right) & = \lim_{n \rightarrow \infty} n \left(\frac{1}{n} x + \mathcal{O}\left( \frac{|x|^2}{n^2} \right)\right) \\ & = \lim_{n \rightarrow \infty} x + \mathcal{O}\left(\frac{|x|^2}{n} \right) \\ & = x + \lim_{n \rightarrow \infty} \mathcal{O}\left(\frac{|x|^2}{n} \right) \\ & = x + 0 \\ & = x. \end{aligned}

So:

log(limn(1+1nx)n)=x.\log \left(\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n \right) = x.

Rearranging returns the limiting definition of the exponential function:

limn(1+1nx)n=ex.\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n = e^x.

Approximating Proportions

Let’s use these ideas to estimate some probabilities. Each example below involves a chance we can compute explicitly by counting. In each case, the combinatorial formula can be approximated accurately using an exponential function derived using the same logic we applied above.

Sampling With and Without Replacement

Suppose you are a statistician working for the census. You are asked to run a survey. You can choose your sample size, mm. You sample from a large population of nn individuals. You may assume nn is much larger than mm.

You sample without replacement, but would like approximate chances by pretending you sampled with replacement. To justify your approximation, you argue that, since mm is much smaller than nn, the chance you would sample any individual twice had you sampled with replacement is very small.

What is the chance that, if you sample mm individuals, uniformly with replacement from a population of nn individuals, that you never draw an individual more than once?

We can find this chance using probability by proportion. You have nn options on each draw, and draw mm times. So, when drawing with replacement, there are nmn^m samples you could collect. To avoid a duplicate on mm draws, count in sequence. There are nn options for your first draw, n1n - 1 for your second, n2n - 2 for your third, and so on. Therefore:

Pr(no duplicates)=n×(n1)×(n2)×...(n(m1))nm.\text{Pr}(\text{no duplicates}) = \frac{n \times (n - 1) \times (n - 2) \times ... (n - (m - 1))}{n^m}.

We could derive this chance using probability rules instead. By the multiplication rule:

Pr(no duplicates)=Pr(draw 1 unique)×Pr(draw 2 uniquedraw 1 unique)×...=j=0m1njn.\text{Pr}(\text{no duplicates}) = \text{Pr}(\text{draw 1 unique}) \times \text{Pr}(\text{draw 2 unique}|\text{draw 1 unique}) \times ... = \prod_{j=0}^{m-1} \frac{n-j}{n}.

It seems sensible that, when nn is much bigger than mm, this chance should be close to one. After all, it is a product of numbers close to one. However, it’s not so easy, from the formula we’ve written, to actually know whether mm is small enough relative to nn. Is m=20m = 20, n=100n = 100 enough of a difference? What about m=10m = 10, n=400n = 400?

This is an applied example of a famous toy problem. We introduce the problem below, then work out the general answer for arbitrary mm and nn.

The Birthday Problem

Suppose you are in a room with m=20m = 20 people. What’s the chance that any of the the 20 people share a birthday?

Before reading further, check your intuition. Do you think this chance is small? After all, there are 365 days in a year and 365 is much bigger than 20.

How would we find the chance? Well, let’s start with some reasonable modeling assumptions.

We’ll assume that:

  1. No-one is born on a leap day, so there are 365 possible birthdays.

  2. All birthdays are equally likely.

  3. The birthdays for distinct individuals are independent. My birthday has no relation to yours.

Under these assumptions we can compute the chance that any two people share a birthday using rules of chance.

First, there are n=365n = 365 possible birthdays. Each individual could have any of the 365 birthdays, so there are nm=36520n^m = 365^{20} different possible assignments of birthdays to individuals. So, Ω=nm=36520|\Omega| = n^m = 365^{20}.

Second, the complement of the event (at least one duplicate birthday)(\text{at least one duplicate birthday}) is (no duplicates)(\text{no duplicates}), so:

Pr(at least one duplicate birthday)=1Pr(no duplicates).\text{Pr}(\text{at least one duplicate birthday}) = 1 - \text{Pr}(\text{no duplicates}).

How many ways can we assign birthdays without repeating a day?

Well, there are nn options for the first individual, then n1n - 1 for the second, then n2n - 2 for the third, and so on. For the mthm^{th} individual there are n(m1)n - (m -1) birthdays left to pick since we’ve already picked m1m - 1 dates.

Therefore:

Pr(at least one duplicate birthday)=1n×(n1)×...(n(m1))nm=1n!(nm)!nm\begin{aligned} \text{Pr}(\text{at least one duplicate birthday}) & = 1 - \frac{n \times (n - 1) \times ... (n - (m - 1))}{n^m} \\ & = 1 - \frac{n!}{(n - m)! n^m} \end{aligned}

So, for n=365n = 365 and m=20m = 20:

Pr(at least one duplicate birthday)=1365!345!×36520\text{Pr}(\text{at least one duplicate birthday}) = 1 - \frac{365!}{345! \times 365^{20}}

This formula is correct, but its pretty hard to work with. First, the terms involved are enormous. Factorials of large numbers are absurdly large. For instance, 365!2.5×10778365! \approx 2.5 \times 10^{778}. Many computer algebra systems overflow for numbers larger than 1.8×103081.8 \times 10^{308}. So, the numbers involved are too large to compute as written.

Worse, it is very hard to get any intuition for whether this number is close to 1, or close to 0, without somehow computing the ratio of each term exactly. We certainly can’t compute these ratios by hand and we can’t guess their magnitude by gut. Try it. Do you think 365!345!×36520\frac{365!}{345! \times 365^{20}} is close to one, close to zero, or somewhere in between?

Even worse still, if we changed some minor feature of the problem, for instance, if the room had m=10m = 10, or m=30m = 30 individuals, then this equation does not provide any insight into how the chance should change. The chance of a repeat birthday should increase as we add more people, but it is not clear, from this result, whether it increases quickly, or slowly, in mm.

So, let’s try to simplify the chance, and, where possible, approximate it with a more transparent function of nn and mm.

First, expand it as a product:

Pr(at least one duplicate birthday)=1j=0m1njn=1j=1m1(1jn).\text{Pr}(\text{at least one duplicate birthday}) = 1 - \prod_{j=0}^{m-1} \frac{n-j}{n} = 1 - \prod_{j=1}^{m-1} \left(1 - \frac{j}{n} \right).

This is computationally much easier since the big product is a product of numbers closer to, and smaller than, 1.

The product representing the chance of no duplicate birthdays looks a lot like the product involved in the limiting definition of the exponential. Let’s try to approximate it using the same strategy:

log(j=1m1(1jn))=j=1m1log(1jn)j=1m1jn\log\left( \prod_{j=1}^{m-1} \left(1 - \frac{j}{n} \right) \right) = \sum_{j=1}^{m-1} \log\left(1 - \frac{j}{n} \right) \approx -\sum_{j=1}^{m-1} \frac{j}{n}

The last approximation is valid in the limit as n/mn/m diverges. Then j/nj/n is small for all j<mj < m.

Now, we can approximate:

log(Pr(no duplicates))1nj=1m1j.\log\left( \text{Pr}(\text{no duplicates}) \right) \approx - \frac{1}{n} \sum_{j=1}^{m-1} j.

The sum j=1m1j=m(m1)/2\sum_{j=1}^{m-1} j = m (m-1)/2 since it can be expressed as a sum of (m1)/2(m-1)/2 pairs: (1+m1)+(2+m2)+(3+m3)+...=m+m+m+...=m(m1)/2.(1 + m - 1) + (2 + m - 2) + (3 + m - 3) + ... = m + m + m + ... = m(m-1)/2.

Therefore:

log(Pr(no duplicates))m(m1)2nm22n\log\left( \text{Pr}(\text{no duplicates}) \right) \approx - \frac{m (m - 1)}{2 n} \approx - \frac{m^2}{2 n}

So, rearranging:

Pr(no duplicates)e12m2n=e12(mn)2.\text{Pr}(\text{no duplicates}) \approx e^{- \frac{1}{2} \frac{m^2}{n}} = e^{- \frac{1}{2} \left(\frac{m}{\sqrt{n}}\right)^2}.

This function decays very quickly in mm for fixed nn. It suggests that we won’t need particularly large mm to see a duplicate birthday. For instance, if m=20m = 20 then m2=400m^2 = 400 so m2/n=400/365>1m^2/n = 400/365 > 1 and 12m2/n<12-\frac{1}{2} m^2/n < -\frac{1}{2}.

So, with m=20m = 20:

Pr(no duplicates)e12=1e12.70.6\text{Pr}(\text{no duplicates}) \approx e^{-\frac{1}{2}} = \frac{1}{\sqrt{e}} \approx \frac{1}{\sqrt{2.7}} \approx 0.6

Therefore:

Pr(at least one duplicate)10.6=0.4.\text{Pr}(\text{at least one duplicate}) \approx 1 - 0.6 = 0.4.

Using the more accurate approximation:

Pr(at least one duplicate)1e12202365=0.42.\text{Pr}(\text{at least one duplicate}) \approx 1 - e^{- \frac{1}{2} \frac{20^2}{365}} = 0.42.

That’s a surprisingly large chance. With only 20 people there is almost a 50-50 chance that at least two individuals share a birthday!

The exponential approximation:

Pr(no duplicates)=n!(nm)!nme12m2n\text{Pr}(\text{no duplicates}) = \frac{n!}{(n - m)! n^m} \approx e^{- \frac{1}{2} \frac{m^2}{n}}

is also useful since it is much easier to analyze.

For instance, if we change nn and mm, but keep the ratio m/nm/\sqrt{n} fixed, then the chance of no duplicates will remain about constant. Alternately, to solve for the size mm so that the chance of a duplicate first exceeds 0.5 we should use:

e12m2n=12m=2log(2)n1.4×ne^{- \frac{1}{2} \frac{m^2}{n}} = \frac{1}{2} \Rightarrow m = \sqrt{2 \log(2) n } \approx \sqrt{1.4 \times n}

For n=365n = 365 this gives m22m \approx 22. So, in a room with more than 22 individuals there is more than a 50% chance that at least two will share a birthday!

Sampling With Replacement \approx Sampling Without Replacement

We can now solve our original problem. In order to approximate chances by pretending a sample drawn without replacement was drawn with replacement, we need:

Pr(no duplicates)=n!(nm)!nme12m2n1\text{Pr}(\text{no duplicates}) = \frac{n!}{(n - m)! n^m} \approx e^{- \frac{1}{2} \frac{m^2}{n}} \approx 1

Since e0=1e^0 = 1, and since we are looking for a chance near 1, we will want to choose mm so that m2/(2n)m^2/(2 n) is small. Since we are looking at an exponential with a small argument, we can replace the exponential with its Taylor series:

e12m2n112m2n.e^{- \frac{1}{2} \frac{m^2}{n}} \approx 1 - \frac{1}{2} \frac{m^2}{n}.

Then, if we want Pr(no duplicates)1q\text{Pr}(\text{no duplicates}) \geq 1 - q where qq is some small chance that our sample produced a duplicate, we should set:

112m2n1q1 - \frac{1}{2} \frac{m^2}{n} \geq 1 - q

Which requires 12m2nq\frac{1}{2} \frac{m^2}{n} \leq q , or:

m2qn.m \leq \sqrt{2 q n}.

This gives a fairly simple rule of thumb. If, for example, we want to approximate a sample, drawn without replacement, from a population of 10,000 individuals with a sample drawn with replacement, and want to guarantee that the sample with replacement produces no duplicates with at least a 98% chance, then n=10,000n = 10,000 and q=0.02q = 0.02, so we should not use a sample size mm larger than 2×0.02×10,000=400=20\sqrt{2 \times 0.02 \times 10,000} = \sqrt{400} = 20.

Exponential Distributions from First Principles

So far, we have only ever introduced continuous random variables by explicitly fixing their distribution functions and support. In contrast, we introduced all of our discrete distributions with a process that produced the random variable.

We finally have enough mathematical muscle to start deriving continuous PDF’s from processes without starting from a uniformity assumption. Here we will show how to derive the exponential distribution from a waiting time process via the limiting definition of the exponential.

Examples could include bit flips in a computer, earthquake occurences in a city, radioactive decay in a collection of atoms, binding events of a neurotransmitter, and many other processes that produce incidents at random times. Note, in each case, some of the assumptions are circumspect. This model is often adopted since it is simple and is reasonably accurate. If, in particular, the time between successive incidents are dependent, or the chance an incident occurs increases as time passes, then this is not a good model.

To show that these assumptions require TExponential(λ)T \sim \text{Exponential}(\lambda), start imagine that you check for incidents once every Δt\Delta t seconds. For example, if you are studying earthquakes, you might check your seismometer once every hour for evidence of an earthquake. You keep a record of the number of times, NN, you needed to check between successive records marking an incident.

How is NN distributed?

  1. NN is a discrete random variable since it is a count. At smallest, N=1N = 1.

  2. At each check-in interval you ask the same binary question: did an incident occur during the last Δt\Delta t seconds?

    Since your intervals are non-overlapping, assumption (2) suggests that your answers are independent. Moreover, since all of your intervals have equal duration, assumption (3) suggests that you are equally likely to see an incident in any of the intervals.

    So, under assumptions (2) and (3), each interval is an independent, identical, binary trial.

Therefore, NN is the number of independent, identical, binary trials up until a first success (in this case, an observed incident). It follows that NN must be a geometric random variable:

NGeometric(p(Δt))N \sim \text{Geometric}(p(\Delta t))

where p(Δt)p(\Delta t) is the chance an incident occurs in an interval of length Δt\Delta t. Then:

PMF(n)=(1p(Δt))n1p(Δt),CDF(n)=1(1p(Δt))n\text{PMF}(n) = (1 - p(\Delta t))^{n - 1} p(\Delta t), \quad \text{CDF}(n) = 1 - (1 - p(\Delta t))^n

Now that we can describe the distribution of NN, let’s try to work back to the distribution of TT.

The random time, TT is continuous, so there’s no point in pursuing its PDF directly. Let’s try to find it’s CDF instead. Recall that, CDF(t)=Pr(Tt)\text{CDF}(t) = \text{Pr}(T \leq t).

The event Tt=nΔtT \leq t = n \Delta t is the same event as NnN \leq n, since (N1)ΔtTNΔt(N - 1) \Delta t \leq T \leq N \Delta t.

Let FT(t)F_T(t) denote the CDF of TT. Then:

FT(nΔt)=1(1p(Δt))nF_T(n \Delta t) = 1 - (1 - p(\Delta t))^n

The duration Δt\Delta t was arbitrary. You could check once a day, once an hour, or once a minute. The frequency with which you check should not change the distribution of TT. So, let’s try to find the CDF of TT at some fixed tt, while taking Δt\Delta t to zero. To hold tt fixed at nΔtn \Delta t, while Δt\Delta t vanishes we need n=t/Δtn = t/\Delta t, or, Δt=t/n\Delta t = t/n. Set Δt=t/n\Delta t = t/n and take nn large:

FT(t)=limn1(1p(t/n))n=1limn(1p(t/n))n.F_T(t) = \lim_{n \rightarrow \infty} 1 - (1 - p(t/n))^n = 1 - \lim_{n \rightarrow \infty} (1 - p(t/n))^n.

When nn is large, Δt=t/n\Delta t = t/n is small, so by assumption (3), p(t/n)p(t/n) should approach λt/n\lambda t/n. Therefore:

FT(t)=1limn(1λtn)n=1limn(1+1n(λt))n\begin{aligned} F_T(t) & = 1 - \lim_{n \rightarrow \infty} \left(1 - \lambda \frac{t}{n} \right)^n \\ & = 1 - \lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} (-\lambda t) \right)^n \end{aligned}

Now we can use the limiting definition of the exponential!

FT(t)=1limn(1+1n(λt))n=1eλt.\begin{aligned} F_T(t) & = 1 - \lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} (-\lambda t) \right)^n \\ & = 1 - e^{-\lambda t}. \end{aligned}

To find the PDF, take a derivative:

PDF(t)=fT(t)=ddtFT(t)=ddt(1eλt)=λeλt.\text{PDF}(t) = f_T(t) = \frac{d}{dt} F_T(t) = \frac{d}{dt} (1 - e^{-\lambda t}) = \lambda e^{-\lambda t}.

So, TT is a nonnegative continuous random variable with density function proportional to eλte^{-\lambda t}. It follows that:

TExponential(λ).T \sim \text{Exponential}(\lambda).