Exponential Approximation - Data 89 Course Notes

In Section 6.1 we derived the Taylor series for the exponential and the logarithm. This section uses those series to derive the limit definition of the exponential:

We will apply this formula to approximate probabilities that involve ratios of factorial terms and to derive the exponential distribution from first principles.

The Limiting Definition of $e$ ¶

Consider the limit:

\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n.

(2)

At first glance it is hard to see how to evaluate this limit.

If $x > 0$ , then the argument inside the parentheses is greater than 1, so it is natural to think that, as $n$ gets large, the limit should diverge. However, as $n$ gets large the term inside the parenthesis also approaches 1. In either case, applying the limit on the outside first produces a different answer than applying the limit on the inside.

To find the limit, note that logarithms are continuous functions. So, the logarithm of a limit is the limit of the logarithm:

\log \left(\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n \right) = \lim_{n \rightarrow \infty} \log \left(\left(1 + \frac{1}{n} x \right)^n \right)

(3)

Applying log rules:

\lim_{n \rightarrow \infty} \log\left(\left(1 + \frac{1}{n} x \right)^n \right) = \lim_{n \rightarrow \infty} n \log\left(1 + \frac{1}{n} x \right)

(4)

Consider $\log(1 + x/n)$ for fixed $x$ . As $n$ gets big, $x/n$ must get small. So, provided $n > |x|$ , we can replace the logarithm with its Taylor series about 1:

\log\left(1 + \frac{1}{n} x \right) = \frac{1}{n} x - \frac{1}{2}\left(\frac{1}{n} x \right)^2 + \frac{1}{3}\left(\frac{1}{n} x \right)^3 - ...

(5)

To isolate the leading term, notice that, for large $n$ , $|x|/n$ is small, so $(|x|/n)^2$ will be dominate $(|x|/n)^m$ for $m > 2$ . Therefore, using the order notation from Section 5.2

\log\left(1 + \frac{1}{n} x \right) = \frac{1}{n} x + \mathcal{O}\left( \frac{|x|^2}{n^2} \right)

(6)

where $\mathcal{O}$ means on the order of, and $\mathcal{O}\left( \frac{|x|^2}{n^2} \right)$ means that the errors in the linear approximation to the logarithm decay proportionally to $\frac{|x|^2}{n^2}$ as $n$ diverges.

Now:

\begin{aligned} \lim_{n \rightarrow \infty} n \log\left(1 + \frac{1}{n} x \right) & = \lim_{n \rightarrow \infty} n \left(\frac{1}{n} x + \mathcal{O}\left( \frac{|x|^2}{n^2} \right)\right) \\ & = \lim_{n \rightarrow \infty} x + \mathcal{O}\left(\frac{|x|^2}{n} \right) \\ & = x + \lim_{n \rightarrow \infty} \mathcal{O}\left(\frac{|x|^2}{n} \right) \\ & = x + 0 \\ & = x. \end{aligned}

(7)

So:

\log \left(\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n \right) = x.

(8)

Rearranging returns the limiting definition of the exponential function:

\lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n = e^x.

(9)

Approximating Exponentials.

We now have two approximate expressions for an exponential function:

e^x = \begin{cases} \sum_{n = 0}^{\infty} \frac{1}{n!} x^n & \approx 1 + x + \frac{1}{2} x^2 + \frac{1}{6} x^3 + ... \\ \lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} x \right)^n & \approx \left(1 + \frac{1}{n} x \right)^n \text{ for large } n \end{cases}

(10)

The first converges more quickly. So, if you need to approximate an exponential, use it. The second appears more commonly in probability problems. So, we will often use a chain of approximations:

\left(1 + \frac{1}{n} x \right)^n \text{ for large } n \approx e^x \approx 1 + x + \frac{1}{2} x^2 + \frac{1}{6} x^3 + ...

(11)

Approximating Proportions¶

Let’s use these ideas to estimate some probabilities. Each example below involves a chance we can compute explicitly by counting. In each case, the combinatorial formula can be approximated accurately using an exponential function derived using the same logic we applied above.

Sampling With and Without Replacement¶

Suppose you are a statistician working for the census. You are asked to run a survey. You can choose your sample size, $m$ . You sample from a large population of $n$ individuals. You may assume $n$ is much larger than $m$ .

You sample without replacement, but would like approximate chances by pretending you sampled with replacement. To justify your approximation, you argue that, since $m$ is much smaller than $n$ , the chance you would sample any individual twice had you sampled with replacement is very small.

What is the chance that, if you sample $m$ individuals, uniformly with replacement from a population of $n$ individuals, that you never draw an individual more than once?

We can find this chance using probability by proportion. You have $n$ options on each draw, and draw $m$ times. So, when drawing with replacement, there are $n^m$ samples you could collect. To avoid a duplicate on $m$ draws, count in sequence. There are $n$ options for your first draw, $n - 1$ for your second, $n - 2$ for your third, and so on. Therefore:

\text{Pr}(\text{no duplicates}) = \frac{n \times (n - 1) \times (n - 2) \times ... (n - (m - 1))}{n^m}.

(12)

We could derive this chance using probability rules instead. By the multiplication rule:

\text{Pr}(\text{no duplicates}) = \text{Pr}(\text{draw 1 unique}) \times \text{Pr}(\text{draw 2 unique}|\text{draw 1 unique}) \times ... = \prod_{j=0}^{m-1} \frac{n-j}{n}.

(13)

It seems sensible that, when $n$ is much bigger than $m$ , this chance should be close to one. After all, it is a product of numbers close to one. However, it’s not so easy, from the formula we’ve written, to actually know whether $m$ is small enough relative to $n$ . Is $m = 20$ , $n = 100$ enough of a difference? What about $m = 10$ , $n = 400$ ?

This is an applied example of a famous toy problem. We introduce the problem below, then work out the general answer for arbitrary $m$ and $n$ .

The Birthday Problem¶

Suppose you are in a room with $m = 20$ people. What’s the chance that any of the the 20 people share a birthday?

Before reading further, check your intuition. Do you think this chance is small? After all, there are 365 days in a year and 365 is much bigger than 20.

How would we find the chance? Well, let’s start with some reasonable modeling assumptions.

We’ll assume that:

No-one is born on a leap day, so there are 365 possible birthdays.
All birthdays are equally likely.
The birthdays for distinct individuals are independent. My birthday has no relation to yours.

Under these assumptions we can compute the chance that any two people share a birthday using rules of chance.

First, there are $n = 365$ possible birthdays. Each individual could have any of the 365 birthdays, so there are $n^m = 365^{20}$ different possible assignments of birthdays to individuals. So, $|\Omega| = n^m = 365^{20}$ .

Second, the complement of the event $(\text{at least one duplicate birthday})$ is $(\text{no duplicates})$ , so:

\text{Pr}(\text{at least one duplicate birthday}) = 1 - \text{Pr}(\text{no duplicates}).

(14)

How many ways can we assign birthdays without repeating a day?

Well, there are $n$ options for the first individual, then $n - 1$ for the second, then $n - 2$ for the third, and so on. For the $m^{th}$ individual there are $n - (m -1)$ birthdays left to pick since we’ve already picked $m - 1$ dates.

Therefore:

\begin{aligned} \text{Pr}(\text{at least one duplicate birthday}) & = 1 - \frac{n \times (n - 1) \times ... (n - (m - 1))}{n^m} \\ & = 1 - \frac{n!}{(n - m)! n^m} \end{aligned}

(15)

So, for $n = 365$ and $m = 20$ :

\text{Pr}(\text{at least one duplicate birthday}) = 1 - \frac{365!}{345! \times 365^{20}}

(16)

This formula is correct, but its pretty hard to work with. First, the terms involved are enormous. Factorials of large numbers are absurdly large. For instance, $365! \approx 2.5 \times 10^{778}$ . Many computer algebra systems overflow for numbers larger than $1.8 \times 10^{308}$ . So, the numbers involved are too large to compute as written.

Worse, it is very hard to get any intuition for whether this number is close to 1, or close to 0, without somehow computing the ratio of each term exactly. We certainly can’t compute these ratios by hand and we can’t guess their magnitude by gut. Try it. Do you think $\frac{365!}{345! \times 365^{20}}$ is close to one, close to zero, or somewhere in between?

Even worse still, if we changed some minor feature of the problem, for instance, if the room had $m = 10$ , or $m = 30$ individuals, then this equation does not provide any insight into how the chance should change. The chance of a repeat birthday should increase as we add more people, but it is not clear, from this result, whether it increases quickly, or slowly, in $m$ .

So, let’s try to simplify the chance, and, where possible, approximate it with a more transparent function of $n$ and $m$ .

First, expand it as a product:

\text{Pr}(\text{at least one duplicate birthday}) = 1 - \prod_{j=0}^{m-1} \frac{n-j}{n} = 1 - \prod_{j=1}^{m-1} \left(1 - \frac{j}{n} \right).

(17)

This is computationally much easier since the big product is a product of numbers closer to, and smaller than, 1.

The product representing the chance of no duplicate birthdays looks a lot like the product involved in the limiting definition of the exponential. Let’s try to approximate it using the same strategy:

\log\left( \prod_{j=1}^{m-1} \left(1 - \frac{j}{n} \right) \right) = \sum_{j=1}^{m-1} \log\left(1 - \frac{j}{n} \right) \approx -\sum_{j=1}^{m-1} \frac{j}{n}

(18)

The last approximation is valid in the limit as $n/m$ diverges. Then $j/n$ is small for all $j < m$ .

Now, we can approximate:

\log\left( \text{Pr}(\text{no duplicates}) \right) \approx - \frac{1}{n} \sum_{j=1}^{m-1} j.

(19)

The sum $\sum_{j=1}^{m-1} j = m (m-1)/2$ since it can be expressed as a sum of $(m-1)/2$ pairs: $(1 + m - 1) + (2 + m - 2) + (3 + m - 3) + ... = m + m + m + ... = m(m-1)/2.$

Therefore:

\log\left( \text{Pr}(\text{no duplicates}) \right) \approx - \frac{m (m - 1)}{2 n} \approx - \frac{m^2}{2 n}

(20)

So, rearranging:

\text{Pr}(\text{no duplicates}) \approx e^{- \frac{1}{2} \frac{m^2}{n}} = e^{- \frac{1}{2} \left(\frac{m}{\sqrt{n}}\right)^2}.

(21)

This function decays very quickly in $m$ for fixed $n$ . It suggests that we won’t need particularly large $m$ to see a duplicate birthday. For instance, if $m = 20$ then $m^2 = 400$ so $m^2/n = 400/365 > 1$ and $-\frac{1}{2} m^2/n < -\frac{1}{2}$ .

So, with $m = 20$ :

\text{Pr}(\text{no duplicates}) \approx e^{-\frac{1}{2}} = \frac{1}{\sqrt{e}} \approx \frac{1}{\sqrt{2.7}} \approx 0.6

(22)

Therefore:

\text{Pr}(\text{at least one duplicate}) \approx 1 - 0.6 = 0.4.

(23)

Using the more accurate approximation:

\text{Pr}(\text{at least one duplicate}) \approx 1 - e^{- \frac{1}{2} \frac{20^2}{365}} = 0.42.

(24)

That’s a surprisingly large chance. With only 20 people there is almost a 50-50 chance that at least two individuals share a birthday!

The exponential approximation:

\text{Pr}(\text{no duplicates}) = \frac{n!}{(n - m)! n^m} \approx e^{- \frac{1}{2} \frac{m^2}{n}}

(25)

is also useful since it is much easier to analyze.

For instance, if we change $n$ and $m$ , but keep the ratio $m/\sqrt{n}$ fixed, then the chance of no duplicates will remain about constant. Alternately, to solve for the size $m$ so that the chance of a duplicate first exceeds 0.5 we should use:

e^{- \frac{1}{2} \frac{m^2}{n}} = \frac{1}{2} \Rightarrow m = \sqrt{2 \log(2) n } \approx \sqrt{1.4 \times n}

(26)

For $n = 365$ this gives $m \approx 22$ . So, in a room with more than 22 individuals there is more than a 50% chance that at least two will share a birthday!

Sampling With Replacement $\approx$ Sampling Without Replacement¶

We can now solve our original problem. In order to approximate chances by pretending a sample drawn without replacement was drawn with replacement, we need:

\text{Pr}(\text{no duplicates}) = \frac{n!}{(n - m)! n^m} \approx e^{- \frac{1}{2} \frac{m^2}{n}} \approx 1

(27)

Since $e^0 = 1$ , and since we are looking for a chance near 1, we will want to choose $m$ so that $m^2/(2 n)$ is small. Since we are looking at an exponential with a small argument, we can replace the exponential with its Taylor series:

e^{- \frac{1}{2} \frac{m^2}{n}} \approx 1 - \frac{1}{2} \frac{m^2}{n}.

(28)

Then, if we want $\text{Pr}(\text{no duplicates}) \geq 1 - q$ where $q$ is some small chance that our sample produced a duplicate, we should set:

1 - \frac{1}{2} \frac{m^2}{n} \geq 1 - q

(29)

Which requires $\frac{1}{2} \frac{m^2}{n} \leq q$ , or:

m \leq \sqrt{2 q n}.

(30)

This gives a fairly simple rule of thumb. If, for example, we want to approximate a sample, drawn without replacement, from a population of 10,000 individuals with a sample drawn with replacement, and want to guarantee that the sample with replacement produces no duplicates with at least a 98% chance, then $n = 10,000$ and $q = 0.02$ , so we should not use a sample size $m$ larger than $\sqrt{2 \times 0.02 \times 10,000} = \sqrt{400} = 20$ .

Exponential Distributions from First Principles¶

So far, we have only ever introduced continuous random variables by explicitly fixing their distribution functions and support. In contrast, we introduced all of our discrete distributions with a process that produced the random variable.

We finally have enough mathematical muscle to start deriving continuous PDF’s from processes without starting from a uniformity assumption. Here we will show how to derive the exponential distribution from a waiting time process via the limiting definition of the exponential.

Exponential Waiting Time Process

Suppose that you are a scientist studying incidents that occur randomly in time. Let $T$ denote the time between successive occurences.

You make the following assumptions about $T$ :

$T$ is continuously distributed. Random times should not be restricted to an integer number of hours, minutes, or seconds.
If $[t,t']$ and $[s,s']$ are two nonoverlapping time intervals ( $t < t' < s < s'$ ) then the event that an incident occured in the first time interval is independent of the event that an incident occurs in the second time interval.
The chance an incident occurs in a time interval $[t,t']$ is only a function of the duration of the interval, and is $\mathcal{O}(t' - t)$ when $t' - t$ is small. That is, the chance an incident occurs in a short time interval is proportional to the duration of the time interval:
$\text{Pr}(\text{at least one incident in } {[t,t']}) = \lambda (t' - t) + \mathcal{O}((t' - t)^2)$
(31)
for some $\lambda > 0$ .

Under this collection of assumptions, $T \sim \text{Exponential}(\lambda)$ .

Examples could include bit flips in a computer, earthquake occurences in a city, radioactive decay in a collection of atoms, binding events of a neurotransmitter, and many other processes that produce incidents at random times. Note, in each case, some of the assumptions are circumspect. This model is often adopted since it is simple and is reasonably accurate. If, in particular, the time between successive incidents are dependent, or the chance an incident occurs increases as time passes, then this is not a good model.

To show that these assumptions require $T \sim \text{Exponential}(\lambda)$ , start imagine that you check for incidents once every $\Delta t$ seconds. For example, if you are studying earthquakes, you might check your seismometer once every hour for evidence of an earthquake. You keep a record of the number of times, $N$ , you needed to check between successive records marking an incident.

How is $N$ distributed?

$N$ is a discrete random variable since it is a count. At smallest, $N = 1$ .
At each check-in interval you ask the same binary question: did an incident occur during the last $\Delta t$ seconds?
Since your intervals are non-overlapping, assumption (2) suggests that your answers are independent. Moreover, since all of your intervals have equal duration, assumption (3) suggests that you are equally likely to see an incident in any of the intervals.
So, under assumptions (2) and (3), each interval is an independent, identical, binary trial.

Therefore, $N$ is the number of independent, identical, binary trials up until a first success (in this case, an observed incident). It follows that $N$ must be a geometric random variable:

N \sim \text{Geometric}(p(\Delta t))

(32)

where $p(\Delta t)$ is the chance an incident occurs in an interval of length $\Delta t$ . Then:

\text{PMF}(n) = (1 - p(\Delta t))^{n - 1} p(\Delta t), \quad \text{CDF}(n) = 1 - (1 - p(\Delta t))^n

(33)

Now that we can describe the distribution of $N$ , let’s try to work back to the distribution of $T$ .

The random time, $T$ is continuous, so there’s no point in pursuing its PDF directly. Let’s try to find it’s CDF instead. Recall that, $\text{CDF}(t) = \text{Pr}(T \leq t)$ .

The event $T \leq t = n \Delta t$ is the same event as $N \leq n$ , since $(N - 1) \Delta t \leq T \leq N \Delta t$ .

Let $F_T(t)$ denote the CDF of $T$ . Then:

F_T(n \Delta t) = 1 - (1 - p(\Delta t))^n

(34)

The duration $\Delta t$ was arbitrary. You could check once a day, once an hour, or once a minute. The frequency with which you check should not change the distribution of $T$ . So, let’s try to find the CDF of $T$ at some fixed $t$ , while taking $\Delta t$ to zero. To hold $t$ fixed at $n \Delta t$ , while $\Delta t$ vanishes we need $n = t/\Delta t$ , or, $\Delta t = t/n$ . Set $\Delta t = t/n$ and take $n$ large:

F_T(t) = \lim_{n \rightarrow \infty} 1 - (1 - p(t/n))^n = 1 - \lim_{n \rightarrow \infty} (1 - p(t/n))^n.

(35)

When $n$ is large, $\Delta t = t/n$ is small, so by assumption (3), $p(t/n)$ should approach $\lambda t/n$ . Therefore:

\begin{aligned} F_T(t) & = 1 - \lim_{n \rightarrow \infty} \left(1 - \lambda \frac{t}{n} \right)^n \\ & = 1 - \lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} (-\lambda t) \right)^n \end{aligned}

(36)

Now we can use the limiting definition of the exponential!

\begin{aligned} F_T(t) & = 1 - \lim_{n \rightarrow \infty} \left(1 + \frac{1}{n} (-\lambda t) \right)^n \\ & = 1 - e^{-\lambda t}. \end{aligned}

(37)

To find the PDF, take a derivative:

\text{PDF}(t) = f_T(t) = \frac{d}{dt} F_T(t) = \frac{d}{dt} (1 - e^{-\lambda t}) = \lambda e^{-\lambda t}.

(38)

So, $T$ is a nonnegative continuous random variable with density function proportional to $e^{-\lambda t}$ . It follows that:

T \sim \text{Exponential}(\lambda).

(39)

6.2 Exponential Approximation

The Limiting Definition of eee¶

Approximating Proportions¶

Sampling With and Without Replacement¶

The Birthday Problem¶

Sampling With Replacement ≈\approx≈ Sampling Without Replacement¶

Exponential Distributions from First Principles¶

The Limiting Definition of $e$ ¶

Sampling With Replacement $\approx$ Sampling Without Replacement¶