Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1.5 Conditional Probability

Section 1.3 and 1.4 establish rules for “not”, “or”, and “and” statements. However, we didn’t really finish the job for “and” statements. We showed how to organize joint probabilities, and how to use the rules for “or” to relate joints and marginals, but we didn’t derive any new rules that help us compute joint probabilities directly. We didn’t answer the question, how is Pr(A,B)\text{Pr}(A,B) related to Pr(A)\text{Pr}(A) and Pr(B)\text{Pr}(B)?

In this section we’ll see that, to work out the probability that AA and BB happen, it is easier to first work out the probability that AA happens if BB happens (or visa versa). The probability that AA happens if BB happens is a conditional probability. We call it a conditional probability since the statement conditions on some other outcome, i.e. adds an additional condition that restricts the outcome space.

If Statements and Conditional Probability

What is the probability that it rains tomorrow if the weather is cold?

Suppose that:

EventRainCloudsSunMarginals
Cold2/102/103/103/101/101/106/106/10
Warm1/101/10 01/101/102/102/10
Hot000 2/102/102/102/10
Marginals3/103/103/103/104/104/101

Then, when we condition on the assumption that it is cold, we are rejecting the possibility that it is warm or hot. In essence, we are restricting our outcome space. If it is cold, then it cannot be warm or hot, so any event in the sets “warm” or “hot” is not possible after conditioning. So, we can drop the bottom two rows of the table:

EventRainCloudsSunMarginals
Cold2/102/103/103/101/101/106/106/10

Unlike the operations “not”, “or”, and “and”, which act on the definition of the event, the logical operation “if” acts on the space of possible outcomes, Ω\Omega. So, unlike the first three operations, which change the composition of the event, “if” changes the list of outcomes that could occur. As a result, conditioning will change both the numerator and the denominator when we equate probability to proportion or frequency. All other operations only act on the numerator.

Normalization

Take a look at the conditioned table. All of the numbers in the table are nonnegative and less than one, so could be chances, however, they don’t add to one, so fail to form a distribution. The marginal, at the far right, is 6/106/10, not 1, so the list [2/10,3/10,1/10][2/10, 3/10, 1/10] can’t define a full distribution.

There’s an easy fix here. The list of joints add to 6/106/10, so, if we scale them all by 10/610/6, they’ll add to 1:

106(210+310+110)=26+36+16=1\frac{10}{6} \left(\frac{2}{10} + \frac{3}{10} + \frac{1}{10} \right) = \frac{2}{6} + \frac{3}{6} + \frac{1}{6} = 1

So, we can get a valid distribution if we rescale the joint entries of the row by its sum. This is the same as putting all elements of the row over the least common multiple of their numerators, ignoring the denominator, then replacing it with the sum of the numerators:

[0.2,0.3,0.1][2/10,3/10,1/10][2,3,1]2+3+1=6[2/6,3/6,1/6].[0.2, 0.3, 0.1] \rightarrow [2/10, 3/10, 1/10] \rightarrow [2, 3, 1] \rightarrow 2 + 3 + 1 = 6 \rightarrow [2/6, 3/6, 1/6].

The same operation will work for any list of nonnegative numbers with a finite sum. If we have a list [a1,a2,a3,...,an][a_1,a_2,a_3, ..., a_n] where aj0a_j \geq 0 for all jj, then:

1j=1naj[a1,a2,a3,...,an]\frac{1}{\sum_{j=1}^n a_j} [a_1,a_2,a_3, ..., a_n]

is a valid categorical distribution. This operation is called normalization since it rescales the entries to make sure they are normalized (add to 1).

Conditional Probability

When Outcomes are Equally Likely

While we could normalize our list [2/10,3/10,1/10][2/10, 3/10, 1/10] to make a valid categorical distribution, it is not clear that we should. Why would normalizing by the marginal correctly return the conditional probabilities?

To answer this question we need a probability model that directs our calculation. Without a model, we could define conditional probabilities however we like. With a model, conditional probabilities have to behave in a sensible way.

Our first probability model is probability as proportion. If all outcomes are equally likely, then the probability of an event is the number of ways it can occur divided by the number of possible outcomes. In other words, the probability of an event is the proportion of the outcome space contained in the event. We can use this model to define conditional probability for equally likely events.

Think again about what “if” does to our model. When we condition, we are restricting the set of possible outcomes. For instance, in the weather example, we reject all outcomes where the temperature is warm or hot. If we roll a die, and condition on an even roll, then we are rejecting all odd outcomes.

Since we have a rule that assigns chances to outcomes when the outcomes are equally likely, we can compute conditional probabilities using this rule:

  • The probability a fair die lands on a 2 given that the roll is even:

    Pr({2} even)={2}{2,4,6}=13\text{Pr}(\{2\}|\text{ even}) = \frac{|\{2\}|}{|\{2,4,6\} |} = \frac{1}{3}
  • The probability a fair die lands on a 2 or 4 given that the roll is even:

    Pr({2,4} even)={2,4}{2,4,6}=23\text{Pr}(\{2,4\}|\text{ even}) = \frac{|\{2, 4\}|}{|\{2,4,6\} |} = \frac{2}{3}
  • The probability a fair die roll is less than 3 given that the roll is even:

    Pr({1,2,3} even)={2}{2,4,6}=13\text{Pr}(\{1,2,3\}|\text{ even}) = \frac{|\{2 \}|}{|\{2,4,6\} |} = \frac{1}{3}
  • The probability a fair die roll is equal to 3 given that the roll is even:

    Pr({3} even)={2,4,6}=0\text{Pr}(\{3\}|\text{ even}) = \frac{|\emptyset|}{|\{2,4,6\} |} = 0

In other words, the conditional probability of an event BB given another event AA, when outcomes are equally likely is:

Pr(BA)=the numb. of ways B and A can happenthe numb. of ways A can happen=BAA.\text{Pr}(B|A) = \frac{\text{the numb. of ways } B \text{ and } A \text{ can happen}}{\text{the numb. of ways } A \text{ can happen}} = \frac{|B \cap A|}{|A|}.

We can rewrite the equation to recover the normalization rule we suggested earlier:

Pr(BA)=ΩΩBAA=BAΩ×ΩA=Pr(B,A)Pr(A).\begin{aligned} \text{Pr}(B | A) & = \frac{|\Omega|}{|\Omega|}\frac{|B \cap A|}{|A|} = \frac{|B \cap A}{|\Omega|} \times \frac{|\Omega|}{|A|} = \frac{\text{Pr}(B,A)}{\text{Pr}(A)}. \end{aligned}

So, when outcomes are equally likely, we can compute conditional probabilities by isolating all outcomes that are consistent with the conditioning statement, then matching probability to proportion in the restricted space. In other words, just normalize the necessary collection of probabilities.

Does the rule Pr(BA)=Pr(B,A)Pr(A)\text{Pr}(B | A) = \frac{\text{Pr}(B,A)}{\text{Pr}(A)} work if the underlying outcomes are not equally likely?

Consider our weather example again. We can isolate the appropriate row of the joint table:

EventRainCloudsSunMarginals
Cold2/102/103/103/101/101/106/106/10

but we don’t know anything about the background outcome space Ω\Omega that produced these joint probabilities. Moreover, trying to spell out a detailed weather model in which all microscopic outcomes are equally likely is both far too much work for this problem and would be impractical in almost all settings. For conditional probability to be useful, we should be able to compute it in categorical settings, without somehow expanding our outcome space. So, while the derivation provided above works for equally likely outcome models, it is too restricted to work for general applications.

By Conditional Frequency

Thankfully, we have at hand an alternate model; probability as frequency. Recall that, the probability an event occurs should be approximated by, and equal in the long run, the frequency with which it occurs in a sequence of repeated trials. This relation should hold for any valid probability model. So, let’s use it to show that the normalization approach correctly computes conditional probabilities.

Consider a long weather record. Say, the weather in Berkeley over the last year. Let’s try to find the conditional probability that it is cold, and rains, on some future day selected at random. We’ll assume that the climate is fixed, the process that produces weather does not change (is stationary), and the process doesn’t remember its past forever (e.g. the probability that it rains today given that it rained on this day a century ago is the same as the probability that it rains today). Then the probability of any event should be approximated by the frequency with which the event occurs in the weather record.

To keep our record short we’ll use the following visuals:

EventRainCloudsSunColdWarmHot
Precip.💧☁️☀️🥶😎🥵
SymbolRClSCoWH

Here’s an example two-week record:

Day1234567891011121314
Precip.💧☁️☁️☁️☀️☀️☁️💧💧☁️☀️☀️☀️☁️
Temp🥶🥶🥶😎😎🥵😎😎🥶🥶😎🥵🥵😎

We can compute frequencies from this record. For example Fr(R)=3/14\text{Fr}(\text{R}) = 3/14 and Fr(Cl)=6/14\text{Fr}(\text{Cl}) = 6/14 since it rained on 3 days, and was cloudy on 6, out of the last 14.

We can also use this record to compute joint frequencies. For example Fr(R and Co)=2/14\text{Fr}(\text{R and Co}) = 2/14 since it was rainy and cold on 2 of the 14 days. These were days 1 and 9.

Here’s the good part. We can also compute conditional frequencies from the record. Suppose I wanted to find the frequency with which it rained, given that it was cold. Then, I would disregard all days when it wasn’t cold, and compute the frequency out of the remaining days. Disregarding the days when it was not cold is equivalent to filtering for only the days when it was cold:

Day123910
Precip.💧☁️☁️💧☁️
Temp🥶🥶🥶🥶🥶

Now that we’ve filtered the record for only cold days, the conditional frequencies are apparent:

Fr(RCo)=2/5\text{Fr}(\text{R}|\text{Co}) = 2/5

since it rained on 2 of the 5 days when it was cold. Notice, 2/5 = 0.2 is not a bad estimate to the value we got by normalizing, 2/10×10/6=2/6=0.3333...2/10 \times 10/6 = 2/6 = 0.3333.... These two numbers are different because our record of cold days was short, so the frequencies are only rough estimates to the true probabilities.

Nevertheless, the algebra for conditional frequencies is clear, and, should recover the appropriate probabilities if we run enough trials/collect a long enough record.

Take a look at the frequency calculation again:

Fr(RCo)=the numb. of times R and Co happenedthe numb. times Co happened\text{Fr}(\text{R}|\text{Co}) = \frac{\text{the numb. of times R and Co happened}}{\text{the numb. times Co happened}}

This expression looks a lot like what we wrote for equally likely outcomes. All we’ve done is changed the way we count. Instead of counting ways an outcome can occur, we are counting the number of times it did occur in a sequence.

Let NR,CoN_{R,Co} be the number of times it rained and was cold (2), and NCoN_{Co} be the number of times it was cold (5). Let nn be the length of the record (14). Then:

Fr(RCo)=NR,CoNCo=nn×NR,CoNCo=NR,Con×nNCo=Fr(R,Co)Fr(Co).\begin{aligned} \text{Fr}(\text{R}|\text{Co}) & = \frac{N_{R, Co}}{N_{Co}} = \frac{n}{n} \times \frac{N_{R, Co}}{N_{Co}} = \frac{N_{R, Co}}{n} \times \frac{n}{N_{Co}} \\ & = \frac{\text{Fr}(R, Co)}{\text{Fr}(Co)}. \end{aligned}

Compare these statement to what we wrote for equally likely outcomes. They are identical, up to substituting frequency for probability. Since frequencies should match probabilities on long trials (in the limit as nn goes to infinity), we’ve just derived the general definition for conditional probability:

In our example, Pr(RCo)=(2/10)/(6/10)=2/6\text{Pr}(\text{R}|\text{Co}) = (2/10)/(6/10) = 2/6 exactly as we predicted by normalizing.

If we want probabilities to match long run frequencies, this is the only sensible definition. You should remember it, conditional equals joint over marginal.

Conditioning Preserves Odds

The following sectional is optional. It suggests an axiomatic method for deriving the conditional probability formula by requiring that relative chances are unchanged by conditioning.

If the relative likelihood of two events that are consistent with the condition is unchanged by conditioning, then it must be true that conditional distributions are recovered from joint distributions by:

  1. isolating the appropriate row or column of the joint probability table

  2. dividing all joint entries by their sum, which is the associated marginal

Conditional Distributions

Let’s practice using this rule. Here’s the weather example again:

  1. Write down the joint table:

EventRainCloudsSunMarginals
Cold2/102/103/103/101/101/106/106/10
Warm1/101/10 01/101/102/102/10
Hot000 2/102/102/102/10
Marginals3/103/103/103/104/104/101
  1. Filter for only the cold events:

EventRainCloudsSunMarginals
Cold2/102/103/103/101/101/106/106/10
  1. Normalize by the marginal:

EventRainCloudsSun
Cold2/62/63/63/61/61/6

Notice that the conditional distribution is proportional to the list of joint probabilities in the isolated row. This is a nice visual rule of thumb. If you have a joint table, and want the conditionals, just look up the appropriate row or column and scale it.

For instance, if we conditioned on sun, we’d isolate the column:

EventSun
Cold1/101/10
Warm1/101/10
Hot2/102/10
Marginals4/104/10

Then rescale to find the conditional probabilities:

EventSun
Cold1/41/4
Warm1/41/4
Hot2/42/4

The Multiplication Rule

Now that we know how to handle “if” statements, we can go back to our original aim, understanding “and” statements. Consider the definition of a conditional probability:

Pr(BA)=Pr(A,B)Pr(A)\text{Pr}(B|A) = \frac{\text{Pr}(A,B)}{\text{Pr}(A)}

Rearranging the expression gives our next fundamental probability rule:

This rule is sensible. Suppose that four in every ten days are sunny, and half of all sunny days are hot. Then it is sensible that the fraction of all days that are both hot and sunny should equal the fraction of all days that are sunny, times the fraction of all sunny days that are hot. This is precisely the calculation we performed to find the probability that a sunny day is hot in reverse:

Pr(S,H)=Pr(S)×Pr(H)=410×24=210,\text{Pr}(\text{S}, \text{H}) = \text{Pr}(\text{S}) \times \text{Pr}(\text{H}) = \frac{4}{10} \times \frac{2}{4} = \frac{2}{10},

The multiplication rule should be used in the same fashion as the addition rule or the complement rule. Use it if:

  1. You are asked for the probability of some event,

  2. that event is naturally expressed as a joint event or intersection (it can be expanded as a sequence of “and” statements) where,

  3. the individual parts are each simpler to work with.

In particular, look for examples where the event is best described with a conditional sequence. Then you can find the joint probability by simply walking through the sequence.

Notice: ⚠️ the multiplication rule does not tell you to directly multiply marginal probabilities. This is a common mistake. Always multiply a marginal with a conditional. Otherwise, your calculation will be incorrect. In the example above, the chance of drawing an ace on the first draw is 4/52=1/134/52 = 1/13 and on the second draw is 4/52=1/134/52 = 1/13, but the chance of drawing two aces is 1/13×3/511/13 \times 3/51 not 1/13×1/131/13 \times 1/13.

We can now complete out summary table of probability rules:

Reasoning with Sequences

The multiplication rule, and its extension to sequences of events, gives us a new visual tool for computing probabilities.

Consider the weather example again. If we rewrite the table thinking first about the marginal chance of temperature, then the conditional chances of precipitation, we can express the probability model:

EventMarginal Probability
Cold6/106/10
Warm2/102/10
Hot2/102/10

and

EventRainCloudsSun
if Cold2/62/63/63/61/61/6
if Warm1/21/2 01/21/2
if Hot000 1

This information can be represented with an outcome tree. The outcome tree works like a decision tree. First ask, what is the temperature? Then ask, given the temperature, what is the precipitation? Label each edge in the decision tree with the matching marginal or conditional probability:

Outcome tree for the weather model.

To find the joint probabilities of the events at the far end of the outcome tree, simply multiply the probabilities down the matching path.

For example:

Pr(W,R)=210×12=110.\text{Pr}(\text{W}, \text{R}) = \frac{2}{10} \times \frac{1}{2} = \frac{1}{10}.

If you consult the joint table in Section 1.4, you’ll find the same value.

Bayes Rule

How would we find the conditional probability that it is warm if it rains?

Notice that, the outcome diagram sketched above does not provide this conditional directly. Nor does the specification:

EventMarginal Probability
Cold6/106/10
Warm2/102/10
Hot2/102/10

and

EventRainCloudsSun
if Cold2/62/63/63/61/61/6
if Warm1/21/2 01/21/2
if Hot000 1

Nevertheless, we can always find the desired conditional by first solving for the appropriate joint and marginal, then scaling the joint by the marginal. In many ways, this procedure is the same as what we’ve done before, only we start with different information.

Suppose that we know Pr(A)\text{Pr}(A) and the conditionals given AA, Pr(BA)\text{Pr}(B|A) and Pr(BAc)\text{Pr}(B|A^c). How can we find Pr(AB)\text{Pr}(A|B)?

Well, let’s use our rules.

  1. Always start from what you need to find. By definition:

    Pr(AB)=Pr(A,B)Pr(B)\text{Pr}(A|B) = \frac{\text{Pr}(A,B)}{\text{Pr}(B)}
  2. Let’s find the joint probabilities. If we have a complete joint probability table then we can find any conditionals we want. To find the conditional probability of BB given AA we need the joint Pr(B,A)\text{Pr}(B,A) . We can find it by multiplying down the appropriate path in the outcome tree:

Pr(A,B)=Pr(B,A)=Pr(A)×Pr(BA)\text{Pr}(A,B) = \text{Pr}(B,A) = \text{Pr}(A) \times \text{Pr}(B|A)
  1. We now have two ways of expressing the joint. Both are valid applications of the multiplication rule:

Pr(A,B)=Pr(B,A)={Pr(A)×Pr(BA)Pr(B)×Pr(AB)\text{Pr}(A,B) = \text{Pr}(B,A) = \begin{cases} & \text{Pr}(A) \times \text{Pr}(B|A) \\ & \text{Pr}(B) \times \text{Pr}(A|B) \end{cases}

We can compute the top line, and want the last term in the bottom line. Since the two lines return the same joint they are equal, and:

Pr(AB)=Pr(A,B)Pr(B)=Pr(A)×Pr(BA)Pr(B)\text{Pr}(A|B) = \frac{\text{Pr}(A,B)}{\text{Pr}(B)} = \frac{\text{Pr}(A) \times \text{Pr}(B|A)}{\text{Pr}(B)}
  1. By assumption, we know all the values in the numerator. That leaves the denominator. The demoninator is a marginal, so we can always expand it as we did in Section 1.4:

Pr(B)=Pr(A,B)+Pr(Ac,B)\text{Pr}(B) = \text{Pr}(A,B) + \text{Pr}(A^c,B)

Then, since both terms on the right hand side are joint probabilities, we can find them with the multiplication rule:

Pr(B)=Pr(A)×Pr(BA)+Pr(Ac)×Pr(BAc)\text{Pr}(B) = \text{Pr}(A) \times \text{Pr}(B|A) + \text{Pr}(A^c) \times \text{Pr}(B|A^c)

Putting the numerator and denominator together gives Baye’s rule:

It is usually more helpful to think about Baye’s rule in two stages. First, find all the joint probabilities by multiplying down the paths of the outcome tree that point to an event where BB occurs. Then, find the marginal probability that BB occurs by summing over the joint probabilities. Finally, take the ratio of joint to marginal that recovers the desired conditional.

Here are two examples:

Example: Base Rate Neglect

Let’s look at a practical problem where the Bayesian approach is necessary.

Suppose you are subject to a medical test that is designed to determine whether or not you have some medical condition. For example, you take a Covid or Flu test. The result of the test is important, since it will impact your behavior. For example, if you are sick, you might decide to stay home, or may invest in a medical intervention which is costly.

No test is perfectly accurate. In essentially all cases a test could predict that a healthy patient is sick, or that a sick patient is healthy. Let HH denote the event that the recipient is healthy, SS the event they are sick, NN the event that the tests returns negative (predicts healthy), and PP the event that the test returns positive (predicts sick). Then, there are four possible outcomes. We can arrange them just like we did a joint probability table:

EventNP
HTNFP
SFNTP

Here the labels TN, TP, FN, FP stand for (true/false) and (positive/negative). Notice that there are two ways the test can make a mistake. Either it falsely predicts positive or falsely predicts negative. Both rates matter. False positives are costly and can be harmful to the recipient if they take actions assuming they are sick. At worst, a false positive can lead to uneccessary medical intervention. False negatives are dangerous, since the recipient may act as if they are healthy, so may risk others’ health, or not take medical action that could address their condition.

It is standard in test design to control the false positive rate. That is, the conditional probability that the test returns positive if the truth is negative (i.e. the patient is healthy). The smaller the false positive rate, the more significant the test result, and the more specific the procedure. The other error rate, the probability the test misses a sick patient, controls the power of the test (its ability to detect sick patients) and its sensitivity (how sensitive it is to evidence of disease).

Let’s name these probabilities:

Pr(PH)=pFP,Pr(NS)=pFN\text{Pr}(P|H) = p_{FP}, \quad \text{Pr}(N|S) = p_{FN}

Suppose that:

Pr(PH)=pFP=0.05,Pr(NS)=pFN=0.01\text{Pr}(P|H) = p_{FP} = 0.05, \quad \text{Pr}(N|S) = p_{FN} = 0.01

This looks like a good test. It is reasonably selective/specific, since it only makes a false detection for 5 percent of patients. It’s also pretty powerful/sensitive. It only misses a true detection in 1 percent of sick patients. For reference the significance of a mammogram is 90%, so about 10% of healthy women are falsely flagged for breast cancer. The sensitivity of mammograms is 87%, so they have a false negative probability of 0.13.

Now suppose you take the test, and it returns positive. Should you act as if you are sick? What is the probability you were a false positive, and are actually healthy? In other words, what is Pr(HP)?\text{Pr}(H|P)?

Imagine that, of all patients who take the test, pSp_S percent are actually sick. This percent is sometimes called a base rate. Base rates can influence conditional chances in surprising ways.

In many settings, they are quite small. For example, only 0.5% of women screened for breast cancer in a mammogram are diagnozed for breast cancer in a follow up test. Since the mammogram is meant to filter for women with breast cancer, the population screened post mammogram should include more cases of breast cancer, so should have a higher base rate than the population of all women who take the mammogram. So, let’s be conservative, and assume that the second stage test is perfect. Then we can put a conservative upper bound on the base rate of cancer in the population of women undergoing the mammogram at pS0.005p_S \leq 0.005.

Let’s compute a lower bound on the chance a woman who recieves a positive mammogram does not have breast cancer via Bayes rule:

Pr(HP)=Pr(H,P)Pr(P)=Pr(H)×Pr(PH)Pr(H)×Pr(PH)+Pr(S)×Pr(PS)0.995×0.10.995×0.1+0.005×0.87=(1+0.0050.9950.870.1)1(1+0.04)1=100104=25260.95\begin{aligned} \text{Pr}(H|P) & = \frac{\text{Pr}(H,P)}{\text{Pr}(P)} \\ & = \frac{\text{Pr}(H) \times \text{Pr}(P|H)}{\text{Pr}(H) \times \text{Pr}(P|H) + \text{Pr}(S) \times \text{Pr}(P|S)} \\ & \geq \frac{0.995 \times 0.1}{0.995 \times 0.1 + 0.005 \times 0.87} \\ & = \left(1 + \frac{0.005}{0.995} \frac{0.87}{0.1} \right)^{-1} \\ & \approx (1 + 0.04)^{-1} = \frac{100}{104} = \frac{25}{26} \approx 0.95 \end{aligned}

That’s a shocking number. Read it again. Don’t skim it.

Even though only 10% of healthy women get flagged by the mammogram, then the fraction of women flagged by the mammogram who actually have breast cancer is at least 95%!

Pause to let that sink in. What’s happened here?

The problem is the base rate. The base rate of actual sick cases is so low that, even with a reasonably accurate test, the small fraction of healthy cases who are flagged as sick vastly outweighs the large fraction actually sick cases who are flagged as sick. Why? Because there are 995 healthy cases for every 5 sick cases.

What about our hypothetical test with Pr(PH)=pFP=0.05\text{Pr}(P|H) = p_{FP} = 0.05 and Pr(NS)=pFN=0.01\text{Pr}(N|S) = p_{FN} = 0.01. This is a much more accurate test. Can it filter out enough healthy cases so that a patient who receives a positive result is usually sick?

Pr(HP)=Pr(H,P)Pr(P)=Pr(H)×Pr(PH)Pr(H)×Pr(PH)+Pr(S)×Pr(PS)0.995×0.050.995×0.05+0.005×0.99=(1+0.0050.9950.990.05)1(1+0.1)1=1011=25260.91\begin{aligned} \text{Pr}(H|P) & = \frac{\text{Pr}(H,P)}{\text{Pr}(P)} \\ & = \frac{\text{Pr}(H) \times \text{Pr}(P|H)}{\text{Pr}(H) \times \text{Pr}(P|H) + \text{Pr}(S) \times \text{Pr}(P|S)} \\ & \geq \frac{0.995 \times 0.05}{0.995 \times 0.05 + 0.005 \times 0.99} \\ & = \left(1 + \frac{0.005}{0.995} \frac{0.99}{0.05} \right)^{-1} \\ & \approx (1 + 0.1)^{-1} = \frac{10}{11} = \frac{25}{26} \approx 0.91 \end{aligned}

Even with the better test, the chance that a patient who received a positive result is actually healthy is still greater than 90%.

This is why we should use multistage test procedures when false positives are a problem, and base rates are low. Forgetting to account for the base rate is sometimes called base rate neglect.