Chapter Summary - Data 89 Course Notes

Phew. That was a lot to think through.

Here’s a summary of the important definitions and results from Chapter 1. Reading the summary is not a substitute for reading the chapter. It will provide most of the information to complete your studysheets, but copying results from the summary to the studysheet is not a substitute for completing the sheet yourself. Make sure you can find where each result listed here was explained in the main chapter text.

If there is one table to summarize the chapter, it is this:

Logic, Sets, and Algebra

Logical	Set Operation	Notation	Chance Rule
not	complement	$^c$	$1 - ...$
or	union	$\cup$	$+$ if disjoint
if	restrict $\Omega$	$\mid$	joint $/$ marginal
and	intersect	$\cap$	marginal $\times$ conditional

Outcomes, Events, and Sets¶

These results are all explained in Section 1.1.

A random process is some process that produces unpredictable outcomes
- An outcome is a specific, distinct, result of the process
- The outcome space, $\Omega$ , is the set of all possible outcomes
- An event is any collection of outcomes. Events, $E$ are subsets of $\Omega$ .
Sets may be defined:
- explicitly by listing their entries. For example, $A = \{a,b,c\}$ .
- implicitly by defining rules that the entries all must satisfy, and that, if satisfied, ensure membership in the set. For example, $A = \{\text{all letters before } d \text{ in the alphabet} \}.$
- The size of a set, $A$ , is denoted $|A|$ . If a set is finite, then its size is the number of entries in the set.
Logic and set operations
- Sets can be defined by combining a collection of rules into logical sentences. For instance, $S = \{\text{all letters before } d \text{ or after } w\} = \{a,b,c,y,z\}$
- Appending not before a set’s implicit definition produces the set complement. For example: $\text{not } A = A^c = \{\text{all outcomes not in } A\}$
- Concatenating sets with an or produces their union. For example, $S = A \cup V$ if $V = \{y,z\}$ .
- Concatenating sets with an and produces their intersection. For example, $B = \{b,c,d,e\}$ then $A \cap B = \{b,c\}$ .
- Modifying a probability statement with an if adds conditions that restrict the space of possible outcomes $\Omega$ . We denote if with a vertical bar |.
- In summary:

Logical	Set Operation	Notation
not	complement	$^c$
or	union	$\cup$
if	restrict $\Omega$	$\mid$
and	intersect	$\cap$

Probability as Proportion¶

These results are all explained in Section 1.2.

A probability measure is a function that accepts events and returns their chance. We denote the measure, $\text{Pr}(\cdot)$ so $\text{Pr}(A)$ is the chance the event $A$ occurs.
Probability as Frequency: The chance of an event equals the long run frequency with which it would occur in an arbitrarily long sequence of trials
- It follows that:
  - All chances are between 0 and 1
  - The chance that something happens, $\text{Pr}(\Omega)$ , equals 1
  - Chances for disjoint events add: $\text{Pr}(A \cup B) = \text{Pr}(A) + \text{Pr}(B)$ if $A$ and $B$ are disjoint.
  - Expanding an event to include more outcomes never makes it less likely. Contracting an event so it includes fewer outcomes never makes it more likely.
We say that all outcomes are equally likely if:
- They would occur with the same long run frequency
- We have no better model and want to start simple
- The features that distinguish outcomes cannot possibly influence their frequency, or the process that selects outcomes
If all outcomes are equally likely then probability is equivalent to proportion:
- The probability of every outcome is $1/|\Omega|$ where $|\Omega|$ is the number of possible outcomes
- The probability of every event is:
$\text{Pr}(E) = \frac{|E|}{|\Omega|} = \frac{\text{the number of ways } E \text{ can happen}}{\text{the number of distinct things that can happen}}$
(1)
- So, if all outcomes are equally likely, we can compute probabilities by (a) enumerating the outcome space, (b) counting the number of possible outcomes, (c) enumerating the event, (d) counting the number of ways the event can happen, and (d) evaluating their ratio.

The Rules of Chance¶

All of these results are explained in Section 1.3.

A probability model is a choice of outcome space, all relevant events, and probability measure, such that:
1. Nonnegativity: $\text{Pr}(E) \geq 0$ for all events $E$ .
2. Normalization: $\text{Pr}(\Omega) = 1$ .
3. Additivity: $\text{Pr}(A \cup B) = \text{Pr}(A) + \text{Pr}(B)$ if $A$ and $B$ are disjoint.
Ensuing probability rules:
1. Complements: $\text{Pr}(E^c) = 1 - \text{Pr}(E)$
2. Sub-additivity: $\text{Pr}(A \cup B) = \text{Pr}(A) + \text{Pr}(B) - \text{Pr}(A \text{ and } B) \leq \text{Pr}(A) + \text{Pr}(B).$

Joint and Marginal Probability¶

All of these results are explained in Section 1.4.

A joint probability is the probability that two events both happen: $\text{Pr}(A,B) = \text{Pr}(A \text{ and }B) = \text{Pr}(A \cap B)$
- Since $A \cap B$ is contained in both $A$ and $B$ , $\text{Pr}(A,B) \leq \text{min}\{\text{Pr}(A),\text{Pr}(B)\}$ .
- Given a collection of joint probabilities, $\text{Pr}(A,B), \text{Pr}(A,B^c), \text{Pr}(A^c,B), \text{Pr}(A^c,B^c)$ the marginal probabilities are the chances of the individual events, $\text{Pr}(A), \text{Pr}(A^c), \text{Pr}(B), \text{Pr}(B^c)$ .
- The act of breaking an event into all the ways it can occur is called partitioning (breaking into disjoint parts)
- The act of summing the chances of disjoint parts is called marginalization
Joint and marginal probabilities may be arranged into a joint probability table where
- The sum of the joint probabilities in any row or column must add to the corresponding marginal
- The sum of all joint probabilities must equal 1
- The sum of any pair of marginals must equal 1

For example:

Event	$A$	not $A$	$B$ Marginals
$B$	$\text{Pr}(A,B)$	$\text{Pr}(A^c,B)$	$\text{Pr}(B)$
not $B$	$\text{Pr}(A,B^c)$	$\text{Pr}(A^c,B^c)$	$\text{Pr}(B^c)$
$A$ Marginals	$\text{Pr}(A)$	$\text{Pr}(A^c)$	1

Conditional Probability¶

All of these results are explained in Section 1.5.

A conditional probability is the probability of one event given that another occurs: $\text{Pr}(B|A) = \text{Pr}(B \text{ if } A)$
- Conditioning on an event, $A$ , restricts the set of possible outcomes to $A$
- Conditioning on $A$ does not change the relative likelihood (e.g. the odds) of any outcomes in $A$
Normalization is the action of scaling a list of nonnegative numbers by their sum
To find conditional probabilities from a joint probability table:
1. Excerpt the appropriate rows or columns of the joint table
2. Scale all entries by their sum, which equals the marginal assigned to the row/column (e.g. normalize)
The conditional probability of $B$ given $A$ is always the ratio of a joint to a marginal:

\text{Pr}(B|A) = \frac{\text{Pr}(B,A)}{\text{Pr}(A)}

(2)

The multiplication rule expresses any joint as a product of a marginal and a conditional:

\text{Pr}(A,B) = \text{Pr}(A) \times \text{Pr}(B|A)

(3)

An outcome tree is a diagram with one node for every possible event in a sequence of events, arrows for possible transitions between nodes, labelled by the marginal, or conditional, probabilities of the transition.
- We can use the multiplication rule to compute chances by evaluating products along paths in outcome trees
Bayes Rule recovers $\text{Pr}(A|B)$ from marginals for $A$ and conditionals for $B$ given $A$ :

\text{Pr}(A|B) = \frac{\text{Pr}(A,B)}{\text{Pr}(B)} = \frac{\text{Pr}(A) \text{Pr}(B|A)}{\text{Pr}(A,B) + \text{Pr}(A^c,B)} = \frac{\text{Pr}(A) \text{Pr}(B|A)}{\text{Pr}(A) \text{Pr}(B|A) + \text{Pr}(A^c) \text{Pr}(B|A^c)}

(4)

Independent Events¶

All of these results are explained in Section 1.6.

Events $A$ and $B$ are independent if and only if any of the following are true:
- Knowing the outcome of one tells us nothing about the other.
- $\text{Pr}(A|B) = \text{Pr}(A), \quad \text{Pr}(B|A) = \text{Pr}(B)$
  - that is, the conditionals equal the marginals because we learn nothing by conditioning
- $\text{Pr}(A|B) = \text{Pr}(A|B^c), \quad \text{Pr}(B|A) = \text{Pr}(B|A^c)$
  - that is, the conditionals don’t depend on the conditioning statement, since the events tell us nothing about each other
- $\text{Pr}(A,B) = \text{Pr}(A) \times \text{Pr}(B)$
  - that is, the joint is the product of the marginals
  - This is a special case of the general multiplication rule. Only use it for independent events.
  - This is useful for computing joint probabilities and checking independence.
  - Do not take this as the definition of independence. It’s really a consequence
If two events are not independent, then they are dependent.

1.7 Chapter Summary

Outcomes, Events, and Sets¶

Probability as Proportion¶

The Rules of Chance¶

Joint and Marginal Probability¶

Conditional Probability¶

Independent Events¶