A random vector is a ordered list of finitely many random variables, . Random vectors may be discretely, or continuously, distributed. If they are discretely distributed, then the random vector is selected from a finite, or countably infinite, set of possible vectors. If continuously distributed, then the vector is selected from an uncountably infinite set of possible vectors, such that for any specific .
Like random variables, random vectors may be characterized using distribution functions. In the discrete case, we use an analog to the probability mass function. In the continuous case we use an analog to the probability density function.
The Discrete Case¶
In the discrete case, we can define a joint probability mass function. Given a possible vector , the joint probability mass function returns the chance:
In the bivariate setting, we can represent the joint probability mass function with a joint distribution table (see Section 1.4). The columns correspond to possible values for the first entry, and the rows correspond to possible values for the second entry. For instance, if and , then we could represent the joint distribution with the table:
| Event | |||
|---|---|---|---|
| 0 | |||
| 0 |
As usual, we can expand the table by appending the marginal probabilities. These are probabilities like, (sum down the first column), or (sum across the second row).
| Event | Marginals | |||
|---|---|---|---|---|
| 0 | ||||
| 0 | ||||
| Marginals | 1 |
The bottom row provides the marginal mass function for :
The rightmost column provides the marginal mass function for :
To isolate conditional mass functions from the joint mass function, isolate either a row or column by fixing either or , then normalize the isolated row/column. For example, if we fix , then we should isolate the first row of the table:
| Event | |||
|---|---|---|---|
If we fix , then we should isolate the second row:
| Event | |||
|---|---|---|---|
| 0 |
If we’d fixed , then we woul isolate the first column.
Notice that, isolating a row corresponds to fixing one input variable, while letting the other vary. Similarly, isolating a column correspond to fixing one input, and letting the other variable. This is the same procedure we used to define cross-sections. Start with a scalar valued function of multiple inputs (in this case, a column and row index). Then, fix one input, and let the other vary.
Since isolating a row or column is equivalent to extracting a cross-section, each conditional mass function is proportional to a cross-section of the joint mass function.* The particular cross-section (row or column isolated) is selected by the conditioning statement. We’ll see that the same intuition extends to the continuous case.
The Continuous Case¶
In the continuous setting, the probability of any exact event is zero (see Section 2.3). So:
So, like continuous random variables are characterized by density functions, continuous random vectors are characterized by joint dentisy functions.
This definition extends the definition in one-dimension. In one-dimension, a probability density is the chance that a random variable lands in a small interval, relative to the length of the interval. In two-dimensions, is the chance that a random variable lands in a small square, relative to the area of the square. In three-dimensions, is the chance that a random variable lands in a small cube, relative to the volume of the cube. In each case, we recover a density by computing the chance that the random vector lands in some small region, relative to the size of the region, in the limit as the region contracts to a point, .
It follows that, joint density functions don’t return chances, but return chances per unit volume. In two dimensions, returns chance per unit area.
In the special case when , is a function of two inputs, and , that returns a scalar value. So, in two-dimensions, defines a surface over two variables. More generally, the joint density function is a scalar-valued function of multiple variables. So, joint density functions are surfaces.
So, to visualize density functions, we can borrow the same techniques we developed in Section 8.2.
Examples¶
You can experiment with the case by running the code cell below, then selecting “Independent Exp.”
from utils_lsg import show_level_sets
show_level_sets()Notice that, all of the level sets are lines with slope negative 1. This follows since is constant if is constant. When , we have so . So, every level set is a line with slope -1.
You can experiment with the case by running the code cell below, then selecting “Independent Laplace”
from utils_lsg import show_level_sets
show_level_sets()Try computing the level sets of . Convince yourself that these should form concentric diamonds centered at the origin.
You can experiment with the associated density function by running the code cell below. Select “Independent Normal.”
from utils_lsg import show_level_sets
show_level_sets()Why are the level sets concentric circles? What would change if we’d used a density function proportional to ?
The three previous examples illustrate a useful fact. If is a composition of functions, then its level sets are all level sets of the inner function . If and both produce , then .
For example:
This density also has circular level sets, since it is a composition whose innermost function is . You can experiment with it by running the code cell below. Select “Student-t”. Like the normal distribution, it is bell-shaped, however, unlike the normal, its tails decay at power law rates.
from utils_lsg import show_level_sets
show_level_sets()Working with Joint Density Functions¶
Recall that, to find the chance a random variable lies in an interval, we compute the area under the density function over the interval:
The same idea extends to random vectors.
We will study integration in multiple variables in detail later in the course. For now, it is enough to recognize the analogy to the univariate case. Chances are given by integrating density functions over the region defined by an event. So, when we are in mutliple dimensions, picture the volume under a segment of a surface, rather than the area under a segment of a curve.
We’ll see that essentially every equation used to answer probability questions witha joint density are either directly analogous to the corresponding equation in a single variable, or are natural analogs to the procedures applied to joint distribution tables, if we substitute sums for integrals. For example,
In each case, integrate out the variable we are not interested in. This is exactly analogous to summing across a row, or down a column. So, marginal densities return the area under cross-sections of the joint density surface. The same definition extends easily to higher dimensions. For example, if , we can find marginal densities by integrating out two of the three variables.
Conditional densities follow in the same fashion:
Notice the immediate analogy to the division rule for conditional distributions (see Section 1.5). As usual, conditional is joint divided by marginal.
Notice also that, conditional densities are, as functions of their input, proportional to cross-sections of the joint density surface. For example, fixing isolates the cross-section of where can vary, but is fixed at 3. So, conditional density functions are proportional to cross-sections of the joint density. The proportionality constant is the associated marginal density. It is the area under the selected cross-section.
It follows that the division and multiplication rules introduced in Section 1.5 extend naturally to joint densities. In particular:
If and are jointly distributed, continuous random variables, then is a continuously distributed random vector with joint density function:
So, as usual, joint densities equal marginal densities time conditional densities.
Applying either the joint density definition, or the multiplication rule, to independent variables establishes the usual result that, joint distributions are products of marginals if and only if the variables are independent. In the continuous setting, joint densities are products of their marginals if and only if the components of the associated random vector are independent.
Expectations are also defined in the usual fashion:
So,
The rules for computing chances, finding marginal densities, finding conditional densities, checking independence, and evaluating expectations are analogous to all of the same procedures we developed in Sections 1.5 and 1.6, as long as we:
work with densities instead of mass functions, and,
replace all sums over possible values with integrals over possible vectors.