Chapter Summary - Data 89 Course Notes

Estimators¶

All definitions and results are available in Section 12.1.

In an estimation problem we aim to estimate an unknown quantity, $\theta$ , from observed data, $\{X_j = x_j\}_{j=1}^n$ . The function that accepts the observed data, and returns an estimate, is the estimator, $\hat{\theta}(x_1,x_2,..,x_n)$ .
If the observed quantity is a summary characteristic of the distribution that produced the sample data (e.g. an unknown expected value, median, standard deviation, variance, covariance, correlation), and the data is drawn independently and identically, then it is standard practice to estimate the summary quantity by computing the associated summary of the empirical distribution instead.
- The empirical distribution is the distribution when we select an observed data point uniformly at random. It is the distribution of $W = x_J$ when $J$ is drawn uniformly from $\{1,2,...,n\}$ . The empirical distribution is the distribution of the observed data.
- Then we can estimate unknown expectations with sample averages:
$\mathbb{E}[X_j] \approx \mathbb{E}_{J}[x_J] = \frac{1}{n} \sum_{j=1}^n x_j.$
(1)
$\text{Var}[X_j] \approx \text{Var}_J[x_J] = \frac{1}{n} \sum_{j=1}^n (x_j - \bar{x})^2 \text{ where } \bar{x} = \frac{1}{n} \sum_{j=1}^n x_j = \mathbb{E}_J[x_J].$
(2)
$\text{Cov}[X_j,Y_j] \approx \text{Cov}[x_J,y_J] = \frac{1}{n} \sum_{j=1}^n (x_j - \bar{x})(y_j - \bar{y}), \quad \text{Corr}[X_j,Y_j] \approx \text{Corr}[x_J,y_J] = \frac{\text{Cov}_J[x_J,y_J]}{\text{SD}_J[x_J] \text{SD}_J[y_J]}$
(3)
If we hypothesize that our data was generated by a distribution with some unknown parameters, then we can estimate those parameters with the parameter values that, if true, would have made the observed data most likely. That is with the maximum likelihood estimates.
- Note, the maximum likelihood estimate is the parameter that, if true, would maximize the likelihood of the observed data not the most likely parameter value conditional on the observed data.
- Some, but not all, maximum likelihood estimators are sample averages.

Properties of Estimators¶

All definitions and results are available in Section 12.2.

An estimator, $\hat{\theta}$ is consistent if it is guaranteed to converge to the true, unknown value, $\theta$ , in the limit of infinitely many observations.
The bias in an estimator is the expected error in its estimates: $\mathbb{E}_X[\hat{\theta}(X) - \theta] = \mathbb{E}_{X}[\hat{\theta}(X)] - \theta$ .
- An estimator overestimates on average if the bias is positive, and underestimates on average if the bias is negative.
- An estimator is unbiased if the bias is zero, i.e. if:
$\mathbb{E}_{X}[\hat{\theta}(X)] = \theta.$
(4)
The precision in an estimator can be measured using the standard deviation in the estimator, $\text{SD}[\hat{\theta}(X)]$ .
- There is often a trade-off between precise estimators, that do not vary much and are not highly sensitive to the data used to form the estimate, and the bias in estimates. Often, to reduce the variance in the estimator, we have to introduce a bias towards our preferred estimate in the absence of data.
The accuracyof an estimator is the expected size of the errors in its estimates. Often we measure the expected size with a mean squared error:
$\text{MSE}(\hat{\theta}) = \mathbb{E}_{X}[(\hat{\theta}(X) - \theta)^2]$
(5)
- The mean square error can be decomposed into a contribution of a term associated with the bias in the estimator, and the variance in the estimator:
$\text{MSE}(\hat{\theta}) = \text{bias}(\hat{\theta})^2 + \text{Var}_X[\hat{\theta}(X)].$
(6)

12.3 Chapter Summary

Estimators¶

Properties of Estimators¶