Part III: Evaluating Models

You have learned a lot so far. You now know something about the concept of distribution, and you have explored two types of distributions: the distribution of sample data, and the distribution of the population or Data Generating Process (DGP). You have learned how to specify and fit statistical models to data; how to quantify the fit of a model, and compare the fit of two models; and how to use models to make predictions and further sharpen your understanding of the DGP.

Although these models fit our data as well as possible, minimizing error as much as possible, there is still one problem: we don’t really know how good they are as models of the Data Generating Process. We know how good they are as models of our data, but not as models of the population from which our data came.

In Part III of the course we take up this issue. We start by more fully explaining the problem, and then take you through the solutions that statisticians use to evaluate models and quantify the error around our parameter estimates.

All of these solutions depend on the third distribution of the Distribution Triad, the sampling distribution (or as we will also refer to it, the distribution of estimates). Sampling distributions will be our focus as we enter the next part of the course. While the population distribution is hidden but real, this new kind of distribution is imaginary! Sampling distributions require you to think, “What if…?!” Let’s dive in!

9.13 Chapter 9 Review Questions 2 10.1 The Problem of Inference