Statistics and Data Science: A Modeling Approach
Part III: Evaluating Models
Introduction to Part III: Evaluating Models
You have learned a lot so far. You now know something about the concept of distribution, and you have a real sense of what a statistical model is. More importantly, you know many of the moves and ways of thinking used by data analysts: how to examine distributions and use them to help further your understanding of the Data Generating Process (DGP); how to specify and fit models to data; how to quantify the fit of a model, and compare the fit of two models; and how to use models to make predictions and further sharpen your understanding of the DGP.
Although these models fit our data as well as possible, minimizing error to the degree possible, there is still one problem: we don’t really know how accurate they are as models of the Data Generating Process. We know how accurate they are as models of our data, but not as models of the population from which our data came.
In Part III of the course we take up this question. We start by more fully explaining the problem, and then take you through the solutions that statisticians use to evaluate models and assess the accuracy of parameter estimates.
All of these solutions depend on the third distribution of the Distribution Triad, the sampling distribution (or as we will also refer to it, the distribution of estimates). Sampling distributions will be our focus as we enter the next part of the course. While the population distribution is hidden but real, this new kind of distribution is imaginary! Sampling distributions require you to think, “What if…?!” Let’s dive in!