CourseKata - 4.14 Quantifying the Data Generating Process

4.14 Quantifying the Data Generating Process

So far we have done a lot just specifying what our model might be, writing word equations, and exploring possible explanations for the variation we see in our data. These are all qualitative models because we have not quantified anything about our models.

At this point we have gone about as far as we can go with qualitative models of the DGP. We have practiced making visualizations of distributions of a single variable, and working to imagine and represent the DGP that might have generated the distribution. We have also experienced, in an intuitive way, what it means to explain variation in one variable with variation in another.

But there are important questions that we can’t answer until we are able to create quantitative statistical models. For example: although we now know intuitively that total variation can be partitioned into explained variation and unexplained variation, we have no way of specifying the percentage of variation that is explained or unexplained. If we wanted to compare two variables and ask: which one explains more variation in a particular outcome variable, we would not have a ready way to answer.

By the same token, we have developed an intuitive idea that when we are able to explain variation in one variable with variation in another, it will help us to predict a score on the first variable if we know the score on the second variable. But we don’t yet know how to literally turn that prediction into a number—a quantitative prediction. And even more important, we have no way of knowing, in a quantifiable way, how far off our prediction might be.

Finally, we have presented an intuitive idea of what a Type I error is. But, we haven’t developed any way of quantifying the likelihood, in a specific situation, that we might have made a Type I error.

These are questions we can’t answer just by looking at graphs and tables. We will need to quantify the DGP, and construct statistical models. This is where we are going in the next section of the course.

End of Chapter Survey

Mid-Course Survey #1

You’re one-third through the book! Please tell us about your experience so far. (Estimated time: 5 minutes)

4.13 Shuffling Can Help Us Understand Real Data Better 4.15 Chapter 4 Review Questions