Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

4.13 Quantifying the Data Generating Process

So far we have done a lot just specifying what our model might be, writing word equations, and exploring possible explanations for the variation we see in our data. These are all qualitative models because we have not quantified anything about our models.

At this point we have gone about as far as we can go with qualitative models of the DGP. We have practiced making visualizations of distributions of a single variable, and working to imagine and represent the DGP that might have generated the distribution. We have also experienced, in an intuitive way, what it means to explain variation in one variable with variation in another.

But there are important questions that we can’t answer until we are able to create quantitative statistical models. For example: although we now know intuitively that total variation can be partitioned into explained variation and unexplained variation, we have no way of specifying the percentage of variation that is explained or unexplained. If we wanted to compare two variables and ask: which one explains more variation in a particular outcome variable, we would not have a ready way to answer.

By the same token, although we can intuitively see what it means to be able to make a “better guess” as to a particular observations’ value on the outcome if we know their value on an explanatory variable, we don’t yet have a method for making a precise quantitative prediction of what the outcome might be. And even more important, we have no way, yet, of knowing, in a quantifiable way, how far off our prediction might be.

Finally, we have presented an intuitive idea of what a Type I error is. But we haven’t developed any way of quantifying the likelihood, in a specific situation, that we might have made a Type I error. (Again, we will come back to this idea in later chapters, so no need to worry too much about it at this point.)

These are questions we can’t answer just by looking at graphs and tables. For this, we will need to construct statistical models of the DGP, use those models to make predictions, and evaluate the accuracy of our predictions. This is where we are going in the next section of the course.

Mid-Course Survey #1

You’re one-third through the book! Please tell us about your experience so far. (Estimated time: 3 minutes)

Responses