Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

7.1 Explaining Variation


segmentChapter 8  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 9  The Logic of Inference

segmentChapter 10  Model Comparison with F

segmentChapter 11  Parameter Estimation and Confidence Intervals

segmentChapter 12  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
Chapter 7  Adding an Explanatory Variable to the Model
7.1 Explaining Variation
Having started with the empty model, you may be feeling frustrated. Statistics, we have said, is about explaining variation. But in what sense have we explained variation with the empty model? Yes, the mean is the point in the distribution that reduces the sum of squares to its lowest point. But surely that doesn’t count as explanation!
Indeed it does not! We started with the empty model in order to get some important ideas across, but certainly that’s not where we want to end up. It is time we start building models that include explanatory variables. We will still use the empty model, but only as a reference point.
Let’s start by reviewing what we mean by explaining variation. Earlier in the course, we developed an intuitive idea of explanation by comparing the distribution of one variable across two different groups. So, for example, we looked at the distribution of thumb length broken down by sex, which we can see in the two density histograms below.
You can clearly see that sex explains some of the variation in thumb length in our data. (This may not be true in the Data Generating Process, of course. It’s always possible that we are being fooled by a sample that doesn’t accurately represent what’s true in the DGP.) When we break up thumb length by sex it looks like two separate, though overlapping distributions. In general, males have longer thumbs than females in our data.
If we assume that this relationship (between sex and thumb length) exists in the population (or DGP), and not just in our data, we can use it to help us make a better prediction about a future observation. If you know that someone is male, you would make a different prediction of their thumb length than if you knew they were female.
Earlier in the course we expressed this idea in a word equation like this:
THUMB LENGTH = SEX + ERROR
What this says, is: sex explains some of the variation in thumb length, but other things also affect thumb length.
Building on the previous section, let’s now try to state this relationship more precisely (that is, quantitatively) as a statistical model.