Course Outline
- 
        segmentGetting Started (Don't Skip This Part)
- 
        segmentStatistics and Data Science II
- 
        segmentPART I: EXPLORING AND MODELING VARIATION
- 
        segmentChapter 1 - Exploring Data with R
- 
        segmentChapter 2 - From Exploring to Modeling Variation
- 
        segmentChapter 3 - Modeling Relationships in Data
- 
        segmentPART II: COMPARING MODELS TO MAKE INFERENCES
- 
        segmentChapter 4 - The Logic of Inference
- 
        segmentChapter 5 - Model Comparison with F
- 
        segmentChapter 6 - Parameter Estimation and Confidence Intervals
- 
        segmentPART III: MULTIVARIATE MODELS
- 
        segmentChapter 7 - Introduction to Multivariate Models
- 
        segmentChapter 8 - Multivariate Model Comparisons
- 
                
                  8.11 Error and Inference from Models with Multiple Quantitative Predictors
 
- 
        segmentChapter 9 - Models with Interactions
- 
        segmentChapter 10 - More Models with Interactions
- 
        segmentFinishing Up (Don't Skip This Part!)
- 
        segmentResources
list High School / Statistics and Data Science II (XCD)
8.11 Error and Inference from Models with Multiple Quantitative Predictors
Unpacking the ANOVA Table for FEV ~ HEIGHT + AGE
As with all statistical models, this one produces a predicted value on the outcome variable for every data point. By subtracting each predicted value from the actual value in the data we get residuals, and from there we get sums of squares, PRE, and F. Everything works the same way here as with previous models.
Add some code to the window below to generate the ANOVA table for the FEV ~ HEIGHT + AGE model.
require(coursekata)
# saves the multivariate model
multi_model <- lm(FEV ~ HEIGHT + AGE, data = fevdata)
# write code to produce the ANOVA table
# saves the multivariate model
multi_model <- lm(FEV ~ HEIGHT + AGE, data = fevdata)
# write code to produce the ANOVA table
supernova(multi_model)
ex() %>%
  check_function("supernova") %>%
  check_result() %>%
  check_equal()Analysis of Variance Table (Type III SS)
Model: FEV ~ HEIGHT + AGE
                               SS  df      MS        F    PRE     p
------ --------------- | ------- --- ------- -------- ------ -----
  Model (error reduced) | 376.245   2 188.122 1067.956 0.7664 .0000
 HEIGHT                 |  95.326   1  95.326  541.157 0.4539 .0000
    AGE                 |   6.259   1   6.259   35.532 0.0518 .0000
  Error (from model)    | 114.675 651   0.176                      
 ------ --------------- | ------- --- ------- -------- ------ -----
  Total (empty model)   | 490.920 653   0.752   There are many things you could have observed. We notice that the PRE for the whole model is .77 (rounded) so this model explains a lot of error. We also noticed that height uniquely reduces error more than age. We also noticed huge Fs for every row (Fs larger than 4 are worth talking about and these are way bigger than that) – for the degrees of freedom we spent, we have reduced a lot of error.
Comparing Models of the DGP
We’ve been able to explain a lot of the variation in the data with this model. But is this a good model of the DGP? We need to engage in some model comparison to decide which model we will select as our best model of the DGP.
Just because the p-values are below our .05 cutoff for rejecting the simpler models, however, doesn’t necessarily mean we should adopt the multivariate model as our preferred model of the DGP. In this case, it’s also smart to look at the single-predictor models for HEIGHT and AGE, especially since there is apparently a lot of overlap between these predictors.
Below we have put the ANOVA tables for three models: the multivariate model, the height model, and the age model.
Multivariate Model: FEV ~ HEIGHT + AGE
                               SS  df      MS        F    PRE     p
 ------ --------------- | ------- --- ------- -------- ------ -----
  Model (error reduced) | 376.245   2 188.122 1067.956 0.7664 .0000
 HEIGHT                 |  95.326   1  95.326  541.157 0.4539 .0000
    AGE                 |   6.259   1   6.259   35.532 0.0518 .0000
  Error (from model)    | 114.675 651   0.176                      
 ------ --------------- | ------- --- ------- -------- ------ -----
  Total (empty model)   | 490.920 653   0.752   
Height Model: FEV ~ HEIGHT
                              SS  df      MS        F    PRE     p
 ----- --------------- | ------- --- ------- -------- ------ -----
 Model (error reduced) | 369.986   1 369.986 1994.731 0.7537 .0000
 Error (from model)    | 120.934 652   0.185                      
 ----- --------------- | ------- --- ------- -------- ------ -----
 Total (empty model)   | 490.920 653   0.752                      
Age Model: FEV ~ AGE
                              SS  df      MS       F    PRE     p
 ----- --------------- | ------- --- ------- ------- ------ -----
 Model (error reduced) | 280.919   1 280.919 872.184 0.5722 .0000
 Error (from model)    | 210.001 652   0.322                     
 ----- --------------- | ------- --- ------- ------- ------ -----
 Total (empty model)   | 490.920 653   0.752 
Same Model, Different Names
We have now learned how to fit models with quantitative outcome variables and various types and numbers of predictor variables (categorical, quantitative, or both). As we have seen, all of these models can be understood through the common framework of the General Linear Model.
Out in the world, however, people will often use specialized terms to refer to models with different numbers and types of variables. Here is a table with some of the examples we have looked at and the special names people give to those models.
| Example | Description | Common Name | 
|---|---|---|
| PriceK ~ Neighborhood(with 2 possible neighborhoods) | a model with a single two-group predictor variable | t-test | 
| PriceK ~ Neighborhood(3+ possible neighborhoods) | a model with a single more-than-two-group predictor variable | one-way ANOVA (Analysis of Variance) | 
| PriceK ~ HomeSizeK | a model with a single quantitative predictor | simple regression | 
| PriceK ~ Neighborhood + HomeSizeK | a model with at least one categorical and one quantitative variable (sometimes called the “covariate”) | ANCOVA (Analysis of Covariance) | 
| tip_percent ~ condition + gender | a model with two categorical variables | two-way ANOVA | 
| FEV ~ HEIGHT + AGE | a model with multiple quantitative variable | multiple regression | 
It’s good for you to become familiar with some of these names. However, the understanding that you have is much more powerful: you see that all of these are variations of one super useful idea – the General Linear Model. The reason these different names arose in the first place was because each technique was historically developed to solve a specific problem in statistics and data analysis. Later, people discovered how they were connected.
Although some people prefer the specialized names, even experts have a hard time keeping all these names straight. There are well known “cheatsheets” (such as this one called Common Statistical Tests Are Linear Models) that help people remember what all these different models can be called. But you know the truth: they are all just variants of the general linear model.