## Course Outline

• segmentGetting Started (Don't Skip This Part)
• segmentStatistics and Data Science: A Modeling Approach
• segmentPART I: EXPLORING VARIATION
• segmentChapter 1 - Welcome to Statistics: A Modeling Approach
• segmentChapter 2 - Understanding Data
• segmentChapter 3 - Examining Distributions
• segmentChapter 4 - Explaining Variation
• segmentPART II: MODELING VARIATION
• segmentChapter 5 - A Simple Model
• segmentChapter 6 - Quantifying Error
• segmentChapter 7 - Adding an Explanatory Variable to the Model
• segmentChapter 8 - Models with a Quantitative Explanatory Variable
• segmentPART III: EVALUATING MODELS
• segmentChapter 9 - The Logic of Inference
• segmentChapter 10 - Model Comparison with F
• segmentChapter 11 - Parameter Estimation and Confidence Intervals
• segmentPART IV: MULTIVARIATE MODELS
• segmentChapter 12 - Introduction to Multivariate Models
• segmentChapter 13 - Multivariate Model Comparisons
• segmentFinishing Up (Don't Skip This Part!)
• segmentResources

### list Statistics and Data Science: A Modeling Approach

Book
• College / Advanced Statistics and Data Science (ABCD)
• College / Statistics and Data Science (ABC)
• High School / Advanced Statistics and Data Science I (ABC)
• High School / Statistics and Data Science I (AB)
• High School / Statistics and Data Science II (XCD)

## 13.4 Inference for Targeted Model Comparisons

By using targeted model comparisons, we can compare a complex model (with two predictors) to a simpler one with just a single predictor. This allows us to see how one variable (e.g., HomeSizeK) in the multivariate model uniquely improves the fit of the model to the data, even after controlling for the effect of other predictors (e.g., Neighborhood).

But just the fact that HomeSizeK reduces error in our data over a model that doesn’t include it does not show that it is a better model of the DGP. For that, we need to rule out the possibility that the simple model of the DGP could have produced an F (or PRE) for the HomeSizeK effect as large as the one we observed in the data.

For the HomeSizeK effect, we are comparing these two models of the DGP (expressed in both R code and GLM notation:

Model R Code GLM Notation
Complex PriceK ~ Neighborhood + HomeSizeK $$PriceK_i= \beta_0 + \beta_1NeighborhoodEastside_{i} + \beta_2HomeSizeK_{i} + \epsilon_i$$
Simple PriceK ~ Neighborhood  $$PriceK_i= \beta_0 + \beta_1NeighborhoodEastside_{i} + \colorbox{yellow}{(0)}HomeSizeK_{i} + \epsilon_i$$

We have highlighted a different way of describing the simple Neighborhood model. It is a model where the additional effect of HomeSizeK is 0. Could this simpler DGP produce an F as large as the one we observed in our data?

### F and p-value in the ANOVA Table

The answer to this question is summarized by the p-values in the ANOVA table below. The supernova() function uses a mathematical model of the F distribution, assuming that the simpler of the two models being compared is a true model of the DGP. It then looks to see how likely the observed F would be to have resulted in a world in which the simpler model is true and any effect of the additional predictor is only due to randomness.

Analysis of Variance Table (Type III SS)
Model: PriceK ~ Neighborhood + HomeSizeK

SS df        MS      F    PRE     p
------------ --------------- | ---------- -- --------- ------ ------ -----
Model (error reduced) | 124402.900  2 62201.450 17.216 0.5428 .0000
Neighborhood                 |  27758.138  1 27758.138  7.683 0.2094 .0096
HomeSizeK                 |  42003.739  1 42003.739 11.626 0.2862 .0019
Error (from model)    | 104774.201 29  3612.903
------------ --------------- | ---------- -- --------- ------ ------ -----
Total (empty model)   | 229177.101 31  7392.810


The p-value on the Model row (.0000) means that there is less than a .0001 chance that an F as large as the overall F (17) could be generated by the simple model (which, for this row, is the empty model). This small p-value indicates that we should reject the simple model.

The p-value for HomeSizeK (0.0019) is also very small so we should reject the simpler model.

This p-value means that the probability of getting an F of 11.626 for HomeSizeK in the multivariate model – if HomeSizeK adds no predictive value in the DGP – is very low (0.0096). Based on this, we would reject the simple model that only includes Neighborhood, and go with the complex model that includes both Neighborhood and HomeSizeK.