Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentPART IV: MULTIVARIATE MODELS
-
segmentChapter 13 - Introduction to Multivariate Models
-
segmentChapter 14 - Multivariate Model Comparisons
-
segmentChapter 15 - Models with Interactions
-
segmentChapter 16 - More Models with Interactions
-
16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Advanced Statistics with R (ABCD)
16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors
In the interaction model, we not only allow the intercepts of the regression lines to differ for different years built, but also the slopes of the lines. To the extent that the slopes do, in fact, differ for different values of YearBuilt
, it means that the relationship between price and home size depends on when the home was built – at least in the data if not in the DGP. Allowing the slopes to differ costs us an additional degree of freedom, but may lead to a better-fitting model than the additive model.
The GLM notation for the interaction model with two quantitative predictors is the same as it was for the model with one categorical and one quantitative predictor. So is the R code! But because both predictor variables are quantitative, the interpretation of the model is a little different.
Add some code to the code window below to fit the interaction model and save it as interaction_model
. Also, add on gf_model()
to the gf_point()
to visualize predictions of the interaction model on the scatter plot.
require(coursekata)
# fit and save the interaction model
interaction_model <-
# add the model to this scatter plot
gf_point(PriceK ~ HomeSizeK, data = Ames, color = ~YearBuilt)
# fit and save the interaction model
interaction_model <- lm(PriceK ~ YearBuilt*HomeSizeK, data = Ames)
# add the model to this scatter plot
gf_point(PriceK ~ HomeSizeK, data = Ames, color = ~YearBuilt) %>%
gf_model(interaction_model)
ex() %>% check_or(
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ HomeSizeK*YearBuilt, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ YearBuilt + HomeSizeK + YearBuilt:HomeSizeK, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ HomeSizeK + YearBuilt + HomeSizeK:YearBuilt, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ YearBuilt + HomeSizeK + YearBuilt*HomeSizeK, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
override_solution(., 'lm(PriceK ~ HomeSizeK + YearBuilt + HomeSizeK*YearBuilt, data = Ames)') %>%
check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal()
)
ex() %>% check_function(., "gf_model") %>% check_arg(., "object")
Earlier, when we had one categorical predictor and one quantitative predictor, the gf_model()
function overlaid two regression lines – one for each level of the categorical variable. When both predictors are quantitative, however, a different approach is required.
gf_model()
can’t overlay a regression line for each possible value of YearBuilt
because there are too many possible values! Instead, it selects three representative values of YearBuilt
and overlays these. The values it chooses are the mean of YearBuilt
, one standard deviation above the mean, and one standard deviation below the mean.
In the graph, the middle (greenish) line shows the model predictions for the average YearBuilt
(1978), while the two flanking lines represent one standard deviation above the mean (2014, yellowish) and below the mean (1942, bluish).
Just because we graphed three lines doesn’t mean there are only three possible lines. Theoretically there could be an infinite number of lines. The gf_model()
function just shows a few representative examples to help us see what the interaction pattern looks like.
The important thing to notice is that the slope of the line is steeper for newer homes compared to older homes. A way to describe this pattern of increasing steepness is that the effect of home size on price gets larger as houses get newer. In other words, there is an interaction between HomeSizeK
and YearBuilt
.
Different Graphs Can Highlight Different Interpretations
You might wonder why we chose to represent HomeSizeK
on the x-axis, and YearBuilt
with the different lines. Actually, there is no reason you couldn’t present the same model in a different way, as in the graph below.
interaction_model <- lm(PriceK ~ YearBuilt*HomeSizeK, data = Ames)
gf_point(PriceK ~ YearBuilt, data = Ames, color = ~HomeSizeK) %>%
gf_model(interaction_model)
Now each line represents a particular value on HomeSizeK
(the mean, +1 SD, and -1 SD). Although the model and the data are the same as in the previous graph, plotting it in a different way may lead to a different way of describing the pattern of results.