CourseKata - 16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors

College / Advanced Statistics with R (ABCD)

Book

16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors

In the interaction model, we not only allow the intercepts of the regression lines to differ for different years built, but also the slopes of the lines. To the extent that the slopes do, in fact, differ for different values of YearBuilt, it means that the relationship between price and home size depends on when the home was built – at least in the data if not in the DGP. Allowing the slopes to differ costs us an additional degree of freedom, but may lead to a better-fitting model than the additive model.

The GLM notation for the interaction model with two quantitative predictors is the same as it was for the model with one categorical and one quantitative predictor. So is the R code! But because both predictor variables are quantitative, the interpretation of the model is a little different.

Add some code to the code window below to fit the interaction model and save it as interaction_model. Also, add on gf_model() to the gf_point() to visualize predictions of the interaction model on the scatter plot.

require(coursekata)

# fit and save the interaction model
interaction_model <-

# add the model to this scatter plot
gf_point(PriceK ~ HomeSizeK, data = Ames, color = ~YearBuilt)

# fit and save the interaction model
interaction_model <- lm(PriceK ~ YearBuilt*HomeSizeK, data = Ames)

# add the model to this scatter plot
gf_point(PriceK ~ HomeSizeK, data = Ames, color = ~YearBuilt) %>%
  gf_model(interaction_model)

ex() %>% check_or(
  check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
  override_solution(., 'lm(PriceK ~ HomeSizeK*YearBuilt, data = Ames)') %>%
    check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
  override_solution(., 'lm(PriceK ~ YearBuilt + HomeSizeK + YearBuilt:HomeSizeK, data = Ames)') %>%
    check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
  override_solution(., 'lm(PriceK ~ HomeSizeK + YearBuilt + HomeSizeK:YearBuilt, data = Ames)') %>%
    check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
  override_solution(., 'lm(PriceK ~ YearBuilt + HomeSizeK + YearBuilt*HomeSizeK, data = Ames)') %>%
    check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal(),
  override_solution(., 'lm(PriceK ~ HomeSizeK + YearBuilt + HomeSizeK*YearBuilt, data = Ames)') %>%
    check_function(., 'lm') %>% check_arg(., 'formula') %>% check_equal()
)

ex() %>% check_function(., "gf_model") %>% check_arg(., "object")

Earlier, when we had one categorical predictor and one quantitative predictor, the gf_model() function overlaid two regression lines – one for each level of the categorical variable. When both predictors are quantitative, however, a different approach is required.

gf_model() can’t overlay a regression line for each possible value of YearBuilt because there are too many possible values! Instead, it selects three representative values of YearBuilt and overlays these. The values it chooses are the mean of YearBuilt, one standard deviation above the mean, and one standard deviation below the mean.

In the graph, the middle (greenish) line shows the model predictions for the average YearBuilt (1978), while the two flanking lines represent one standard deviation above the mean (2014, yellowish) and below the mean (1942, bluish).

Just because we graphed three lines doesn’t mean there are only three possible lines. Theoretically there could be an infinite number of lines. The gf_model() function just shows a few representative examples to help us see what the interaction pattern looks like.

The important thing to notice is that the slope of the line is steeper for newer homes compared to older homes. A way to describe this pattern of increasing steepness is that the effect of home size on price gets larger as houses get newer. In other words, there is an interaction between HomeSizeK and YearBuilt.

Different Graphs Can Highlight Different Interpretations

You might wonder why we chose to represent HomeSizeK on the x-axis, and YearBuilt with the different lines. Actually, there is no reason you couldn’t present the same model in a different way, as in the graph below.

interaction_model <- lm(PriceK ~ YearBuilt*HomeSizeK, data = Ames)

gf_point(PriceK ~ YearBuilt, data = Ames, color = ~HomeSizeK) %>%
gf_model(interaction_model)

Scatter plot of PriceK predicted by YearBuilt. The points are colored based on HomeSizeK. Three separate regression lines are drawn on the plot. The lines show a positive trend.

Now each line represents a particular value on HomeSizeK (the mean, +1 SD, and -1 SD). Although the model and the data are the same as in the previous graph, plotting it in a different way may lead to a different way of describing the pattern of results.

16.1 Interactions with Two Quantitative Predictors 16.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors

Course Outline

College / Advanced Statistics with R (ABCD)

16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors

Different Graphs Can Highlight Different Interpretations

Responses

list College / Advanced Statistics with R (ABCD)

16.2 Fitting and Visualizing an Interaction Model with Two Quantitative Predictors

Different Graphs Can Highlight Different Interpretations

College / Advanced Statistics with R (ABCD)