Course Outline

list High School / Statistics and Data Science II (XCD)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • College / Statistics and Data Science (ABC)
  • College / Advanced Statistics and Data Science (ABCD)
  • College / Accelerated Statistics and Data Science (XCDCOLLEGE)
  • Skew the Script: Jupyter

10.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors

We know from the graph that the interaction model produces a larger effect of home size on price for newer homes than it does for older homes. This is what makes it an interaction model: the effects of one predictor on the outcome are different for different values of the second predictor.

But how is it that multiplying YearBuilt by HomeSizeK results in different slopes and intercepts for homes of different ages? Let’s dig in and see how this works. Write some code to fit and print out the parameter estimates from the interaction model of PriceK using HomeSizeK and YearBuilt as predictors.

require(coursekata) # find the best-fitting parameter estimates for the interaction model # find the best-fitting parameter estimates for the interaction model lm(PriceK ~ YearBuilt * HomeSizeK, data = Ames) # or alternatively: lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames) ex() %>% check_or( check_function(., "lm") %>% check_result() %>% check_equal(), override_solution(., "lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)") %>% check_function("lm") %>% check_result() %>% check_equal() )
CK Code: D4_Code_Interpreting_01
Call:
lm(formula = PriceK ~ YearBuilt * HomeSizeK, data = Ames)
 
Coefficients:
        (Intercept)            YearBuilt            HomeSizeK  
          -157.0888               0.1037            -837.6424  
YearBuilt:HomeSizeK  
             0.4686 
 

How the Interaction Model Generates Predictions

Here, again, is the interaction model in GLM notation:

\[\text{PriceK}_i=b_0+b_1\text{YearBuilt}_i+b_2\text{HomeSizeK}_i+b_3\text{YearBuilt}_i*\text{HomeSizeK}_i+e_i\]

If we replace the \(b\)s with their corresponding parameter estimates we get this function that we can use to predict the price of any home based on its size and the year it was built:

\[\text{PriceK}=-157.09 + 0.1\text{YearBuilt}+-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}\]

Because this is an interaction model, we know that it will generate many lines – one for each value of YearBuilt. To see how this works, it is helpful to label the part of the function that generates the predicted y-intercept, and the part that generates the predicted slope. Let’s start by looking at the y-intercept.

\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}}_\text{the part that produces slope}\]

The y-intercept part of the function generates a different y-intercept for each value of YearBuilt just as it did in the additive model. To get the y-intercept, we start at -157.09, then add 0.1 for each year of YearBuilt.

To see how the interaction model generates a different slope for each value of YearBuilt, it helps to simplify the remaining part of the function by doing a little algebraic manipulation.

We start with the part that produces the slope:

\[838.64\text{HomeSizeK}_i+0.47\text{YearBuilt}_i*\text{HomeSizeK}\]

and use the distributive property (\(ac+bc=(a+b)c\)) to turn it into this:

\[(-838.64+0.47\text{YearBuilt})\text{HomeSizeK}\]

We can put this re-written slope back into the function to more clearly show how this equation generates different slopes for each value of YearBuilt:

\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{(-838.64+0.47\text{YearBuilt})}_\text{slope}\text{HomeSizeK}\]

Similar to the adjustment made for y-intercepts, the function gets the slope for each value of YearBuilt by starting with -838.64, then adding 0.47 for each YearBuilt. A home built in the year 2000, therefore, would have a predicted slope of -838.64 + (0.47*2000), or 101.36.

We could use this same logic to make regression lines of YearBuilt predicting PriceK, yielding a different regression line for each value of HomeSizeK. The key is that the y-intercept and slope for the regression lines of one predictor are adjusted based on the value of the other predictor. This, indeed, is the very definition of an interaction model.

Responses