Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science II
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentChapter 9 - Models with Interactions
-
segmentChapter 10 - More Models with Interactions
-
10.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Statistics and Data Science II (XCD)
10.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors
We know from the graph that the interaction model produces a larger effect of home size on price for newer homes than it does for older homes. This is what makes it an interaction model: the effects of one predictor on the outcome are different for different values of the second predictor.
But how is it that multiplying YearBuilt
by HomeSizeK
results in different slopes and intercepts for homes of different ages? Let’s dig in and see how this works. Write some code to fit and print out the parameter estimates from the interaction model of PriceK
using HomeSizeK
and YearBuilt
as predictors.
require(coursekata)
# find the best-fitting parameter estimates for the interaction model
# find the best-fitting parameter estimates for the interaction model
lm(PriceK ~ YearBuilt * HomeSizeK, data = Ames)
# or alternatively: lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)
ex() %>% check_or(
check_function(., "lm") %>%
check_result() %>%
check_equal(),
override_solution(., "lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)") %>%
check_function("lm") %>%
check_result() %>%
check_equal()
)
Call:
lm(formula = PriceK ~ YearBuilt * HomeSizeK, data = Ames)
Coefficients:
(Intercept) YearBuilt HomeSizeK
-157.0888 0.1037 -837.6424
YearBuilt:HomeSizeK
0.4686
How the Interaction Model Generates Predictions
Here, again, is the interaction model in GLM notation:
\[\text{PriceK}_i=b_0+b_1\text{YearBuilt}_i+b_2\text{HomeSizeK}_i+b_3\text{YearBuilt}_i*\text{HomeSizeK}_i+e_i\]
If we replace the \(b\)s with their corresponding parameter estimates we get this function that we can use to predict the price of any home based on its size and the year it was built:
\[\text{PriceK}=-157.09 + 0.1\text{YearBuilt}+-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}\]
Because this is an interaction model, we know that it will generate many lines – one for each value of YearBuilt
. To see how this works, it is helpful to label the part of the function that generates the predicted y-intercept, and the part that generates the predicted slope. Let’s start by looking at the y-intercept.
\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}}_\text{the part that produces slope}\]
The y-intercept part of the function generates a different y-intercept for each value of YearBuilt
just as it did in the additive model. To get the y-intercept, we start at -157.09, then add 0.1 for each year of YearBuilt
.
To see how the interaction model generates a different slope for each value of YearBuilt
, it helps to simplify the remaining part of the function by doing a little algebraic manipulation.
We start with the part that produces the slope:
\[838.64\text{HomeSizeK}_i+0.47\text{YearBuilt}_i*\text{HomeSizeK}\]
and use the distributive property (\(ac+bc=(a+b)c\)) to turn it into this:
\[(-838.64+0.47\text{YearBuilt})\text{HomeSizeK}\]
We can put this re-written slope back into the function to more clearly show how this equation generates different slopes for each value of YearBuilt
:
\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{(-838.64+0.47\text{YearBuilt})}_\text{slope}\text{HomeSizeK}\]
Similar to the adjustment made for y-intercepts, the function gets the slope for each value of YearBuilt
by starting with -838.64, then adding 0.47 for each YearBuilt
. A home built in the year 2000, therefore, would have a predicted slope of -838.64 + (0.47*2000), or 101.36.
We could use this same logic to make regression lines of YearBuilt
predicting PriceK
, yielding a different regression line for each value of HomeSizeK
. The key is that the y-intercept and slope for the regression lines of one predictor are adjusted based on the value of the other predictor. This, indeed, is the very definition of an interaction model.