Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentPART IV: MULTIVARIATE MODELS
-
segmentChapter 13 - Introduction to Multivariate Models
-
segmentChapter 14 - Multivariate Model Comparisons
-
segmentChapter 15 - Models with Interactions
-
segmentChapter 16 - More Models with Interactions
-
16.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Advanced Statistics with R (ABCD)
16.3 Interpreting Parameter Estimates of Interaction Models with Two Quantitative Predictors
We know from the graph that the interaction model produces a larger effect of home size on price for newer homes than it does for older homes. This is what makes it an interaction model: the effects of one predictor on the outcome are different for different values of the second predictor.
But how is it that multiplying YearBuilt
by HomeSizeK
results in different slopes and intercepts for homes of different ages? Let’s dig in and see how this works. Write some code to fit and print out the parameter estimates from the interaction model of PriceK
using HomeSizeK
and YearBuilt
as predictors.
require(coursekata)
# find the best-fitting parameter estimates for the interaction model
# find the best-fitting parameter estimates for the interaction model
lm(PriceK ~ YearBuilt * HomeSizeK, data = Ames)
# or alternatively: lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)
ex() %>% check_or(
check_function(., "lm") %>%
check_result() %>%
check_equal(),
override_solution(., "lm(PriceK ~ HomeSizeK * YearBuilt, data = Ames)") %>%
check_function("lm") %>%
check_result() %>%
check_equal()
)
Call:
lm(formula = PriceK ~ YearBuilt * HomeSizeK, data = Ames)
Coefficients:
(Intercept) YearBuilt HomeSizeK
-157.0888 0.1037 -837.6424
YearBuilt:HomeSizeK
0.4686
How the Interaction Model Generates Predictions
Here, again, is the interaction model in GLM notation:
\[\text{PriceK}_i=b_0+b_1\text{YearBuilt}_i+b_2\text{HomeSizeK}_i+b_3\text{YearBuilt}_i*\text{HomeSizeK}_i+e_i\]
If we replace the \(b\)s with their corresponding parameter estimates we get this function that we can use to predict the price of any home based on its size and the year it was built:
\[\text{PriceK}=-157.09 + 0.1\text{YearBuilt}+-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}\]
Because this is an interaction model, we know that it will generate many lines – one for each value of YearBuilt
. To see how this works, it is helpful to label the part of the function that generates the predicted y-intercept, and the part that generates the predicted slope. Let’s start by looking at the y-intercept.
\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{-838.64\text{HomeSizeK}+0.47\text{YearBuilt}*\text{HomeSizeK}}_\text{the part that produces slope}\]
The y-intercept part of the function generates a different y-intercept for each value of YearBuilt
just as it did in the additive model. To get the y-intercept, we start at -157.09, then add 0.1 for each year of YearBuilt
.
To see how the interaction model generates a different slope for each value of YearBuilt
, it helps to simplify the remaining part of the function by doing a little algebraic manipulation.
We start with the part that produces the slope:
\[838.64\text{HomeSizeK}_i+0.47\text{YearBuilt}_i*\text{HomeSizeK}\]
and use the distributive property (\(ac+bc=(a+b)c\)) to turn it into this:
\[(-838.64+0.47\text{YearBuilt})\text{HomeSizeK}\]
We can put this re-written slope back into the function to more clearly show how this equation generates different slopes for each value of YearBuilt
:
\[\text{PriceK}=\underbrace{-157.09 + 0.1\text{YearBuilt}}_\text{y-intercept}+\underbrace{(-838.64+0.47\text{YearBuilt})}_\text{slope}\text{HomeSizeK}\]
Similar to the adjustment made for y-intercepts, the function gets the slope for each value of YearBuilt
by starting with -838.64, then adding 0.47 for each YearBuilt
. A home built in the year 2000, therefore, would have a predicted slope of -838.64 + (0.47*2000), or 101.36.
We could use this same logic to make regression lines of YearBuilt
predicting PriceK
, yielding a different regression line for each value of HomeSizeK
. The key is that the y-intercept and slope for the regression lines of one predictor are adjusted based on the value of the other predictor. This, indeed, is the very definition of an interaction model.