Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science II
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentChapter 9 - Models with Interactions
-
9.3 Representing the Interaction Model in GLM Notation
-
segmentChapter 10 - More Models with Interactions
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Statistics and Data Science II (XCD)
9.3 Representing the Interaction Model in GLM Notation
The Interaction Model in GLM Notation
In the additive model, we constrain the slopes of the two regression lines to be the same, but allow the two lines to have different y-intercepts depending on condition
. We accomplish this by adding another term to the model that adjusts the y-intercept up or down if the patient is in the dog group (\(b_1Dog_i\) in the additive model, below).
Additive model: \(\text{later}_i=b_0+\hspace{-6pt}\underbrace{b_1\text{Dog}_i}_{\substack{\text{adjustment} \\ \text{to y-intercept} \\ \text{when Dog=1}}}\hspace{-6pt}+\hspace{6pt}b_2\text{base}_i\)
In the interaction model, we allow the two lines to have different slopes (as well as different y-intercepts) depending on condition. To make the slope differ based on condition, we can add another term to the model that adjusts the slope when \(Dog_i=1\) in much the same way we did for the y-intercept.
Interaction model: \(\text{later}_i = b_0 + b_1\text{Dog}_i + b_2\text{base}_i + \underbrace{b_3\text{Dog}_i*\text{base}_i}_\text{new term} + e_i\)
This new term, which includes the product of two variables (\(\text{Dog}_i*\text{base}_i\)), is called the interaction term. The other components of the interaction model (\(b_0 + b_1\text{Dog}_i + b_2\text{base}_i\)) should look familiar because we have already studied the additive model. Let’s take a closer look to see how adding this new term can give us a way to adjust the slope from one group to the next.
Interaction model: \[\text{later}_i=b_0+\hspace{-6pt}\underbrace{b_1\text{Dog}_i}_{\substack{\text{adjustment} \\ \text{to y-intercept} \\ \text{when Dog=1}}}\hspace{-6pt}+\hspace{2pt}b_2\text{base}_i+\hspace{-4pt}\underbrace{b_3\text{Dog}_i}_{\substack{\text{adjustment}\\ \text{to slope} \\ \text{when Dog=1} }}\hspace{-8pt}* \hspace{2pt}\text{base}_i\]
Writing Simple Expressions
We know that the model predictions for the interaction model can be represented as two straight lines, each with its own y-intercept and slope. But it is not always easy to see the two lines when they are embedded in the complete model.
One way to help us see the two lines in the complete model is to write a separate model statement for patients in each group: Control and Dog. We will call these simplified model statements simple expressions.
Here, again, is the complete interaction model:
\[b_0+b_1\text{Dog}_i+b_2\text{base}_i+b_3\text{Dog}_i*\text{base}_i\]
Notice that \(\text{Dog}_i\) doesn’t appear in any of these simple expressions. Because we have generated separate expressions for each group, we don’t need to include condition (i.e., \(\text{Dog}_i\)) in the expressions.
Let’s focus in on the simple expression for the control condition:
\[b_0 + b_2\text{base}_i\]
We can begin to summarize how to interpret the parameter estimates of the interaction model by filling in the first row of the table below.
condition | y-intercept | slope |
---|---|---|
Control | \(b_0\) | \(b_2\) |
Dog | – | – |
It stands to reason that if we look at the simple expression for the dog condition, we should be able to see that it too is an equation of a line (it would just have a different y-intercept and a different slope). Let’s delve into that:
\[b_0 + b_1 + b_2\text{base}_i + b_3\text{base}_i\]
Although these \(b\)s look like they are “variables” – they are actually numbers that don’t vary (we can easily get them by using lm()
). That’s why we call them coefficients rather than variables. The only true variable here is \(\text{base}_i\). We can combine the two terms that have \(\text{base}_i\) in them (i.e., \(b_2\text{base}_i+b_3\text{base}_i\)) into this: \((b_2 + b_3)\text{base}_i\)
Here’s how we would re-write the simple expression for the dog condition: \[(b_0+b_1) + (b_2 + b_3)\text{base}_i\]
Now we can fill in the rest of our table:
condition | y-intercept | slope |
---|---|---|
Control | \(b_0\) | \(b_2\) |
Dog | \(b_0 + b_1\) | \(b_2 + b_3\) |
These y-intercepts and slopes fit into the simple expressions like this:
Control group model: \(\underbrace{b_0} + \underbrace{b_2}\text{base}_i\)
Dog group model: \(\underbrace{(b_0 + b_1)}_{\text{y-intercept}} + \underbrace{(b_2+ b_3)}_{\text{slope}}\text{base}_i\)
The \(b_0\) parameter estimate is the y-intercept for the Control
group, whereas the \(b_1\) estimate is the adjustment, up or down, that must be made to get from the y-intercept of the Control
group to that of the Dog
group.
The \(b_2\) and \(b_3\) estimates in the interaction model work in exactly the same way, but for slopes. \(b_2\) is the slope of the regression line for the Control
group. \(b_3\), the parameter estimate for our new interaction term, is the adjustment in slope, up or down, that should be made to get the slope of the regression line for patients in the Dog
group.