Course Outline

list High School / Statistics and Data Science II (XCD)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • College / Statistics and Data Science (ABC)
  • College / Advanced Statistics and Data Science (ABCD)
  • College / Accelerated Statistics and Data Science (XCDCOLLEGE)
  • Skew the Script: Jupyter

10.8 Thinking of Factorial Models in Terms of Intercepts and Slopes

In earlier examples of multivariate models (e.g., those with one categorical and one continuous predictor, or those with two continuous predictors), we used y-intercepts and slopes to differentiate between additive and interaction models. For additive models, the y-intercepts were allowed to differ, but the slopes were constrained to be equal. These models looked like parallel lines. For interaction models, both the y-intercepts and slopes were allowed to vary (i.e., the lines were allowed to be non-parallel).

In factorial models, like the one predicting tip_percent with condition and gender (both categorical variables), we don’t typically draw lines that have slopes. (Some people even think it’s wrong to do so!) But doing so can help us, by analogy, deepen our understanding of the distinction between additive and interaction models.

In the figure below we re-graphed the same data from the tipping experiment, but this time connected the model predictions for each gender (female and male) with a different line. On the left we have represented the model predictions of the additive model, and on the right, the interaction model.

Additive Model Interaction Model

On the left, the additive model overlaid on a jitter plot of tip_percent predicted by condition (control vs smiley face). The points are colored based on gender (male vs female). The predictions for each condition and gender appear as colored points and dashed lines connect the predictions across each condition. The dashed lines are parallel.

On the right, the interaction model overlaid on a jitter plot of tip_percent predicted by condition (control vs smiley face). The points are colored based on gender (male vs female). The predictions for each condition and gender appear as colored points and dashed lines connect the predictions across each condition. The dashed lines are not parallel; they funnel away from each other as they connect from the control condition to the smiley face condition.

Previously, we had established that interaction models can be thought of as models with multiple “lines”, one for each value of a predictor variable, with each line having its own y-intercept and slope. We can use this same idea to think about interaction models with categorical predictors.

Using the ANOVA Table to Compare the Interaction Model to the Additive Model

Finally, let’s look at the ANOVA table for the interaction model to get a quantitative comparison of the interaction model to the additive model. In the code window below, generate the ANOVA table for the interaction model. Also include the argument verbose = FALSE to generate a slightly less wordy ANOVA table.

require(coursekata) # no models have been created for you # generate the ANOVA table for the interaction model (remember to use verbose=FALSE) # no models have been created for you # generate the ANOVA table for the interaction model (remember to use verbose=FALSE) supernova(lm(tip_percent ~ condition * gender, data = tip_exp), verbose = FALSE) # alternatively: supernova(lm(tip_percent ~ gender * condition, data = tip_exp), verbose = FALSE) ex() %>% check_or( check_function(., "supernova") %>% check_result() %>% check_equal(), override_solution(., "supernova(lm(tip_percent ~ gender * condition, data = tip_exp), verbose = FALSE)") %>% check_function("supernova") %>% check_result() %>% check_equal() )
CK Code: D4_Code_Factorial_01
Analysis of Variance Table (Type III SS)
 Model: tip_percent ~ condition * gender

                           SS df       MS      F    PRE     p
 ---------------- | --------- -- -------- ------ ------ -----
 Model            |  3194.544  3 1064.848 11.394 0.2868 .0000
 condition        |   421.742  1  421.742  4.513 0.0504 .0365
 gender           |   291.965  1  291.965  3.124 0.0355 .0807
 condition:gender |   660.006  1  660.006  7.062 0.0767 .0094
 Error            |  7943.854 85   93.457                    
 ---------------- | --------- -- -------- ------ ------ -----
 Total            | 11138.398 88  126.573    


We can see from the ANOVA table that the interaction term accounts for about .08 of the error remaining after fitting the additive model. This is the additional explanatory power we get for spending the additional degree of freedom.

Moreover, the p-value for the condition:gender effect is quite low: .009. This means that the likelihood of getting a PRE for the interaction term as high as .08 (as observed) if the additive model is true in the DGP is less than 1 percent. We will probably want to adopt the interaction model.

Responses