Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science II
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
6.12 Confidence Intervals for Pairwise Comparisons
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentChapter 9 - Models with Interactions
-
segmentChapter 10 - More Models with Interactions
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Statistics and Data Science II (XCD)
6.12 Confidence Intervals for Pairwise Comparisons
In a previous chapter we discussed testing the pairwise comparisons in a three-group model. We looked at some data comparing students’ outcomes on a math test after playing three different educational games. We first used an F test to compare the three-group model with the empty model, and decided to reject the empty model (that the outcomes from all three games could be modeled with the same average score).
Knowing that at least some of the three games differed statistically from each other, but not knowing which ones, we conducted pairwise comparisons, testing the three possible pairings of the three games, A, B, and C.
Here is the code we used to conduct the pairwise comparisons for the game_model
:
pairwise(game_model)
And here is the output, on which we added some yellow highlighting:
Model: outcome ~ game
game
Levels: 3
Family-wise error-rate: 0.05
group_1 group_2 diff pooled_se q df lower upper p_adj
1 B A 2.086 0.516 4.041 102 0.350 3.822 .0142
2 C A 3.629 0.516 7.031 102 1.893 5.364 .0000
3 C B 1.543 0.516 2.990 102 -0.193 3.279 .0920
Note that the p-values and the confidence intervals are adjusted (hence reported as p_adj
) based on Tukey’s Honestly Significant Difference test to maintain an overall (or family-wise) Type I error rate of 0.05.
The mean difference between B and C in the sample is 1.54. But the p-value of .09 tells us that the observed difference is within the range of differences we would consider likely if the true difference between the games were 0. For this reason, we did not reject the empty model for this pairwise difference.
Because we have learned that model comparison (using the p-value) and confidence intervals are related, we would expect this finding to be mirrored in the 95% confidence interval. Specifically, because we did not reject the empty model based on the p-value, we should expect that the confidence interval would include 0, meaning that a \(\beta_1\) of 0 is one of a range of models we would consider likely to have generated the sample \(b_1\).
As shown below, the confidence interval of the difference between games C and B is centered at the sample difference (1.54) but extends from -0.19 to 3.28. As expected based on the p-value (greater than .05), this interval includes 0.
group_1 group_2 diff pooled_se q df lower upper p_adj
1 B A 2.086 0.516 4.041 102 0.350 3.822 .0142
2 C A 3.629 0.516 7.031 102 1.893 5.364 .0000
3 C B 1.543 0.516 2.990 102 -0.193 3.279 .0920
Try Adding plot=TRUE
to the pairwise()
Function
The pairwise()
function has an option to help us visualize the pairwise confidence intervals in relation to each other. Just add the argument plot = TRUE
to the function, like this:
pairwise(game_model, plot = TRUE)
Try it in the code window below.
require(coursekata)
# import game_data
students_per_game <- 35
game_data <- data.frame(
outcome = c(16,8,9,9,7,14,5,7,11,15,11,9,13,14,11,11,12,14,11,6,13,13,9,12,8,6,15,10,10,8,7,1,16,18,8,11,13,9,8,14,11,9,13,10,18,12,12,13,16,16,13,13,9,14,16,12,16,11,10,16,14,13,14,15,12,14,8,12,10,13,17,20,14,13,15,17,14,15,14,12,13,12,17,12,12,9,11,19,10,15,14,10,10,21,13,13,13,13,17,14,14,14,16,12,19),
game = c(rep("A", students_per_game), rep("B", students_per_game), rep("C", students_per_game))
)
# we have fit and saved game_model for you
game_model <- lm(outcome ~ game, data = game_data)
# add plot = TRUE
pairwise(game_model)
# we have fit and saved game_model for you
game_model <- lm(outcome ~ game, data = game_data)
# add plot = TRUE
pairwise(game_model, plot = TRUE)
ex() %>%
check_function(., "pairwise") %>%
check_arg("plot") %>%
check_equal()
Notice that one of the 95% confidence intervals crosses the dotted line, which represents a pairwise difference of 0: C and B. But the other two confidence intervals (C - A and B - A) do not include 0. This means that we are not confident that the mean difference in the DGP for these pairs could be 0. We would conclude that game A is indeed different from both games B and C in the DGP.