Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
12.12 Confidence Intervals for Pairwise Comparisons
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
12.12 Confidence Intervals for Pairwise Comparisons
In a previous chapter we discussed testing the pairwise comparisons in a three-group model. We looked at some data comparing students’ outcomes on a math test after playing three different educational games. We first used an F test to compare the three-group model with the empty model, and decided to reject the empty model (that the outcomes from all three games could be modeled with the same average score).
Knowing that at least some of the three games differed statistically from each other, but not knowing which ones, we conducted pairwise comparisons, testing the three possible pairings of the three games, A, B, and C.
Here is the code we used to conduct the pairwise comparisons for the game_model
:
pairwise(game_model)
And here is the output, on which we added some yellow highlighting:
Model: outcome ~ game
game
Levels: 3
Family-wise error-rate: 0.05
group_1 group_2 diff pooled_se q df lower upper p_adj
1 B A 2.086 0.516 4.041 102 0.350 3.822 .0142
2 C A 3.629 0.516 7.031 102 1.893 5.364 .0000
3 C B 1.543 0.516 2.990 102 -0.193 3.279 .0920
Note that the p-values and the confidence intervals are adjusted (hence reported as p_adj
) based on Tukey’s Honestly Significant Difference test to maintain an overall (or family-wise) Type I error rate of 0.05.
The mean difference between B and C in the sample is 1.54. But the p-value of .09 tells us that the observed difference is within the range of differences we would consider likely if the true difference between the games were 0. For this reason, we did not reject the empty model for this pairwise difference.
Because we have learned that model comparison (using the p-value) and confidence intervals are related, we would expect this finding to be mirrored in the 95% confidence interval. Specifically, because we did not reject the empty model based on the p-value, we should expect that the confidence interval would include 0, meaning that a \(\beta_1\) of 0 is one of a range of models we would consider likely to have generated the sample \(b_1\).
As shown below, the confidence interval of the difference between games C and B is centered at the sample difference (1.54) but extends from -0.19 to 3.28. As expected based on the p-value (greater than .05), this interval includes 0.
group_1 group_2 diff pooled_se q df lower upper p_adj
1 B A 2.086 0.516 4.041 102 0.350 3.822 .0142
2 C A 3.629 0.516 7.031 102 1.893 5.364 .0000
3 C B 1.543 0.516 2.990 102 -0.193 3.279 .0920
Try Adding plot=TRUE
to the pairwise()
Function
The pairwise()
function has an option to help us visualize the pairwise confidence intervals in relation to each other. Just add the argument plot = TRUE
to the function, like this:
pairwise(game_model, plot = TRUE)
Try it in the code window below.
require(coursekata)
# import game_data
students_per_game <- 35
game_data <- data.frame(
outcome = c(16,8,9,9,7,14,5,7,11,15,11,9,13,14,11,11,12,14,11,6,13,13,9,12,8,6,15,10,10,8,7,1,16,18,8,11,13,9,8,14,11,9,13,10,18,12,12,13,16,16,13,13,9,14,16,12,16,11,10,16,14,13,14,15,12,14,8,12,10,13,17,20,14,13,15,17,14,15,14,12,13,12,17,12,12,9,11,19,10,15,14,10,10,21,13,13,13,13,17,14,14,14,16,12,19),
game = c(rep("A", students_per_game), rep("B", students_per_game), rep("C", students_per_game))
)
# we have fit and saved game_model for you
game_model <- lm(outcome ~ game, data = game_data)
# add plot = TRUE
pairwise(game_model)
# we have fit and saved game_model for you
game_model <- lm(outcome ~ game, data = game_data)
# add plot = TRUE
pairwise(game_model, plot = TRUE)
ex() %>%
check_function(., "pairwise") %>%
check_arg("plot") %>%
check_equal()
Notice that one of the 95% confidence intervals crosses the dotted line, which represents a pairwise difference of 0: C and B. But the other two confidence intervals (C - A and B - A) do not include 0. This means that we are not confident that the mean difference in the DGP for these pairs could be 0. We would conclude that game A is indeed different from both games B and C in the DGP.