Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Digging Deeper into Group Models

segmentChapter 9  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 10  The Logic of Inference

segmentChapter 11  Model Comparison with F

segmentChapter 12  Parameter Estimation and Confidence Intervals

12.12 Confidence Intervals for Pairwise Comparisons

segmentChapter 13  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
12.12 Confidence Intervals for Pairwise Comparisons
In a previous chapter we discussed testing the pairwise comparisons in a threegroup model. We looked at some data comparing students’ outcomes on a math test after playing three different educational games. We first used an F test to compare the threegroup model with the empty model, and decided to reject the empty model (that the outcomes from all three games could be modeled with the same average score).
Knowing that at least some of the three games differed statistically from each other, but not knowing which ones, we conducted pairwise comparisons, testing the three possible pairings of the three games, A, B, and C.
Here is the code we used to conduct the pairwise comparisons for the game_model
:
pairwise(game_model)
And here is the output, on which we added some yellow highlighting:
Model: outcome ~ game
game
Levels: 3
Familywise errorrate: 0.05
group_1 group_2 diff pooled_se q df lower upper p_adj
1 B A 2.086 0.516 4.041 102 0.350 3.822 .0142
2 C A 3.629 0.516 7.031 102 1.893 5.364 .0000
3 C B 1.543 0.516 2.990 102 0.193 3.279 .0920
Note that the pvalues and the confidence intervals are adjusted (hence reported as p_adj
) based on Tukey’s Honestly Significant Difference test to maintain an overall (or familywise) Type I error rate of 0.05.
The mean difference between B and C in the sample is 1.54. But the pvalue of .09 tells us that the observed difference is within the range of differences we would consider likely if the true difference between the games were 0. For this reason, we did not reject the empty model for this pairwise difference.
Because we have learned that model comparison (using the pvalue) and confidence intervals are related, we would expect this finding to be mirrored in the 95% confidence interval. Specifically, because we did not reject the empty model based on the pvalue, we should expect that the confidence interval would include 0, meaning that a \(\beta_1\) of 0 is one of a range of models we would consider likely to have generated the sample \(b_1\).
As shown below, the confidence interval of the difference between games C and B is centered at the sample difference (1.54) but extends from 0.19 to 3.28. As expected based on the pvalue (greater than .05), this interval includes 0.
group_1 group_2 diff pooled_se q df lower upper p_adj
1 B A 2.086 0.516 4.041 102 0.350 3.822 .0142
2 C A 3.629 0.516 7.031 102 1.893 5.364 .0000
3 C B 1.543 0.516 2.990 102 0.193 3.279 .0920
Try Adding plot=TRUE
to the pairwise()
Function
The pairwise()
function has an option to help us visualize the pairwise confidence intervals in relation to each other. Just add the argument plot = TRUE
to the function, like this:
pairwise(game_model, plot = TRUE)
Try it in the code window below.
require(coursekata)
# import game_data
students_per_game < 35
game_data < data.frame(
outcome = c(16,8,9,9,7,14,5,7,11,15,11,9,13,14,11,11,12,14,11,6,13,13,9,12,8,6,15,10,10,8,7,1,16,18,8,11,13,9,8,14,11,9,13,10,18,12,12,13,16,16,13,13,9,14,16,12,16,11,10,16,14,13,14,15,12,14,8,12,10,13,17,20,14,13,15,17,14,15,14,12,13,12,17,12,12,9,11,19,10,15,14,10,10,21,13,13,13,13,17,14,14,14,16,12,19),
game = c(rep("A", students_per_game), rep("B", students_per_game), rep("C", students_per_game))
)
# we have fit and saved game_model for you
game_model < lm(outcome ~ game, data = game_data)
# add plot = TRUE
pairwise(game_model)
# we have fit and saved game_model for you
game_model < lm(outcome ~ game, data = game_data)
# add plot = TRUE
pairwise(game_model, plot = TRUE)
ex() %>%
check_function(., "pairwise") %>%
check_arg("plot") %>%
check_equal()
Notice that one of the 95% confidence intervals crosses the dotted line, which represents a pairwise difference of 0: C and B. But the other two confidence intervals (C  A and B  A) do not include 0. This means that we are not confident that the mean difference in the DGP for these pairs could be 0. We would conclude that game A is indeed different from both games B and C in the DGP.