Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
4.7 Contingency Tables
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
4.7 Contingency Tables
Bar graphs are one way to visualize the hypothesis WtLost = Condition + Other Stuff. Another way to explore the hypothesis is with a contingency table, which shows the distribution of cases across two categorical variables.
You already know the R function we use to make tables, tally()
. Here we will extend its use to look at an outcome variable by an explanatory variable.
tally(WtLost ~ Condition, data = MindsetMatters)
Try using the tally()
function in the code block below to generate the contingency table for our MindsetMatters
hypothesis.
require(coursekata)
MindsetMatters <- MindsetMatters %>%
mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))
# Make a contingency table
tally()
# Make a contingency table
tally(WtLost ~ Condition, data = MindsetMatters)
ex() %>% check_function("tally") %>% {
check_arg(., 1) %>% check_equal()
check_arg(., 2) %>% check_equal()
}
Condition
WtLost Informed Uninformed
lost 28 20
not lost 13 14
Each value in the table represents the frequency of a particular combination of levels (e.g., “lost” and “Informed”; “lost” and “Uninformed”, “not lost” and “Informed”; “not lost” and “Uninformed”) in the data set.
If you want proportions instead of counts (more appropriate in this case due to the unequal sample sizes across conditions) you can add the argument format = "proportion"
:
tally(WtLost ~ Condition, data = MindsetMatters, format = "proportion")
Condition
WtLost Informed Uninformed
lost 0.6829268 0.5882353
not lost 0.3170732 0.4117647
In tables created by tally()
, the proportions are normalized by column, meaning that the proportions in each column add up to 1. If the row proportions added up to 1, we would say they are normalized by row.
It is more informative to normalize by columns (that is, where levels of WtLost
add up to 1 within each Condition
) because our main interest is in comparing the proportion of housekeepers who lost weight between the two conditions. If the table were normalized by rows, we would not see the proportion of housekeepers who lost weight, but rather the proportion of those who lost weight who were in each condition.
Recap of Visualizations
So far we have considered both quantitative (e.g., Thumb
) and categorical (e.g., WtLost
) outcomes. We have also looked at some categorical explanatory variables (e.g., Sex
and Condition
) and quantitative explanatory variables (e.g., Height
).
We haven’t yet looked at any situations where there is a categorical outcome and a quantitative explanatory variable. But there isn’t any reason to think that we couldn’t! Perhaps a quantitative variable like age or initial weight might help us predict whether a housekeeper will lose weight or not.
Let’s review when each type of visualization is appropriate to use.
Variable | Visualization Type | R Code |
---|---|---|
Categorical |
Frequency Table Bar Graph |
tally
|
Quantitative |
Histogram Box Plot |
gf_histogram
|
Outcome Variable | Explanatory Variable | Visualization Type | R Code |
---|---|---|---|
Categorical | Categorical |
Frequency Table Faceted Bar Graph |
tally
|
Quantitative | Categorical |
Faceted Histogram Box Plot Jitter Plot Scatter Plot |
gf_histogram %>%
|
Categorical | Quantitative | ||
Quantitative | Quantitative |
Jitter Plot Scatter Plot |
gf_jitter
|
You have also learned a lot of R functions that you can use to create these visualizations of distributions of data. Even though we are only about halfway through chapter 4, you have learned most of the code we will use in the entire course!