Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

4.7 Contingency Tables

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Digging Deeper into Group Models

segmentChapter 9  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 10  The Logic of Inference

segmentChapter 11  Model Comparison with F

segmentChapter 12  Parameter Estimation and Confidence Intervals

segmentChapter 13  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
4.7 Contingency Tables
Bar graphs are one way to visualize the hypothesis WtLost = Condition + Other Stuff. Another way to explore the hypothesis is with a contingency table, which shows the distribution of cases across two categorical variables.
You already know the R function we use to make tables, tally()
. Here we will extend its use to look at an outcome variable by an explanatory variable.
tally(WtLost ~ Condition, data = MindsetMatters)
Try using the tally()
function in the code block below to generate the contingency table for our MindsetMatters
hypothesis.
require(coursekata)
MindsetMatters < MindsetMatters %>%
mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))
# Make a contingency table
tally()
# Make a contingency table
tally(WtLost ~ Condition, data = MindsetMatters)
ex() %>% check_function("tally") %>% {
check_arg(., 1) %>% check_equal()
check_arg(., 2) %>% check_equal()
}
Condition
WtLost Informed Uninformed
lost 28 20
not lost 13 14
Each value in the table represents the frequency of a particular combination of levels (e.g., “lost” and “Informed”; “lost” and “Uninformed”, “not lost” and “Informed”; “not lost” and “Uninformed”) in the data set.
If you want proportions instead of counts (more appropriate in this case due to the unequal sample sizes across conditions) you can add the argument format = "proportion"
:
tally(WtLost ~ Condition, data = MindsetMatters, format = "proportion")
Condition
WtLost Informed Uninformed
lost 0.6829268 0.5882353
not lost 0.3170732 0.4117647
In tables created by tally()
, the proportions are normalized by column, meaning that the proportions in each column add up to 1. If the row proportions added up to 1, we would say they are normalized by row.
It is more informative to normalize by columns (that is, where levels of WtLost
add up to 1 within each Condition
) because our main interest is in comparing the proportion of housekeepers who lost weight between the two conditions. If the table were normalized by rows, we would not see the proportion of housekeepers who lost weight, but rather the proportion of those who lost weight who were in each condition.
Recap of Visualizations
So far we have considered both quantitative (e.g., Thumb
) and categorical (e.g., WtLost
) outcomes. We have also looked at some categorical explanatory variables (e.g., Sex
and Condition
) and quantitative explanatory variables (e.g., Height
).
We haven’t yet looked at any situations where there is a categorical outcome and a quantitative explanatory variable. But there isn’t any reason to think that we couldn’t! Perhaps a quantitative variable like age or initial weight might help us predict whether a housekeeper will lose weight or not.
Let’s review when each type of visualization is appropriate to use.
Variable  Visualization Type  R Code 

Categorical 
Frequency Table Bar Graph 
tally

Quantitative 
Histogram Box Plot 
gf_histogram

Outcome Variable  Explanatory Variable  Visualization Type  R Code 

Categorical  Categorical 
Frequency Table Faceted Bar Graph 
tally

Quantitative  Categorical 
Faceted Histogram Box Plot Jitter Plot Scatter Plot 
gf_histogram %>%

Categorical  Quantitative  
Quantitative  Quantitative 
Jitter Plot Scatter Plot 
gf_jitter

You have also learned a lot of R functions that you can use to create these visualizations of distributions of data. Even though we are only about halfway through chapter 4, you have learned most of the code we will use in the entire course!