CourseKata - 4.7 Contingency Tables

High School / Advanced Statistics and Data Science I (ABC)

Book

4.7 Contingency Tables

Bar graphs are one way to visualize the hypothesis WtLost = Condition + Other Stuff. Another way to explore the hypothesis is with a contingency table, which shows the distribution of cases across two categorical variables.

You already know the R function we use to make tables, tally(). Here we will extend its use to look at an outcome variable by an explanatory variable.

tally(WtLost ~ Condition, data = MindsetMatters)

Try using the tally() function in the code block below to generate the contingency table for our MindsetMatters hypothesis.

require(coursekata)

MindsetMatters <- MindsetMatters %>%
  mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))

# Make a contingency table
tally()

# Make a contingency table
tally(WtLost ~ Condition, data = MindsetMatters)

ex() %>% check_function("tally") %>% {
  check_arg(., 1) %>% check_equal()
  check_arg(., 2) %>% check_equal()
}

          Condition
WtLost     Informed Uninformed
  lost           28         20
  not lost       13         14

Each value in the table represents the frequency of a particular combination of levels (e.g., “lost” and “Informed”; “lost” and “Uninformed”, “not lost” and “Informed”; “not lost” and “Uninformed”) in the data set.

If you want proportions instead of counts (more appropriate in this case due to the unequal sample sizes across conditions) you can add the argument format = "proportion":

tally(WtLost ~ Condition, data = MindsetMatters, format = "proportion")

          Condition
WtLost      Informed Uninformed
  lost     0.6829268  0.5882353
  not lost 0.3170732  0.4117647

In tables created by tally(), the proportions are normalized by column, meaning that the proportions in each column add up to 1. If the row proportions added up to 1, we would say they are normalized by row.

It is more informative to normalize by columns (that is, where levels of WtLost add up to 1 within each Condition) because our main interest is in comparing the proportion of housekeepers who lost weight between the two conditions. If the table were normalized by rows, we would not see the proportion of housekeepers who lost weight, but rather the proportion of those who lost weight who were in each condition.

Recap of Visualizations

So far we have considered both quantitative (e.g., Thumb) and categorical (e.g., WtLost) outcomes. We have also looked at some categorical explanatory variables (e.g., Sex and Condition) and quantitative explanatory variables (e.g., Height).

We haven’t yet looked at any situations where there is a categorical outcome and a quantitative explanatory variable. But there isn’t any reason to think that we couldn’t! Perhaps a quantitative variable like age or initial weight might help us predict whether a housekeeper will lose weight or not.

Let’s review when each type of visualization is appropriate to use.

**Visualizations with One Variable**
Variable	Visualization Type	R Code
Categorical	Frequency Table Bar Graph	`tally gf_bar`
Quantitative	Histogram Box Plot	`gf_histogram gf_boxplot`

**Visualizations with Two Variables**
Outcome Variable	Explanatory Variable	Visualization Type	R Code
Categorical	Categorical	Frequency Table Faceted Bar Graph	`tally gf_bar %>% gf_facet_grid`
Quantitative	Categorical	Faceted Histogram Box Plot Jitter Plot Scatter Plot	`gf_histogram %>% gf_facet_grid gf_boxplot gf_jitter gf_point`
Categorical	Quantitative
Quantitative	Quantitative	Jitter Plot Scatter Plot	`gf_jitter gf_point`

You have also learned a lot of R functions that you can use to create these visualizations of distributions of data. Even though we are only about halfway through chapter 4, you have learned most of the code we will use in the entire course!

4.6 Categorical Outcomes 4.8 Adding More Explanatory Variables to a Plot

Course Outline

High School / Advanced Statistics and Data Science I (ABC)

4.7 Contingency Tables

Recap of Visualizations

Responses

list High School / Advanced Statistics and Data Science I (ABC)

4.7 Contingency Tables

Recap of Visualizations

High School / Advanced Statistics and Data Science I (ABC)