Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

4.7 Contingency Tables

Bar graphs are one way to visualize the hypothesis WtLost = Condition + Other Stuff. Another way to explore the hypothesis is with a contingency table, which shows the distribution of cases across two categorical variables.

You already know the R function we use to make tables, tally(). Here we will extend its use to look at an outcome variable by an explanatory variable.

tally(WtLost ~ Condition, data = MindsetMatters)

Try using the tally() function in the code block below to generate the contingency table for our MindsetMatters hypothesis.

require(coursekata) MindsetMatters <- MindsetMatters %>% mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost")) # Make a contingency table tally() # Make a contingency table tally(WtLost ~ Condition, data = MindsetMatters) ex() %>% check_function("tally") %>% { check_arg(., 1) %>% check_equal() check_arg(., 2) %>% check_equal() }
          Condition
WtLost     Informed Uninformed
  lost           28         20
  not lost       13         14

Each value in the table represents the frequency of a particular combination of levels (e.g., “lost” and “Informed”; “lost” and “Uninformed”, “not lost” and “Informed”; “not lost” and “Uninformed”) in the data set.

If you want proportions instead of counts (more appropriate in this case due to the unequal sample sizes across conditions) you can add the argument format = "proportion":

tally(WtLost ~ Condition, data = MindsetMatters, format = "proportion")
          Condition
WtLost      Informed Uninformed
  lost     0.6829268  0.5882353
  not lost 0.3170732  0.4117647

In tables created by tally(), the proportions are normalized by column, meaning that the proportions in each column add up to 1. If the row proportions added up to 1, we would say they are normalized by row.

It is more informative to normalize by columns (that is, where levels of WtLost add up to 1 within each Condition) because our main interest is in comparing the proportion of housekeepers who lost weight between the two conditions. If the table were normalized by rows, we would not see the proportion of housekeepers who lost weight, but rather the proportion of those who lost weight who were in each condition.

Recap of Visualizations

So far we have considered both quantitative (e.g., Thumb) and categorical (e.g., WtLost) outcomes. We have also looked at some categorical explanatory variables (e.g., Sex and Condition) and quantitative explanatory variables (e.g., Height).

We haven’t yet looked at any situations where there is a categorical outcome and a quantitative explanatory variable. But there isn’t any reason to think that we couldn’t! Perhaps a quantitative variable like age or initial weight might help us predict whether a housekeeper will lose weight or not.

Let’s review when each type of visualization is appropriate to use.

Visualizations with One Variable

Variable Visualization Type R Code
Categorical Frequency Table
Bar Graph
tally
gf_bar
Quantitative Histogram
Box Plot
gf_histogram
gf_boxplot


Visualizations with Two Variables

Outcome Variable Explanatory Variable Visualization Type R Code
Categorical Categorical Frequency Table
Faceted Bar Graph
tally
gf_bar %>%
  gf_facet_grid
Quantitative Categorical Faceted Histogram

Box Plot
Jitter Plot
Scatter Plot
gf_histogram %>%
  gf_facet_grid
gf_boxplot
gf_jitter
gf_point
Categorical Quantitative
Quantitative Quantitative Jitter Plot
Scatter Plot
gf_jitter
gf_point

You have also learned a lot of R functions that you can use to create these visualizations of distributions of data. Even though we are only about halfway through chapter 4, you have learned most of the code we will use in the entire course!

Responses