Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

4.6 Categorical Outcomes

We have learned to express hypotheses in word equations and make appropriate data visualizations to explore these hypotheses with real data. Thus far, we have focused solely on hypotheses about quantitative outcome variables – e.g., thumb length. We can extend those same ideas to categorical outcome variables.

Example Study: MindsetMatters

The MindsetMatters data frame contains the results of an experiment in which a sample of housekeepers were randomly assigned to one of two conditions (recorded in the variable Condition). In the Informed condition (N=41) the housekeepers were told that the work they do satisfies the Surgeon General’s recommendations for an active lifestyle (which is true), and they were given some examples to illustrate why their work is considered good exercise. Housekeepers assigned to the Uninformed condition (N=37) were told nothing.

The researchers hypothesized that being informed in this way would lead housekeepers to actually become more fit and perhaps even to lose weight. Four weeks after the start of the study, researchers recorded whether each housekeeper lost weight in a categorical variable called WtLost (either lost or not lost). Below, we show a sample of data from 10 housekeepers for the two variables (Condition and WtLost) below.

   Condition   WtLost
1    Informed not lost
2    Informed  wt lost
3    Informed not lost
4    Informed  wt lost
5    Informed  wt lost
6    Informed  wt lost
7  Uninformed not lost
8  Uninformed  wt lost
9  Uninformed not lost
10 Uninformed not lost

Faceted Bar Graphs

Because WtLost is a categorical outcome, we can’t graph its distribution in a histogram. Instead, we can use a bar graph. Try running the code in the code block below. Then replace gf_histogram() with gf_bar() to make a bar graph.

require(coursekata) MindsetMatters <- MindsetMatters %>% mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost")) # Edit this code to make a more appropriate visualization # for this outcome variable gf_histogram(~WtLost, data = MindsetMatters) # Edit this code to make a more appropriate visualization # for this outcome variable gf_bar(~WtLost, data = MindsetMatters) ex() %>% { check_or(., check_function(., "gf_bar") %>% { check_arg(., "object") %>% check_equal() check_arg(., "data") %>% check_equal() }, override_solution(., "gf_bar(MindsetMatters, ~ WtLost)") %>% check_function("gf_bar") %>% { check_arg(., "object") %>% check_equal() check_arg(., "gformula") %>% check_equal() } ) }

This graph shows us the outcome, whether housekeepers lost weight or not, but it doesn’t break the outcome down by Condition. To see if Condition might explain some of the variation in WtLost we can add on the function gf_facet_grid() (as we can with any gf_ plot). We can facet the bar graphs either vertically or horizontally.

gf_bar(~WtLost, data = MindsetMatters) %>%
  gf_facet_grid(Condition ~ .)
gf_bar(~WtLost, data = MindsetMatters) %>%
  gf_facet_grid(. ~ Condition)

Vertical top-and-bottom faceted bar graphs

Horizontal side-by-side faceted bar graphs

Try making both types of faceted bar graphs in the code block below. Submit code after you have created side-by-side faceted bar graphs (the graph on the right).

require(coursekata) MindsetMatters <- MindsetMatters %>% mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost")) # Create a faceted bar graph of WtLost by Condition # Create a faceted bar graph of WtLost by Condition gf_bar(~ WtLost, data = MindsetMatters) %>% gf_facet_grid(. ~ Condition) ex() %>% { check_or(., check_function(., "gf_bar") %>% { check_arg(., "object") %>% check_equal() check_arg(., "data") %>% check_equal() }, override_solution(., "gf_bar(MindsetMatters, ~ WtLost)") %>% check_function("gf_bar") %>% { check_arg(., "object") %>% check_equal() check_arg(., "gformula") %>% check_equal() } ) check_function(., "gf_facet_grid") %>% check_arg(2) %>% check_equal() }

Horizontal side-by-side faceted bar graph of WtLost by Condition in MindsetMatters.

There is a limitation in this graph. Because the sample sizes are different between the two groups (41 in the Informed group, 34 in the Uninformed), you have to look at the relative difference in the number of housekeepers who lost weight between the two groups, mentally controlling for the difference in sample size.

A simpler approach is to use the gf_props() function instead of gf_bar*(). gf_props() shows the proportion of housekeepers who lost weight instead of the number of housekeepers. Use gf_props() instead of gf_bar() to create a bar graph depicting the proportion of each condition that lost weight in the code window below.

require(coursekata) MindsetMatters <- MindsetMatters %>% mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost")) # Edit this code gf_bar(~ WtLost, data = MindsetMatters, fill = "purple") %>% gf_facet_grid(. ~ Condition) # Edit this code gf_props(~ WtLost, data = MindsetMatters, fill = "purple") %>% gf_facet_grid(. ~ Condition) ex() %>% { check_or(., check_function(., "gf_props") %>% { check_arg(., "object") %>% check_equal() check_arg(., "data") %>% check_equal() }, override_solution(., "gf_props(MindsetMatters, ~ WtLost)") %>% check_function("gf_props") %>% { check_arg(., "object") %>% check_equal() check_arg(., "gformula") %>% check_equal() } ) check_function(., "gf_facet_grid") %>% check_arg(2) %>% check_equal() }
gf_props() gf_bar()

Side-by-side faceted bar graphs depicting proportions

Side-by-side faceted bar graphs depicting counts

The sample sizes between the two groups aren’t that different, but because there are fewer housekeepers in the Uninformed group, proportions are a better basis on which to compare to the two groups. Roughly .68 of the Informed group lost weight while a little less than .60 of the Uninformed group lost weight.

Responses