CourseKata - 4.6 Categorical Outcomes

High School / Advanced Statistics and Data Science I (ABC)

Book

4.6 Categorical Outcomes

We have learned to express hypotheses in word equations and make appropriate data visualizations to explore these hypotheses with real data. Thus far, we have focused solely on hypotheses about quantitative outcome variables – e.g., thumb length. We can extend those same ideas to categorical outcome variables.

Example Study: `MindsetMatters`

The MindsetMatters data frame contains the results of an experiment in which a sample of housekeepers were randomly assigned to one of two conditions (recorded in the variable Condition). In the Informed condition (N=41) the housekeepers were told that the work they do satisfies the Surgeon General’s recommendations for an active lifestyle (which is true), and they were given some examples to illustrate why their work is considered good exercise. Housekeepers assigned to the Uninformed condition (N=37) were told nothing.

The researchers hypothesized that being informed in this way would lead housekeepers to actually become more fit and perhaps even to lose weight. Four weeks after the start of the study, researchers recorded whether each housekeeper lost weight in a categorical variable called WtLost (either lost or not lost). Below, we show a sample of data from 10 housekeepers for the two variables (Condition and WtLost) below.

   Condition   WtLost
1    Informed not lost
2    Informed  wt lost
3    Informed not lost
4    Informed  wt lost
5    Informed  wt lost
6    Informed  wt lost
7  Uninformed not lost
8  Uninformed  wt lost
9  Uninformed not lost
10 Uninformed not lost

Faceted Bar Graphs

Because WtLost is a categorical outcome, we can’t graph its distribution in a histogram. Instead, we can use a bar graph. Try running the code in the code block below. Then replace gf_histogram() with gf_bar() to make a bar graph.

require(coursekata)

MindsetMatters <- MindsetMatters %>%
  mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))

# Edit this code to make a more appropriate visualization
# for this outcome variable
gf_histogram(~WtLost, data = MindsetMatters)

# Edit this code to make a more appropriate visualization
# for this outcome variable
gf_bar(~WtLost, data = MindsetMatters)

ex() %>% {
  check_or(.,
    check_function(., "gf_bar") %>% {
      check_arg(., "object") %>% check_equal()
      check_arg(., "data") %>% check_equal()
    },
    override_solution(., "gf_bar(MindsetMatters, ~ WtLost)") %>%
      check_function("gf_bar") %>% {
        check_arg(., "object") %>% check_equal()
        check_arg(., "gformula") %>% check_equal()
      }
  )
}

This graph shows us the outcome, whether housekeepers lost weight or not, but it doesn’t break the outcome down by Condition. To see if Condition might explain some of the variation in WtLost we can add on the function gf_facet_grid() (as we can with any gf_ plot). We can facet the bar graphs either vertically or horizontally.

`gf_bar(~WtLost, data = MindsetMatters) %>% gf_facet_grid(Condition ~ .)`	`gf_bar(~WtLost, data = MindsetMatters) %>% gf_facet_grid(. ~ Condition)`

Try making both types of faceted bar graphs in the code block below. Submit code after you have created side-by-side faceted bar graphs (the graph on the right).

require(coursekata)

MindsetMatters <- MindsetMatters %>%
  mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))

# Create a faceted bar graph of WtLost by Condition

# Create a faceted bar graph of WtLost by Condition
gf_bar(~ WtLost, data = MindsetMatters) %>%
  gf_facet_grid(. ~ Condition)

ex() %>% {
  check_or(.,
    check_function(., "gf_bar") %>% {
      check_arg(., "object") %>% check_equal()
      check_arg(., "data") %>% check_equal()
    },
    override_solution(., "gf_bar(MindsetMatters, ~ WtLost)") %>%
      check_function("gf_bar") %>% {
        check_arg(., "object") %>% check_equal()
        check_arg(., "gformula") %>% check_equal()
      }
  )
  check_function(., "gf_facet_grid") %>%
    check_arg(2) %>%
    check_equal()
}

Horizontal side-by-side faceted bar graph of WtLost by Condition in MindsetMatters.

There is a limitation in this graph. Because the sample sizes are different between the two groups (41 in the Informed group, 34 in the Uninformed), you have to look at the relative difference in the number of housekeepers who lost weight between the two groups, mentally controlling for the difference in sample size.

A simpler approach is to use the gf_props() function instead of gf_bar*(). gf_props() shows the proportion of housekeepers who lost weight instead of the number of housekeepers. Use gf_props() instead of gf_bar() to create a bar graph depicting the proportion of each condition that lost weight in the code window below.

require(coursekata)

MindsetMatters <- MindsetMatters %>%
  mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))

# Edit this code
gf_bar(~ WtLost, data = MindsetMatters, fill = "purple") %>%
  gf_facet_grid(. ~ Condition)

# Edit this code
gf_props(~ WtLost, data = MindsetMatters, fill = "purple") %>%
  gf_facet_grid(. ~ Condition)

ex() %>% {
  check_or(.,
    check_function(., "gf_props") %>% {
      check_arg(., "object") %>% check_equal()
      check_arg(., "data") %>% check_equal()
    },
    override_solution(., "gf_props(MindsetMatters, ~ WtLost)") %>%
      check_function("gf_props") %>% {
        check_arg(., "object") %>% check_equal()
        check_arg(., "gformula") %>% check_equal()
      }
  )
  check_function(., "gf_facet_grid") %>%
    check_arg(2) %>% check_equal()
}

`gf_props()`	`gf_bar()`

The sample sizes between the two groups aren’t that different, but because there are fewer housekeepers in the Uninformed group, proportions are a better basis on which to compare to the two groups. Roughly .68 of the Informed group lost weight while a little less than .60 of the Uninformed group lost weight.

4.5 Faceted Histograms 4.7 Contingency Tables

Course Outline

High School / Advanced Statistics and Data Science I (ABC)

4.6 Categorical Outcomes

Example Study: `MindsetMatters`

Faceted Bar Graphs

Responses

list High School / Advanced Statistics and Data Science I (ABC)

4.6 Categorical Outcomes

Example Study: MindsetMatters

Faceted Bar Graphs

High School / Advanced Statistics and Data Science I (ABC)

Example Study: `MindsetMatters`