Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentHigh School / Advanced Statistics and Data Science I (ABC)
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
4.6 Categorical Outcomes
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
4.6 Categorical Outcomes
We have learned to express hypotheses in word equations and make appropriate data visualizations to explore these hypotheses with real data. Thus far, we have focused solely on hypotheses about quantitative outcome variables – e.g., thumb length. We can extend those same ideas to categorical outcome variables.
Example Study:
MindsetMatters
The MindsetMatters
data frame contains the results of an
experiment in which a sample of housekeepers were randomly assigned to
one of two conditions (recorded in the variable Condition
).
In the Informed condition (N=41) the housekeepers were
told that the work they do satisfies the Surgeon General’s
recommendations for an active lifestyle (which is true), and they were
given some examples to illustrate why their work is considered good
exercise. Housekeepers assigned to the Uninformed
condition (N=37) were told nothing.
The researchers hypothesized that being informed in this way would
lead housekeepers to actually become more fit and perhaps even to lose
weight. Four weeks after the start of the study, researchers recorded
whether each housekeeper lost weight in a categorical variable called
WtLost
(either lost or not
lost). Below, we show a sample of data from 10 housekeepers for
the two variables (Condition
and WtLost
)
below.
Condition WtLost
1 Informed not lost
2 Informed wt lost
3 Informed not lost
4 Informed wt lost
5 Informed wt lost
6 Informed wt lost
7 Uninformed not lost
8 Uninformed wt lost
9 Uninformed not lost
10 Uninformed not lost
Faceted Bar Graphs
Because WtLost
is a categorical outcome, we can’t graph
its distribution in a histogram. Instead, we can use a bar graph. Try
running the code in the code block below. Then replace
gf_histogram()
with gf_bar()
to make a bar
graph.
require(coursekata)
MindsetMatters <- MindsetMatters %>%
mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))
# Edit this code to make a more appropriate visualization
# for this outcome variable
gf_histogram(~WtLost, data = MindsetMatters)
# Edit this code to make a more appropriate visualization
# for this outcome variable
gf_bar(~WtLost, data = MindsetMatters)
ex() %>% {
check_or(.,
check_function(., "gf_bar") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal()
},
override_solution(., "gf_bar(MindsetMatters, ~ WtLost)") %>%
check_function("gf_bar") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "gformula") %>% check_equal()
}
)
}
This graph shows us the outcome, whether housekeepers lost weight or
not, but it doesn’t break the outcome down by Condition
. To
see if Condition
might explain some of the variation in
WtLost
we can add on the function
gf_facet_grid()
(as we can with any gf_
plot).
We can facet the bar graphs either vertically or horizontally.
|
|
---|---|
|
|
Try making both types of faceted bar graphs in the code block below. Submit code after you have created side-by-side faceted bar graphs (the graph on the right).
require(coursekata)
MindsetMatters <- MindsetMatters %>%
mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))
# Create a faceted bar graph of WtLost by Condition
# Create a faceted bar graph of WtLost by Condition
gf_bar(~ WtLost, data = MindsetMatters) %>%
gf_facet_grid(. ~ Condition)
ex() %>% {
check_or(.,
check_function(., "gf_bar") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal()
},
override_solution(., "gf_bar(MindsetMatters, ~ WtLost)") %>%
check_function("gf_bar") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "gformula") %>% check_equal()
}
)
check_function(., "gf_facet_grid") %>%
check_arg(2) %>%
check_equal()
}
There is a limitation in this graph. Because the sample sizes are
different between the two groups (41 in the Informed
group,
34 in the Uninformed
), you have to look at the relative
difference in the number of housekeepers who lost weight between
the two groups, mentally controlling for the difference in sample
size.
A simpler approach is to use the gf_props()
function
instead of gf_bar*()
. gf_props()
shows the
proportion of housekeepers who lost weight instead of the
number of housekeepers. Use gf_props()
instead of
gf_bar()
to create a bar graph depicting the proportion of
each condition that lost weight in the code window below.
require(coursekata)
MindsetMatters <- MindsetMatters %>%
mutate(WtLost = ifelse(Wt2 < Wt, "lost", "not lost"))
# Edit this code
gf_bar(~ WtLost, data = MindsetMatters, fill = "purple") %>%
gf_facet_grid(. ~ Condition)
# Edit this code
gf_props(~ WtLost, data = MindsetMatters, fill = "purple") %>%
gf_facet_grid(. ~ Condition)
ex() %>% {
check_or(.,
check_function(., "gf_props") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal()
},
override_solution(., "gf_props(MindsetMatters, ~ WtLost)") %>%
check_function("gf_props") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "gformula") %>% check_equal()
}
)
check_function(., "gf_facet_grid") %>%
check_arg(2) %>% check_equal()
}
gf_props()
|
gf_bar()
|
---|---|
|
|
The sample sizes between the two groups aren’t that different, but because there are fewer housekeepers in the Uninformed group, proportions are a better basis on which to compare to the two groups. Roughly .68 of the Informed group lost weight while a little less than .60 of the Uninformed group lost weight.