Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
4.8 Adding More Explanatory Variables to a Plot
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
4.8 Adding More Explanatory Variables to a Plot
We’ve learned how to make various data visualizations to explore hypotheses with one outcome variable and one explanatory variable. We can express these hypotheses more generally with this word equation:
Outcome = Explanatory + Other Stuff
Because we typically put the outcome variable on the y-axis, we can also express such hypotheses in a word equation as:
Y = X + Other Stuff
But often we can make better predictions about outcome variables (such as Thumb
) if we have more than just one explanatory variable in the model. For example, what if knowing both the height and gender of a student would improve our prediction of their thumb length?
This is called a multivariate hypothesis because it has more than one explanatory variable.
Using Color to Add a Second Explanatory Variable to a Plot
We can explore multivariate hypotheses with data visualizations in a few ways. One way is to start with a basic scatter plot (such as the one below) and add in color to represent the other explanatory variable (by adding the argument color = ~Gender
).
Try adding a color argument in the code block below to color data points from female students differently from male students in the scatter plot of thumb length by height.
require(coursekata)
# add color according to the variable Gender
gf_point(Thumb ~ Height, data = Fingers)
# add color according to the variable Gender
gf_point(Thumb ~ Height, data = Fingers, color = ~Gender)
ex() %>% check_function(., "gf_point") %>% {
check_arg(., "data") %>% check_equal()
check_arg(., "object") %>% check_equal()
check_arg(., "color") %>% check_equal()
}
You can also change the colors of the bars in histograms and bar graphs, but instead of color
we must use the argument fill
. Try adjusting the histogram below by filling the bars with a different color according to the variable Gender
.
require(coursekata)
# add a fill color according to the variable Gender
gf_histogram(~ Thumb, data = Fingers)
# add a fill color according to the variable Gender
gf_histogram(~ Thumb, data = Fingers, fill = ~Gender)
ex() %>% check_function(., "gf_histogram") %>% {
check_arg(., "data") %>% check_equal()
check_arg(., "object") %>% check_equal()
check_arg(., "fill") %>% check_equal()
}
Size and Shape
Color is not the only way to add additional explanatory variables to a plot. You might also want to explore arguments like size
and shape
with gf_point()
and gf_jitter()
.
In the following line of code, we use color and shape to explore a hypothesis with three explanatory variables. We also added the argument size = 3
just to make the dots larger.
gf_point(Thumb ~ Height, data = Fingers,
color = ~RaceEthnic, shape = ~Gender, size = 3)
More Facets
Just for fun, we will teach you one more way to look at a multivariate hypothesis. We already know how to make separate facets (or panels) of plots – by piping on (%>%
) gf_facet_grid()
. We can also use gf_facet_grid()
to lay out plots in a grid of rows and columns.
gf_point(Thumb ~ Height, data = Fingers,
color = ~RaceEthnic, shape = ~Gender, show.legend = FALSE) %>%
gf_facet_grid(Gender ~ RaceEthnic)
Let’s try using these options to explore a new hypothesis in the MindsetMatters
data frame, namely that we can make better predictions about the weight of housekeepers at the end of the study (Wt2
) if we know their BMI
at the beginning of the study as well as what Condition
they were in.
In the code block below, make some visualizations to explore this multivariate hypothesis. Use the <Submit> button when you have a data visualization that you think is most helpful.
require(coursekata)
# create a visualization that is helpful for exploring this hypothesis
# create a visualization that is helpful for exploring this hypothesis
# there are a variety of correct solutions
# here is one of our examples:
gf_point(Wt2 ~ BMI, data = MindsetMatters, color = ~Condition)
# just checking that they do make a visualization that runs (either gf_point, gf_jitter)
ex() %>% {
check_or(.,
check_function(., "gf_point", not_called_msg="There are a variety of correct solutions, e.g., using gf_jitter") %>% {
check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
check_arg(., "data") %>% check_equal()
},
override_solution(., "gf_point(Wt2 ~ Condition, data = MindsetMatters, color = ~BMI)") %>%
check_function(., "gf_point") %>% {
check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
check_arg(., "data") %>% check_equal()
},
override_solution(., "gf_jitter(Wt2 ~ BMI, data = MindsetMatters, color = ~Condition)") %>%
check_function(., "gf_jitter") %>% {
check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
check_arg(., "data") %>% check_equal()
},
override_solution(., "gf_jitter(Wt2 ~ Condition, data = MindsetMatters, color = ~BMI)") %>%
check_function(., "gf_jitter") %>% {
check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
check_arg(., "data") %>% check_equal()
}
)
}