CourseKata - 4.8 Adding More Explanatory Variables to a Plot

High School / Advanced Statistics and Data Science I (ABC)

Book

4.8 Adding More Explanatory Variables to a Plot

We’ve learned how to make various data visualizations to explore hypotheses with one outcome variable and one explanatory variable. We can express these hypotheses more generally with this word equation:

Outcome = Explanatory + Other Stuff

Because we typically put the outcome variable on the y-axis, we can also express such hypotheses in a word equation as:

Y = X + Other Stuff

But often we can make better predictions about outcome variables (such as Thumb) if we have more than just one explanatory variable in the model. For example, what if knowing both the height and sex of a student would improve our prediction of their thumb length?

This is called a multivariate hypothesis because it has more than one explanatory variable.

Using Color to Add a Second Explanatory Variable to a Plot

We can explore multivariate hypotheses with data visualizations in a few ways. One way is to start with a basic scatter plot (such as the one below) and add in color to represent the other explanatory variable (by adding the argument color = ~Sex).

Try adding a color argument in the code block below to color data points from female students differently from male students in the scatter plot of thumb length by height.

require(coursekata)

# add color according to the variable Sex
gf_point(Thumb ~ Height, data = Fingers)

# add color according to the variable Sex
gf_point(Thumb ~ Height, data = Fingers, color = ~Sex)

ex() %>% check_function(., "gf_point") %>% {
        check_arg(., "data") %>% check_equal()
        check_arg(., "object") %>% check_equal()
        check_arg(., "color") %>% check_equal()
    }

A scatter plot of Thumb predicted by Height. The data points for female students are colored teal and the male students are purple. The purple dots tend to be more to the right and a little higher up in the plot.

You can also change the colors of the bars in histograms and bar graphs, but instead of color we must use the argument fill. Try adjusting the histogram below by filling the bars with a different color according to the variable Sex.

require(coursekata)

# add a fill color according to the variable Sex
gf_histogram(~ Thumb, data = Fingers)

# add a fill color according to the variable Sex
gf_histogram(~ Thumb, data = Fingers, fill = ~Sex)

ex() %>% check_function(., "gf_histogram") %>% {
        check_arg(., "data") %>% check_equal()
        check_arg(., "object") %>% check_equal()
        check_arg(., "fill") %>% check_equal()
    }

A histogram of Thumb filled in by Sex. The male thumb lengths in the bins are purple while female thumb lengths are teal.

Size and Shape

Color is not the only way to add additional explanatory variables to a plot. You might also want to explore arguments like size and shape with gf_point() and gf_jitter().

In the following line of code, we use color and shape to explore a hypothesis with three explanatory variables. We also added the argument size = 3 just to make the dots larger.

gf_point(Thumb ~ Height, data = Fingers, 
  color = ~RaceEthnic, shape = ~Sex, size = 3)

A scatter plot of Thumb predicted by Height. The data points are colored differently depending on the RaceEthnic category and are a different shape depending on the Sex of the student. The size of the dots for each data point is slightly larger than previous plots.

Just for fun, we will teach you one more way to look at a multivariate hypothesis. We already know how to make separate facets (or panels) of plots – by piping on (%>%) gf_facet_grid(). We can also use gf_facet_grid() to lay out plots in a grid of rows and columns.

gf_point(Thumb ~ Height, data = Fingers, 
  color = ~RaceEthnic, shape = ~Sex, show.legend = FALSE) %>%
  gf_facet_grid(Sex ~ RaceEthnic)

A scatter plot of Thumb predicted by Height, faceted into columns by RaceEthnic category and faceted into rows by Sex.

Let’s try using these options to explore a new hypothesis in the MindsetMatters data frame, namely that we can make better predictions about the weight of housekeepers at the end of the study (Wt2) if we know their BMI at the beginning of the study as well as what Condition they were in.

In the code block below, make some visualizations to explore this multivariate hypothesis. Use the <Submit> button when you have a data visualization that you think is most helpful.

require(coursekata)

# create a visualization that is helpful for exploring this hypothesis

# create a visualization that is helpful for exploring this hypothesis
# there are a variety of correct solutions
# here is one of our examples:
gf_point(Wt2 ~ BMI, data = MindsetMatters, color = ~Condition)

# just checking that they do make a visualization that runs (either gf_point, gf_jitter)
ex() %>% {
  check_or(.,
    check_function(., "gf_point", not_called_msg="There are a variety of correct solutions, e.g., using gf_jitter") %>% {
      check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
      check_arg(., "data") %>% check_equal()
    },
    override_solution(., "gf_point(Wt2 ~ Condition, data = MindsetMatters, color = ~BMI)") %>%
      check_function(., "gf_point") %>% {
        check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
        check_arg(., "data") %>% check_equal()
      },
    override_solution(., "gf_jitter(Wt2 ~ BMI, data = MindsetMatters, color = ~Condition)") %>%
      check_function(., "gf_jitter") %>% {
        check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
        check_arg(., "data") %>% check_equal()
      },
    override_solution(., "gf_jitter(Wt2 ~ Condition, data = MindsetMatters, color = ~BMI)") %>%
      check_function(., "gf_jitter") %>% {
        check_arg(., "object") %>% check_equal(incorrect_msg = "Did you specify the right outcome variable?")
        check_arg(., "data") %>% check_equal()
      }
  )
}

4.7 Contingency Tables 4.9 Sources of Variation

Course Outline

High School / Advanced Statistics and Data Science I (ABC)

4.8 Adding More Explanatory Variables to a Plot

Using Color to Add a Second Explanatory Variable to a Plot

Size and Shape

More Facets

Responses

list High School / Advanced Statistics and Data Science I (ABC)

4.8 Adding More Explanatory Variables to a Plot

Using Color to Add a Second Explanatory Variable to a Plot

Size and Shape

More Facets

High School / Advanced Statistics and Data Science I (ABC)