Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

7.6 Graphing Residuals From the Model

You might wonder, why are we bothering to generate and save residuals? There are a lot of reasons but one short answer is: it helps us to understand the error around our model, and can suggest ways of improving the model.

Just as the first thing we do when looking at a data set is to examine the distributions of the variables, it is good to get in the habit of examining the distributions of residuals after we fit a new model.

In the following window, we have provided the code to create histograms of Thumb in a facet grid by Gender. Try modifying it to generate histograms of Gender_resid in a facet grid by Gender. Compare the histograms of residuals from the Gender_model with histograms of thumb length.

require(coursekata) # this creates the residuals from the Gender_model Gender_model <- lm(Fingers$Thumb ~ Fingers$Gender) Fingers$Gender_resid <- resid(Gender_model) # this creates histograms of Thumb for each Gender # modify it to create histograms of Gender_resid for each Gender gf_histogram(~Thumb, data = Fingers) %>% gf_facet_grid(Gender ~ .) # this creates the residuals from the Gender_model Gender_model <- lm(Fingers$Thumb ~ Fingers$Gender) Fingers$Gender_resid <- resid(Gender_model) # this creates histograms of Thumb for each Gender # modify it to create histograms of Gender_resid for each Gender gf_histogram(~Gender_resid, data = Fingers) %>% gf_facet_grid(Gender ~ .) ex() %>% { check_or(., check_function(., "gf_histogram") %>% { check_arg(., "object") %>% check_equal() check_arg(., "data") %>% check_equal() }, override_solution(., "gf_histogram(Fingers, ~ Gender_resid)") %>% check_function("gf_histogram") %>% { check_arg(., "object") %>% check_equal() check_arg(., "gformula") %>% check_equal() } ) check_function(., "gf_facet_grid") %>% check_arg("...") %>% check_equal(incorrect_msg = "Make sure you keep the code to create a grid faceted by `Gender`") }

Here we’ve depicted the histograms of Thumb by Gender (in teal) next to the histograms of Gender_resid by Gender (in darker gray).

Thumb Gender_resid

On the left, a faceted histogram of Thumb faceted by Gender (female and male), in teal. The distributions are both roughly normal but the male group is distributed slightly more to the right.

On the right, a faceted histogram of Gender_resid faceted by Gender (female and male), in gray. The distributions are both roughly normal and are mostly overlapping.


The residuals of the Gender_model represent the variation leftover after taking out the part of the variation that can be explained by Gender. The figures below show the mean Thumb length and mean Gender_resid of the two Gender groups.

mean Thumb of each group mean Gender_resid of each group

A faceted histogram of the distribution of Thumb by Gender on the left with vertical lines showing the mean for each Gender group. The mean for the male group is higher than the mean for the female group.

A faceted histogram of the distribution of Gender_resid by Gender on the right with vertical lines showing the mean for each Gender_resid group. The means for both the male group and the female group are 0.


Responses