## Course Outline

• segmentGetting Started (Don't Skip This Part)
• segmentStatistics and Data Science: A Modeling Approach
• segmentPART I: EXPLORING VARIATION
• segmentChapter 1 - Welcome to Statistics: A Modeling Approach
• segmentChapter 2 - Understanding Data
• segmentChapter 3 - Examining Distributions
• segmentChapter 4 - Explaining Variation
• segmentPART II: MODELING VARIATION
• segmentChapter 5 - A Simple Model
• segmentChapter 6 - Quantifying Error
• segmentChapter 7 - Adding an Explanatory Variable to the Model
• segmentChapter 8 - Digging Deeper into Group Models
• segmentChapter 9 - Models with a Quantitative Explanatory Variable
• segmentPART III: EVALUATING MODELS
• segmentChapter 10 - The Logic of Inference
• segmentChapter 11 - Model Comparison with F
• segmentChapter 12 - Parameter Estimation and Confidence Intervals
• segmentChapter 13 - What You Have Learned
• segmentFinishing Up (Don't Skip This Part!)
• segmentResources

### list High School / Advanced Statistics and Data Science I (ABC)

Book
• High School / Advanced Statistics and Data Science I (ABC)
• High School / Statistics and Data Science I (AB)
• High School / Statistics and Data Science II (XCD)
• College / Statistics and Data Science (ABC)
• College / Advanced Statistics and Data Science (ABCD)
• College / Accelerated Statistics and Data Science (XCDCOLLEGE)
• Skew the Script: Jupyter

## 7.5 Error Leftover From the Group Model

Previously, we calculated residuals from the empty model of Thumb by starting with the actual thumb length for each student, and then subtracting the predicted value based on the model. In the empty model, all students had the same predicted value.

DATA = MODEL PREDICTION + RESIDUAL

RESIDUAL = DATA - MODEL PREDICTION

For the Sex model of Thumb we will use the same method, the only difference being that this time there will be two different model predictions, depending on the sex of the student. Still, the predicted thumb length for each student, which depends on their sex, is subtracted from their actual thumb length to get the residuals.

The residuals for 6 example students are represented in the plots below for both the empty model (left) and Sex model (right). Notice that the placement of the 6 data points is the same from one model to the other; the actual thumb lengths of these students don’t change.

Residuals from the Empty Model Residuals from the Sex Model  The predictions and residuals of the two models, however, are different. For the empty model, each student’s residual is calculated in relation to the mean Thumb of all students in the data set. For the Sex model, each student’s residual is calculated in relation to the predicted Thumb for their sex.

Something to keep in mind as well is that looking at residuals can help you interpret your data. Take, for example, the male student whose thumb length is circled in the plots below. Looking at residuals can help you see something interesting about this student.

In relation to the empty model, this student has a larger than average thumb length. In relation to the sex model, the same student has a slightly below average thumb length given that they are male.  ### Using R to Calculate Residuals from the Sex Model

Just as we earlier used the predict() function to generate a predicted thumb length to each student in the data frame, we can use the resid() function to calculate the residual for each student. We’ve done that with the code below, and printed out the data table for just the 6 students we have been looking at.

Fingers$Sex_predict <- predict(Sex_model) Fingers$Sex_resid <- resid(Sex_model)
head(select(Fingers, Sex, Thumb, Sex_predict, Sex_resid))
     Sex Thumb Sex_predict Sex_resid
1 female    64    58.25585  5.744152
2 female    56    58.25585 -2.255848
3 female    52    58.25585 -6.255848
4   male    66    64.70267  1.297333
5   male    70    64.70267  5.297333
6   male    62    64.70267 -2.702667