Course Outline

• segmentGetting Started (Don't Skip This Part)
• segmentStatistics and Data Science: A Modeling Approach
• segmentPART I: EXPLORING VARIATION
• segmentChapter 1 - Welcome to Statistics: A Modeling Approach
• segmentChapter 2 - Understanding Data
• segmentChapter 3 - Examining Distributions
• segmentChapter 4 - Explaining Variation
• segmentPART II: MODELING VARIATION
• segmentChapter 5 - A Simple Model
• segmentChapter 6 - Quantifying Error
• segmentChapter 7 - Adding an Explanatory Variable to the Model
• segmentChapter 8 - Digging Deeper into Group Models
• segmentChapter 9 - Models with a Quantitative Explanatory Variable
• segmentPART III: EVALUATING MODELS
• segmentChapter 10 - The Logic of Inference
• segmentChapter 11 - Model Comparison with F
• segmentChapter 12 - Parameter Estimation and Confidence Intervals
• segmentFinishing Up (Don't Skip This Part!)
• segmentResources

list High School / Advanced Statistics and Data Science I (ABC)

Book
• High School / Advanced Statistics and Data Science I (ABC)
• High School / Statistics and Data Science I (AB)
• High School / Statistics and Data Science II (XCD)
• College / Statistics and Data Science (ABC)
• College / Advanced Statistics and Data Science (ABCD)
• College / Accelerated Statistics and Data Science (XCDCOLLEGE)
• Skew the Script: Jupyter

9.6 Sums of Squares in the ANOVA Table

Finally, let’s use the ANOVA table to examine the fit of the height model. We have saved Height_model in the code block below. Use supernova() to generate the ANOVA table.

require(coursekata) # this saves the Height_model Height_model <- lm(Thumb ~ Height, data = Fingers) # print the ANOVA tables for this model # this saves the Height_model Height_model <- lm(Thumb ~ Height, data = Fingers) # print the ANOVA tables for this model supernova(Height_model) ex() %>% check_function("supernova") %>% check_result() %>% check_equal()
CK Code: B5_Code_Assessing_01

Below we have printed out the resulting ANOVA table for the Height_model along with the one we produced earlier for the Height2Group_model.

Height Model

Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height

SS  df       MS      F    PRE     p
----- --------------- | --------- --- -------- ------ ------ -----
Model (error reduced) |  1816.862   1 1816.862 27.984 0.1529 .0000
Error (from model)    | 10063.349 155   64.925
----- --------------- | --------- --- -------- ------ ------ -----
Total (empty model)   | 11880.211 156   76.155                    

Height2Group Model

Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group

SS  df      MS      F    PRE     p
----- --------------- | --------- --- ------- ------ ------ -----
Model (error reduced) |   830.880   1 830.880 11.656 0.0699 .0008
Error (from model)    | 11049.331 155  71.286
----- --------------- | --------- --- ------- ------ ------ -----
Total (empty model)   | 11880.211 156  76.155                    

SS Total is the sum of squared residuals from the empty model. Total sum of squares is only about the outcome variable, and isn’t affected by the explanatory variable or variables. When we use sum of squares to compare statistical models, we are modeling the same outcome variable.

SS Error from Three Models

The table below summarizes the sums of squares leftover (SS Error) after fitting each of the three models we have been considering. All of these are calculated the same way, by summing the squared residuals from the model predictions.

Model Leftover SS Statistic Name
Empty model 11,880 Sum of Squares Total (SST)
Height2Group model 11,049 Sum of Squares Error (SSE)
Height model 10,063 Sum of Squares Error (SSE)

The more error there is leftover after fitting a model, the less of the total variation is explained. The empty model tells us how much total variation there is in the outcome variable. SS Error tells us how much of that error remains unexplained after fitting a more complex model.

SS Model

SS Model is the amount by which the error is reduced under the complex model (e.g., the Height model) compared with the empty model. As developed previously for group models, SS Model is easily calculated by subtracting SS Error from SS Total. This is the same, regardless of whether you are fitting a group model or a regression model.

It also is possible to calculate SS Model in the regression model directly, in much the same way we did for the group model. We simply take each person’s predicted score under the regression model and calculate its distance from the prediction of the empty model. This is the amount by which the model has reduced each person’s error compared with the empty model. We then square these distances and add them up to get SS Model.

Height2Group model’s error reduced Height model’s error reduced