list

Statistics and Data Science: A Modeling Approach

8.5 Assessing Model Fit with Sum of Squares

Finally, let’s examine the fit of our regression model by running the supernova() function on our model. And at the same time, let’s compare the table we get from the regression model (Height.model) with the one we produced before for the Height2Group.model.

supernova(Height2Group.model)
supernova(Height.model)

Height2Group Model

Analysis of Variance Table
Outcome variable: Thumb
Model: lm(formula = Thumb ~ Height2Group, data = Fingers)
 
                                SS  df      MS      F    PRE     p
 ----- ----------------- --------- --- ------- ------ ------ -----
 Model (error reduced) |   830.880   1 830.880 11.656 0.0699 .0008
 Error (from model)    | 11049.331 155  71.286                    
 ----- ----------------- --------- --- ------- ------ ------ -----
 Total (empty model)   | 11880.211 156  76.155

Height Model

Analysis of Variance Table
Outcome variable: Thumb
Model: lm(formula = Thumb ~ Height, data = Fingers)
 
                               SS  df       MS      F    PRE     p
 ----- ----------------- -------- --- -------- ------ ------ -----
 Model (error reduced) |   1816.9   1 1816.862 27.984 0.1529 .0000
 Error (from model)    |  10063.3 155   64.925                    
 ----- ----------------- -------- --- -------- ------ ------ -----
 Total (empty model)   |  11880.2 156   76.155

Remember, the total sum of squares is the sum of squared deviations (or more generally, residuals) from the empty model. Total sum of squares is all about the outcome variable, and isn’t affected by the explanatory variable or variables. And when we compare statistical models, as we are doing here, we always are modeling the same outcome variable.

Partitioning Sums of Squares

If you want to try out the app Dr. Ji uses in this video you can click this link to the applet. Copy/paste the data below into the little “sample data” box to reproduce Ji’s examples. (Here’s the link in case that one doesn’t work: http://www.rossmanchance.com/applets/RegShuffle.htm)

   Height2 Group Thumb
0 56
0 60
1 61
0 63
1 64
1 68

 

Height Thumb
62 56
66 60
67 61
63 63
68 64
71 68

For any model with an explanatory variable (what we have been calling “complex models”), the SS Total can be partitioned into the SS Error and the SS Model. The SS Model is the amount by which the error is reduced under the complex model (e.g., the Height model) compared with the empty model.

As we developed previously for the group models, SS Model is easily calculated by subtracting SS Error from SS Total. This is the same, regardless of whether you are fitting a group model or a regression model. Error from the model is defined in the former case as residuals from the group means, and in the latter, residuals from the regression line.

It also is possible to calculate the SS Model in the regression model directly, in much the same way we did for the group model. Recall that for the group model, SS Model was the sum of the squared deviations of each person’s predicted score (their group mean) from the Grand Mean. In the regression model, SS Model is calculated in exactly the same way, except that each person’s predicted score is defined as a point on the regression line. The Grand Mean is the same in both cases.