Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

8.4 The F Ratio

On the prior page we discussed the limits of PRE as a measure of model fit. We can “overfit” a model by adding a lot of parameters to it. PRE alone, therefore, is not a sufficient guide in our quest to document a reduction in error. Yes, it tells us whether we are reducing error. But it does not take into account the cost of that reduction. The F ratio provides a solution to this problem, giving us an indicator of the amount of error reduced by a model that adjusts for the number of parameters it takes to realize the reduction in error.

To see how the F ratio is calculated, let’s go back to the analysis of variance table for the Height2Group model (reprinted below). We have already discussed how we interpret the SS column. Let’s now look at the next three columns in the table: df, MS, and F. Just a note: df stands for degrees of freedom, MS stands for Mean Square, and F, well, that stands for the F ratio.

Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group

                               SS  df      MS      F    PRE     p
----- --------------- | --------- --- ------- ------ ------ -----
Model (error reduced) |   830.880   1 830.880 11.656 0.0699 .0008
Error (from model)    | 11049.331 155  71.286
----- --------------- | --------- --- ------- ------ ------ -----
Total (empty model)   | 11880.211 156  76.155                    

Degrees of Freedom (\(\text{df}\))

Technically, the degrees of freedom is the number of independent pieces of information that went into calculating a parameter estimate (e.g., \(b_0\) or \(b_1\)). But we find it helpful to think about degrees of freedom (also called \(df\)) as a budget. The more data (represented by \(n\)) you have, the more degrees of freedom you have, which you can use to estimate more parameters (i.e., build more complex models).

In the Fingers data, there are 157 students. When we estimated the single parameter for the empty model (to estimate \(b_0\)), we used 1 \(\text{df}\), leaving a balance of 156 \(\text{df}\) left to spend (called \(\text{df}_\text{total}\)). The Height2Group model required us to estimate one additional parameter (to estimate \(b_1\)), which cost us one additional \(\text{df}\). This is why, in the ANOVA table \(\text{df}_\text{model}\) is 1. After fitting the Height2Group model we are left with 155 \(\text{df}\) (also called \(\text{df}_{error}\)).

Mean Square (\(MS\))

The column labeled MS stands for mean square (also referred to as variance).

Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group

                               SS  df      MS      F    PRE     p
----- --------------- | --------- --- ------- ------ ------ -----
Model (error reduced) |   830.880   1 830.880 11.656 0.0699 .0008
Error (from model)    | 11049.331 155  71.286
----- --------------- | --------- --- ------- ------ ------ -----
Total (empty model)   | 11880.211 156  76.155     

MS is calculated by dividing SS by degrees of freedom for each row of the table.

\[\text{MS}_\text{Model} = \text{SS}_\text{Model}/\text{df}_\text{Model}\]

\[\text{MS}_\text{Error} = \text{SS}_\text{Error}/\text{df}_\text{Error}\]

\[\text{MS}_\text{Total} = \text{SS}_\text{Total}/\text{df}_\text{Total}\]

Starting again with the bottom row, MS Total tells us how much error there is in the outcome variable, per degree of freedom, after fitting the empty model. MS Error tells us how much error still remains, per degree of freedom, after fitting the Height2Group model. MS Model represents the reduction in error by the Height2Group model per degree of freedom spent beyond the empty model.

The F Ratio

Now let’s get to the F ratio. In our table, we have produced two different estimates of variance under the Height2Group model: MS Model and MS Error.

                               SS  df      MS      F    PRE     p
----- --------------- | --------- --- ------- ------ ------ -----
Model (error reduced) |   830.880   1 830.880 11.656 0.0699 .0008
Error (from model)    | 11049.331 155  71.286       
----- --------------- | --------- --- ------- ------ ------ -----
Total (empty model)   | 11880.211 156  76.155                    

MS Model tells us the variance of the predictions generated by the Height2Group model; MS Error tells us the variance of the residuals after subtracting out the model. The F ratio is calculated as MS Model divided by MS Error:

\[F = \frac{\text{MS}_\text{Model}}{\text{MS}_\text{Error}} = \frac{\text{SS}_\text{Model}/\text{df}_\text{Model}}{\text{SS}_\text{Error}/\text{df}_\text{Error}}\]

This ratio turns out to be a very useful statistic. If there were little effect of Height2Group on thumb length we would expect the variance among the model predictions to be approximately the same as the variance of the residuals, resulting in an F ratio of approximately 1. A larger F ratio means a better model.

Another way that we can think about F is as the variance between groups (because the group means are in fact the model predictions) divided by the variance within groups (the variation around the group means). But if the variation across groups is more than the variation within groups, the F ratio would rise above 1.

Just as variance provides a way to adjust the sum of squares based on degrees of freedom, the F ratio provides a way to take degrees of freedom into account when judging the fit of a model. The F ratio gives us a sense of whether the degrees of freedom that we spent in order to make our model more complicated were “worth it”.

Another Way of Thinking About the F Ratio

There is another way of thinking about F that makes clearer the relationship between F and PRE. It is represented by this alternative formula for F:

\[F = \frac{\text{PRE}/\text{df}_\text{model}}{(1-\text{PRE})/\text{df}_\text{error}}\]

This formula produces the same result as the formula for F presented in the previous section, but makes it easier to think about the relation between PRE and F.

The numerator of this formula gives us an indicator of how much PRE we have achieved in our model per degree of freedom spent (i.e., number of parameters estimated beyond the empty model). In the case of the Height2Group model, it would simply be PRE divided by 1 because the model used only one additional degree of freedom (\(b_1\)) beyond the empty model.

The denominator of the formula tells us how much error could still be reduced (i.e., the remaining unexplained error, \(1-\text{PRE}\)) per degree of freedom if we were to put all still-unused degrees of freedom (\(\text{df}_\text{error}\)) into the model. In other words, it tells us what the PRE would be, on average, if we just randomly picked a parameter to estimate instead of the one that we picked for our model.

The F ratio, thought of this way, compares the amount of PRE achieved by the particular parameters we included in our model (per parameter) to the average amount of remaining unexplained variation that could have been explained by adding all the possible remaining parameters into the model.

Put another way, the F ratio answers this question: How many times bigger is the PRE obtained by our best fitting model (per degree of freedom spent) than the PRE that could have been obtained (again, per degree of freedom) by spending all of the possible remaining degrees of freedom?

Responses