Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Digging Deeper into Group Models

segmentChapter 9  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 10  The Logic of Inference

segmentChapter 11  Model Comparison with F

segmentChapter 12  Parameter Estimation and Confidence Intervals

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
8.4 The F Ratio
On the prior page we discussed the limits of PRE as a measure of model fit. We can “overfit” a model by adding a lot of parameters to it. PRE alone, therefore, is not a sufficient guide in our quest to document a reduction in error. Yes, it tells us whether we are reducing error. But it does not take into account the cost of that reduction. The F ratio provides a solution to this problem, giving us an indicator of the amount of error reduced by a model that adjusts for the number of parameters it takes to realize the reduction in error.
To see how the F ratio is calculated, let’s go back to the analysis of variance table for the Height2Group
model (reprinted below). We have already discussed how we interpret the SS column. Let’s now look at the next three columns in the table: df, MS, and F. Just a note: df stands for degrees of freedom, MS stands for Mean Square, and F, well, that stands for the F ratio.
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group
SS df MS F PRE p
        
Model (error reduced)  830.880 1 830.880 11.656 0.0699 .0008
Error (from model)  11049.331 155 71.286
        
Total (empty model)  11880.211 156 76.155
Degrees of Freedom (\(df\))
Technically, the degrees of freedom is the number of independent pieces of information that went into calculating a parameter estimate (e.g., \(b_0\) or \(b_1\)). But we find it helpful to think about degrees of freedom (also called \(df\)) as a budget. The more data (represented by \(n\)) you have, the more degrees of freedom you have, which you can use to estimate more parameters (i.e., build more complex models).
In the Fingers
data, there are 157 students. When we estimated the single parameter for the empty model (\(b_0\)), we used 1 \(df\), leaving a balance of 156 \(df\) left to spend (called \(df_{total}\)). The Height2Group
model required us to estimate one additional parameter (\(b_1\)), which cost us one additional \(df\). This is why, in the ANOVA table \(df_{model}\) is 1. After fitting the Height2Group
model we are left with 155 \(df\) (also called \(df_{error}\)).
Mean Square (\(MS\))
The column labeled MS stands for mean square (also referred to as variance).
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Height2Group
SS df MS F PRE p
        
Model (error reduced)  830.880 1 830.880 11.656 0.0699 .0008
Error (from model)  11049.331 155 71.286
        
Total (empty model)  11880.211 156 76.155
MS is calculated by dividing SS by degrees of freedom for each row of the table.
\[\text{MS}_\text{Model} = \text{SS}_\text{Model}/\text{df}_\text{Model}\]
\[\text{MS}_\text{Error} = \text{SS}_\text{Error}/\text{df}_\text{Error}\]
\[\text{MS}_\text{Total} = \text{SS}_\text{Total}/\text{df}_\text{Total}\]
Starting again with the bottom row, MS Total tells us how much error there is in the outcome variable, per degree of freedom, after fitting the empty model. MS Error tells us how much error still remains, per degree of freedom, after fitting the Height2Group
model. MS Model represents the reduction in error by the Height2Group
model per degree of freedom spent beyond the empty model.
The F Ratio
Now let’s get to the F ratio. In our table, we have produced two different estimates of variance under the Height2Group
model: MS Model and MS Error.
SS df MS F PRE p
        
Model (error reduced)  830.880 1 830.880 11.656 0.0699 .0008
Error (from model)  11049.331 155 71.286
        
Total (empty model)  11880.211 156 76.155
MS Model tells us the variance of the predictions generated by the Height2Group
model; MS Error tells us the variance of the residuals after subtracting out the model. The F ratio is calculated as MS Model divided by MS Error:
\[F = \frac{\text{MS}_\text{Model}}{\text{MS}_\text{Error}} = \frac{\text{SS}_\text{Model}/\text{df}_\text{Model}}{\text{SS}_\text{Error}/\text{df}_\text{Error}}\]
This ratio turns out to be a very useful statistic. If there were little effect of Height2Group
on thumb length we would expect the variance among the model predictions to be approximately the same as the variance of the residuals, resulting in an F ratio of approximately 1. A larger F ratio means a better model.
Another way that we can think about F is as the variance between groups (because the group means are in fact the model predictions) divided by the variance within groups (the variation around the group means). But if the variation across groups is more than the variation within groups, the F ratio would rise above 1.
Just as variance provides a way to adjust the sum of squares based on degrees of freedom, the F ratio provides a way to take degrees of freedom into account when judging the fit of a model. The F ratio gives us a sense of whether the degrees of freedom that we spent in order to make our model more complicated were “worth it”.
Another Way of Thinking About the F Ratio
There is another way of thinking about F that makes clearer the relationship between F and PRE. It is represented by this alternative formula for F:
\[F = \frac{\text{PRE}/\text{df}_\text{model}}{(1\text{PRE})/\text{df}_\text{error}}\]
This formula produces the same result as the formula for F presented in the previous section, but makes it easier to think about the relation between PRE and F.
The numerator of this formula gives us an indicator of how much PRE we have achieved in our model per degree of freedom spent (i.e., number of parameters estimated beyond the empty model). In the case of the Height2Group
model, it would simply be PRE divided by 1 because the model used only one additional degree of freedom (\(b_1\)) beyond the empty model.
The denominator of the formula tells us how much error could still be reduced (i.e., the remaining unexplained error, \(1\text{PRE}\)) per degree of freedom if we were to put all stillunused degrees of freedom (\(\text{df}_\text{error}\)) into the model. In other words, it tells us what the PRE would be, on average, if we just randomly picked a parameter to estimate instead of the one that we picked for our model.
The F ratio, thought of this way, compares the amount of PRE achieved by the particular parameters we included in our model (per parameter) to the average amount of remaining unexplained variation that could have been explained by adding all the possible remaining parameters into the model.
Put another way, the F ratio answers this question: How many times bigger is the PRE obtained by our best fitting model (per degree of freedom spent) than the PRE that could have been obtained (again, per degree of freedom) by spending all of the possible remaining degrees of freedom?