Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
7.10 Using Proportional Reduction in Error (PRE) to Compare Two Models
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
7.10 Using Proportional Reduction in Error (PRE) to Compare Two Models
We have now quantified how much variation has been explained by our model: 1,334 square millimeters. But is that a lot of explained variation, or just a little? It would be easier to understand if we knew the proportion of total error that has been reduced rather than the raw amount of error reduced measured in \(mm^2\).
If you take another look at the supernova()
table (reproduced below) for the Gender_model
, you will see a column labeled PRE. PRE stands for Proportional Reduction in Error.
Analysis of Variance Table (Type III SS)
Model: Thumb ~ Gender
SS df MS F PRE p
----- --------------- | --------- --- -------- ------ ------ -----
Model (error reduced) | 1334.203 1 1334.203 19.609 0.1123 .0000
Error (from model) | 10546.008 155 68.039
----- --------------- | --------- --- -------- ------ ------ -----
Total (empty model) | 11880.211 156 76.155
PRE is calculated using the sums of squares. It is simply SS Model (i.e., the sum of squares reduced by the model) divided by SS Total (or, the total sum of squares in the outcome variable under the empty model). We can represent this in a formula:
\[\text{PRE}=\frac{\text{SS}_\text{Model}}{\text{SS}_\text{Total}}\]
When we calculate PRE this way we are comparing a complex model (e.g., the gender model) to the empty model. Based on this formula, PRE can be interpreted as the proportion of total variation in the outcome variable that is explained by the explanatory variable. It tells us something about the overall strength of our statistical model. For example, in the Fingers
data set , the effect of Gender
on Thumb
accounts for .11 (11%) of the variation in thumb length. Not too shabby.
It is important to remember that SS Model in the numerator of the formula above represents the reduction in error when going from the empty model to the more complex model, which includes an explanatory variable. To make this clearer we can rewrite the above formula like this:
\[\text{PRE}=\frac{(\text{SS}_\text{Total} - \text{SS}_\text{Error})}{\text{SS}_\text{Total}}\]
The numerator of this formula starts with the error from the simple (empty) model (SS Total), and then subtracts the error from the complex model (SS Error) to get the error reduced by the complex model. Dividing this reduction in error by the SS Total yields the proportion of total error in the empty model that has been reduced by the complex model.
The PRE in the ANOVA table above (.11) represents a comparison of the gender model to the empty model, but PRE more generally can represent a comparison of any complex model to one that is simpler. Toward this end, we will add a version of the same formula that is more general:
\[\text{PRE}=\frac{(\text{SS}_\text{Error from Simple Model} - \text{SS}_\text{Error from Complex Model})}{\text{SS}_\text{Error from Simple Model}}\]
Just as a note: PRE goes by other names in other traditions. In the ANOVA tradition (Analysis of Variance) it is referred to as \(\eta^2\), or eta squared. In an upcoming chapter, we will introduce the same concept in the context of regression, where it is called \(R^2\). For now all you need to know is: these are different terms used to refer to the same thing, in case anyone asks you.