Course Outline

list High School / Statistics and Data Science II (XCD)

Book
  • College / Advanced Statistics and Data Science (ABCD)
  • College / Statistics and Data Science (ABC)
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)

3.12 Comparing Models with PRE

Proportion Reduction in Error (PRE)

One disadvantage of using sums of squares to compare models is that they only make sense when comparing models that use the same outcome variable (e.g., PriceK) and the same data set (e.g., Ames). Just having a larger sample size in your data set will increase your sums of squares. That doesn’t mean, though, that your model is getting better or worse.

A more general way to characterize how good a model fits is by the proportion of the total variation (SS Total) that is explained, or reduced, by the complex model (SS Model). This metric is called the Proportion Reduction in Error, or PRE. You will find it in the supernova table.


Analysis of Variance Table (Type III SS)
Model: PriceK ~ Neighborhood
 
                                 SS  df         MS       F    PRE     p
 ----- --------------- | ---------- --- ---------- ------- ------ -----
 Model (error reduced) | 260904.012   1 260904.012 128.068 0.4117 .0000
 Error (from model)    | 372813.204 183   2037.231                     
 ----- --------------- | ---------- --- ---------- ------- ------ -----
 Total (empty model)   | 633717.215 184   3444.115                    

PRE is literally the proportion of error that is reduced by the complex model over the empty model. The total error to be explained is represented by SST. This can be thought of as 100% of the error in PriceK that potentially could be explained.

If we take the part of that total error that can be explained (or reduced) by Neighborhood (SSM) and then divide it by the total error in the outcome variable (SST) we will get PRE.

Total variation in PriceK is represented by the blue circle in the figure below, the part explained by Neighborhood is the overlap between the two circles (striped area). If we take SS Model (the striped area) and divide by SS Total (the whole blue circle), we get PRE.

Diagram showing the partitioning of Sum of Squares. On the left, a single circle represents SS Total from the Empty Model of Price. An arrow from that circle points to a Venn diagram of two partially overlapping circles to the right. One circle is labeled as SS Error from the Neighborhood Model of Price, and the other circle is labeled as Neighborhood. The intersection where the two circles overlap is labeled as SS Model, the Error Reduced by Neighborhood.

We can summarize the calculation of PRE with this formula:

\[P\!R\!E = \frac{S\!S_{Model}}{S\!S_{Total}}\]

A PRE of .41 tells us that a lot of the variation in price is explained by the neighborhood model. (In many fields of research, it’s exciting to get a PRE as high as .05.)

Responses