Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Digging Deeper into Group Models

segmentChapter 9  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 10  The Logic of Inference

segmentChapter 11  Model Comparison with F

11.7 FDistribution and tDistribution

segmentChapter 12  Parameter Estimation and Confidence Intervals

segmentChapter 13  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
11.7 FDistribution and tDistribution
Shapes of the FDistribution
The shape of the Fdistribution varies quite a bit depending on the degrees of freedom (df1 and df2). To illustrate, look at the plots below. On the left, we have depicted three Fdistributions that have the same df1 (that is df1 = 2) but differ on df2 (2, 12, 1000). On the right, we have held df2 constant at 1000 and varied df1 (1, 5, 30).
When df1 (i.e., df Model) is held constant (left panel of the figure), that means that the number of parameters estimated for the model is held constant. For the threegroup model, df1 = 2 – 2 being the number of parameters estimated beyond the one for the empty model. We can see that changing the sample size, and thus the values of df2 (i.e., df Error), has only a slight effect on the shape of the Fdistribution when df1 is held constant. Even at a df2 of 12 (the blue line), it’s very similar to the Fdistribution where df2 is 1000 (black line). Once df2 gets above 30 or so, it barely changes at all.
Changing the number of parameters estimated for the model (df Model), on the other hand, has a more profound influence on the shape of the Fdistribution. In the right panel of the figure above, where we hold the sample size constant at a fairly large df2 of 1000, increasing the number of parameters (df1) from 1 to 5 to 30 produces a big difference in shape. As the number of parameters goes up, e.g., as high as 30, the Fdistribution starts to look almost normal in shape.
The FDistribution and TDistribution are Actually the Same
We have now used one mathematical model for the sampling distribution of \(b_1\) (the tdistribution) and another for the sampling distribution of PRE and F (the Fdistribution). But we found that in the tipping study, whether we use t or F, the pvalue comes out exactly the same (.0762).
The reason is that fundamentally, the Fdistribution and the tdistribution are actually one and the same! If you randomly sample values from a tdistribution, and then square each one, you will get exactly an Fdistribution!
In the graph below on the left we show the distribution of 1000 \(b_1\)s that we created using shuffle()
. We know from the prior chapter that this distribution is well modelled by the tdistribution. We then squared each of the 1000 \(b_1\)s and graphed the distribution of 1000 b1_squared
s. As you can see, it now looks like the F distribution.
In the case of the Condition
model of Tip
, we can calculate the t statistic using the t.test
function, and the F statistic using supernova()
.
t.test(Tip ~ Condition, data = TipExperiment, var.equal=TRUE)
supernova(Tip ~ Condition, data = TipExperiment)
data: Tip by Condition
t = 1.818, df = 42, pvalue = 0.0762
Analysis of Variance Table (Type III SS)
Model: Tip ~ Condition
SS df MS F PRE p
        
Model (error reduced)  402.023 1 402.023 3.305 0.0729 .0762
Error (from model)  5108.955 42 121.642
        
Total (empty model)  5510.977 43 128.162
Notice two things. First, the pvalue is exactly the same for the twosample ttest as it is for the model comparisons using F: .0762. Second, notice the values of t (1.818) and F (3.305). Guess what you would get if you square 1.818? Yep, 3.305.
Instead of trying to think about how these methods are different from each other (e.g., Ftest versus ttest, or the permutation test versus mathematical functions), we want you, for now, to appreciate just how similar they are to each other. They all help us locate our parameter estimates in distributions of other estimates that could have been generated by the empty model.