Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

11.3 Sampling Distribution of F

So far we have constructed and examined the sampling distribution of PRE to examine the variation in PREs that could be generated by the empty model in the context of the tipping experiment. We can use the same method to construct the sampling distribution of F.

In fact, using the sampling distribution of F is one of the most common ways to evaluate the empty model (or do a Null Hypothesis Significance Test, NHST). It is so popular, in fact, that it has its own name: the F-test. For this reason, we will spend a little time investigating the sampling distribution of F.

Just as we have an R function to directly calculate the PRE for a model, we also have one to calculate F: f. The following line of code calculates the sample F ratio that results from fitting the Condition model to the tipping study data.

​​f(Tip ~ Condition, data = TipExperiment)
3.3049725526482

Armed with the f() function, we can use the same approach we used for PRE to construct the sampling distribution of F. We will use shuffle() to simulate a DGP in which the only difference between the groups is due to randomization, and then use f() to find the F for the shuffled data. We will then repeat this process many times to create the sampling distribution.

Use the code block below to save 1000 randomly generated F ratios in a data frame called sdoF (an acronym for the sampling distribution of F). We’ve already provided some code that will display this distribution in a histogram.

require(coursekata) # save 1000 randomly generated F ratios in a data frame called sdoF sdoF <- gf_histogram(~ f, data = sdoF, fill = "darkgoldenrod1") # save 1000 randomly generated F ratios in a data frame called sdoF sdoF <- do(1000) * f(shuffle(Tip) ~ Condition, data = TipExperiment) gf_histogram(~ f, data = sdoF, fill = "darkgoldenrod1") ex() %>% check_object("sdoF")

Below on the right we have plotted a histogram of the randomized sampling distribution of F. On the left we have plotted the sampling distribution of PRE for purposes of comparison.

A histogram of the sampling distribution of PRE. It is skewed right, with most PRE's between zero and .05. and the tail extends from about .05 to about 0.15. A histogram of the sampling distribution of F. It is skewed right, with most F's between zero and 2.5. and the tail extends from about 2.5 to about 10.

You may be happy to find that the shapes of the sampling distributions of PRE and F are very similar. Neither of these sample statistics can be negative. And, the result of either a large positive or large negative effect of smiley face on Tip would both yield extreme Fs in the upper tail of the distribution.

Both of these sampling distributions are based on the assumption that the empty model is true in the DGP. We have previously developed the idea that if there is no effect of smiley face in the DGP (i.e., the empty model is true) then PRE in the DGP would be equal to 0. This means that knowing which condition a table is in explains literally 0% of the variation in Tip, which is the same as saying \(\beta_1=0\).

But what would the expected value of F be if the empty model were true? F is a more difficult concept to understand, and so we won’t develop this fully here. But if the empty model were true, meaning that PRE were literally 0, then the expected value of F would be 1. The variance estimated using the model predictions would be roughly equal to the variance estimated based on the error within groups.

To confirm that this is true, you can use the code window below to calculate the mean of f for our sampling distribution of F. Because our sampling distribution, which we created using shuffle(), assumes that the empty model is true, the average of all the Fs we generated should be roughly equal to 1.

require(coursekata) # we have created the sdoF for you sdoF <- do(1000) * f(shuffle(Tip) ~ Condition, data = TipExperiment) # calculate the mean of f favstats( ) # we have created the sdoF for you sdoF <- do(1000) * f(shuffle(Tip) ~ Condition, data = TipExperiment) # calculate the mean of f favstats(~ f, data = sdoF) ex() %>% { check_function(., "favstats") }

Because the sampling distribution of F is more common and highly similar to the sampling distribution of PRE, we will focus on using the distribution made of Fs. However, just know that everything we say in the next few sections will also apply to PREs.

Responses