Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
11.8 Using F to Test a Regression Model
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
11.8 Using F to Test a Regression Model
We can use sampling distributions of \(b_1\) or \(F\) to compare the empty model versus a two-group model. It turns out we can do the same for regression models. Previously, we used a sampling distribution of \(b_1\) to test the \(b_1\) in a regression model. Now let’s use the F-distribution to see how the F ratio from a regression model compares to the Fs generated by the empty model of the DGP.
Recap: Using Food Quality to Predict Tips
In the previous chapter we created a regression model that used each table’s average rating of food quality (FoodQuality
) to predict the outcome Tip
. We were interested in exploring the model: Tip = Food Quality + Other Stuff. To remind us of this example, here is a scatter plot of the data as well as the best-fitting regression model depicted as a blue line.
gf_point(Tip ~ FoodQuality, data = TipExperiment) %>%
gf_lm(color = "blue")
Using lm()
, we fit the regression model and found the best-fitting parameter estimates:
lm(Tip ~ FoodQuality, data = TipExperiment)
Call:
lm(formula = Tip ~ FoodQuality, data = TipExperiment)
Coefficients:
(Intercept) FoodQuality
10.1076 0.3776
The best-fitting \(b_1\) estimate was 0.38, meaning that for every one point increase in FoodQuality
, the Tip
increased, on average, by 0.38 percentage points. This is the slope of the regression line. But while this is the best-fitting estimate of the slope based on the data, is it possible that this \(b_1\) could have been generated by the empty model, one in which the true slope in the DGP is 0?
In the previous chapter we asked this question using the sampling distribution of \(b_1\). But we can ask the same question using the sampling distribution of F. Let’s start by finding the F for the FoodQuality
model. You can do this in the code window below using the f()
function.
require(coursekata)
TipExperiment <- select(TipExperiment, -Check)
# calculate the sample F for the food quality model
f()
# calculate the sample F for the Check model
f(Tip ~ FoodQuality, data = TipExperiment)
ex() %>%
check_function("f") %>%
check_result() %>%
check_equal()
4.42776650396444
The F ratio for the FoodQuality
model is 4.43, which suggests that the variance explained by the model is 4.43 times the variance left unexplained by the model. Clearly, the FoodQuality
model explains more variation than the empty model in the data.
But is it possible that the sample F could have been generated by a DGP in which there is no effect of FoodQuality
on Tip
? To answer this question we will need the sampling distribution of F generated under the empty model in which the true slope of the regression line in the DGP is 0.
Creating a Sampling Distribution of F
Following the same approach we used for group models, we can use shuffle()
to simulate many samples from a DGP in which there is no effect of FoodQuality
on Tip
, fit the FoodQuality
model to each shuffled sample, and then calculate the corresponding value of F for each shuffled sample.
Write code to create a randomized sampling distribution of 1000 Fs (called sdof
) under the assumption that FoodQuality
has no relationship with Tip
. Print the first six lines of sdof
.
require(coursekata)
TipExperiment <- select(TipExperiment, -Check)
# create a sampling distribution of 1000 Fs generated from the empty model
# print the first 6 rows
# create a sampling distribution of 1000 Fs generated from the empty model
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
# print the first 6 rows
head(sdof)
ex() %>% {
check_object(., "sdof") %>%
check_equal()
check_function(., "head") %>%
check_arg("x") %>%
check_equal()
}
f
1 0.5689833
2 3.7782664
3 0.3390408
4 1.4000646
5 1.8874035
6 0.7876602
Use the code block below to make a histogram of the randomized F ratios.
require(coursekata)
TipExperiment <- select(TipExperiment, -Check)
# this creates a sampling distribution of 1000 Fs generated from the empty model
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
# make a histogram depicting the sdof
# this creates a sampling distribution of 1000 Fs generated from the empty model
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
# make a histogram depicting the sdof
gf_histogram(~ f, data = sdof)
ex() %>% check_or(
check_function(., "gf_histogram") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal(eval = FALSE)
},
override_solution(., '{
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
gf_histogram(sdof, ~ f)
}') %>%
check_function("gf_histogram") %>% {
check_arg(., "object") %>% check_equal(eval = FALSE)
check_arg(., "gformula") %>% check_equal()
},
override_solution(., '{
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
gf_histogram(~ sdof$f)
}') %>%
check_function("gf_histogram") %>%
check_arg("object") %>%
check_equal(eval = FALSE)
)
As you can see, the sampling distribution of F created based on the assumption that the empty model is true is still shaped like the F distribution, even though the model is a regression model and not a group model.
Use the code window and the tally()
function to find out how many (or what proportion) of the 1000 Fs were greater than the sample F of 4.43.
require(coursekata)
TipExperiment <- select(TipExperiment, -Check)
# we’ve saved the f value (4.43) as sample_f
sample_f <- f(Tip ~ FoodQuality, data = TipExperiment)
# we have also generated 1000 Fs from the empty model of the DGP
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
# how many Fs are larger than sample_f?
tally()
# we’ve saved the f value (4.43) as sample_f
sample_f <- f(Tip ~ FoodQuality, data = TipExperiment)
# we have also generated 1000 Fs from the empty model of the DGP
sdof <- do(1000) * f(shuffle(Tip) ~ FoodQuality, data = TipExperiment)
# how many Fs are larger than sample_f?
tally(~ f > sample_f, data = sdof)
ex() %>% check_or(
check_function(., "tally") %>% {
check_arg(., "x") %>% check_equal()
check_arg(., "data") %>% check_equal()
},
override_solution_code(., '{tally(~(f > sample_f), data= sdof)}') %>%
check_function(., "tally") %>% {
check_arg(., "x") %>% check_equal()
check_arg(., "data") %>% check_equal()
}
)
f > sample_f
TRUE FALSE
41 959
Only 41 of the 1000 randomized Fs (in our shuffled distribution; yours might be a little different) were as large as our sample F of 4.43. Based on this, we would estimate the p-value to be 0.041. And indeed, this is very close to the p-value we get in the ANOVA table for the regression model (reproduced below),which is calculated using the mathematical F distribution.
supernova(lm(Tip ~ FoodQuality, data = TipExperiment))
Analysis of Variance Table (Type III SS)
Model: Tip ~ FoodQuality
SS df MS F PRE p
----- --------------- | -------- -- ------- ----- ----- -----
Model (error reduced) | 525.576 1 525.576 4.428 .0954 .0414
Error (from model) | 4985.401 42 118.700
----- --------------- | -------- -- ------- ----- ----- -----
Total (empty model) | 5510.977 43 128.162
Thus, it is unlikely (about a 4% chance), but not impossible that the F value resulting from the FoodQuality
model could have resulted if the empty model were true.
Does FoodQuality
Cause an Increase in Tip
?
There appears to be a relationship between the average rating of food quality and the amount that a table tips. And the p-value of .041 indicates that observed relationship would be unlikely to have appeared in the data if the empty model were true. But should the low p-value give us confidence that higher food quality actually caused the increase in Tip
?
Causality cannot be inferred purely based on the results of statistical analysis; we must also take into account the research design.
When we investigated the effect of a smiley face on Tip
, we had the advantage of random assignment: each table at the restaurant was randomly assigned to Condition
, to either get a smiley face or not. This meant that any differences in tips could be attributed to either the Condition
(the one thing the researchers varied across conditions) or the result of randomness. A low p-value would have supported the inference that the smiley faces caused higher tips, because it would rule out randomness.
Analyzing the relationship between FoodQuality
and Tip
is different because this study used a correlational design where researchers measured FoodQuality
without randomly assigning tables to levels of it. Without random assignment, other factors not measured by the researchers might have influenced both FoodQuality
and Tip
. For example, if tables with higher food quality ratings were celebrating special occasions, the celebration might have influenced both the tips and the perceived food quality.
Even in a correlational study, a low p-value can help rule out randomness as the only cause of variation in tips. But it cannot tell us that the variable FoodQuality
is therefore a cause of variation in Tip
. The observed effect could be due to FoodQuality
or other unmeasured variables, so a low p-value does not necessarily indicate a causal relationship. A low p-value in the context of a correlational design can rule out the empty model of the DGP, but leaves many other models that could have caused this pattern of variation in tips.