CourseKata - 10.9 Hypothesis Testing for Regression Models

High School / Advanced Statistics and Data Science I (ABC)

Book

10.9 Hypothesis Testing for Regression Models

We have gone through the logic of hypothesis testing for group models. We have used shuffle() to create a sampling distribution assuming that \(\beta_1=0\), and then used the sampling distribution to calculate the probability of our sample \(b_1\) or one more extreme having come from the empty model.

Now let’s apply the same ideas to regression models. As you will see, the strategy is exactly the same. We still want to create a sampling distribution of \(b_1\)s, though this time the \(b_1\) will represent a slope, not a group difference. Let’s see how this works by adding a new variable to the tipping experiment data frame.

Tips = Food Quality + Other Stuff

We have explored the effect of a smiley face on how much people tip at a restaurant. But there surely are other factors that can help us explain the variation in tip percentage. One of these might be the perceived quality of the food. We can explore this hypothesis by looking at another variable available in the TipExperiment data frame: FoodQuality.

Each adult diner at each table was asked to rate the quality of the food on a 100-point scale. They were told to consider 50 (the middle of the scale) as “about average for this type of restaurant,” and then to go up or down the scale from there, where 100 would be the best food they’ve ever tasted in their life, and 0 would be the worst. FoodQuality is the average rating for each table of diners.

  TableID Tip Condition  FoodQuality
1       1  39   Control         54.9
2       2  36   Control         51.7
3       3  34   Control         60.5
4       4  34   Control         56.7
5       5  33   Control         51.0
6       6  31   Control         43.3

We created a scatter plot to explore the hypothesis that FoodQuality might explain some of the variation in Tip.

gf_point(Tip ~ FoodQuality, data = TipExperiment)

scatter plot of Tips as a function of FoodQuality

Modeling Variation in Tips as a Function of Food Quality

Use the code window below to fit a regression model in which FoodQuality is used to explain Tip.

require(coursekata)
TipExperiment <- select(TipExperiment, -Check)

# fit a regression model in which FoodQuality is used to explain Tip

# fit a regression model in which FoodQuality is used to explain Tip
lm(Tip ~ FoodQuality, data = TipExperiment)

ex() %>%
  check_function("lm") %>%
  check_result() %>%
  check_equal()

Call:
lm(formula = Tip ~ FoodQuality, data = TipExperiment)

Coefficients:
 (Intercept)   FoodQuality  
     10.1076        0.3776

A .38 percentage point increase in tip for every additional point increase in FoodQuality does not seem like very much. In fact, it seems pretty close to 0. Is it possible that this \(b_1\) could have been generated by a DGP in which there is no effect of food quality, that is, a DGP where \(\beta_1=0\)? Or, can we reject the empty model in favor of one in which FoodQuality does effect Tip?

Evaluating the Empty Model of the DGP

Just as we did with the Condition model, we can use shuffle() to simulate the case where the empty model is true (i.e., where the true value of the slope in the DGP is 0), create a sampling distribution of \(b_1\)s by shuffling Tip, and then use the sampling distribution to calculate the likelihood of a \(b_1\) as extreme as .38 being generated by the empty model.

In the code block below we have written code to create a scatter plot of the data. Add shuffle() around the outcome (Tip) to generate a sample of shuffled data from the empty model of the DGP and plot the data with the best-fitting regression line. Run it a few times just to see what kinds of slopes (\(b_1\)s) are generated by this DGP.

require(coursekata)
TipExperiment <- select(TipExperiment, -Check)

# modify this to shuffle the data
gf_point(Tip ~ FoodQuality, data = TipExperiment, color = "orangered") %>%
  gf_lm()

# modify this to shuffle the data
gf_point(shuffle(Tip) ~ FoodQuality, data = TipExperiment, color = "orangered") %>%
  gf_lm()

ex() %>% {
  # can't check outcomes because gf_point produces an unreliable result
  check_function(., "gf_lm")
  check_or(.,
    check_function(., "gf_point") %>% {
      check_arg(., "object") %>% check_equal()
      check_arg(., "data") %>% check_equal()
    },
    override_solution_code(., "gf_point(Tip ~ shuffle(FoodQuality), data = TipExperiment)") %>%
    check_function(., "gf_point") %>% {
      check_arg(., "object") %>% check_equal()
      check_arg(., "data") %>% check_equal()
    },
  override_solution_code(., "gf_point(shuffle(Tip) ~ shuffle(FoodQuality), data = TipExperiment)") %>%
    check_function(., "gf_point") %>% {
      check_arg(., "object") %>% check_equal()
      check_arg(., "data") %>% check_equal()
    }
  )
}

6 scatter plots, description follows

The actual data from the tipping study is shown in blue (the panel in the upper left) along with the best-fitting regression line (the slope is .05). The 5 other plots (with red dots) are shuffled data, along with their best-fitting regression lines.

From the shuffled data, we saw that many of the regression lines are flatter than the line for the actual data. This makes sense given that we are simulating a DGP in which \(\beta_1=0\); we would expect many of the \(b_1\)s to be close to 0. Now let’s generate a sampling distribution of \(b_1\)s using the b1() function.

Complete the first line of code below to generate a sampling distribution of 1000 \(b_1\)s (sdob1) from the FoodQuality model fit to shuffled data. We have added some additional code to generate a histogram of the sampling distribution of \(b_1\)s and represent the sample \(b_1\) as a black dot.

require(coursekata)
TipExperiment <- select(TipExperiment, -Check)
sample_b1 <- b1(Tip ~ FoodQuality, data = TipExperiment)

# generate a sampling distribution of 1000 b1s from the shuffled data
sdob1 <- do() * b1()

# histogram of the 1000 b1s
gf_histogram(~ b1, data = sdob1, fill = ~middle(b1, .95), bins = 100, show.legend = FALSE) %>%
  gf_point(x = sample_b1, y = 0, show.legend = FALSE)

# generate a sampling distribution of 1000 b1s from the shuffled data
sdob1 <- do(1000) * b1(shuffle(Tip) ~ FoodQuality, data = TipExperiment)

# histogram of the 1000 b1s
gf_histogram(~ b1, data = sdob1, fill = ~middle(b1, .95), bins = 100, show.legend = FALSE) %>%
  gf_point(x = sample_b1, y = 0, show.legend = FALSE)

ex() %>% check_or(.,
  check_function(., 'b1') %>% {
    check_arg(., 1) %>% check_equal(eval = FALSE)
    check_arg(., 2) %>% check_equal()
  },
  override_solution_code(., "sdob1 <- do(1000) * b1(shuffle(Tip) ~ FoodQuality, data = TipExperiment)") %>%
    check_function(., 'b1') %>% {
      check_arg(., 1) %>% check_equal(eval = FALSE)
      check_arg(., 2) %>% check_equal()
    }
)

A sampling distribution of b1, with a black dot representing the sample b1. The black dot is just outside the likely region of the distribution.

From this sampling distribution we can see that a value as extreme as .38 falls just outside the region of the sampling distribution we are considering likely. We might have thought a .38 percentage point increase per one-point increase in food quality was close to 0, but it is not one of the likely \(b_1\)s generated from a DGP where the true \(\beta_1\) is 0! This suggests the p-value is going to be relatively small.

To make sure, let’s take a look at the p-value from the ANOVA table.

require(coursekata)
TipExperiment <- select(TipExperiment, -Check)

# here's the best-fitting FoodQuality model
FoodQuality_model <- lm(Tip ~ FoodQuality, data = TipExperiment)

# create the ANOVA table for this model

FoodQuality_model <- lm(Tip ~ FoodQuality, data = TipExperiment)

supernova(FoodQuality_model)
# Or
# supernova(lm(Tip ~ FoodQuality, data = TipExperiment))

ex() %>%
  check_function("supernova") %>%
  check_result() %>%
  check_equal()

Analysis of Variance Table (Type III SS)
Model: Tip ~ FoodQuality

                              SS df      MS     F   PRE     p
----- --------------- | -------- -- ------- ----- ----- -----
Model (error reduced) |  525.576  1 525.576 4.428 .0954 .0414
Error (from model)    | 4985.401 42 118.700                  
----- --------------- | -------- -- ------- ----- ----- -----
Total (empty model)   | 5510.977 43 128.162

The p-value is .04. There is only a 4% chance that the observed \(b_1\) of .38 would have occurred just by chance if the empty model of the DGP is true.

This sampling distribution of \(b_1\)s tells us that if the empty model of the DGP were true, our sample is unlikely. Given that we actually got our sample, we would reject the empty model of the DGP in favor a model that includes food quality as an explanatory variable.

10.8 Things That Affect p-Value 10.10 Chapter 10 Review Questions

Course Outline

High School / Advanced Statistics and Data Science I (ABC)

10.9 Hypothesis Testing for Regression Models

Tips = Food Quality + Other Stuff

Modeling Variation in Tips as a Function of Food Quality

Evaluating the Empty Model of the DGP

Responses

list High School / Advanced Statistics and Data Science I (ABC)

10.9 Hypothesis Testing for Regression Models

Tips = Food Quality + Other Stuff

Modeling Variation in Tips as a Function of Food Quality

Evaluating the Empty Model of the DGP

High School / Advanced Statistics and Data Science I (ABC)