Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
10.3 Exploring the Sampling Distribution of b1
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
10.3 Exploring the Sampling Distribution of b1
It’s hard to look at a long list of
The code below will save the sdob1
, which is an acronym for sampling distribution of b1s. (We made up this name for the data frame just to help us remember what it is. You could make up your own name if you prefer.) Add some code to this window to take a look at the first 6 rows of the data frame, and then run the code.
require(coursekata)
sdob1 <- do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
sdob1 <- do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
head(sdob1)
# the instructions in the text above the exercise say that they can change the name of the object (sdob1) if they want, and the sdob1 contents are shuffled, so the easiest thing here is to just check that they called head
ex() %>% check_function("head")
b1
1 -0.1363636
2 6.7727273
3 0.6818182
4 -0.5909091
5 -5.7727273
6 7.5000000
In the window below, write an additional line of code to display the variation in b1
in a histogram.
require(coursekata)
# we created the sampling distribution of b1s for you
sdob1 <- do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
# visualize that distribution in a histogram
# we created the sampling distribution of b1s for you
sdob1 <- do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
# visualize that distribution in a histogram
gf_histogram(~b1, data = sdob1)
ex() %>% check_or(
check_function(., "gf_histogram") %>% {
check_arg(., "object") %>% check_equal()
check_arg(., "data") %>% check_equal(eval = FALSE)
},
override_solution(., '{
sdob1 <- do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
gf_histogram(sdob1, ~b1)
}') %>%
check_function(., "gf_histogram") %>% {
check_arg(., "object") %>% check_equal(eval = FALSE)
check_arg(., "gformula") %>% check_equal()
},
override_solution(., '{
sdob1 <- do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
gf_histogram(~sdob1$b1)
}') %>%
check_function(., "gf_histogram") %>% {
check_arg(., "object") %>% check_equal(eval = FALSE)
}
)
Although this looks similar to other histograms you have seen in this book, it is not the same! This histogram visualizes the sampling distribution of
Because the sampling distribution is based on the empty model, where
You can see from the histogram that while it’s not impossible to generate a
Just eyeballing the histogram can give us a rough idea of the probability of getting a particular sample
Using the Sampling Distribution to Evaluate the Empty Model
We used R to simulate a world where the empty model is true in order to construct a sampling distribution. Now let’s return to our original goal, to see how this sampling distribution can be used to evaluate whether the empty model might explain the data we collected, or whether it should be rejected.
The basic idea is this: using the sampling distribution of possible sample
If we judge the
Let’s see how this works in the context of the tipping study, where
Samples that are extreme in either a positive (e.g., average tips that are $8 higher in the smiley face group) or negative direction (e.g., -$8, representing much lower average tips in the smiley face group), are unlikely to be generated if the true
Put another way: if we had a sample that fell in either the extreme upper tail or extreme lower tail of the sampling distribution (see figure below), we might reject the empty model as the true model of the DGP.
In statistics, this is commonly referred to as a two-tailed test because whether our actual sample falls in the extreme upper tail or extreme lower tail of this sampling distribution, we would have reason to reject the empty model as the true model of the DGP. By rejecting the model in which
Of course, even if we observe a