Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
8.7 Using Shuffle to Compare Models of the DGP
Simulating the Empty Model with shuffle()
If there was truly no effect of drawing smiley faces on tips, then the empty model in which
We can use the shuffle()
to investigate this question. But now that you have learned to fit a model, we can take shuffling to a new level. Instead of shuffling the tips and looking at the resulting plots of the shuffled groups, we can fit a model to the shuffled data and calculate an estimate of
In fact, we don’t even need the graphs. We can combine the b1()
function with shuffle()
to directly calculate the
In the code window below we’ve put code to directly calculate the
6.045
0.036
We got these two results when we ran the code. The 6.045 is familiar: it’s the lm()
. The 0.036 is what we got as the
Because shuffling is a random process, we will get a different
If you add the do()
function in front of the code above it will repeat this random process a specified number of times, generating a new
do( 5 ) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
Try modifying the code in the window below to produce 10 shuffles of the tips across conditions, and 10 randomized estimates of
b1
1 -3.7727273
2 -2.6818182
3 -1.4090909
4 -2.6818182
5 -4.6818182
6 2.0454545
7 -0.1363636
8 -1.1363636
9 0.5909091
10 4.6818182
The ten
If we continue to generate random
Bear in mind: Each of these shuffle()
function mimics a DGP where
Using Simulated s to Help Us Understand the Data Better
We can use these simulated
Here they are again, but this time we’ve arranged them in order from low to high.
b1
1 -3.7727273
2 -3.2272727
3 -1.6818182
4 -1.6818182
5 -1.5000000
6 -0.5000000
7 0.1363636
8 2.7727273
9 3.6818182
10 6.9545455
What we can see now, much more clearly than when we had only the graphs to go on, is that only one of the 10 randomly generated
Based on this, would we want to rule out the empty model as being the true model of the DGP? That’s a hard call, and one we will return to in an upcoming chapter. For now, just note that seeing the observed