Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
10.5 The p-Value
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentPART IV: MULTIVARIATE MODELS
-
segmentChapter 13 - Introduction to Multivariate Models
-
segmentChapter 14 - Multivariate Model Comparisons
-
segmentChapter 15 - Models with Interactions
-
segmentChapter 16 - More Models with Interactions
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Advanced Statistics with R (ABCD)
10.5 The P-Value
Locating the Sample in the Sampling Distribution
We have now spent some time looking at the sampling distribution of
Let’s place our sample
Here’s some code that will save the value of our sample sample_b1
.
sample_b1 <- b1(Tip ~ Condition, data = TipExperiment)
If we printed it out, we would see that the sample
Let’s find out by overlaying the sample %>%
operator) will place a black dot right at the sample
gf_point(x = 6.05, y = 0)
If you have already saved the value of sample_b1
), you can also write the code like this:
gf_point(x = sample_b1, y = 0)
We can see that our sample is not in the unlikely zone. It’s within the middle .95 of
Recap of Logic with the Distribution Triad
The hard thing about inference is that we have to keep all three distributions (sample, DGP, and sampling distribution) in mind. It’s really easy to lose track. We’re going to introduce a new kind of picture that shows all three of these distributions together in relation to one another.
The picture below represents how we’ve used sampling distributions so far to evaluate the empty model (or null hypothesis). Let’s start from the top of this image. The top blue horizontal line represents the possible values of
Based on this hypothesized DGP, we simulated samples, generated by random shuffles of the tipping experiment data. These sample
The Concept of P-Value
We have located the sample
But we can do better than this. We don’t have to just ask a yes/no question of our sampling distribution. Instead of asking whether the sample
Before we teach you how to calculate a p-value, let’s do some thinking about what the concept means.
The total area of the two tails shaded red in the histogram above represent the alpha level of .05. These regions represent the
Whereas we know what alpha is before we even do a study – it’s just a statement of our criterion for deciding what we will count as unlikely – the p-value is calculated after we do a study, based on actual sample data. We can illustrate the difference between these two ideas in the plots below, which zero in on just the upper tail of the sampling distribution of
alpha | p-value |
---|---|
This plot illustrates the concept of alpha. Having decided to set alpha as .05, the red area in the upper tail of the sampling distribution represents .025 of the largest |
This plot illustrates the concept of p-value. While p-value is also a probability, it is not dependent on alpha. P-value is represented by the purple area beyond our sample |
|
|
The dashed line in the plot on the left has been added to demarcate the cutoff beyond which we will consider unlikely, and the middle .95 of the sampling distribution which we consider not unlikely. We have kept the dashed line in the plot on the right to help you remember where the red alpha region started.
In the pictures above, we showed only the upper tail of the sampling distribution. But because a very low
![]() |
---|
![]() |
Because the purple tails, which represent the area beyond the sample
The p-value is the probability of getting a parameter estimate as extreme or more extreme than the sample estimate given the assumption that the empty model is true.
Thus, the p-value is calculated based on both the value of the sample estimate and the shape of the sampling distribution of the parameter estimate under the empty model. In contrast, alpha does not depend on the value of the sample estimate.