Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - The Logic of Inference
-
segmentChapter 10 - Model Comparison with F
-
segmentChapter 11 - Parameter Estimation and Confidence Intervals
-
segmentChapter 12 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
9.4 The P-Value
Locating the Sample in the Sampling Distribution
We have now spent some time looking at the sampling distribution of
Let’s place our sample
Here’s some code that will save the value of our sample sample_b1
.
sample_b1 <- b1(Tip ~ Condition, data = TipExperiment)
If we printed it out, we would see that the sample
Let’s find out by overlaying the sample %>%
operator) will place a black dot right at the sample
gf_point(x = 6.05, y = 0)
If you have already saved the value of sample_b1
), you can also write the code like this:
gf_point(x = sample_b1, y = 0)
We can see that our sample is not in the unlikely zone. It’s within the middle .95 of
Recap of Logic with the Distribution Triad
The hard thing about inference is that we have to keep all three distributions (sample, DGP, and sampling distribution) in mind. It’s really easy to lose track. We’re going to introduce a new kind of picture that shows all three of these distributions together in relation to one another.
The picture below represents how we’ve used sampling distributions so far to evaluate the empty model (or null hypothesis). Let’s start from the top of this image. The top blue horizontal line represents the possible values of
Based on this hypothesized DGP, we simulated samples, generated by random shuffles of the tipping experiment data. These sample
The Concept of P-Value
We have located the sample
But we can do better than this. We don’t have to just ask a yes/no question of our sampling distribution. Instead of asking whether the sample
Before we teach you how to calculate a p-value, let’s do some thinking about what the concept means.
The total area of the two tails shaded red in the histogram above represent the alpha level of .05. These regions represent the
Whereas we know what alpha is before we even do a study – it’s just a statement of our criterion for deciding what we will count as unlikely – the p-value is calculated after we do a study, based on actual sample data. We can illustrate the difference between these two ideas in the plots below, which zero in on just the upper tail of the sampling distribution of
alpha | p-value |
---|---|
This plot illustrates the concept of alpha. Having decided to set alpha as .05, the red area in the upper tail of the sampling distribution represents .025 of the largest |
This plot illustrates the concept of p-value. While p-value is also a probability, it is not dependent on alpha. P-value is represented by the purple area beyond our sample |
|
|
The dashed line in the plot on the left has been added to demarcate the cutoff beyond which we will consider unlikely, and the middle .95 of the sampling distribution which we consider not unlikely. We have kept the dashed line in the plot on the right to help you remember where the red alpha region started.
In the pictures above, we showed only the upper tail of the sampling distribution. But because a very low
![]() |
---|
![]() |
Because the purple tails, which represent the area beyond the sample
The p-value is the probability of getting a parameter estimate as extreme or more extreme than the sample estimate given the assumption that the empty model is true.
Thus, the p-value is calculated based on both the value of the sample estimate and the shape of the sampling distribution of the parameter estimate under the empty model. In contrast, alpha does not depend on the value of the sample estimate.
Calculating the P-Value for a Sample
To calculate the probability of getting a
We can use tally()
to figure out how many simulated samples are more extreme than our sample sample_b1
($6.05), the second line, how many are more extreme than our sample on the negative side (-$6.05).
tally(~ b1 > sample_b1, data = sdob1)
tally(~ b1 < -sample_b1, data = sdob1)
(Coding note: R thinks of <-
with no space between the two characters as an assignment operator; it’s supposed to look like an arrow. For the second line of code, you need to be sure to put a space between the <
and -
so R interprets it to mean less than the negative of sample_b1
.)
The two lines of tally()
code will produce:
b1 > sample_b1
TRUE FALSE
38 963
b1 < -sample_b1
TRUE FALSE
41 959
When we add up the two tails (the extreme positive
Since there are about 80 randomly generated
tally(~ b1 > sample_b1, data = sdob1)
tally(~ b1 < -sample_b1, data = sdob1)
Instead of using two lines of code - one to find the number of
tally(sdob1$b1 > sample_b1 | sdob1$b1 < -sample_b1)
Note the use of the |
operator, which means or, to put the two criteria together: this code tallies up the total number of format = "proportion"
to get the proportion or p-value directly.
The p-value for the
If our p-value had been less than .05, we might have declared our sample unlikely to have been generated by the empty model of the DGP, and thus rejected the empty model.
What It Means to Reject – or Not – the Empty Model (or Null Hypothesis)
The concept of p-value, and using it to decide whether or not to reject the empty model in favor of the more complex model we have fit to the data, comes from a tradition known as Null Hypothesis Significance Testing (NHST). The null hypothesis is, in fact, the same as what we call the empty model. It refers to a world in which
While we want you to understand the logic of NHST, we also want you to be thoughtful in your interpretation of the p-value. The NHST tradition has been criticized lately because it often is applied thoughtlessly, in a highly ritualized manner (download NHST article by Gigerenzer, Krauss, & Vitouch, 2004 (PDF, 286KB)). People who don’t really understand what the p-value means may draw erroneous conclusions.
For example, we just decided, based on a p-value of .08, to not reject the empty model of Tip
. But does this mean that
It could, for example, be 6.05, which was the best-fitting estimate of
If both the empty model and the complex “best-fitting” model are possible true models of the DGP, how should we decide which model to use?
Some people from the null hypothesis testing tradition would say that if you cannot reject the empty model then you should use the empty model. From this perspective, we should avoid Type I error at all costs; we don’t want to say there is an effect of smiley face when there is not one in the DGP. In this tradition, it is worse to make a Type I error than a Type II error, to say there is no effect when there is, in fact, an effect in the DGP.
But this might not be the best course of action in some situations. For example, if you simply want to make better predictions, you might decide to use the complex model, even if you cannot rule out the empty model. On the other hand, if your goal is to better understand the DGP, there is some value in having the simplest theory that is consistent with your data. Scientists call this preference for simplicity “parsimony.”
Judd, McClelland, and Ryan (some statisticians we greatly admire) once said that you just have to decide whether a model is “better enough to adopt.” Much of statistical inference involves imagining a variety of models that are consistent with your data and looking to see which ones will help you to achieve your purpose.
We prefer to think in terms of model comparison instead of null hypothesis testing. With too much emphasis on null hypothesis testing, you might think your job is done when you either reject or fail to reject the empty model. But in the modeling tradition, we are always seeking a better model: one that helps us understand the DGP, or one that makes better predictions about future events.