Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
4.6 Calculating the p-Value for a Sample
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentChapter 9 - Models with Interactions
-
segmentChapter 10 - More Models with Interactions
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list College / Accelerated Statistics with R (XCD)
4.6 Calculating the P-Value for a Sample
To calculate the probability of getting a
We can use tally()
to figure out how many simulated samples are more extreme than our sample sample_b1
(6.05), the second line, how many are more extreme than our sample on the negative side (-6.05).
tally(~ b1 > sample_b1, data = sdob1)
tally(~ b1 < -sample_b1, data = sdob1)
(Coding note: R thinks of <-
with no space between the two characters as an assignment operator; it’s supposed to look like an arrow. For the second line of code, you need to be sure to put a space between the <
and -
so R interprets it to mean less than the negative of sample_b1
.)
The two lines of tally()
code will produce:
b1 > sample_b1
TRUE FALSE
38 962
b1 < -sample_b1
TRUE FALSE
41 959
When we add up the two tails (the extreme positive
Since there are about 80 randomly generated
tally(~ b1 > sample_b1, data = sdob1)
tally(~ b1 < -sample_b1, data = sdob1)
Instead of using two lines of code - one to find the number of
tally(sdob1$b1 > sample_b1 | sdob1$b1 < -sample_b1)
Note the use of the |
operator, which means or, to put the two criteria together: this code tallies up the total number of format = "proportion"
to get the proportion or p-value directly.
The p-value for the
If our p-value had been less than .05, we might have declared our sample unlikely to have been generated by the empty model of the DGP, and thus rejected the empty model.
What It Means to Reject – or Not – the Empty Model (or Null Hypothesis)
The concept of p-value, and using it to decide whether or not to reject the empty model in favor of the more complex model we have fit to the data, comes from a tradition known as Null Hypothesis Significance Testing (NHST). The null hypothesis is, in fact, the same as what we call the empty model. It refers to a world in which
While we want you to understand the logic of NHST, we also want you to be thoughtful in your interpretation of the p-value. The NHST tradition has been criticized lately because it often is applied thoughtlessly, in a highly ritualized manner (download NHST article by Gigerenzer, Krauss, & Vitouch, 2004 (PDF, 286KB)). People who don’t really understand what the p-value means may draw erroneous conclusions.
For example, we just decided, based on a p-value of .08, to not reject the empty model of Tip
. But does this mean that
It could, for example, be 6.05, which was the best-fitting estimate of
If both the empty model and the complex “best-fitting” model are possible true models of the DGP, how should we decide which model to use?
Some people from the null hypothesis testing tradition would say that if you cannot reject the empty model then you should use the empty model. From this perspective, we should avoid Type I error at all costs; we don’t want to say there is an effect of smiley face when there is not one in the DGP. In this tradition, it is worse to make a Type I error than a Type II error, to say there is no effect when there is, in fact, an effect in the DGP.
But this might not be the best course of action in some situations. For example, if you simply want to make better predictions, you might decide to use the complex model, even if you cannot rule out the empty model. On the other hand, if your goal is to better understand the DGP, there is some value in having the simplest theory that is consistent with your data. Scientists call this preference for simplicity “parsimony.”
Judd, McClelland, and Ryan (some statisticians we greatly admire) once said that you just have to decide whether a model is “better enough to adopt.” Much of statistical inference involves imagining a variety of models that are consistent with your data and looking to see which ones will help you to achieve your purpose.
We prefer to think in terms of model comparison instead of null hypothesis testing. With too much emphasis on null hypothesis testing, you might think your job is done when you either reject or fail to reject the empty model. But in the modeling tradition, we are always seeking a better model: one that helps us understand the DGP, or one that makes better predictions about future events.