Course Outline

segmentGetting Started (Don't Skip This Part)

segmentStatistics and Data Science: A Modeling Approach

segmentPART I: EXPLORING VARIATION

segmentChapter 1  Welcome to Statistics: A Modeling Approach

segmentChapter 2  Understanding Data

segmentChapter 3  Examining Distributions

segmentChapter 4  Explaining Variation

segmentPART II: MODELING VARIATION

segmentChapter 5  A Simple Model

segmentChapter 6  Quantifying Error

segmentChapter 7  Adding an Explanatory Variable to the Model

segmentChapter 8  Digging Deeper into Group Models

segmentChapter 9  Models with a Quantitative Explanatory Variable

segmentPART III: EVALUATING MODELS

segmentChapter 10  The Logic of Inference

10.6 Calculating the pValue for a Sample

segmentChapter 11  Model Comparison with F

segmentChapter 12  Parameter Estimation and Confidence Intervals

segmentChapter 13  What You Have Learned

segmentFinishing Up (Don't Skip This Part!)

segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
10.6 Calculating the PValue for a Sample
To calculate the probability of getting a \(b_1\) within a particular region (e.g., greater than 6.05 and less than 6.05) we can simply calculate the proportion of \(b_1\)s in the sampling distribution that fall within these regions. In this way, we are using the simulated sampling distribution of 1000 \(b_1\)s as a probability distribution.
We can use tally()
to figure out how many simulated samples are more extreme than our sample \(b_1\). The first line of code will tell us how many \(b_1\)s are more extreme on the positive side than our sample_b1
(6.05), the second line, how many are more extreme than our sample on the negative side (6.05).
tally(~ b1 > sample_b1, data = sdob1)
tally(~ b1 < sample_b1, data = sdob1)
(Coding note: R thinks of <
with no space between the two characters as an assignment operator; it’s supposed to look like an arrow. For the second line of code, you need to be sure to put a space between the <
and 
so R interprets it to mean less than the negative of sample_b1
.)
The two lines of tally()
code will produce:
b1 > sample_b1
TRUE FALSE
38 962
b1 < sample_b1
TRUE FALSE
41 959
When we add up the two tails (the extreme positive \(b_1\)s and negative \(b_1\)s), there are about 80 \(b_1\)s that are more extreme than our sample \(b_1\).
Since there are about 80 randomly generated \(b_1\)s (out of a 1000) that are more extreme than our sample, we would say there is roughly a .08 likelihood of the empty model generating a sample as extreme as 6.05. This probability is the pvalue.
tally(~ b1 > sample_b1, data = sdob1)
tally(~ b1 < sample_b1, data = sdob1)
Instead of using two lines of code  one to find the number of \(b_1\)s at the upper extreme, the other at the lower extreme  we can use a single line of code like this:
tally(sdob1$b1 > sample_b1  sdob1$b1 < sample_b1)
Note the use of the 
operator, which means or, to put the two criteria together: this code tallies up the total number of \(b_1\)s that are either greater than positive 6.05 or less than negative 6.05. You can run the code in the code window below. Try adding the argument format = "proportion"
to get the proportion or pvalue directly.
require(coursekata)
# this saves the sample b1 and creates a sampling distribution using shuffle
sample_b1 < b1(Tip ~ Condition, data = TipExperiment)
sdob1 < do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
# change the code below to calculate the *proportion* of b1s
# as extreme (positive or negative) as the sample b1
tally(sdob1$b1 > sample_b1  sdob1$b1 < sample_b1)
# this saves the sample b1 and creates a sampling distribution using shuffle
sample_b1 < b1(Tip ~ Condition, data = TipExperiment)
sdob1 < do(1000) * b1(shuffle(Tip) ~ Condition, data = TipExperiment)
# change the code below to calculate the *proportion* of b1s
# as extreme (positive or negative) as the sample b1
tally(sdob1$b1 > sample_b1  sdob1$b1 < sample_b1, format="proportion")
ex() %>%
check_function("tally") %>% {
check_arg(., "format") %>% check_equal()
check_result(.) %>% check_equal()
}
The pvalue for the \(b_1\) in the tipping experiment was .08, which is greater than our alpha of .05. Therefore, we would say our sample is not unlikely to have been generated by this DGP. Thus, we would consider the empty model a plausible model of the DGP and therefore not reject the empty model. Even a DGP where there is no effect of smiley face can produce a \(b_1\) that is as extreme as our sample or more extreme about .08 of the time.
If our pvalue had been less than .05, we might have declared our sample unlikely to have been generated by the empty model of the DGP, and thus rejected the empty model.
What It Means to Reject – or Not – the Empty Model (or Null Hypothesis)
The concept of pvalue, and using it to decide whether or not to reject the empty model in favor of the more complex model we have fit to the data, comes from a tradition known as Null Hypothesis Significance Testing (NHST). The null hypothesis is, in fact, the same as what we call the empty model. It refers to a world in which \(\beta_1=0\).
While we want you to understand the logic of NHST, we also want you to be thoughtful in your interpretation of the pvalue. The NHST tradition has been criticized lately because it often is applied thoughtlessly, in a highly ritualized manner (download NHST article by Gigerenzer, Krauss, & Vitouch, 2004 (PDF, 286KB)). People who don’t really understand what the pvalue means may draw erroneous conclusions.
For example, we just decided, based on a pvalue of .08, to not reject the empty model of Tip
. But does this mean that \(\beta_1\) in the true DGP is actually equal to 0? No. This means it could be 0, and the data are consistent with it being 0. But it could be something else instead.
It could, for example, be 6.05, which was the bestfitting estimate of \(\beta_1\) based on the sample data. If the true \(\beta_1\) were equal to 6.05, we could be sure that 6.05 would be one of the many possible \(b_1\)s that would be considered likely.
If both the empty model and the complex “bestfitting” model are possible true models of the DGP, how should we decide which model to use?
Some people from the null hypothesis testing tradition would say that if you cannot reject the empty model then you should use the empty model. From this perspective, we should avoid Type I error at all costs; we don’t want to say there is an effect of smiley face when there is not one in the DGP. In this tradition, it is worse to make a Type I error than a Type II error, to say there is no effect when there is, in fact, an effect in the DGP.
But this might not be the best course of action in some situations. For example, if you simply want to make better predictions, you might decide to use the complex model, even if you cannot rule out the empty model. On the other hand, if your goal is to better understand the DGP, there is some value in having the simplest theory that is consistent with your data. Scientists call this preference for simplicity “parsimony.”
Judd, McClelland, and Ryan (some statisticians we greatly admire) once said that you just have to decide whether a model is “better enough to adopt.” Much of statistical inference involves imagining a variety of models that are consistent with your data and looking to see which ones will help you to achieve your purpose.
We prefer to think in terms of model comparison instead of null hypothesis testing. With too much emphasis on null hypothesis testing, you might think your job is done when you either reject or fail to reject the empty model. But in the modeling tradition, we are always seeking a better model: one that helps us understand the DGP, or one that makes better predictions about future events.