Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentHigh School / Advanced Statistics and Data Science I (ABC)
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
10.4 What Counts as Unlikely
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
10.4 What Counts as Unlikely?
All of this, however, begs the question of how extreme a sample
One common standard used in the social sciences is that a sample
counts as unlikely if there is less than a .05 chance of generating one
that extreme (in either the negative or positive direction) from a
particular DGP. We notate this numerical definition of “unlikely” with
the Greek letter
Let’s try setting an alpha level of .05 to the sampling distribution
of
In a two-tailed test, we will reject the empty model of the DGP if
the sample is not in the middle .95 of randomly generated middle()
to fill the middle .95 of
gf_histogram(~b1, data = sdob1, fill = ~middle(b1, .95))
The fill=
part tells R that we want the bars of the
histogram to be filled with particular colors. The ~
tells
R that the fill color should be conditioned on whether the
Here’s what the histogram of the sampling distribution looks like
when you add fill = ~middle(b1, .95)
to
gf_histogram()
.
You might be wondering why some of the bars of the histogram include both red and blue. This is because the data in a histogram is grouped into bins. The value 6.59, for example, is grouped into the same bin as the value 6.68, but while 6.59 falls within the middle .95 (thus colored blue), 6.68 falls just outside the .025 cutoff for the upper tail (and thus is colored red).
If you would like to see a more sharp delineation, you could try making your bins smaller, or to put it another way, making more bins. Doing so would increase the chances of having just one color in each bin.
We re-made the histogram, but this time added the argument
bins = 100
to the code (the default number of bins is 30).
We also added show.legend = FALSE
to get rid of the legend,
and thus provide more space for the plot.
gf_histogram(~b1, data = sdob1, fill = ~middle(b1, .95), bins = 100, show.legend = FALSE)
Increasing the number of bins resulted in each bin being represented
by only one color. But it also created some holes in the histogram,
i.e., empty bins in which none of the sample
Remember, this histogram represents a sampling distribution. All
these
In the actual experiment of course, we only have one sample. If our
actual sample
But it might be the wrong decision. If the empty model is true, .05
of the
What is the Opposite of Unlikely?
We’re going to be interested in whether our sample
To be precise, if the sample falls in the middle .95 of the sampling distribution, it means that the sample is not unlikely. But saying that it is likely is a little bit sloppy, and possibly misleading.
In statistics, even if an event has a probability of .06, we will say it is not unlikely because our definition of unlikely is .05 or lower. But a regular person would not call something with a likelihood of .06 “likely”.
It gets tiring to say not unlikely all the time, and sometimes sentences read a little bit easier if we just say likely. Just remember that when we say likely we usually mean not unlikely. But this is not what normal people mean by the word likely.