Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
12.7 Using the t-Distribution to Construct a Confidence Interval
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
12.7 Using the t-Distribution to Construct a Confidence Interval
Just as we used the t-distribution in the previous chapter to model the sampling distribution of supernova()
), we can use it here to calculate a 95% confidence interval.
In the figure below, we replaced the resampled sampling distribution of
The R function that calculates a confidence interval based on the t-distribution is confint()
.
Here’s the code you can use to directly calculate a 95% confidence interval that uses the t-distribution as a model of the sampling distribution of
confint(lm(Tip ~ Condition, data = TipExperiment))
The confint()
function takes as its argument a model, which results from running the lm()
function. In this case we simply wrapped the confint()
function around the lm()
code. You could accomplish the same goal using two lines of code, the first to create the model, and the second to run confint()
. Try it in the code block below.
2.5 % 97.5 %
(Intercept) 22.254644 31.74536
ConditionSmiley Face -0.665492 12.75640
As you can see, the confint()
function returns the 95% confidence interval for the two parameters we are estimating in the Condition
model. The first one, labelled Intercept, is the confidence interval for Control
group. The second line shows us what we want here, which is the confidence interval for
Using this method, the 95% confidence interval for
Margin of Error
One way to report a confidence interval is to simply say that it goes, for example, from -0.67 to 12.76. But another common way of saying the same thing is to report the best estimate (6.05) plus or minus the margin of error (6.72), which you could write like this:
The margin of error is the distance between the upper bound and the sample estimate. In the case of the tipping experiment this would be
We can always calculate the margin of error by using confint()
to get the upper bound of the confidence interval and then subtracting the sample estimate. But we can do a rough calculation of margin of error using the empirical rule. According to the empirical rule, 95% of all observations under a normal curve fall within plus or minus 2 standard deviations from the mean.
Applying this rule to the sampling distribution, the picture below shows that the margin of error is approximately equal to two standard errors. If we start with a t-distribution centered at the sample
If you have an estimate of standard error, you can simply double it to get the approximate margin of error. If, for example, we use the standard error generated by R (3.33) for the condition model, the margin of error would be twice that, or 6.66. That’s pretty close to the margin of error we calculated from confint()
: 6.72.
R uses the Central Limit Theorem to estimate standard error, but we also have other ways of getting the standard error. Using shuffle()
to create the sampling distribution resulted in a slightly larger standard error of 3.5. If we double that, we get a margin of error of 7, slightly larger than the 6.66 we got using R’s estimate of standard error. In general, if the standard error is larger, the margin of error will be larger and so will the confidence interval.