CourseKata - 12.7 Using the t-Distribution to Construct a Confidence Interval

High School / Advanced Statistics and Data Science I (ABC)

Book

12.7 Using the t-Distribution to Construct a Confidence Interval

Just as we used the t-distribution in the previous chapter to model the sampling distribution of $b_1$ for purposes of calculating a p-value (the approach used by supernova()), we can use it here to calculate a 95% confidence interval.

In the figure below, we replaced the resampled sampling distribution of $b_1$s with one modeled by the smooth t-distribution with its associated standard error. As before, we can mentally move the t-distribution down and up the scale to find the lower and upper bounds of the confidence interval.

The R function that calculates a confidence interval based on the t-distribution is confint().

Here’s the code you can use to directly calculate a 95% confidence interval that uses the t-distribution as a model of the sampling distribution of $b_1$:

confint(lm(Tip ~ Condition, data = TipExperiment))

The confint() function takes as its argument a model, which results from running the lm() function. In this case we simply wrapped the confint() function around the lm() code. You could accomplish the same goal using two lines of code, the first to create the model, and the second to run confint(). Try it in the code block below.

require(coursekata)

# create the condition model of tip and save it as Condition_model
Condition_model <-
confint(Condition_model)

# create the condition model of tip and save it as Condition_model
Condition_model <- lm(Tip ~ Condition, data = TipExperiment)
confint(Condition_model)

ex() %>%
  check_function("lm") %>%
  check_result() %>%
  check_equal()

                         2.5 %   97.5 %
(Intercept)          22.254644 31.74536
ConditionSmiley Face -0.665492 12.75640

As you can see, the confint() function returns the 95% confidence interval for the two parameters we are estimating in the Condition model. The first one, labelled Intercept, is the confidence interval for $\beta_0$, which we remind you is the mean for the Control group. The second line shows us what we want here, which is the confidence interval for $\beta_1$.

Using this method, the 95% confidence interval for $\beta_1$ is from -0.67 to 12.76. Let’s compare this confidence interval to the one we calculated above on the previous page using bootstrapping: 0 to 13. While these two confidence intervals aren’t exactly the same, they are darn close, which gives us a lot of confidence. Even when we use very different methods for constructing the confidence interval, we get very similar results.

Margin of Error

One way to report a confidence interval is to simply say that it goes, for example, from -0.67 to 12.76. But another common way of saying the same thing is to report the best estimate (6.05) plus or minus the margin of error (6.72), which you could write like this: $6.05 \pm 6.72$.

A diagram of beta-sub-1 and the margin of error depicted as a horizontal line. It shows the distance between the best estimate of beta-sub-1, of 6.05, to the lower bound point of negative 0.67 is a distance of negative 6.72. And the distance between the best estimate of beta-sub-1, of 6.05, to the upper bound point of 12.76 is a distance of 6.72.

The margin of error is the distance between the upper bound and the sample estimate. In the case of the tipping experiment this would be $12.76 - 6.05$, or $6.72. If we assume that the sampling distribution is symmetrical, the margin of error will be the same below the parameter estimate as it is above.

We can always calculate the margin of error by using confint() to get the upper bound of the confidence interval and then subtracting the sample estimate. But we can do a rough calculation of margin of error using the empirical rule. According to the empirical rule, 95% of all observations under a normal curve fall within plus or minus 2 standard deviations from the mean.

Applying this rule to the sampling distribution, the picture below shows that the margin of error is approximately equal to two standard errors. If we start with a t-distribution centered at the sample $b_1$, we would need to slide it up about two standard errors to reach the cutoff above which $b_1$ (6.05) falls into the lower .025 tail.

A diagram that shows the distance of negative 6.72 between the best estimate of 6.05 and the lower bound of negative 0.67 is also a distance of about 2 negative standard errors, and the distance of 6.72 between the best estimate of 6.05 and the upper bound of 12.76 is also a distance of about 2 standard errors.

If you have an estimate of standard error, you can simply double it to get the approximate margin of error. If, for example, we use the standard error generated by R (3.33) for the condition model, the margin of error would be twice that, or 6.66. That’s pretty close to the margin of error we calculated from confint(): 6.72.

R uses the Central Limit Theorem to estimate standard error, but we also have other ways of getting the standard error. Using shuffle() to create the sampling distribution resulted in a slightly larger standard error of 3.5. If we double that, we get a margin of error of 7, slightly larger than the 6.66 we got using R’s estimate of standard error. In general, if the standard error is larger, the margin of error will be larger and so will the confidence interval.

12.6 Shuffle, Resample, and Standard Error 12.8 Interpreting the Confidence Interval

Course Outline

High School / Advanced Statistics and Data Science I (ABC)

12.7 Using the t-Distribution to Construct a Confidence Interval

Margin of Error

Responses

list High School / Advanced Statistics and Data Science I (ABC)

12.7 Using the t-Distribution to Construct a Confidence Interval

Margin of Error

High School / Advanced Statistics and Data Science I (ABC)