Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

12.2 Thinking With Sampling Distributions

Up to now, we have centered all of our thinking with sampling distributions around the empty model. In Chapters 9 and 10, we always started by assuming that \(\beta_1\) is 0 and then went on to make sampling distributions based on this assumption. In this chapter we will move beyond the empty model and consider other models that could have produced the sample \(b_1\).

Our basic strategy is illustrated in the animated gif below. We start with the same sampling distribution we constructed based on the empty model. But then, using our hypothetical thinking skills, we mentally move the sampling distribution up and down along the number line, imagining different possible values of \(\beta_1\).

The same three-layered diagram of beta-sub-1, the sampling distribution of b1, and the sample b1 that appears earlier on the page; however, the histogram of the sampling distribution is animated to slide horizontally, left and right. As it slides, the horizontal red dashed line that runs through the center is connected to a small red box with a question mark inside, rather than a zero as before, that appears above the top line in the area representing the unknown beta-sub-1. The red box moves with the sliding histogram to show that as the possible beta-sub-1 changes, the probability of getting a sample b1 of 6.05 also changes. If it slides too far to the left or right, the sample b1 no longer falls within the middle 95 percent of likely samples.

As we begin thinking about alternative models of the DGP, we will assume that the shape and spread of the sampling distribution stays constant across different hypothesized values of \(\beta_1\). By making these assumptions, it makes it possible for us to use a sampling distribution created based on one particular DGP (e.g., the empty model) for other DGPs up and down the scale. Later we will provide more justification for this assumption, but for now just go with us!

As we mentally move the sampling distribution up and down the measurement scale we consider different possible values of \(\beta_1\). For each of these possible values we ask the same question we asked using the sampling distribution centered at a \(\beta_1\) of 0: Given the new hypothesized value of \(\beta_1\), is such a DGP likely to generate our sample \(b_1\)?

Let us show you what we mean. In the figure below we have moved the sampling distribution we constructed based on the empty model for the tipping study up (to the right) until it is centered at a DGP where \(\beta_1=6.05\). We now pose the question, “If the true \(\beta_1\) is 6.05, is our sample \(b_1\) of 6.05 likely?

The same three-layered diagram of beta-sub-1, the sampling distribution of b1, and the sample b1 that appears earlier on the page; however, there are two histograms of potential sampling distributions. They are partially overlapping, and one is slightly more transparent to distinguish them. The more transparent histogram represents a possible DGP where beta-sub-1 equals 0, so the sampling distribution is also centered at zero. The other histogram represents a possible DGP where beta-sub-1 equals 6.05, so the sampling distribution is also centered at 6.05. The sample b1 that is plotted as a dot on the bottom line falls right at the center of this distribution as well.

On the left, the same three-layered diagram of beta-sub-1, the sampling distribution of b1, and the sample b1 that appears earlier on the page. The beta-sub-1 is set to zero, so the sampling distribution is also centered at zero. The sample b1 of 6.05 falls near the upper tail but still within the middle 95 percent of samples. On the right, the same three-layered diagram of the beta-sub-1, the sampling distribution of b1, and the sample b1 that appears to the left, but the beta-sub-1 is set to 6.05, so the sampling distribution is also centered at 6.05. The sample b1 of 6.05 falls right in the middle of the distribution as well.

We saw before that a DGP in which \(\beta_1=0\) could produce the observed sample \(b_1\) of 6.05. That was our reason for not rejecting the empty model. But that does not mean the true \(\beta_1\) in the DGP is actually 0. The pictures above show it’s also possible that the true \(\beta_1\) is 6.05! And 6.05 was, after all, the best-fitting estimate of \(\beta_1\) based on the data.

From our musings so far, we can see that \(\beta_1\) could be 0 or it could be 6.05. But these are just two of the many possible DGPs that could have produced the sample estimate of 6.05. Once we start imagining different possible DGPs, and the sampling distributions each would generate, we will see more and more possibilities.

But using this strategy, we can also rule out some possibilities. There are values of \(\beta_1\) that are not likely to produce the sample estimate. Imagine a DGP with a \(\beta_1\) a lot larger than 6.05; for example, a world where the true difference between groups is 15.00 percentage points. To represent this world, we could slide the DGP as well as its corresponding sampling distribution further to the right (see the picture below).

Such a DGP could produce a variety of samples. But notice that the sample \(b_1\) of 6.05 is no longer in the middle .95 region – now it’s in the lower unlikely tail. We could say, therefore, that a DGP with \(\beta_1=15.00\) is unlikely to have generated the sample \(b_1\) because 6.05 is much lower than most of the \(b_1\)s generated by this DGP.

The same three-layered diagram of beta-sub-1, the sampling distribution of b1, and the sample b1 that appears earlier on the page; however, there are two histograms of potential sampling distributions. They are side-by-side, not overlapping, and the one on the left is slightly more transparent. The more transparent histogram represents a possible DGP where the sampling distribution is centered at 6.05, and the sample b1 of 6.05 falls right in the center as well. The histogram on the right represents a possible DGP where beta-sub-1 equals 15, so the sampling distribution is also centered at 15. The sample b1 of 6.05 that is plotted as a dot on the bottom line does not fall within the boundaries of this sampling distribution.

By the same logic, if we slide the sampling distribution far down to the left (as in the figure below), we can see that it is unlikely that the \(b_1\) of 6.05 came from a DGP with a \(\beta_1\) as low as -2.00. By sliding the sampling distribution left and right, we can begin to see the range of possible \(\beta_1\)s that could have generated our sample \(b_1\).

The same three-layered diagram of beta-sub-1, the sampling distribution of b1, and the sample b1 that appears earlier on the page; however, there are two histograms of potential sampling distributions. They are side-by-side, not overlapping, and the one on the right is slightly more transparent. The more transparent histogram represents a possible DGP where the sampling distribution is centered at 6.05, and the sample b1 of 6.05 falls right in the center as well. The histogram on the left represents a possible DGP where beta-sub-1 equals negative 2, so the sampling distribution is also centered at negative 2. The sample b1 of 6.05 that is plotted as a dot on the bottom line does not fall within the boundaries of this sampling distribution.

Responses