list

# Statistics and Data Science: A Modeling Approach

## 9.5 Notation and Terminology

We should pause for a second and introduce some new terminology and notation that applies especially to sampling distributions. Recall that earlier in the course we discussed the importance of notation for keeping straight the distinction between the distribution of data (that is, a sample) and the distribution of the population (that is, the long-run result of a DGP). So, for example, $$\bar{Y}$$ represents the mean of a sample of data, whereas $$\mu$$ represents the mean of the population.

Sampling distributions are a third kind of distribution—completing our Distribution Triad—and so we need notation to specifically indicate when we are talking about a sampling distribution. Let’s fill out our notational toolbox, therefore, to include sampling distributions.

Sample / Data Population / DGP Sampling Distribution of Means
Mean $$\bar{Y}$$ $$\mu$$ (mu) $$\mu_\bar{Y}$$ (mu sub y-bar)
Standard Deviation $$s$$ $$\sigma$$ (sigma) $$\sigma_\bar{Y}$$ (sigma sub y-bar)
Model statement Estimated from sample
$$Y_i=b_0+e_i$$
Parameters being estimated
$$Y_i=\beta_0+\epsilon_i$$
Distribution of estimates
A lot of $$b_0$$s

Also note: we use a special word to refer to the standard deviation of the sampling distribution of the mean: the standard error, or standard error of the mean.

We can also have sampling distributions of other estimates besides the mean. For example, we could have a sampling distribution of standard deviations or sampling distributions of SS, PRE, F or any other statistic.

In general, Greek letters (e.g., $$\mu$$ or $$\sigma$$) are used to describe parameters that are unknown and estimated. The population mean ($$\mu$$) is unknown, for example, and so represented with a Greek letter. Things we calculate based on samples are generally represented with Roman letters. So, $$\bar{Y}$$ is the mean of a sample of data.

Sampling distributions are unknown (imaginary, in fact) and so the mean and standard deviation of a sampling distribution are represented with Greek letters. But the subscript (for example, $${\sigma_{\bar{Y}}}$$) represents the statistic that the sampling distribution is made out of. Because it’s a statistic, it is represented with a Roman letter. For example, if we analyze the notation for standard error ($${\sigma_{\bar{Y}}}$$), the $$\sigma$$ represents the standard deviation of this distribution and the tiny $$\bar{Y}$$ represents that the distribution is made up of sample means.