Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
5.10 Summarizing Where We Are
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
5.10 Summarizing Where We Are
Let’s take a moment to think about how far you have already come in this class. As a result of your hard work and perseverance, you’ve gained knowledge of how data in the world can be organized, visualized, and summarized!
Up until this chapter, we used the DATA = MODEL + ERROR idea in a qualitative way. We built on this qualitative approach in this chapter to introduce our first statistical model—the simple (or empty) model, which we represented as DATA = MEAN + ERROR. As soon as we conceptualize a model as a number, then we can be more specific: we can be specific about which number we use for our model, and how to calculate it. And, we can be more specific about the meaning of error, defining it as the gap between our model prediction and an actual observed score (i.e., the residual).
But then we went and added a bunch of notation, which seems to complicate everything. In a sense, it does complicate everything. But in another sense, it simplifies everything, especially as we go forward. There are some key ideas we need to keep straight as we continue to work with models, and notation will help us do that.
Remember: our goal is to use a distribution of data to construct a statistical model of the population distribution.
Data | Population | |
---|---|---|
Model constructed based on data (estimated) | Model we are trying to estimate (unknown) | |
Word equation | Person i’s thumb = sample mean + error | Person i’s thumb = population mean + error |
More specific statement; model is the mean |
\(Y_i=\bar{Y}+e_i\) • \(Y_i\) is person i’s thumb • \(\bar{Y}\) is the sample mean • \(e_i\) is the difference between person i’s thumb length and the sample mean |
\(Y_i=\mu+\epsilon_i\) • \(Y_i\) is person i’s thumb • \(\mu\) is the population mean (unknown) • \(\epsilon_i\) is the difference between person i’s thumb length and the population mean (unknown) |
Most general; can be used for any one-parameter model |
\(Y_i=b_0+e_i\) • Can be used to represent any one-parameter model, estimated from data, not just the mean |
\(Y_i=\beta_0+\epsilon_i\) • Can be used to represent any one-parameter model of the population, not just the mean |