Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
5.7 DATA = MODEL + ERROR: Notation
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - The Logic of Inference
-
segmentChapter 10 - Model Comparison with F
-
segmentChapter 11 - Parameter Estimation and Confidence Intervals
-
segmentChapter 12 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
5.7 DATA = MODEL + ERROR: Notation
Now let’s see how mathematical notation is used to represent the simple (empty) model we introduced before. We have introduced the overarching concept that DATA = MODEL + ERROR. In our simple model, we are using one number, the mean, to model the distribution of scores.
We could represent this model in a word equation like this:
THUMB DATA = MEAN + ERROR
But there are some real advantages to rewriting this statement in mathematical notation. Here’s one form this notation might take:
This equation literally represents what we were doing with R before. It tells us that each value of Y in our data (
Going back to DATA = MODEL + ERROR, you might also see a version that looks like this:
In our tiny data set, for example, student #1 had a thumb length of 56. So,
As we develop more complex models we still will end up with a single predicted value of
Notation for the General Linear Model
Finally, we complicate things a little more, introducing one more form of our DATA = MODEL + ERROR formulation called the General Linear Model (GLM) notation:
This is a more abstract version of the equation above; we have substituted
For our simple model (the empty model) it represents the mean. But for other models, and other situations, it can represent other values. For example, if our outcome variable were categorical, the interpretation of
Indeed, this flexibility is what makes the General Linear Model general. Whenever you see a GLM model statement, you should think carefully about what, in the particular situation, each symbol represents.