Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
7.3 GLM Notation for the Group Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
7.3 GLM Notation for the Group Model
For most of us humans, we are content to describe the Gender
model simply as two means. But as with the empty model, it will be helpful to learn how a two-group model is represented in the notation of the General Linear Model, especially as we develop more complicated models.
The Gender
Model Using GLM Notation
The full GLM equation for the Gender
model incorporates both
We can also write it in a way more specific to the Gender
model of Thumb
like this:
Using the output from lm()
, we can substitute the estimates into the model.
Call:
lm(formula = Thumb ~ Gender, data = Fingers)
Coefficients:
(Intercept) Gendermale
58.256 6.447
It’s important to notice, first, that both the empty model and the two-group Gender
model start with
For the two-group model, the MODEL part of DATA = MODEL + ERROR is now more complicated:
Note that the Thumb
for the whole sample of data, whereas for the two-group model (with two parameters), it represents the mean of the first group (in this case, female
).
You might find it confusing to use the same symbol to represent two different ideas. But this flexibility is what makes the General Linear Model so powerful and so… general.
Unlike the empty model, this more complicated model (
Interpreting
We have developed the idea that
It turns out we need the Gender
– but in a special way. It is called a dummy variable, which means that R creates it specifically to make the model work.
R takes the variable Gender
and recodes it into a new variable (male
), and it is coded 0 if the student is not in the second group (i.e., not male
).
Although in this data, saying a student is not male is the same as saying the student is female, it’s important to think of
The reason the Intercept
in the lm()
output is because it is the predicted thumb length when Gendermale
(Gender
.