Statistics and Data Science: A Modeling Approach
Chapter 4 - Explaining Variation
4.0 Introduction to Explaining Variation
Examining distributions of single variables is always an important starting place. But as data analysts, our interests usually go beyond exploring patterns of variation in a single variable. We want to explain the variation. In this section we begin thinking about what it means to explain variation.
We can start with an intuitive definition of “explain”: if knowing someone’s score on one variable helps you make a slightly better guess about that person’s score on another variable, then we can say that the first variable explains some variation in the second variable.
For example, if we knew someone’s sex, could that help us make a better prediction of their height? You probably already have a sense that males are taller, on average, than females. If we knew that someone was male, even without meeting them, we would predict that they would be taller than if we knew they were female.
This is what we mean when we say sex explains some of the variation in height. It doesn’t explain all the variation because some females are taller than some males. But it does explain some of the variation.
Explaining variation could help us in three ways: it helps us understand what causes the variation in a variable; it helps us predict future observations; or, it helps us change the system we are studying to produce different outcomes.
In this chapter we develop some informal methods for representing and exploring relationships among variables. We start by graphing relationships between two variables, looking for evidence that one variable explains variation in another, and representing these relationships with word equations. (In the next chapters we will introduce more quantitative methods for explaining variation using the concept of statistical model.)