Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

Chapter 4 - Explaining Variation

4.1 Introduction to Explaining Variation

Examining distributions of single variables is always an important starting place. But as data analysts, our interests usually go beyond exploring patterns of variation in a single variable. We want to explain the variation. In this section we begin thinking about what it means to explain variation.

We can start with an intuitive definition of “explain”: if knowing someone’s score on one variable helps you make a slightly better guess about that person’s score on another variable, then we can say that the first variable explains some variation in the second variable.

For example, if we knew someone’s sex, could that help us make a better prediction of their height? You probably already have a sense that males are taller, on average, than females. If we knew that someone was male, even without meeting them, we would predict that they would be taller than if we knew they were female.

This is what we mean when we say sex explains some of the variation in height. It doesn’t explain all the variation because some females are taller than some males. But it does explain some of the variation.

Explaining variation could help us in three ways: it helps us understand what causes the variation in a variable; it helps us predict future observations; and, it can help us change the system we are studying to produce different outcomes.

In this chapter we develop some informal methods for representing and exploring relationships among variables. We start by graphing relationships between two variables, looking for evidence that one variable explains variation in another, and representing these relationships with word equations. (In the next chapters we will introduce more quantitative methods for explaining variation using the concept of statistical model.)

Note that we are now getting into the real meat of statistical analysis: Using variation in one variable to explain variation in an outcome variable. This is where you start learning how your theories about the world can be supported by data, or not. Although this chapter is longer than the others so far, take your time going through it. Because the concepts are important, the effort and hard work you put into this chapter will pay off later as you learn how to create and test statistical models of the world!

Responses