Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
3.1 The Concept of Distribution
-
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
Chapter 3 - Examining Distributions
3.1 The Concept of Distribution
Assuming we have a tidy data set to work with, the next step in data analysis is to begin looking at the variation in your measures. This leads us to one of the most fundamental concepts in statistics, the concept of distribution. Wild (2006) defines the concept of distribution as “the pattern of variation in a variable or set of variables.” A distribution is like a “lens” through which we can view variation in data (figure from Wild, C. (2006).)
Thinking about distributions requires you to think abstractly, at a higher level, about your data. You must shift your thinking from a focus on the individual observations in your data set (e.g., the 20 people you have sampled) to a focus, first, on just one attribute along which the observations vary; and second, to a focus on the pattern of variation in that attribute across the sample.
Note that not just any bunch of numbers can be thought of as a distribution. The numbers must all be measures of the same attribute. So, for example, if you have measures of height and weight on a sample of 20 people, you can’t just lump the height and weight numbers into a single distribution. You can, however, examine the distribution of height and the distribution of weight separately.
Even a small data set includes lots of numbers and lots of variation. The concept of distribution allows us to move beyond the individual data points and think in terms of features that define the distribution as a whole. The concept of distribution allows us to see the whole as greater than the sum of the parts; the forest, and not just the trees.
The features of a forest cannot be seen in a single tree. For example, measuring the height of a single tree does not allow you to see characteristics of the distribution of height across many trees. You can know the height of that one tree, but not the minimum, maximum, or average height of trees in the forest based on a single measurement. Statistics such as the mean do not themselves constitute a distribution; they are features of a distribution, features that don’t apply to individual trees.
The concept of distribution is complex; most people do not understand it all at once. If you find it difficult or vague, don’t worry because that’s normal. Together, we will keep clarifying our understanding and adding more dimensions to this major concept as we progress.