Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
3.8 Boxplots and the Five-Number Summary
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 9 - The Logic of Inference
-
segmentChapter 10 - Model Comparison with F
-
segmentChapter 11 - Parameter Estimation and Confidence Intervals
-
segmentChapter 12 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
3.8 Boxplots and the Five-Number Summary
Boxplots are a handy tool for visualizing the five-number summary of a distribution. Making boxplots with the function gf_boxplot()
will also clearly show you the IQR and outliers. Very handy.
Unlike histograms, where the values of the variable went on the x-axis, the boxplots made with gf_boxplot()
put the values of the variable on the y-axis. Boxplots do not have to be made this way; this is just the way it is done by gf_boxplot()
.
Here is the code for making a boxplot of Wt
from MindsetMatters
with gf_boxplot()
.
gf_boxplot(Wt ~ 1, data = MindsetMatters)
The 1 just means that there is only going to be one boxplot here. Later we will replace that as we explore methods of making multiple boxplots that appear next to each other.
The boxplot is made up of a few parts. There is a big white box with two parts–an upper and lower part. There are lines, called whiskers, above and below the box. Another name for boxplot is box-and-whisker plot.
This is a case where there are no outliers (defined as more than 1.5 IQRs above Q3 or below Q1). So the whiskers will simply end at the max and min values for Wt
.
Modify this code to create a boxplot for Population
from the HappyPlanetIndex
data frame.
Wow, this is a strange-looking boxplot. You can hardly see the box—it’s squished down on the bottom. And there are all these points here, even though it’s supposed to be depicting a box-and-whisker plot.
The points that appear on a boxplot are the outliers. If they appear above the top whisker, they are outliers because R has checked whether these values are greater than the
There are a lot of large outlier countries. No wonder the histogram we looked at before put so many countries into the same bin! It looks as though most countries are at 0 millions. If only we could “zoom in” on these countries with a smaller population.
In the following code window, use filter()
to get just the countries with populations smaller than this upper boundary. Save these countries in a data frame called SmallerCountries
. Run the code to see a histogram of those Population
data.
Ah, this is a very different histogram than the one that included outliers. Here we get a sense of how the countries that previously got lumped together in one bin actually vary in their population size.
Let’s re-run the boxplot for just these countries in the data frame SmallerCountries
to see what that looks like. Just press the <Run> button.