Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
2.4 Frequency Tables and Sorting Data Frames
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
2.4 Frequency Tables and Sorting Data Frames
Using tally()
to Create Frequency Tables
We can use the tally()
function to create a frequency table of the Age
variable (in the MindsetMatters
data frame). This will tell us how many housekeepers there were of each age.
tally(MindsetMatters$Age)
We don’t have to use the $
notation. We could also specify the variable and data frame separately, like this:
tally(~ Age, data = MindsetMatters)
Age
19 21 22 23 24 26 27 28 29 30 31 32 33 34 35 37
1 1 1 1 4 3 4 2 6 1 3 1 4 2 1 1
38 39 40 41 42 43 44 45 46 47 48 50 52 53 54 55
5 2 2 3 3 1 2 3 2 1 3 1 1 1 2 1
57 58 61 62 65 <NA>
1 1 1 1 1 1
The rows that start with 19, 38, and 57 represent the ages of the housekeepers and the numbers underneath them represent how many of each age are in the data frame. For example, there is one housekeeper who is 19 years old. There are two housekeepers who are 54 years old. There are three housekeepers who are 45 years old.
Try using the tally function to make a frequency table of housekeepers by Condition
.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# Use tally() with the MindsetMatters data frame to create a frequency table of housekeepers by Condition
# Use tally() with the MindsetMatters data frame to create a frequency table of housekeepers by Condition
tally(~Condition, data = MindsetMatters)
# Another solution
# tally(MindsetMatters$Condition)
ex() %>%
check_function("tally") %>%
check_result() %>%
check_equal()
The output of tally()
shows us that there are 41 housekeepers who were in the Informed condition and 34 in the Uninformed condition. Taking a look at this frequency table, we might wonder why there were slightly more housekeepers who were informed that their daily work of cleaning was equivalent to getting adequate exercise.
Using arrange()
to Sort a Data Frame
Let’s turn our attention to two variables in the MindsetMatters
data frame: Age
(the age of the housekeepers, in years, at the start of the study) and Wt
(their weight, in pounds, at the start of the study).
We might want to sort the whole data frame MindsetMatters
by Age
. But now we can’t use the sort()
function—that only works with vectors, not with data frames. If we want to sort a whole data frame, we will use a different function, arrange()
.
The arrange()
function works similarly to sort()
, except now you have to specify both the name of the data frame and the name of the variable you want to use for sorting the rows.
arrange(MindsetMatters, Age)
Importantly, when you use arrange()
to sort on one variable (e.g., Age
), the order of the rows (which in this case is housekeepers) will change, but the contents of each row will stay the same.
The printout of MindsetMatters
won’t stay arranged by age because we didn’t save our work. In order to save the new ordering, we need to assign the arranged version to an R object. We could assign it back to the existing object (MindsetMatters
) or to a new object (e.g., Mindset2
, MM2
, MM_arrange
or any other name you want to make up). If we assign it to the existing object it will revise what’s in MindsetMatters
to be in the new order. In general, it’s a good practice to save a changed data frame to a new R object in case you want to go back to the original version.
Let’s use the assignment operator (<-
) to assign the arranged data frame to MM_arrange
. See if you can edit the code below to save the version of MindsetMatters
that is arranged by Age
into MM_arrange
. Then print out the first six lines of MM_arrange
using head()
.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# save MindsetMatters, arranged by Age, to MM_arrange
arrange(MindsetMatters, Age)
# write code to print out the first 6 rows of MM_arrange
# save MindsetMatters, arranged by Age, to MM_arrange
MM_arrange <- arrange(MindsetMatters, Age)
# write code to print out the first 6 rows of MM_arrange
head(MM_arrange)
no_save <- "Make sure to both `arrange()` `MindsetMatters` by `Age` *and* save the arranged data frame to `MM_arrange`."
ex() %>% {
check_object(., "MM_arrange") %>% check_equal(incorrect_msg = no_save)
check_function(., "arrange") %>% check_arg("...") %>% check_equal()
check_function(., "head") %>% check_result() %>% check_equal()
}
Cond Age Wt Wt2 BMI BMI2 Fat Fat2 WHR WHR2 Syst Syst2 Diast Diast2 Condition
1 0 19 123 124.2 19.6 19.7 26.6 NA 0.69 0.69 113 117 75 72 Uninformed
2 0 21 156 154.4 25.9 25.7 36.4 NA 0.78 0.78 116 135 67 65 Uninformed
3 1 22 127 124.6 25.6 25.2 34.6 31.6 0.74 0.73 110 103 65 69 Informed
4 1 23 161 161.4 26.8 26.9 38.1 37.1 0.90 0.86 126 101 74 64 Informed
5 0 24 90 91.8 16.5 16.8 NA NA 0.73 0.73 NA NA 78 76 Uninformed
6 1 24 166 169.0 28.5 29.0 41.3 41.1 0.88 0.90 114 123 56 55 Informed
Sorting a Data Frame in Descending Order
The function arrange()
can also be used to arrange values in descending order by adding desc()
around the variable name.
arrange(MindsetMatters, desc(Age))
Try arranging MindsetMatters
by Wt
in descending order. Save this to MM_desc
. Print a few rows of MM_desc
to check out what happened.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# arrange MindsetMatters by Wt in descending order
MM_desc <-
# write code to print out a few rows of MM_desc
# arrange MindsetMatters by Wt in descending order
MM_desc <- arrange(MindsetMatters, desc(Wt))
# write code to print out a few rows of MM_desc
head(MM_desc)
no_save <- "Did you save the arranged data set to `MM_desc`?"
ex() %>% {
check_function(., "desc") %>%
check_arg("x") %>%
check_equal(eval = FALSE)
check_function(., "arrange") %>%
check_arg(".data") %>%
check_equal(eval = FALSE)
check_object(., "MM_desc") %>%
check_equal(incorrect_msg = no_save)
check_function(., "head") %>%
check_arg("x") %>%
check_equal()
}
Cond Age Wt Wt2 BMI BMI2 Fat Fat2 WHR WHR2 Syst Syst2 Diast Diast2 Condition
1 1 34 196 198.2 33.7 33.5 45.7 44.7 0.83 0.81 164 83 73 57 Informed
2 1 39 189 183.2 34.6 34.4 47.0 46.7 0.80 0.77 185 154 99 102 Informed
3 0 65 187 186.2 34.2 34.1 47.3 NA 0.89 NA 176 188 106 83 Uninformed
4 1 29 184 182.8 35.9 35.7 44.4 45.0 0.89 0.89 120 124 75 70 Informed
5 0 38 183 186.4 34.6 35.2 44.2 42.8 NA NA 115 125 70 72 Uninformed
6 0 45 182 180.0 33.8 33.5 NA 45.6 0.85 0.88 145 141 96 84 Uninformed