Course Outline

list High School / Advanced Statistics and Data Science I (ABC)

Book
  • High School / Advanced Statistics and Data Science I (ABC)
  • High School / Statistics and Data Science I (AB)
  • High School / Statistics and Data Science II (XCD)
  • High School / Algebra + Data Science (G)
  • College / Introductory Statistics with R (ABC)
  • College / Advanced Statistics with R (ABCD)
  • College / Accelerated Statistics with R (XCD)
  • CKHub: Jupyter made easy

2.4 Frequency Tables and Sorting Data Frames

Using tally() to Create Frequency Tables

We can use the tally() function to create a frequency table of the Age variable (in the MindsetMatters data frame). This will tell us how many housekeepers there were of each age.

tally(MindsetMatters$Age)

We don’t have to use the $ notation. We could also specify the variable and data frame separately, like this:

tally(~ Age, data = MindsetMatters)
Age
  19   21   22   23   24   26   27   28   29   30   31   32   33   34   35   37
   1    1    1    1    4    3    4    2    6    1    3    1    4    2    1    1
  38   39   40   41   42   43   44   45   46   47   48   50   52   53   54   55
   5    2    2    3    3    1    2    3    2    1    3    1    1    1    2    1
  57   58   61   62   65 <NA>
   1    1    1    1    1    1

The rows that start with 19, 38, and 57 represent the ages of the housekeepers and the numbers underneath them represent how many of each age are in the data frame. For example, there is one housekeeper who is 19 years old. There are two housekeepers who are 54 years old. There are three housekeepers who are 45 years old.

Try using the tally function to make a frequency table of housekeepers by Condition.

require(coursekata) MindsetMatters <- Lock5withR::MindsetMatters %>% mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed"))) # Use tally() with the MindsetMatters data frame to create a frequency table of housekeepers by Condition # Use tally() with the MindsetMatters data frame to create a frequency table of housekeepers by Condition tally(~Condition, data = MindsetMatters) # Another solution # tally(MindsetMatters$Condition) ex() %>% check_function("tally") %>% check_result() %>% check_equal()

The output of tally() shows us that there are 41 housekeepers who were in the Informed condition and 34 in the Uninformed condition. Taking a look at this frequency table, we might wonder why there were slightly more housekeepers who were informed that their daily work of cleaning was equivalent to getting adequate exercise.

Using arrange() to Sort a Data Frame

Let’s turn our attention to two variables in the MindsetMatters data frame: Age (the age of the housekeepers, in years, at the start of the study) and Wt (their weight, in pounds, at the start of the study).

We might want to sort the whole data frame MindsetMatters by Age. But now we can’t use the sort() function—that only works with vectors, not with data frames. If we want to sort a whole data frame, we will use a different function, arrange().

The arrange() function works similarly to sort(), except now you have to specify both the name of the data frame and the name of the variable you want to use for sorting the rows.

arrange(MindsetMatters, Age)

Importantly, when you use arrange() to sort on one variable (e.g., Age), the order of the rows (which in this case is housekeepers) will change, but the contents of each row will stay the same.

The printout of MindsetMatters won’t stay arranged by age because we didn’t save our work. In order to save the new ordering, we need to assign the arranged version to an R object. We could assign it back to the existing object (MindsetMatters) or to a new object (e.g., Mindset2, MM2, MM_arrange or any other name you want to make up). If we assign it to the existing object it will revise what’s in MindsetMatters to be in the new order. In general, it’s a good practice to save a changed data frame to a new R object in case you want to go back to the original version.

Let’s use the assignment operator (<-) to assign the arranged data frame to MM_arrange. See if you can edit the code below to save the version of MindsetMatters that is arranged by Age into MM_arrange. Then print out the first six lines of MM_arrange using head().

require(coursekata) MindsetMatters <- Lock5withR::MindsetMatters %>% mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed"))) # save MindsetMatters, arranged by Age, to MM_arrange arrange(MindsetMatters, Age) # write code to print out the first 6 rows of MM_arrange # save MindsetMatters, arranged by Age, to MM_arrange MM_arrange <- arrange(MindsetMatters, Age) # write code to print out the first 6 rows of MM_arrange head(MM_arrange) no_save <- "Make sure to both `arrange()` `MindsetMatters` by `Age` *and* save the arranged data frame to `MM_arrange`." ex() %>% { check_object(., "MM_arrange") %>% check_equal(incorrect_msg = no_save) check_function(., "arrange") %>% check_arg("...") %>% check_equal() check_function(., "head") %>% check_result() %>% check_equal() }
  Cond Age  Wt   Wt2  BMI BMI2  Fat Fat2  WHR WHR2 Syst Syst2 Diast Diast2  Condition
1    0  19 123 124.2 19.6 19.7 26.6   NA 0.69 0.69  113   117    75     72 Uninformed
2    0  21 156 154.4 25.9 25.7 36.4   NA 0.78 0.78  116   135    67     65 Uninformed
3    1  22 127 124.6 25.6 25.2 34.6 31.6 0.74 0.73  110   103    65     69   Informed
4    1  23 161 161.4 26.8 26.9 38.1 37.1 0.90 0.86  126   101    74     64   Informed
5    0  24  90  91.8 16.5 16.8   NA   NA 0.73 0.73   NA    NA    78     76 Uninformed
6    1  24 166 169.0 28.5 29.0 41.3 41.1 0.88 0.90  114   123    56     55   Informed

Sorting a Data Frame in Descending Order

The function arrange() can also be used to arrange values in descending order by adding desc() around the variable name.

arrange(MindsetMatters, desc(Age))

Try arranging MindsetMatters by Wt in descending order. Save this to MM_desc. Print a few rows of MM_desc to check out what happened.

require(coursekata) MindsetMatters <- Lock5withR::MindsetMatters %>% mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed"))) # arrange MindsetMatters by Wt in descending order MM_desc <- # write code to print out a few rows of MM_desc # arrange MindsetMatters by Wt in descending order MM_desc <- arrange(MindsetMatters, desc(Wt)) # write code to print out a few rows of MM_desc head(MM_desc) no_save <- "Did you save the arranged data set to `MM_desc`?" ex() %>% { check_function(., "desc") %>% check_arg("x") %>% check_equal(eval = FALSE) check_function(., "arrange") %>% check_arg(".data") %>% check_equal(eval = FALSE) check_object(., "MM_desc") %>% check_equal(incorrect_msg = no_save) check_function(., "head") %>% check_arg("x") %>% check_equal() }
  Cond Age  Wt   Wt2  BMI BMI2  Fat Fat2  WHR WHR2 Syst Syst2 Diast Diast2  Condition
1    1  34 196 198.2 33.7 33.5 45.7 44.7 0.83 0.81  164    83    73     57   Informed
2    1  39 189 183.2 34.6 34.4 47.0 46.7 0.80 0.77  185   154    99    102   Informed
3    0  65 187 186.2 34.2 34.1 47.3   NA 0.89   NA  176   188   106     83 Uninformed
4    1  29 184 182.8 35.9 35.7 44.4 45.0 0.89 0.89  120   124    75     70   Informed
5    0  38 183 186.4 34.6 35.2 44.2 42.8   NA   NA  115   125    70     72 Uninformed
6    0  45 182 180.0 33.8 33.5   NA 45.6 0.85 0.88  145   141    96     84 Uninformed

Responses