Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
2.3 A Data Frame Example: MindsetMatters
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
2.3 A Data Frame Example: MindsetMatters
The data we looked at on the previous page were selected from a data frame called MindsetMatters
. The full data frame is from a study that investigated the health of 75 female housekeepers from different hotels. You can read more about how these data were collected and organized here: [MindsetMatters R documentation].
A data frame is a kind of object in R, and as with any object, you can just type the name of it to see the whole thing.
Type the name of the data frame MindsetMatters
and then Run.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# Try typing MindsetMatters to see what is in the data frame.
# Try typing MindsetMatters to see what is in the data frame.
MindsetMatters
ex() %>%
check_output_expr("MindsetMatters")
You may need to scroll up to see the whole output. Once you do, you might think to yourself, “Wow, that’s a lot to take in!” This is usually the case when working with real data—there are a whole lot of things in a data set, including a lot of variables and values. And usually we don’t just sample one case (e.g., one housekeeper)—we have a bunch of housekeepers, each with their own values for a bunch of variables. So things get pretty complicated, pretty fast.
It’s always useful to take a quick peek at your data frame. But looking at the whole thing might be a little complicated. So a helpful command is head()
which shows you just the first few rows of a data frame.
Press the <Run> button to see what happens when you run the command head(MindsetMatters)
.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# Run this code to get the first 6 rows of MindsetMatters
head(MindsetMatters)
# Run this code to get the first 6 rows of MindsetMatters
head(MindsetMatters)
ex() %>%
check_function("head") %>%
check_result() %>%
check_equal()
Cond Age Wt Wt2 BMI BMI2 Fat Fat2 WHR WHR2 Syst Syst2 Diast Diast2 Condition
1 0 43 137 137.4 25.1 25.1 31.9 32.8 0.79 0.79 124 118 70 731 Uninformed
2 0 42 150 147.0 29.3 28.7 35.5 NA 0.81 0.81 119 112 80 682 Uninformed
3 0 41 124 124.8 26.9 27.0 35.1 NA 0.84 0.84 108 107 59 653 Uninformed
4 0 40 173 171.4 32.8 32.4 41.9 42.4 1.00 1.00 116 126 71 794 Uninformed
5 0 33 163 160.2 37.9 37.2 41.7 NA 0.86 0.84 113 114 73 784 Uninformed
6 0 24 90 91.8 16.5 16.8 NA NA 0.73 0.73 NA NA 78 764 Uninformed
The head()
function just prints out the first six rows of the data frame as rows and columns.
Sometimes, it’s useful just to get an overview of what’s in the data frame. The function str()
shows us the overall structure of the data frame, including number of observations, number of variables, names of variables and so on. (We often use str()
when first exploring a new data frame, just to see what’s in it.)
Run str()
on MindsetMatters
and look at the results.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# Run this code to see the structure of MindsetMatters
str(MindsetMatters)
# Run this code to see the structure of MindsetMatters
str(MindsetMatters)
ex() %>%
check_function("str") %>%
check_result() %>%
check_equal()
'data.frame': 75 obs. of 15 variables:
$ Cond : int 0 0 0 0 0 0 0 0 0 0 ...
$ Age : int 43 42 41 40 33 24 46 21 29 19 ...
$ Wt : int 137 150 124 173 163 90 150 156 141 123 ...
$ Wt2 : num 137 147 125 171 160 ...
$ BMI : num 25.1 29.3 26.9 32.8 37.9 16.5 27.5 25.9 27.5 19.6 ...
$ BMI2 : num 25.1 28.7 27 32.4 37.2 16.8 27.4 25.7 27.4 19.7 ...
$ Fat : num 31.9 35.5 35.1 41.9 41.7 NA 36.1 36.4 NA 26.6 ...
$ Fat2 : num 32.8 NA NA 42.4 NA NA 37.3 NA NA NA ...
$ WHR : num 0.79 0.81 0.84 1 0.86 0.73 0.9 0.78 0.87 0.69 ...
$ WHR2 : num 0.79 0.81 0.84 1 0.84 0.73 0.9 0.78 0.85 0.69 ...
$ Syst : int 124 119 108 116 113 NA 119 116 110 113 ...
$ Syst2 : int 118 112 107 126 114 NA 115 135 115 117 ...
$ Diast : int 70 80 59 71 73 78 75 67 73 75 ...
$ Diast2 : int 73 68 65 79 78 76 77 65 74 72 ...
$ Condition: Factor w/ 2 levels "Informed","Uninformed": 2 2 2 2 2 2 2 2 2 2 ...
Note that there is a $
in front of each variable name. In R, $
is often used to indicate that what follows is a variable name. If you want to specify the Age
variable in the MindsetMatters
data frame, for example, you would write MindsetMatters$Age
. (R has its own way of categorizing variables, such as int, num, and Factor. You will learn more about these later.)
Try using the $
to print out just the variable Age
from MindsetMatters
.
require(coursekata)
MindsetMatters <- Lock5withR::MindsetMatters %>%
mutate(Condition = factor(Cond, levels = c(1, 0), labels = c("Informed", "Uninformed")))
# Use the $ sign to print out the contents of the Age variable in the MindsetMatters data frame
# Use the $ sign to print out the contents of the Age variable in the MindsetMatters data frame
MindsetMatters$Age
ex() %>%
check_output_expr("MindsetMatters$Age", missing_msg = "Have you used $ to select the Age variable in MindsetMatters?")
That’s a lot of numbers! If you want a more organized list, you can sometimes get that by using the print()
function, like this:
print(MindsetMatters$Age)
You can try adding the print()
function in the window above. When you do you get something like this:
[1] 43 42 41 40 33 24 46 21 29 19 41 33 44 48 38 42 38 46 45 35 30 38 41 54 65
[26] 58 29 45 57 61 38 53 45 62 48 50 40 32 54 24 24 52 34 28 31 29 31 34 26 37
[51] 28 44 26 29 47 27 42 39 27 NA 27 48 39 55 26 29 27 33 29 33 31 24 22 23 38
When R is asked to print out a single variable (such as Age
), R prints out each person’s value on the variable all in a row. When it gets to the end of one row it begins again on the next row. In contrast, when R is asked to print out multiple variables, it uses the rows and columns format, where rows are cases and columns are variables.
If you counted the ages printed on the first row, there are 25 of them. The [26]
indicates that the next row starts with the 26th observation.