Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
1.5 Save Your Work In R Objects
Have you ever had an experience where you have forgotten to save your work? It’s a terrible feeling. Saving your work is also important in R. In R, we don’t just do calculations and look at the results on the R console. We usually save the results of the calculations somewhere we can find them later.
Pretty much anything, including the results of any R function, can be saved in an R object. This is accomplished by using an assignment operator, which looks kind of like an arrow (<-
). You can make up any name you want for an R object. Most combinations of upper case letters, lower case letters, numbers, or even a period or underscore can be used in the names of R objects, so long as you start the name with a letter.
Here’s a simple example to show how it’s done. Let’s make up a name for an R object; we will call it my_favorite_number
. Then let’s think of what our favorite number is (say, 20), and save it in the R object. Go ahead and run the code below to see how this works.
# This code will assign the number 20 to the R object my_favorite_number
my_favorite_number <- 20
# you can revise the code to use your actual favorite number (if it's not 20).
# This code will assign the number 20 to the R object my_favorite_number
my_favorite_number <- 20
# you can revise the code to use your actual favorite number (if it's not 20).
ex() %>% {
check_object(., "my_favorite_number")
}
Notice that after you run the code my_favorite_number <- 20
nothing happens. That’s because you saved the number 20 in my_favorite_number
, but you didn’t tell R to print it out. Go back and add this line of code to the window above, then run it again:
my_favorite_number
Now it not only saves your favorite number, but prints it out. Notice that you don’t need to use the print()
function to print the contents of an R object; you can just type the name of the object.
Now remember, R is case sensitive. Try assigning 5 to num
and 10 to NUM
.
# Assign 5 to num and 10 to NUM
num <-
NUM <-
# Write the name of the object that contains 10 and then press the <Run> button
# Doing so prints out the contents of that object
# Assign 5 to num and 10 to NUM
num <- 5
NUM <- 10
NUM
# Write the name of the object that contains 10 and then press the <Run> button
# Doing so prints out the contents of that object
msg_undefined <- "Make sure to define both variables: num and NUM."
msg_incorrect <- "Make sure you assign the correct value to each variable."
msg_not_print <- "Don't forget to print out the object that contains 10."
ex() %>% {
check_object(., 'num', msg_undefined) %>% check_equal(msg_incorrect)
check_object(., 'NUM', msg_undefined) %>% check_equal(msg_incorrect)
check_output_expr(., "NUM", missing_msg = msg_not_print)
}
NOTE: When you save an R object in one of the code windows it will only be saved until you leave the page. If you re-load the page later it won’t be there.
Vectors
We’ve used R objects so far to store a single number. But in statistics we are dealing with variation, which by definition means more than one—and sometimes many—numbers. An R object can also store a whole set of numbers, called a vector. You can think of a vector as a list of numbers (or values).
The R function c()
can be used to combine a list of individual values into a vector. You could think of the “c” as standing for “combine.” So in the following code we have created two vectors (we just named them my_vector
and my_vector_2
) and put a list of values into each vector.
# Here is the code to create two vectors my_vector and my_vector_2. We just made up those names.
# Run the code and see what happens
my_vector <- c(1,2,3,4,5)
my_vector_2 <- c(10,10,10,10,10)
# Now write some code to print out these two vectors in the R console. Run the code and see what happens.
# Run the code and see what happens
my_vector <- c(1,2,3,4,5)
my_vector_2 <- c(10,10,10,10,10)
# Now write some code to print out these two vectors in the R console. Run the code and see what happens.
my_vector # or print(my_vector)
my_vector_2 # or print(my_vector_2)
ex() %>% {
check_object(., 'my_vector')
check_object(., 'my_vector_2')
check_output_expr(., "my_vector")
check_output_expr(., "my_vector_2")
}
If you ask R to perform an operation on a vector, it will assume that you want to work with the whole vector, not just one of the numbers.
So if you want to multiply each number in my_vector
by 100, then you can just write my_vector * 100
. Try it in the code window below.
my_vector <- c(1, 2, 3, 4, 5)
# write code to multiply each number in my_vector by 100
my_vector <- c(1, 2, 3, 4, 5)
# write code to multiply each number in my_vector by 100
my_vector * 100
ex() %>% {
check_object(., "my_vector") %>% check_equal()
check_operator(., "*") %>% check_result() %>% check_equal()
}
Notice that when you do a calculation with a vector, you’ll get a vector of numbers as the answer, not just a single number.
After you multiply my_vector
by 100, what will happen if you print out my_vector
? Will you get the original vector (1,2,3,4,5), or one that has the hundreds (100,200,300,400,500)? Try running this code to see what happens.
# Run the code below to see what happens
my_vector <- c(1,2,3,4,5)
my_vector * 100
# This will print out my_vector
my_vector
# Run the code below to see what happens
my_vector <- c(1,2,3,4,5)
my_vector * 100
# This will print out my_vector
my_vector
ex() %>% {
check_object(., "my_vector") %>% check_equal(incorrect_msg = "Make sure not to change the contents of my_vector")
check_operator(., "*") %>% check_result() %>% check_equal(incorrect_msg = "Make sure to keep the line my_vector * 100")
check_output_expr(., "my_vector", missing_msg = "Did you print my_vector?")
}
Remember, R will do the calculations, but if you want something saved, you have to assign it somewhere. Try writing some code to compute my_vector * 100
and then assign the result back into my_vector
. If you do this, it will replace the old contents of my_vector
with the new contents (i.e., the product of my_vector
and 100).
require(coursekata)
my_vector <- c(1,2,3,4,5)
# This creates `my_vector` and stores 1, 2, 3, 4, 5 in it
my_vector <- c(1,2,3,4,5)
# Now write code to save `my_vector * 100` back into `my_vector`
my_vector <-
# This creates `my_vector` and stores 1, 2, 3, 4, 5 in it
my_vector <- c(1,2,3,4,5)
# Now write code to save `my_vector * 100` back into `my_vector`
my_vector <- my_vector * 100
ex() %>% {
check_operator(., "*") %>% check_result() %>% check_equal()
check_object(., "my_vector") %>% check_equal()
}
There may be times when you just want to know one of the values in a vector, not all of the values. We can index a position in the vector by using brackets with a number in it like this: [1]
. So if we wanted to print out the contents of the first position in my_vector
, we could write my_vector[1]
.
require(coursekata)
my_vector <- c(1,2,3,4,5)
my_vector <- my_vector * 100
# Write code to get the 4th value in my_vector
# Write code to get the 4th value in my_vector
my_vector[4]
ex() %>% check_output_expr("my_vector[4]", missing_msg = "Have you used `[4]` to print out the 4th number in `my_vector`?")
Many functions will take in a vector as the input. For example, try using sum()
to total up the five values saved in my_vector
. Note that we have already saved some values in my_vector
for you.
require(coursekata)
my_vector <- c(100,200,300,400,500)
# Use sum() to total up the values in my_vector
# Use sum() to total up the values in my_vector
sum(my_vector)
ex() %>% {
check_object(., "my_vector")
check_function(., "sum", not_called_msg = "don't forget to use the sum() function") %>% check_result() %>% check_equal(incorrect_msg = "did you call sum() on my_vector?")
}
We will learn about other R objects that help us organize and visualize data as we go along in the class.
What You Can Store in an R Object
You can think of R objects like buckets that hold values. An R object can hold a single value, or it can hold a group of values (as in the case of a vector). So far, we have only put numbers into R objects. But R objects can actually hold three types of values: numbers, characters, and Boolean values.
Numerical Values
If R knows that you are using numbers, it can do lots of things with them. We have seen, for example, that R can perform arithmetic operations on numbers: addition, subtraction, multiplication, and division.
# Here are two ways of creating a numeric vector with the numbers 1 to 10
my_num_1 <- c(1,2,3,4,5,6,7,8,9,10)
my_num_2 <- 1:10
# Write code to print out both of these numeric vectors
# Here are two ways of creating a numeric vector with the numbers 1 to 10
my_num_1 <- c(1,2,3,4,5,6,7,8,9,10)
my_num_2 <- 1:10
# Write code to print out both of these numeric vectors
my_num_1
my_num_2
ex() %>% {
check_object(., "my_num_1") %>% check_equal()
check_object(., "my_num_2") %>% check_equal()
check_output_expr(., "my_num_1", times = 2, missing_msg = "Did you print out both my_num_1 and my_num_2?", append = FALSE)
}
Note that in R when we use a colon like this, 1:10, it means 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. That’s pretty convenient. Imagine if you needed a vector with the numbers from 1 to 10,000! The colon would be a big time saver.
Character Values
Characters are comprised of text, such as words or sentences. (Numbers can also be treated as characters, depending on the context. For example, when 20 is in quotation marks like this – “20” – it will be treated as a character value, even though it includes a number.) Character values are in between quotation marks, " “. (R doesn’t usually care whether you use single quotes, ‘like this’, or double quotes,”like this".) We’ll mostly use double quotes for consistency.
If we forget the quotes, R will think that a word is a name of an object instead of a character value.
many_hellos <- c("hi", "hello", "hola", "bonjour", "ni hao", "merhaba")
# Write code to print out the 5th way of saying hello in this vector
many_hellos <- c("hi", "hello", "hola", "bonjour", "ni hao", "merhaba")
# Write code to print out the 5th way of saying hello in this vector
many_hellos[5]
ex() %>% {
check_object(., "many_hellos") %>% check_equal(incorrect_msg = "Make sure not to change the contents of many_hellos")
check_output_expr(., "many_hellos[5]", missing_msg = "You can use [] to select the 5th element in many_hellos")
}
Boolean Values
Boolean values are either TRUE
or FALSE
. Maybe we have a question such as: Is the first element in the vector many_hellos
“hi”? We can ask R to find out and return the answer TRUE
or FALSE
. We can do that by using the comparison operator ==
(it just means equal).
require(coursekata)
many_hellos <- c("hi", "hello", "hola", "bonjour", "ni hao", "merhaba")
# See what happens when you submit this code:
many_hellos[1]== "hi"
# See what happens when you submit this code:
many_hellos[1]== "hi"
ex() %>% check_output_expr("many_hellos[1]=='hi'", missing_msg = "Make sure you don't change the code before pressing Submit")
If we want, we can store that answer in an R object.
require(coursekata)
many_hellos <- c("hi", "hello", "hola", "bonjour", "ni hao", "merhaba")
# Write some code that will answer this question: Is the first element in the vector many_hellos "hi"?
# And store it in an R object called first_is_hi
# Write some code that will answer this question: Is the first element in the vector many_hellos "hi"?
# And store it in an R object called first_is_hi
first_is_hi <- many_hellos[1] == "hi"
ex() %>% {
check_operator(., "==") %>% check_result() %>% check_equal()
check_object(., "first_is_hi") %>% check_equal()
}
Most of the questions we ask R to answer with a TRUE
or FALSE
involve comparison operators such as >
, <
, >=
, <=
, and ==
. The double ==
sign checks if two values are equal. There is even a comparison operator to check whether values are not equal: !=
. For example, 5 != 3
is a TRUE
statement.
# Read this code and predict what value will come out of the R console. Then run the code and see if you were right.
A <- 1
B <- 5
compare <- A > B
compare
# Read this code and predict what value will come out of the R console. Then run the code and see if you were right.
A <- 1
B <- 5
compare <- A > B
compare
ex() %>% {
check_object(., "A") %>% check_equal(incorrect_msg = "Make sure not to change the contents of A")
check_object(., "B") %>% check_equal(incorrect_msg = "Make sure not to change the contents of B")
check_object(., "compare") %>% check_equal(incorrect_msg = "Make sure not to change the contents of compare")
check_output_expr(., "compare", missing_msg = "Make sure to print compare")
}
Note that compare
in the code above is not a function. You know this because there is no ()
after it. compare
, in this case, is just a name we made up for an R object to store the Boolean result of the question, “Is A greater than B?”. The answer, as we can see, is FALSE
.
We can also create Boolean vectors by subjecting a whole vector to a comparison. Let’s create a numeric vector with the numbers from 1 to 10 (we will call this vector my_numbers
). Then let’s create a Boolean vector called my_booleans
to store the results of checking whether each number in the my_numbers
vector is greater than or equal to 5.
# Here's the code to create the my_numbers vector:
my_numbers <- 1:10
# And here's the code to check whether each element of the vector my_numbers is greater than or equal to 5, storing the result in a new vector called my_booleans.
my_booleans <- my_numbers >= 5
# This code prints out both vectors
my_numbers
my_booleans
# Here's the code to create the my_numbers vector:
my_numbers <- 1:10
# And here's the code to check whether each element of the vector my_numbers is greater than or equal to 5, storing the result in a new vector called my_booleans.
my_booleans <- my_numbers >= 5
# This code prints out both vectors
my_numbers
my_booleans
ex() %>% {
check_object(., "my_numbers") %>% check_equal(incorrect_msg = "Make sure to keep the line that assigns 1:10 to my_numbers")
check_object(., "my_booleans") %>% check_equal(incorrect_msg = "Make sure you assign my_numbers >= 5 to my_booleans")
check_output_expr(., "my_numbers", missing_msg = "Did you print my_numbers?")
check_output_expr(., "my_booleans", missing_msg = "Did you print my_booleans?")
}
# What do you expect from this code? Run the code to see what happens. Then, fix the bug and run again.
A <- 5
B <- 5
compare <- A = B
compare
# What do you expect from this code? Run the code to see what happens. Then, fix the bug and run again.
A <- 5
B <- 5
compare <- A == B
compare
ex() %>% {
check_object(., "A") %>% check_equal(incorrect_msg = "Make sure the object A is assigned the value 5")
check_object(., "B") %>% check_equal(incorrect_msg = "Make sure the object B is assigned the value 5")
check_operator(., "==") %>% check_result() %>% check_equal()
check_object(., "compare") %>% check_equal()
check_output_expr(., "compare", missing_msg = "Did you tell R to print compare?")
}
In R, we will avoid using the single equal sign, =
. If you want to know whether A is equal to B, use the double equal sign, ==
. The single equal sign is sometimes used instead of the assignment operator, <-
, which can get confusing, both to you and to R. Use the arrow <-
to assign values to an R object, and ==
to ask whether two values are equal.
R for Humans
Programming languages are primarily for communicating with computers. But there are a lot of things we do when we write R to communicate with humans. For example, R doesn’t care if we write spaces between things. We will write A <- 5
and we put spaces in there. But we don’t do it for R. R thinks that A<-5
is the same as A <- 5
. We add the spaces to make it easier for a human to read. The same goes for comments (that begin with #
); R will ignore that code but it may be useful for a human reading the code.
Also, we are mindful that R is a computer language and doesn’t actually “think” or “care” or “ignore” anything, but we will commonly anthropomorphize R. Many readers of this course are new to programming and it might be helpful to think about programming as communicating with R.