Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science: A Modeling Approach
-
segmentPART I: EXPLORING VARIATION
-
segmentChapter 1 - Welcome to Statistics: A Modeling Approach
-
segmentChapter 2 - Understanding Data
-
segmentChapter 3 - Examining Distributions
-
segmentChapter 4 - Explaining Variation
-
segmentPART II: MODELING VARIATION
-
segmentChapter 5 - A Simple Model
-
segmentChapter 6 - Quantifying Error
-
segmentChapter 7 - Adding an Explanatory Variable to the Model
-
segmentChapter 8 - Digging Deeper into Group Models
-
segmentChapter 9 - Models with a Quantitative Explanatory Variable
-
segmentPART III: EVALUATING MODELS
-
segmentChapter 10 - The Logic of Inference
-
segmentChapter 11 - Model Comparison with F
-
segmentChapter 12 - Parameter Estimation and Confidence Intervals
-
segmentChapter 13 - What You Have Learned
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Advanced Statistics and Data Science I (ABC)
1.3 Doing Statistics With R
Speaking of doing, how are you going to do the data analysis part of this course? The answer is: you are going to use R (yes, it’s just called R, the letter). R is a free open source coding language commonly used by statisticians. Open source means that R was developed and is maintained not by a company but by a community of users. So basically, anyone can contribute to R and help make it better.
Why R?
Technology is a fundamental part of doing statistics these days. In fact, most of what we do in terms of data analysis would not be possible without computers, and most statistics courses include learning to use software for data analysis. There are many different software packages available. We chose to use R for two reasons: first, it’s free; second, it’s a coding language.
You may already know a bit about computer coding (or programming). But if you don’t, it’s worth demystifying it a little. Computers manipulate data rapidly and accurately—something we need to do in statistics. A coding language is the language we use for telling a computer what to do. It’s really that simple.
You may be thinking: coding language; that sounds hard! It may, in fact, be a little harder than just learning to use a statistics package with a point-and-click interface. But don’t worry: we will take you through it step by step, slowly. You might even enjoy it. And you won’t have to install anything or do anything special to your computer. You can just focus on learning R.
We want you to learn some R because we believe writing code will help you understand statistics better than simply clicking on buttons in a statistics package. And, it also will give you a skill at the end of this course that you didn’t have before! You can even put it on your resume (as in, “Basic knowledge of data analysis with R”).
Representing the same concept in different forms (called “re-representation”) helps make learning more robust. In this course, you will use a number of different representations: words, graphs, tables, mathematical notation, and R. Making connections between these different representations will deepen your understanding.
Try Some R Code
For example, here’s a bit of R (what we sometimes refer to as “code”). Read the code in the window below. What do you think it will do?
(NOTE: Press the <Connect> button to load all the code windows on this page. The first time you press <Connect> it could take up to 1-2 minutes, so please be patient. The code window is ready when you see a blue dot and the word Ready to the right of the <Submit> button.)
Press the <Run> button and see what happens.
print("Hello world!")
print("Hello world!")
ex() %>% check_function("print") %>% check_arg("x") %>% check_equal()
IF THE CODE WINDOW DOESN’T WORK: Try following the code window’s instructions (in particular – don’t refresh the page if it tells you not to refresh). You might also try waiting a few minutes then pressing the <Reconnect> button. If that doesn’t work, go back to the First Things First page at the beginning of the book to review your technology setup. Then try refreshing your browser page.
If you still can’t get it to work, click the diamond-shaped CK icon in the lower right corner of this page to file a tech support ticket. This will also give you access to a knowledge base, including a searchable list of all R functions used in the book and the page on which they are first introduced.
After you click the <Run> button, you will see that R displays the phrase “Hello world!” in an area below the <Run> and <Submit> buttons. Note: when we tell R to print()
, R interprets that to mean, “Display on the screen.” You just figured out a little bit of R.
Important Things to Notice About the Code Window
There are a few things worth noting about the way the code window works.
<Run> versus <Submit> buttons. When you press the <Run> button it will run all the code in the window above the button. You can run and re-run code as many times as you want. But to get credit for doing the assignment, and to get some feedback, you need to press the <Submit> button.
Go back to the code window above and press <Submit>. This time you get a blue checkmark and some feedback, depending on whether you succeeded in the code exercise or not. Be sure to submit your final work for each code window to your instructor by pressing Submit (unless no <Submit> button is available.)
<Reset> button. The white <Reset> button on the right side of the code window will delete the work you have done so far and return the window to its original state. It’s a good button to push if you want to start over, or just try again without looking at your previous solution.
Try Some More R Code
Let’s try another one. Read the code and see if you can guess what it will do. Then press the <Run> button.
sum(1,5,10)
sum(1,5,10)
ex() %>%
check_function("sum") %>%
check_arg("...", arg_not_specified_msg = "Make sure you don't delete what's inside the parentheses.") %>%
check_equal(incorrect_msg = "Make sure you don't change what's inside the parentheses.")
This bit of code printed out the sum of 1, 5, and 10 (that is, 16). You are already learning a bit of code!
You can also use R like a basic calculator. Try running the code in the window below. Just press Run.
# a few basic arithmetic things
5 + 1
10 - 3
2 * 4
9 / 3
# a few basic arithmetic things
5 + 1
10 - 3
2 * 4
9 / 3
ex() %>% {
check_operator(., "+") %>% check_result() %>% check_equal()
check_operator(., "-") %>% check_result() %>% check_equal()
check_operator(., "*") %>% check_result() %>% check_equal()
check_operator(., "/") %>% check_result() %>% check_equal()
}
Notice that you can put more than one line of code—or set of instructions—in a single R window. When you press the <Run> button, all the commands in the window will be run, one after the other, in the order in which they appear.
Comments in the R Window
Sometimes we will write things in the R coding window that we want R to ignore. These are called comments and they start with a #
. R will ignore comments, and just execute the code. In this book we will use the comments as a way to give you instructions for R exercises. In the code window below, try typing whatever you want after a #
at the front of the line. Then press Run.
require(coursekata)
# type whatever you want
# see... blah blah blah
# no solution, but need code to show submit button
ex() %>% check_code("#", fixed = TRUE)
Notice that you don’t see anything happen because lines that start with a #
are ignored by R.
If you want to write a comment that takes more than one line, it’s a good idea to put a # at the beginning of each line.
How to Learn the Most from the Coding Exercises
The <Run> button will run your code in the code window. The <Submit> button will both run the code and submit your answer to be graded. You’ll learn the most by trying to write code, running it, and keeping on trying until it works. After you’ve figured it out, click <Submit>.
Feel free to try out different ideas, even after you’ve gotten the code to run. You can keep running code even after you have clicked <Submit>. The more you explore, the more you will learn. And if you feel frustrated, that just goes with the territory. Learn to enjoy your frustration; it’s part of getting better!
A Code Window Sandbox is Always Available
We will always provide a code window when you need one. But, sometimes you may just want to try something out.
Go to the Resources folder and click on the page that says R Sandbox. This will open a page with an empty code window. This gives you a handy place to run some R code.
R in the Real World
In this online book we will run all of our R code in the embedded code windows. These windows are great for learning R. But later, when you start doing actual data analysis projects, you will use different software tools. The two leading tools are RStudio and Jupyter Notebooks. Both are powerful tools, and both have advantages and disadvantages.
RStudio is an application that lets you write and run R code on your computer. It is an IDE (Integrated Development Environment). Jupyter Notebooks is a web application that can either be installed on your computer or on a server in the cloud. (Your instructor may have given you access to a version of Jupyter notebooks in the cloud as part of this class.)
It’s possible to install these applications on your computer but an easier route to getting started is to use a cloud service called DeepNote.com (which is free for students). DeepNote is kind of like Google Docs for Jupyter notebooks. If you are using Jupyter notebooks for this class, you can download them, then upload them later to DeepNote and run them there. Or you can login to DeepNote and create a new notebook from scratch.
To get started with DeepNote, check out the R in the Real World page in the Resources folder at the end of this online book.