Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science II
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
1.1 Welcome to Statistics and Data Science II
-
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Statistics and Data Science II (XCD)
Chapter 1 - Welcome to Statistics and Data Science II
1.1 Welcome
Welcome to Statistics and Data Science II!
What Is Statistics?
Statistics is the study of variation – the fact that the same type of thing (e.g., students, families, countries) have different outcomes (e.g., popularity, income, nuclear weapons)! We are constantly trying to figure out why they vary, and use this understanding to make better predictions about future outcomes. We will be working with the tools and concepts that have been developed, over centuries, to help us understand variation.
Remembering “Bits” Versus Interconnected Understanding
Most people have learned something about statistics before they take a course in it. Many of you have even taken whole courses in statistics before this one. If you have, you have probably heard about some or all of these things: mean, variance, standard deviation, F, ANOVA, regression, normal distribution, and so on.
With such a long list, it’s no surprise that many students see remembering all these little “bits” and pieces as the most challenging part of learning statistics. It’s almost as if these “bits” are floating around in a student’s mind (as shown on the left in the picture below).
But actually, remembering is not the most challenging part. Understanding is the most challenging part. Even if you remember what all these things are, if you don’t understand how it all fits together you will probably forget it all as soon as you are done with the final exam. We don’t want that to happen!
We will discuss lots of these things you have heard of, or studied, before. But instead of emphasizing their particularity—how each is different from the other—we will work on understanding their coherence—how they are all connected together into a system of thinking.
How will we do this? We will emphasize the idea of statistical modeling (which we summarize as DATA = MODEL + ERROR). We hope at the end of class, your mind will be organized more like the person on the right – where the main thing they learned is modeling – and everything else is connected to it. If you focus on the big idea of modeling, it will help organize your knowledge and make it more flexible and powerful.
The Statistical Model
We don’t assume that you know what a statistical model is yet. Even if you have an idea of what it is, looking at different statistical models through this course will make this idea more useful. We expect your understanding of this powerful idea to increase gradually throughout the course. Statistical models help us in three main ways.
First, they help us to understand patterns in data and where they come from, or, what we will call the Data Generating Process (or DGP for short). The DGP is the process that causes variation in data, which we will discuss a lot more later in the course.
Second, they help us to predict what will happen in the future. Of course we can’t really predict the future very well—we aren’t psychic, and you probably aren’t either. But, using statistical models, we can make better predictions than we could without them, even if they aren’t very good. Sometimes this is very useful. When Netflix recommends a movie you might like, they use a statistical model. They may be wrong, but they do better than just random guessing!
Finally, statistical models can help us improve the functioning of complex systems. In situations where everything seems to vary, and where the variation seems overwhelming, you can still use statistical models to help you identify changes you can make in one variable that will improve some outcome you are interested in. Some hospitals, for example, use statistical models to help reduce the time patients spend waiting to see a doctor.
Learning to Build Understanding
Learning in a way that builds understanding is hard. Even professional statisticians find it hard. They are always learning new things, and deepening their understanding. In this course, we want to get a little further along the pathway to understanding. At the end of the course you will understand more than you do now, and hopefully that will be useful to you.
Even though learning for understanding is hard, anyone can do it! Seriously, we have not found anyone who can’t understand the concepts in this course. If it feels hard or confusing, that just means you are making progress in building understanding. Professional statisticians feel confused whenever they are trying to further their understanding of modeling. When you feel confused, it means that your mind is trying to build a connection! That’s a good thing.
Learning by Doing
With all this talk about understanding, you may think this course is going to be just a big discussion of ideas. It’s not. Because at the same time you are learning about the core concepts of statistics, you also will be learning how to analyze data.
The reason for understanding statistical concepts in the first place is to guide you as you learn to make sense out of variation in data. As you work through the course, therefore, you will be constantly putting your knowledge to use: organizing, analyzing, and interpreting data.
How This Course Supports Understanding
In this course you will be asked to do things on every page: analyze data and answer questions. You may feel like you are constantly being “tested.”
While in a sense this is true, it’s important for you to know that doing things is often the best way to learn things. So, answering a question is not just so your teacher knows how you are doing. It is also an important learning opportunity—a part of the learning design.
The main reason for all the questions you will answer as you work through the course is just to help you learn more. Don’t worry if you get questions wrong on the first try. Use the questions to help you figure things out. Working hard and thinking through these questions will result in learning, and that learning will lead to higher grades.
Let’s get started!