*list*

# Statistics and Data Science: A Modeling Approach

## 7.3 Generating Predictions From the Model

### Predicting Future Observations

Now that you have fit the **Sex** model, you can use your estimates to make predictions about future observations. Doing this requires you to use your model as a function. Think of a function like a machine: you put something in, you get something out. In this case, you will put in a value (e.g., “female”) for your explanatory variable (**Sex**), and get out a predicted thumb length.

We can think about how use the **TinySex.model** as a function. Recall that our model, once fit, looked like this:

\[Y_{i}=59+6X_{i}+e_{i}\]

To turn this into a function, we remove the error term. If our goal is to model the variation, we want the error term there. But if our goal is to predict, we are going to ignore error and just do our best! We also change the \(Y_{i}\) to \(\hat{Y}_{i}\), which indicates a predicted score for person *i*. Our prediction function, then, looks like this:

\[\hat{Y}_{i}=59+6X_{i}\]

We leave out the error term because every person will have a different error term. If we knew their error, we could predict their score exactly. But since we don’t—because remember, we are predicting a new observation—all we can do is predict their score based on their sex.

This prediction function is straightforward to use. If we want to predict what the next observed thumb length will be, we can see that if the next student sampled is female, their predicted thumb length is 59. If they are male, the prediction is (59 + 6), or 65.

### Using R to Predict Future Observations

If the numbers are easy to add or subtract, it’s not that hard to do this in your head. But of course, we won’t always want to do it in our heads for more complex models.

We have a couple of new R functions that you can use make it easier to generate a prediction: `b0()`

and `b1()`

. Run the three lines of R code in the window below and see if you can figure out what these new functions do.

```
require(tidyverse)
require(mosaic)
#require(Lock5Data)
require(supernova)
TinyFingers <- data.frame(
Sex = rep(c("female", "male"), each = 3),
Thumb = c(56, 60, 61, 63, 64, 68)
)
```

```
# This creates the TinySex.model
TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers)
# Run this code
TinySex.model
b0(TinySex.model)
b1(TinySex.model)
```

```
TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers)
TinySex.model
b0(TinySex.model)
b1(TinySex.model)
```

```
ex() %>% check_object("TinySex.model") %>% check_equal()
ex() %>% check_output_expr("TinySex.model")
ex() %>% check_function("b0") %>% check_arg("fit") %>% check_equal()
ex() %>% check_function("b1") %>% check_arg("fit") %>% check_equal()
```

The `b0()`

function takes a model as its input and returns the parameter estimate for the first parameter, which in this case is the mean Thumb of females. The function `b1()`

returns the parameter estimate for the second parameter, the increment from the mean of females to the mean of males.

In the window below, see if you can write a single line of R code that will use both of these new functions (`b0()`

and `b1()`

) to return the predicted value for a new male’s thumb length.

```
require(tidyverse)
require(mosaic)
#require(Lock5Data)
require(supernova)
TinyFingers <- data.frame(
Sex = rep(c("female", "male"), each = 3),
Thumb = c(56, 60, 61, 63, 64, 68)
)
```

```
# This creates the TinySex.model
TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers)
# Write a line of R code that uses both b0() and b1() functions
# to return the predicted Thumb length of a male
```

```
# This creates the TinySex.model
TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers)
# Write a line of R code that uses both b0() and b1() functions
# to return the predicted Thumb length of a male
b0(TinySex.model) + b1(TinySex.model)
```

```
ex() %>% {
check_function(., "b0") %>% check_arg("fit") %>% check_equal()
check_function(., "b1") %>% check_arg("fit") %>% check_equal()
check_output_expr(., "b0(TinySex.model) + b1(TinySex.model)")
}
```

`[1] 65`

### Generating “Predicted” Values for the Sample Data

As we did in Chapter 5, we also will want to generate model predictions for our sample data. It seems odd to predict values when we already know the actual values. But it’s actually very useful to do so, because then we can calculate residuals from the model predictions.

To get predicted values from the **TinySex.model**, we use the `predict()`

function:

`predict(TinySex.model)`

```
1 2 3 4 5 6
59 59 59 65 65 65
```

Let’s say you want to save these predicted values for each person as a variable called **Sex.predicted** (in the **TinyFingers** data frame). See if you can complete the R code to do this.

```
require(tidyverse)
require(mosaic)
require(Lock5Data)
require(supernova)
TinyFingers <- data.frame(
Sex = rep(c("female", "male"), each = 3),
Thumb = c(56, 60, 61, 63, 64, 68)
)
TinySex.model <- lm(Thumb ~ Sex, data = TinyFingers)
```

```
TinyFingers$Sex.predicted <-
# this prints the TinyFingers data frame
TinyFingers
```

```
TinyFingers$Sex.predicted <- predict(TinySex.model)
```

```
ex() %>% check_object("TinyFingers") %>% check_column("Sex.predicted") %>% check_equal()
```

`predict()`

to predict values for each case.```
Sex Thumb Sex.predicted
1 female 56 59
2 female 60 59
3 female 61 59
4 male 63 65
5 male 64 65
6 male 68 65
```

Notice that our predictions are a single number for each person: 59 for each female and 65 for each male. Each person gets a single predicted thumb length; we never predict both of these values for a single person. But different people will get different predicted outcomes based on their sex.

Try the function `predict()`

on the full data set. Recall that you fit the model to the full data set, **Fingers**. You saved the model as **Sex.model**. Now see if you can generate predictions from the model and save the predictions as a variable in the **Fingers** data frame.

```
require(tidyverse)
require(mosaic)
require(Lock5Data)
require(supernova)
```

```
# here is the model we fit before
Sex.model <- lm(Thumb ~ Sex, data = Fingers)
# generate all possible predictions from Sex.model
Fingers$Sex.predicted <-
# this will print out 10 lines of Fingers
head(select(Fingers, Sex, Thumb, Sex.predicted), 10)
```

```
Sex.model <- lm(Thumb ~ Sex, data = Fingers)
Fingers$Sex.predicted <- predict(Sex.model)
head(select(Fingers, Sex, Thumb, Sex.predicted), 10)
```

```
ex() %>% {
check_object(., "Sex.model") %>% check_equal()
check_function(., "predict") %>% check_result() %>% check_equal()
check_object(., "Fingers") %>% check_column(., "Sex.predicted") %>% check_equal()
check_function(., "head") %>% check_result() %>% check_equal()
}
```

`predict()`

to predict values for each case.```
Sex Thumb Sex.predicted
1 male 66.00 64.70267
2 female 64.00 58.25585
3 female 56.00 58.25585
4 male 58.42 64.70267
5 female 74.00 58.25585
6 female 60.00 58.25585
7 male 70.00 64.70267
8 female 55.00 58.25585
9 female 60.00 58.25585
10 female 52.00 58.25585
```

**We’ve learned how to specify and fit models. We then took those models and used them (as functions) to make predictions for future observations, and also to generate predictions for each person in our sample data. We turn next to examine the residuals from our model—the variation left over after we subtract out our model.**