Course Outline
-
segmentGetting Started (Don't Skip This Part)
-
segmentStatistics and Data Science II
-
segmentPART I: EXPLORING AND MODELING VARIATION
-
segmentChapter 1 - Exploring Data with R
-
1.7 Selecting and Filtering Data in R
-
segmentChapter 2 - From Exploring to Modeling Variation
-
segmentChapter 3 - Modeling Relationships in Data
-
segmentPART II: COMPARING MODELS TO MAKE INFERENCES
-
segmentChapter 4 - The Logic of Inference
-
segmentChapter 5 - Model Comparison with F
-
segmentChapter 6 - Parameter Estimation and Confidence Intervals
-
segmentPART III: MULTIVARIATE MODELS
-
segmentChapter 7 - Introduction to Multivariate Models
-
segmentChapter 8 - Multivariate Model Comparisons
-
segmentFinishing Up (Don't Skip This Part!)
-
segmentResources
list High School / Statistics and Data Science II (XCD)
1.7 Selecting & Filtering Data in R
Sometimes you want to focus on a subset of the variables in a data frame. For example, you might want to look at just the variables PriceK
and PriceR
in the Ames
data frame. PriceK
represents the sale price of the home in thousands of dollars. PriceR
represents the sale price in dollars.
We can use the select()
function to look at just a subset of variables. When using select()
, we first need to tell R which data frame, then which variables to select from that data frame.
select(Ames, PriceK, PriceR)
Modify the select()
code below to take a look at just the following variables in Ames
: PriceK
, PriceR
, and Neighborhood
.
require(coursekata)
# Modify this code
select(Ames, ...)
select(Ames, PriceK, PriceR, Neighborhood)
ex() %>% check_output_expr("select(Ames, PriceK, PriceR, Neighborhood)")
Running the select()
function will print out the values of the selected variables for every case. If you want to just look at the first six rows you can combine the head()
and select()
functions like this: head(select(Ames, PriceK, PriceR, Neighborhood))
.
PriceK PriceR Neighborhood
1 260 260000 CollegeCreek
2 210 210000 CollegeCreek
3 155 155000 OldTown
4 125 125000 OldTown
5 110 110000 CollegeCreek
6 100 100000 OldTown
Whereas select()
gives you a subset of variables (or columns of the data frame), the filter()
function will give you a subset of observations (or rows) of the data frame based on some criteria. For example, here is some code that will return only the observations where the sale price is greater than $300,000:
filter(Ames, PriceK > 300)
Edit the code below to filter for homes that cost more than 300K.
require(coursekata)
# Modify this code
filter()
# Modify this code
filter(Ames, PriceK > 300)
ex() %>% check_output_expr("filter(Ames, PriceK > 300)")
YearBuilt YearSold Neighborhood HomeSizeR HomeSizeK LotSizeR LotSizeK Floors
1 2007 2007 CollegeCreek 2696 2.696 9965 9.965 2
2 2004 2007 CollegeCreek 2000 2.000 10386 10.386 1
3 2000 2009 CollegeCreek 2153 2.153 11050 11.050 2
4 2006 2007 CollegeCreek 2828 2.828 9965 9.965 2
BuildQuality Foundation HasCentralAir Bathrooms Bedrooms TotalRooms
1 7 PouredConcrete 1 2 4 10
2 8 PouredConcrete 1 2 3 8
3 9 PouredConcrete 1 2 3 8
4 8 PouredConcrete 1 3 4 11
KitchenQuality HasFireplace GarageType GarageCars PriceR PriceK
1 Excellent 1 Attached 3 383970 383.97
2 Good 0 Attached 3 305900 305.90
3 Excellent 1 Attached 3 313000 313.00
4 Good 1 Attached 3 424870 424.87
The function filter()
, like select()
, returns a data frame. In this case, the data frame only has four rows because only four observations in Ames
had sale prices greater than $300K.