#> [1] 9 8 5
age[-c(1, 3)]#> [1] 9 10 5
age[age >= 8 & age <= 10]#> [1] 9 8 10
R’s subsetting operators are fast and powerful and mastering them allows you concisely perform complex operations
Subsetting in R easy to learn but hard to master because you need to internalize a number of interrelated concepts
There are three subsetting operators, [[, [, and $
subsetting operators interact differently with different vector types (e.g. atomic vectors, lists, factors, matrices, and data frames)
Subsetting can be combined with assignment
mat <- matrix(1:9, nrow = 3)
mat#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
Select specific element of a matrix
mat[1, 3]#> [1] 7
mat[1:2, 3]#> [1] 7 8
Select rows or columns of a matrix
mat[ , 2:3]#> [,1] [,2]
#> [1,] 4 7
#> [2,] 5 8
#> [3,] 6 9
mat[c(1, 3), ]#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 3 6 9
mat#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
mat[1, ] # returns a vector#> [1] 1 4 7
mat[, 1] # returns a vector#> [1] 1 2 3
# returns a matrix
mat[1, , drop = FALSE] #> [,1] [,2] [,3]
#> [1,] 1 4 7
# returns a matrix
mat[, 1, drop = FALSE] #> [,1]
#> [1,] 1
#> [2,] 2
#> [3,] 3
# Creating a data frame
df <- data.frame(x = 1:4, y = letters[1:4], z = 11:14)df#> x y z
#> 1 1 a 11
#> 2 2 b 12
#> 3 3 c 13
#> 4 4 d 14
# Variable names
names(df)#> [1] "x" "y" "z"
Positional Indexing
df[1:2, 2:3] #> y z
#> 1 a 11
#> 2 b 12
df[2:3, ] # not specifying column index means we want all columns#> x y z
#> 2 2 b 12
#> 3 3 c 13
df[ ,1:2] # similar#> x y
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
Extract a specific variable x from data frame
# first method
df$x#> [1] 1 2 3 4
# second method
df[[1]]#> [1] 1 2 3 4
# third method
df[["x"]]#> [1] 1 2 3 4
# fourth method
df["x"]#> x
#> 1 1
#> 2 2
#> 3 3
#> 4 4
Logical Indexing
df#> x y z
#> 1 1 a 11
#> 2 2 b 12
#> 3 3 c 13
#> 4 4 d 14
df[c(1, 3), ]#> x y z
#> 1 1 a 11
#> 3 3 c 13
df$x > 2#> [1] FALSE FALSE TRUE TRUE
# Rows that satisfy x>2
df[df$x > 2, ]#> x y z
#> 3 3 c 13
#> 4 4 d 14
How many variables are in mtcars? Show the list of these variables.
Extract the vector mpg from mtcars, and calculate its mean and standard deviation.
Check whether there is any missing value in wt of mtcars
Obtain a data frame with mpg > 22
Obtain a data frame from mtcars with gear=5 and cyl=4 and keep only the variables gear, and cyl
list() is the most flexible data structure of R, vectors of different lengths and/or a data frame can be included in a list
Data frame is a special case of a list and [[ ]] is useful for extracting elements of a list
List is considered as a heterogeneous vector as its elements could be of different types
The operators [[, [, and $ can be used to selecting elements from a list
Create a list
str(my_list)#> List of 4
#> $ : int [1:3] 1 2 3
#> $ : chr "a"
#> $ : logi [1:3] TRUE FALSE FALSE
#> $ : int [1:3] 2 5 9
my_list[[1]]#> [1] 1 2 3