[1] 9 8 5
age[-c(1, 3)]
[1] 9 10 5
age[age >= 8 & age <= 10]
[1] 9 8 10
(AST230) R for Data Science
R’s subsetting operators are fast and powerful and mastering them allows you concisely perform complex operations
Subsetting in R easy to learn but hard to master because you need to internalize a number of interrelated concepts
There are three subsetting operators, [[
, [
, and $
subsetting operators interact differently with different vector types (e.g. atomic vectors, lists, factors, matrices, and data frames)
Subsetting can be combined with assignment
mat <- matrix(1:9, nrow = 3)
mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Select specific element of a matrix
mat[1, 3]
[1] 7
mat[1:2, 3]
[1] 7 8
Select rows or columns of a matrix
mat[ , 2:3]
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 6 9
mat[c(1, 3), ]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 3 6 9
mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
mat[1, ] # returns a vector
[1] 1 4 7
mat[, 1] # returns a vector
[1] 1 2 3
# returns a matrix
mat[1, , drop = FALSE]
[,1] [,2] [,3]
[1,] 1 4 7
# returns a matrix
mat[, 1, drop = FALSE]
[,1]
[1,] 1
[2,] 2
[3,] 3
# Creating a data frame
df <- data.frame(x = 1:4, y = letters[1:4], z = 11:14)
Positional Indexing
df[1:2, 2:3]
y z
1 a 11
2 b 12
df[2:3, ] # not specifying column index means we want all columns
x y z
2 2 b 12
3 3 c 13
df[ ,1:2] # similar
x y
1 1 a
2 2 b
3 3 c
4 4 d
Extract a specific variable x
from data frame
# first method
df$x
[1] 1 2 3 4
# second method
df[[1]]
[1] 1 2 3 4
# third method
df[["x"]]
[1] 1 2 3 4
# fourth method
df["x"]
x
1 1
2 2
3 3
4 4
Logical Indexing
df$x > 2
[1] FALSE FALSE TRUE TRUE
# Rows that satisfy x>2
df[df$x > 2, ]
x y z
3 3 c 13
4 4 d 14
How many variables are in mtcars
? Show the list of these variables.
Extract the vector mpg
from mtcars
, and calculate its mean and standard deviation.
Check whether there is any missing value in wt
of mtcars
Obtain a data frame with mpg > 22
Obtain a data frame from mtcars
with gear=5
and cyl=4
and keep only the variables gear
, and cyl
list()
is the most flexible data structure of R, vectors of different lengths and/or a data frame can be included in a list
Data frame is a special case of a list and [[ ]]
is useful for extracting elements of a list
List is considered as a heterogeneous vector as its elements could be of different types
The operators [[
, [
, and $
can be used to selecting elements from a list
Create a list
str(my_list)
List of 4
$ : int [1:3] 1 2 3
$ : chr "a"
$ : logi [1:3] TRUE FALSE FALSE
$ : int [1:3] 2 5 9
my_list[[1]]
[1] 1 2 3