<- c(11, 9, 8, 10, 5)
age c(2, 3, 5)] age[
[1] 9 8 5
-c(1, 3)] age[
[1] 9 10 5
>= 8 & age <= 10] age[age
[1] 9 8 10
(AST230) R for Data Science
R’s subsetting operators are fast and powerful and mastering them allows you concisely perform complex operations
Subsetting in R easy to learn but hard to master because you need to internalize a number of interrelated concepts
There are three subsetting operators, [[
, [
, and $
subsetting operators interact differently with different vector types (e.g. atomic vectors, lists, factors, matrices, and data frames)
Subsetting can be combined with assignment
<- c(11, 9, 8, 10, 5)
age c(2, 3, 5)] age[
[1] 9 8 5
-c(1, 3)] age[
[1] 9 10 5
>= 8 & age <= 10] age[age
[1] 9 8 10
<- matrix(1:9, nrow = 3)
mat mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Select specific element of a matrix
1, 3] mat[
[1] 7
1:2, 3] mat[
[1] 7 8
Select rows or columns of a matrix
2:3] mat[ ,
[,1] [,2]
[1,] 4 7
[2,] 5 8
[3,] 6 9
c(1, 3), ] mat[
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 3 6 9
mat
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
1, ] # returns a vector mat[
[1] 1 4 7
1] # returns a vector mat[,
[1] 1 2 3
# returns a matrix
1, , drop = FALSE] mat[
[,1] [,2] [,3]
[1,] 1 4 7
# returns a matrix
1, drop = FALSE] mat[,
[,1]
[1,] 1
[2,] 2
[3,] 3
# Creating a data frame
<- data.frame(x = 1:4, y = letters[1:4], z = 11:14) df
df
x y z
1 1 a 11
2 2 b 12
3 3 c 13
4 4 d 14
# Variable names
names(df)
[1] "x" "y" "z"
str(df)
'data.frame': 4 obs. of 3 variables:
$ x: int 1 2 3 4
$ y: chr "a" "b" "c" "d"
$ z: int 11 12 13 14
# Row names
row.names(df)
[1] "1" "2" "3" "4"
Positional Indexing
1:2, 2:3] df[
y z
1 a 11
2 b 12
2:3, ] # not specifying column index means we want all columns df[
x y z
2 2 b 12
3 3 c 13
1:2] # similar df[ ,
x y
1 1 a
2 2 b
3 3 c
4 4 d
Extract a specific variable x
from data frame
# first method
$x df
[1] 1 2 3 4
# second method
1]] df[[
[1] 1 2 3 4
# third method
"x"]] df[[
[1] 1 2 3 4
# fourth method
"x"] df[
x
1 1
2 2
3 3
4 4
names(df)
[1] "x" "y" "z"
# Select the variables x and y
c("x", "y")] df[,
x y
1 1 a
2 2 b
3 3 c
4 4 d
# Select the variables x and y
c("x", "y")] df[
x y
1 1 a
2 2 b
3 3 c
4 4 d
# Selecting both columns and rows
1:2, c("x", "y")] df[
x y
1 1 a
2 2 b
Logical Indexing
df
x y z
1 1 a 11
2 2 b 12
3 3 c 13
4 4 d 14
c(1, 3), ] df[
x y z
1 1 a 11
3 3 c 13
$x > 2 df
[1] FALSE FALSE TRUE TRUE
# Rows that satisfy x>2
$x > 2, ] df[df
x y z
3 3 c 13
4 4 d 14
How many variables are in mtcars
? Show the list of these variables.
Extract the vector mpg
from mtcars
, and calculate its mean and standard deviation.
Check whether there is any missing value in wt
of mtcars
Obtain a data frame with mpg > 22
Obtain a data frame from mtcars
with gear=5
and cyl=4
and keep only the variables year
, model
, and cyl
list()
is the most flexible data structure of R, vectors of different lengths and/or a data frame can be included in a list
Data frame is a special case of a list and [[ ]]
is useful for extracting elements of a list
List is considered as a heterogeneous vector as its elements could be of different types
The operators [[
, [
, and $
can be used to selecting elements from a list
Create a list
<- list(1:3, "a", c(TRUE, FALSE, FALSE), c(2L, 5L, 9L)) my_list
str(my_list)
List of 4
$ : int [1:3] 1 2 3
$ : chr "a"
$ : logi [1:3] TRUE FALSE FALSE
$ : int [1:3] 2 5 9
1]] my_list[[
[1] 1 2 3
1] my_list[
[[1]]
[1] 1 2 3
typeof(my_list[[1]])
[1] "integer"
typeof(my_list[1])
[1] "list"
<- list(x = 1:5,
l2 y = c(TRUE, FALSE),
z = matrix(1:4, 2))
l2
$x
[1] 1 2 3 4 5
$y
[1] TRUE FALSE
$z
[,1] [,2]
[1,] 1 3
[2,] 2 4
Extracting x
$x l2
[1] 1 2 3 4 5
"x"]] l2[[
[1] 1 2 3 4 5
1]] l2[[
[1] 1 2 3 4 5