my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7)
my_vec
[1] 2 3 1 6 4 3 3 7
(AST230) R for Data Science
Up until now we’ve been creating simple objects by directly assigning a single value to an object.
It’s very likely that you’ll soon want to progress to creating more complicated objects. Happily, R has a multitude of functions to help you do this
The first function we will learn about is the c()
function.
The c()
function is short for concatenate and we use it to join together a series of values and store them in a data structure called a vector or “atomic vector”
my_vec <- c(2, 3, 1, 6, 4, 3, 3, 7)
my_vec
[1] 2 3 1 6 4 3 3 7
Now that we’ve created a vector. we can use other functions to do useful stuff with this object
For example, we can calculate the mean, variance, standard deviation and number of elements in our vector by using the mean()
, var()
, sd()
and length()
functions
mean(my_vec) # returns the mean of my_vec
[1] 3.625
var(my_vec) # returns the variance of my_vec
[1] 3.982143
sd(my_vec) # returns the standard deviation of my_vec
[1] 1.995531
length(my_vec) # returns the number of elements in my_vec
[1] 8
Scalar is a vector of length one
vec_mean <- mean(my_vec) # returns the mean of my_vec
vec_mean
[1] 3.625
logical
integer
double
character
complex
raw
Every vector has two key properties:
typeof()
length()
.length(my_vec)
[1] 8
Logical vectors are the simplest type of atomic vector because they can take only two possible values: FALSE
and TRUE
x_l <- c(TRUE, FALSE, TRUE)
x_l
[1] TRUE FALSE TRUE
typeof(x_l)
[1] "logical"
is.logical(x_l)
[1] TRUE
logical operator | symbol in R |
---|---|
equal to | == |
greater or greater equal |
> ,>=
|
less or less equal |
< ,<=
|
not equal | != |
10 == 15
[1] FALSE
10 != 15
[1] TRUE
10 > 15
[1] FALSE
10 < 15
[1] TRUE
Integer and double vectors are known collectively as numeric vectors
In R, numbers are doubles by default. To make an integer, place an L
after the number:
Character vectors are used to represent string values. You can think of character strings as something like a word (or multiple words).
It is represented by a collection of characters between double quotes ("
)
x_c <- c("boy", "boy", "girl")
x_c
[1] "boy" "boy" "girl"
typeof(x_c)
[1] "character"
is.character(x_c)
[1] TRUE
NULL
is often used to represent the absence of a vector
NULL
typically behaves like a vector of length 0my_vec1 <- NULL
my_vec1 <- c(my_vec1, 10)
my_vec1
[1] 10
NA
is used to represent the absence of a value in a vector.my_vec2 <- c(18, 21, NA, 22)
my_vec2
[1] 18 21 NA 22
interger
and double
\rightarrow quantitative data
character
\rightarrow qualitative data
logical
\rightarrow binary data
Sometimes it can be useful to create a vector that contains a regular sequence of values in steps of one.
Here we can make use of a shortcut using the :
(colon) symbol.
my_seq <- 1:10 # create regular sequence
my_seq
[1] 1 2 3 4 5 6 7 8 9 10
my_seq2 <- 10:1 # in decending order
my_seq2
[1] 10 9 8 7 6 5 4 3 2 1
-5:4
[1] -5 -4 -3 -2 -1 0 1 2 3 4
seq()
Other useful functions for generating vectors of sequences include the seq()
and rep()
functions.
For example, to generate a sequence from 1
to 5
in steps of 0.5
seq(from = 1, to = 5, by = 0.5)
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
seq(from = 1, to = 5, length.out = 8)
[1] 1.000000 1.571429 2.142857 2.714286 3.285714 3.857143 4.428571 5.000000
Here we’ve used the arguments from =
and to = to
define the limits of the sequence and the by =
argument to specify the increment of the sequence.
Play around with other values for these arguments to see their effect
rep()
rep()
function allows you to replicate (repeat) values a specified number of times. To repeat the value 2, 10 timesrep(2, times = 10)
[1] 2 2 2 2 2 2 2 2 2 2
The arguments times
, each
and length.out
are used in rep()
to obtain different vectors
We can also repeat non-numeric values. e.g.
rep("boy", times = 3)
[1] "boy" "boy" "boy"
[1] "boy" "boy" "boy" "girl" "girl" "girl"
[1] "boy" "boy" "girl" "girl" "boy" "boy" "girl" "girl" "boy" "boy"
[11] "girl" "girl"
Create the vector (101, 102, 103, 200, 205, 210, 1000, 1100, 1200) using a combination of the c()
and seq()
functions
Create a vector that repeats the integers from 1 to 5, 10 times, i.e. (1, 2, 3, 4, 5, 1, 2, 3, 4, 5, \ldots), and the length of the vector should be 50!
Create the same vector as before, but this time repeat 1, 10 times, then 2, 10 times, etc., i.e. (1, 1, 1, \ldots, 2, 2, 2, \ldots, \ldots, 5, 5, 5) and the length of the vector should also be 50