<- c(10, 15)
x c(1:5, x, 100, x)
[1] 1 2 3 4 5 10 15 100 10 15
(AST230) R for Data Science
A variable or an R object with more than one value is known as a vector, and in R, there are two types of vectors: atomic vectors and lists
Atomic vector consists of the same type of elements, e.g. all doubles or all characters.
List can have elements of different data types, i.e. one element of a list could be a numeric value, and the other could be a character value. [More on lists later]
Most of the time, atomic vectors are just called vectors (we’ve already done this in the last section, and we’ll keep doing it throughout the course!).
While lists are also technically vectors, we like to keep things clear by simply calling them “lists.” It makes things easier to understand
c()
merges an arbitrary number of vectors to one vectorIt is not necessary to have vectors of the same length in an expression
If two vectors in an expression are not of the same length then the shorter one will be repeated until it has the same length as the longer one.
Function | Example | Output |
---|---|---|
sum() , prod() |
sum(1:10) |
55 |
min() , max() |
min(1:10) |
1 |
mean() , median() |
median(1:10) |
5.5 |
sd() , var() |
sd(1:10) |
3.0276504 |
quantile() |
quantile(1:10) |
1, 3.25, 5.5, 7.75, 10 |
[ ]
notation[]
(Positional indexing)A vector of the age of five children
Age of a specific child, say the third child
Note
The positional index starts at 1 rather than 0 like some other programming languages (e.g. C, Python)
[]
(Logical indexing)Children with age 11 or 8 years
Children with ages not equal to 11 or 8 years
Observations with age greater than 9 or less than 8
Observations with ages between 8 to 10 inclusive
The mean age of observations between 8 to 10 inclusive
The following code generates a vector nage
of size 1000.
Show that the number of observations
greater than 70 is 176
less than 40 is 185
equal to 39 is 19
greater than 77 or less than 35 is 140
between 50 and 55 (inclusive) is 110
What percentage of observations lies between 70 to 75 (inclusive)?
In R, missing values are coded as NA
meaning ‘Not available’
Most of the R functions return missing value (i.e. NA
) if any input vector contains a missing value
na.rm
, which takes a logical value to include (or exclude) the missing value in (from) the calculationIn R, atomic vectors are homogeneous, i.e., all elements of an atomic vector will be of the same data type
If you attempt to create an atomic vector with more than one data type, e.g. nvec <- c(1, 2, "all")
, then R will create an atomic vector, i.e. all elements of nvec
will be of the same data type, which is known as coercion
as.**
functions, if availableNA
sAn attribute is a piece of information that you can attach to an atomic vector (or any R object) and it won’t affect any of the values in the object, and it will usually not appear when displaying the object.
Attributes are metadata and R will normally ignore it, but some R functions will check for specific attributes
Atomic vectors can be transformed into some other important R data structures, e.g., matrices, arrays, factors, or date-times by adding attributes
Attributes can be retrieved and modified by attr()
or attributes()
Two mostly used attributes are:
names
is one of the common attributes of an R object. We can set names to an atomic vector in various ways. Two of them are:
dim
R will always use the first value in dim
for the number of rows and the second value for the number of columns
R always fills up each matrix by columns, instead of by rows
R functions matrix()
and array()
can be used to control how the columns and rows of a matrix will be arranged (More on next section)