library(tidyverse)
10 Importing data
Importing data
Importing data is the process of loading data from external files into R for analysis.
Most real-world data is stored outside of R (e.g., spreadsheets, databases, web).
Effective data importing is crucial for data cleaning, analysis, and visualization.
Common Data Formats
- CSV (comma-separated values) (.csv)
- Excel (.xlsx, .xls)
- Text files (.txt)
- SPSS (.sav)
- Stata (.dta)
- SAS (.sas7bdat)
- R’s native data format (.RData)
- Download the data folder for this session.
1 Importing CSV Files
- Using
read.csv()
:
<- read.csv("data/prawnGR.csv")
data1 head(data1)
#> GRate diet
#> 1 9.7741 Natural
#> 2 10.2931 Natural
#> 3 10.0474 Natural
#> 4 10.0808 Natural
#> 5 9.3106 Natural
#> 6 10.4414 Natural
- Using
readr::read_csv()
:
library(readr)
<- read_csv("data/prawnGR.csv")
data1 head(data1)
#> # A tibble: 6 × 2
#> GRate diet
#> <dbl> <chr>
#> 1 9.77 Natural
#> 2 10.3 Natural
#> 3 10.0 Natural
#> 4 10.1 Natural
#> 5 9.31 Natural
#> 6 10.4 Natural
2 Importing Excel Files
- Install the
readxl
package:
install.packages("readxl")
- Load the package and read data:
library(readxl)
<- read_excel("data/whaledata.xls")
data2 head(data2)
#> # A tibble: 6 × 8
#> month time.at.station water.noise number.whales latitude longitude depth
#> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 May 1344 low 7 60.4 -4.18 520
#> 2 May 1633 medium 13 60.4 -4.19 559
#> 3 May 743 medium 12 60.5 -4.62 1006
#> 4 May 1050 medium 10 60.3 -4.35 540
#> 5 May 1764 medium 12 60.4 -5.2 1000
#> 6 May 580 high 10 60.4 -5.22 1000
#> # ℹ 1 more variable: gradient <dbl>
3 Importing Text Files
Using read.table()
:
<- read.table("data/atmosphere.txt", header = TRUE)
data3 head(data3)
#> moisture treatment
#> 1 300.6 seeded
#> 2 302.4 seeded
#> 3 298.6 seeded
#> 4 315.9 seeded
#> 5 306.9 seeded
#> 6 300.1 seeded
4 Importing SPSS Files
- Install the
haven
package:
install.packages("haven")
- Load the package and read data:
library(haven)
<- read_sav("data/hw_dat.sav")
data4 head(data4)
#> # A tibble: 6 × 13
#> year_birth age division residence religion edu wealth_index total_birth
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 1988 26 Barisal Rural Islam Primary Poorest 2
#> 2 1973 41 Barisal Rural Islam Primary Middle 4
#> 3 1976 38 Barisal Rural Islam Primary Poorest 2
#> 4 1996 18 Barisal Rural Islam Seconda… Poorest 0
#> 5 1986 28 Barisal Rural Islam Primary Poorest 2
#> 6 1980 34 Barisal Rural Islam Primary Poorer 3
#> # ℹ 5 more variables: current_pregnant <chr>, current_breast_feed <chr>,
#> # edu_husband <chr>, bmi <dbl>, overweight <dbl>
5 Importing Stata Files
- Using
haven
Package:
<- haven::read_dta("data/hw_dat.dta")
data5 head(data5)
#> # A tibble: 6 × 13
#> year_birth age division residence religion edu wealth_index total_birth
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 1988 26 Barisal Rural Islam Primary Poorest 2
#> 2 1973 41 Barisal Rural Islam Primary Middle 4
#> 3 1976 38 Barisal Rural Islam Primary Poorest 2
#> 4 1996 18 Barisal Rural Islam Seconda… Poorest 0
#> 5 1986 28 Barisal Rural Islam Primary Poorest 2
#> 6 1980 34 Barisal Rural Islam Primary Poorer 3
#> # ℹ 5 more variables: current_pregnant <chr>, current_breast_feed <chr>,
#> # edu_husband <chr>, bmi <dbl>, overweight <dbl>
6 Importing SAS Files
- Using
haven
Package:
<- haven::read_sas("data/airline.sas7bdat")
data6 head(data6)
#> # A tibble: 6 × 6
#> YEAR Y W R L K
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1948 1.21 0.243 0.145 1.41 0.612
#> 2 1949 1.35 0.260 0.218 1.38 0.559
#> 3 1950 1.57 0.278 0.316 1.39 0.573
#> 4 1951 1.95 0.297 0.394 1.55 0.564
#> 5 1952 2.27 0.310 0.356 1.80 0.574
#> 6 1953 2.73 0.322 0.359 1.93 0.711
7 Importing R’s native data format
- Simply
load()
the.Rdata
File
load("data/TemoraBR.RData")
- This loads all objects stored in the file into the R environment.
head(TemoraBR)
#> # A tibble: 6 × 3
#> temp beat_rate acclimitisation_temp
#> <dbl> <dbl> <dbl>
#> 1 5 3.76 5
#> 2 6 5.4 5
#> 3 7 8 5
#> 4 10 9.4 5
#> 5 11 16.6 5
#> 6 12 18.5 5
Data entry
Sometimes you’ll need to assemble a tibble “by hand” doing a little data entry in your R script.
There are two useful functions to help you do this which differ in whether you layout the tibble by columns or by rows.
tibble()
works by column.
tibble(
x = c(1, 2, 5),
y = c("h", "m", "g"),
z = c(0.08, 0.83, 0.60)
)
#> # A tibble: 3 × 3
#> x y z
#> <dbl> <chr> <dbl>
#> 1 1 h 0.08
#> 2 2 m 0.83
#> 3 5 g 0.6
Using R’s Built-in Datasets
R comes with several built-in datasets, and we can access them using the
data()
function.List all available datasets:
data()
- Load a dataset into the environment:
data(mtcars)
head(mtcars)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Using dataset from R Packages
Steps to Import Dataset from external R Packages:
- Install and Load the Package
- Load the Dataset
For example: Import the
gapminder
dataset from thegapminder
package
install.packages("gapminder")
library(gapminder)
data(gapminder)
head(gapminder)
#> # A tibble: 6 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Afghanistan Asia 1952 28.8 8425333 779.
#> 2 Afghanistan Asia 1957 30.3 9240934 821.
#> 3 Afghanistan Asia 1962 32.0 10267083 853.
#> 4 Afghanistan Asia 1967 34.0 11537966 836.
#> 5 Afghanistan Asia 1972 36.1 13079460 740.
#> 6 Afghanistan Asia 1977 38.4 14880372 786.
Summary:
- CSV, Excel, Text:
readr::read_csv()
,readxl::read_excel()
,read.table()
- SPSS, Stata, SAS:
haven::read_sav()
,haven::read_dta()
,haven::read_sas()
- RData:
load()
- R’s built-in data:
data()
- Data from any R Package: Install the package, use
data()
Exporting data
Exporting to R’s data formats
- Simply
save()
one or more R objects to an.Rdata
file:
save(mtcars, TemoraBR, file = "two_data_sets.RData")
Exporting to CSV files
To export tibbles and data frames, we can use the
write.csv()
orreadr::write_excel_csv()
functionThis creates CSV file that can opened by spreadsheet software such as Excel
write.csv(mtcars, "data/mtcars3.csv")
::write_excel_csv(mtcars, "data/mtcars4.csv") readr
Exporting to Text Files
- Use
write.table()