library(tidyverse)
(AST230) R for Data Science
Importing data is the process of loading data from external files into R for analysis.
Most real-world data is stored outside of R (e.g., spreadsheets, databases, web).
Effective data importing is crucial for data cleaning, analysis, and visualization.
read.csv()
:readxl
package:# A tibble: 6 × 8
month time.at.station water.noise number.whales latitude longitude depth
<chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 May 1344 low 7 60.4 -4.18 520
2 May 1633 medium 13 60.4 -4.19 559
3 May 743 medium 12 60.5 -4.62 1006
4 May 1050 medium 10 60.3 -4.35 540
5 May 1764 medium 12 60.4 -5.2 1000
6 May 580 high 10 60.4 -5.22 1000
# ℹ 1 more variable: gradient <dbl>
Using read.table()
:
haven
package:# A tibble: 6 × 13
year_birth age division residence religion edu wealth_index total_birth
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1988 26 Barisal Rural Islam Primary Poorest 2
2 1973 41 Barisal Rural Islam Primary Middle 4
3 1976 38 Barisal Rural Islam Primary Poorest 2
4 1996 18 Barisal Rural Islam Seconda… Poorest 0
5 1986 28 Barisal Rural Islam Primary Poorest 2
6 1980 34 Barisal Rural Islam Primary Poorer 3
# ℹ 5 more variables: current_pregnant <chr>, current_breast_feed <chr>,
# edu_husband <chr>, bmi <dbl>, overweight <dbl>
haven
Package:# A tibble: 6 × 13
year_birth age division residence religion edu wealth_index total_birth
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1988 26 Barisal Rural Islam Primary Poorest 2
2 1973 41 Barisal Rural Islam Primary Middle 4
3 1976 38 Barisal Rural Islam Primary Poorest 2
4 1996 18 Barisal Rural Islam Seconda… Poorest 0
5 1986 28 Barisal Rural Islam Primary Poorest 2
6 1980 34 Barisal Rural Islam Primary Poorer 3
# ℹ 5 more variables: current_pregnant <chr>, current_breast_feed <chr>,
# edu_husband <chr>, bmi <dbl>, overweight <dbl>
haven
Package:# A tibble: 6 × 6
YEAR Y W R L K
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1948 1.21 0.243 0.145 1.41 0.612
2 1949 1.35 0.260 0.218 1.38 0.559
3 1950 1.57 0.278 0.316 1.39 0.573
4 1951 1.95 0.297 0.394 1.55 0.564
5 1952 2.27 0.310 0.356 1.80 0.574
6 1953 2.73 0.322 0.359 1.93 0.711
load()
the .Rdata
FileSometimes you’ll need to assemble a tibble “by hand” doing a little data entry in your R script.
There are two useful functions to help you do this which differ in whether you layout the tibble by columns or by rows. tibble()
works by column.
R comes with several built-in datasets, and we can access them using the data()
function.
List all available datasets:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Steps to Import Dataset from external R Packages:
For example: Import the gapminder
dataset from the gapminder
package
# A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
readr::read_csv()
, readxl::read_excel()
, read.table()
haven::read_sav()
, haven::read_dta()
, haven::read_sas()
load()
data()
data()
save()
one or more R objects to an .Rdata
file:To export tibbles and data frames, we can use the write.csv()
or readr::write_excel_csv()
function
This creates CSV file that can opened by spreadsheet software such as Excel
write.table()