| year | Adelie | Chinstrap | Gentoo |
|---|---|---|---|
| 2007 | 50 | 26 | 34 |
| 2008 | 50 | 18 | 46 |
| 2009 | 52 | 24 | 44 |
Wide-format data
| year | Adelie | Chinstrap | Gentoo |
|---|---|---|---|
| 2007 | 50 | 26 | 34 |
| 2008 | 50 | 18 | 46 |
| 2009 | 52 | 24 | 44 |
Long-format data
| year | species | n |
|---|---|---|
| 2007 | Adelie | 50 |
| 2007 | Chinstrap | 26 |
| 2007 | Gentoo | 34 |
| 2008 | Adelie | 50 |
| 2008 | Chinstrap | 18 |
| 2008 | Gentoo | 46 |
| 2009 | Adelie | 52 |
| 2009 | Chinstrap | 24 |
| 2009 | Gentoo | 44 |
The package tidyr (included in tidyverse) has two very useful function for reshaping data.
pivot_longer()
pivot_wider()
pivot_longer() functio converts an wide format data to a long format data
It is required to mention which columns (variables) should be combined into a single variable and it will return two new variables based on the column names and values of the selected columns
The first variable will contain the names of the selected columns
The second variable will contain the values of the selected columns
The syntax of the function pivot_longer()
data
cols \(\rightarrow\) selected variables
names_to \(\rightarrow\) selected variable (column) names
values_to \(\rightarrow\) A character vector specifying the new column to create from the information stored in names_to argument
#> # A tibble: 9 × 3
#> year species body_mass
#> <int> <chr> <int>
#> 1 2007 Adelie 50
#> 2 2007 Chinstrap 26
#> 3 2007 Gentoo 34
#> 4 2008 Adelie 50
#> 5 2008 Chinstrap 18
#> 6 2008 Gentoo 46
#> 7 2009 Adelie 52
#> 8 2009 Chinstrap 24
#> 9 2009 Gentoo 44
pivot_wider() function converts a long-format data to an wide-format data
It is required to mention which columns (variables) should be combined and it will create two new variables based on the column names and values of the selected columns
The first variable will contain the names of the selected columns
The second variable will contain the values of the selected columns
The syntax of the function pivot_wider()
data
id_cols \(\rightarrow\) unique idetifier of a column
names_from \(\rightarrow\) selected variable names
values_from \(\rightarrow\)
Starting with penguins, find counts of observation by species, island and year.
Starting with penguins, filter to only keep Adelie and Gentoo penguins, then find counts by species and sex.
Add a new column to penguins called year that contains:
“Year 1” if the year is 2007
“Year 2” if the year is 2008
“Year 3” if the year is 2009
flipper_length_mm and body_mass_g variables.Add a new column called fm_ratio that contains the ratio of flipper length to body mass for each penguin.
Next, add another column named ratio_bin which contains the word “high” if fm_ratio is greater than or equal to 0.05, “low” if the ratio is less than 0.05, and “no record” if anything else (e.g. NA).
