15 ggplot2
Data vizualization with ggplot2
ggplot2
-
base Rplot functions require more effort and expertise to create high-quality publishable graphs - Over the years, many R packages (e.g.
lattice,grid, etc.) were introduced to overcome the limitations ofbase Rplot functions - The newest addition to R plot functions is
ggplot2package and it can be used to produce elegant plots without much effort!
-
ggplot2package implements thegrammar of graphics, a coherent system for describing and building graphs - To load
ggplot2package to the current R environment
1 Scatter plot
ggplot(data = penguins) +
geom_point(aes(x = bill_length_mm, y = flipper_length_mm))
Creating a ggplot
ggplot(data = penguins)
- A blank slate: It creates a coordinate system to which several layers can be added
- All plot functions of
ggplot2package begin with theggplot()function -
datais the first argument ofggplot()and it specifies the data frame to be used for the plot - One or more layers can be added to
ggplot()using a plus (+) sign
geom_point
- Geometric objects (called
geom) are the shapes we put on a plot (e.g. points, bars, etc.). - You can have an unlimited number of layers, but at a minimum a plot must have at least one geom
-
geom_point()makes a scatter plot by adding a layer of points. -
geom_line()adds a layer of lines connecting data points. -
geom_col()adds bars for bar charts. -
geom_histogram()makes a histogram. -
geom_boxplot()adds boxes for boxplots
-
mapping = aes()
- Each type of
geomusually has a required set of aesthetics to be set. Aesthetic mappings are set with theaes()function. Examples include-
xandy(the position on the x and y axes) -
color(“outside” color, like the line around a bar) -
fill(“inside” color, like the color of the bar itself) -
shape(the type of point, like a dot, square, triangle, etc.) -
linetype(solid, dashed, dotted etc.) -
size(of geoms)
-
Adding labels, title, and caption to a graph
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm)
) +
labs(
x = "Bill length",
y = "Flipper length",
title = "Scatter plot of bill and flipper length",
caption = "R package palmerpenguins") +
theme_bw()

2 Histogram
-
geom_histogram()is for histogram - Only
xvalue is needed for itsaes()function
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g)
)
-
fillargument ofgeom_histogram()modifies color of the bars
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g,
y = after_stat(density)),
fill = "steelblue") 
-
colargument ofgeom_histogram()modifies sides of the bars
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g,
y = after_stat(density)),
fill = "steelblue",
col = "white") 
Histogram and density function
-
geom_density()is used to obtain the density of a variable
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g,
y = after_stat(density)),
fill = "steelblue",
col = "white") +
geom_density(
mapping = aes(x = body_mass_g,
y = after_stat(density)),
col = "brown", size = 1) 
- A common mapping function in
ggplot()for differentgeom_*()
ggplot(data = penguins,
mapping = aes(x = body_mass_g,
y = after_stat(density))) +
geom_histogram(fill = "steelblue",
col = "white") +
geom_density(col = "brown",
size = 1) 
Exercise 3.2.1
(use diamonds data to answer the followings)
Create a histogram of
caratand check the effect ofbinson histogramAdd a density line to the plot obtained in Question 1
3 Boxplot
-
geom_boxplot()is used to obtain a boxplot
ggplot(data = penguins) +
geom_boxplot(
mapping = aes(x = species,
y = body_mass_g)
)
Boxplot
ggplot(data = penguins) +
geom_boxplot(
mapping = aes(x = species,
y = body_mass_g)
) +
coord_flip() 
Boxplot with original data points
ggplot(data = penguins,
mapping = aes(x = species,
y = body_mass_g)) +
geom_boxplot() +
geom_jitter(width = .2,
mapping = aes(col = species),
size = .75) 
-
geom_jitter()adds a small amount random variation to each point and it is useful to visualize points at different levels
Exercise 3.2.2
(use diamonds data to answer the followings)
Create a boxplot of
caratat different levels ofcutCreate a scatter plot to examine the effect of
caratonprice
Aesthetic mappings
Aesthetic mappings
A third variable can be added to a two-dimensional scatter plot by mapping it to an aesthetic
A aesthetic is a visual property (such as the size, shape, and color of the points) of the plot
Points of a plot can be displayed in different ways by changing the levels of its aesthetic properties (e.g. size, shape, or color of points can be changed)
Variables can be linked to the graph using the following properties
positions (
x,y)colors (
color,fill)shapes (
shape,linetype)size (
size)transparency (
alpha)groupings (
group)
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm,
col = species)
)
-
colis specified by different levels ofspecies
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm,
col = bill_length_mm > 50),
show.legend = FALSE
)
colis specified by a function ofbill_length_mmshow_legendis a logical argument ofgeom_*
-
Besides
col, some other aesthetic types are useful inggplot2size assigns different sizes of the points to different values of the variablealpha controls the transparency of the pointshape assigns different (at most six) shapes to different values of the variable
ggplot2creates a legend for the variables used in the arguments ofaes()except forxandy
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm,
alpha = species)
)
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm,
shape = species)
)
Aesthetic properties can also be set manually, e.g.
col = "blue"will make all the points blue, which does not convey any information about a variable but only changes the appearance of the plotTo set an aesthetic manually, the aesthetic type needs to be defined outside of
aes()as an argument ofgeom_??function
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm),
col = "blue") 
ggplot(data = penguins) +
geom_point(
mapping = aes(x = bill_length_mm,
y = flipper_length_mm),
col = "blue", alpha = .5) 
geom_smooth()
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = flipper_length_mm)) +
geom_point() +
geom_smooth() 
-
geom_smooth()fits the relationship between two quantitative using a smoothing method
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = flipper_length_mm)) +
geom_point() +
geom_smooth(method = "lm") 
ggplot(data = penguins,
mapping = aes(x = bill_length_mm,
y = flipper_length_mm,
col = species)) +
geom_point(size = .75) +
geom_smooth(method = "lm", se = FALSE) 
Exercise 3.2.3
(use diamonds data to answer the followings)
Create a scatter plot to examine the effect of
priceoncaratand assign different colors to different levels ofcutShow a fit of a linear model on the scatter plot of
caratandpriceShow different fits of linear models (
priceoncarat) corresponding to different levels ofcuton the scatter plot ofpriceandcarat
Facets
Adding information about a new variable to an existing plot could be helpful for data analysis (e.g. aesthetic)
-
facetscan add information about a categorical variable to an existing plot by splitting the plot according to the levels of the categorical variablefacet_wrap() splits the plot by a single variablefacet_grid() splits the plot by the combination of two variables
ggplot(data = penguins) +
geom_point(mapping = aes(x = flipper_length_mm,
y = bill_length_mm)) +
facet_wrap(~species) 
ggplot(data = penguins,
mapping = aes(
x = flipper_length_mm, y = bill_length_mm, col = species)) +
geom_point() +
geom_smooth(
method = "lm", se = FALSE, col = "black") +
facet_wrap(~species) 
ggplot(data = penguins[!is.na(penguins$sex), ]) +
geom_point(mapping = aes(x = flipper_length_mm,
y = bill_length_mm)) +
facet_wrap(~ sex + species) 
ggplot(data = penguins[!is.na(penguins$sex), ]) +
geom_point(mapping = aes(x = flipper_length_mm,
y = bill_length_mm)) +
facet_grid(sex ~ species) 
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g),
col = "brown", fill = "yellow"
) +
facet_wrap(~species, ncol = 1)
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g),
col = "brown", fill = "yellow"
) +
facet_wrap(~species, ncol = 1) +
theme_minimal() 
ggplot(data = penguins) +
geom_histogram(
mapping = aes(x = body_mass_g),
col = "brown", fill = "yellow"
) +
facet_wrap(~species, ncol = 1) +
theme_minimal() +
theme(
panel.grid = element_blank()
)
Exercise 3.2.4
(use diamonds data to answer the followings)
- Create histogram of
xat different levels ofcut
4 Density plot
Distribution of penguins’ body mass
ggplot(data = penguins) +
geom_density(
mapping = aes(x = body_mass_g,
fill = species),
alpha = .5) 
Density plot
Distribution of penguins’ body mass
ggplot(data = penguins) +
geom_density(
mapping = aes(x = body_mass_g,
fill = species),
alpha = .5) +
theme(
legend.position = "top",
legend.title = element_blank()
)
Distribution of penguins’ body mass
ggplot(data = penguins) +
geom_density(
mapping = aes(x = body_mass_g,
fill = species),
alpha = .5) +
theme_minimal(base_size = 7) +
theme(
legend.position = "top",
legend.title = element_blank()
)
Distribution of penguins’ body mass
ggplot(data = penguins) +
geom_density(
mapping = aes(x = body_mass_g,
fill = species),
alpha = .5) +
theme_minimal(base_size = 7) +
theme(
legend.position = "top",
legend.key.size = unit(.75, "lines"),
legend.title = element_blank()
)
5 Barchart
Frequency distribution of species
Frequency distribution of species
ggplot(data = penguins) +
geom_bar(aes(x = species)) +
theme_minimal(base_size = 16) 
Barchart with two variables
Distribution of species by year
Exercise 3.2.5
(use diamonds data to answer the followings)
Create a barplot of
cutCreate a barplot of
colorCreate a barplot of
cutwith showing the distribution ofcolorat different levels ofcutCheck the use three different value of the argument
positionwhen creating a barplot withcutandcolor
Homework
Use the package
gapminderto get an access to the datagapmindergapminderhas 6 variables and 1704 observations, where the variables are:
#> [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
Create a scatter plot to examine how
gdpPercapaffectslifeExpChange the scale of x-axis to log base 10
Add a color layer corresponding to
continentto the previous graph
Create a scatter plot of
gdpPercapversuslifeExpfor different continents in different plotting regionsAdd smooth lines to describe relationship between
gdpPercapandlifeExpfor different continents separatelyDraw a boxplot of
lifeExpto compare distribution life expectancy for different continentsDraw a histogram of
lifeExpand check it shapes for different bin sizeDraw density plots of
lifeExpfor different continents in a single plot
Make a scatter plot of
lifeExpon the y-axis againstyearon thexFit a straight line to estimate mean life expectancy for a year for different countries
Split the plot for different continents
Add a continent-specific mean line to the plot
Statistical layers geom_*() vs stat_*()
ggplot(penguins,
aes(bill_length_mm, flipper_length_mm)) +
geom_smooth(stat = "smooth")
ggplot(penguins,
aes(bill_length_mm, flipper_length_mm)) +
stat_smooth(geom = "smooth")
ggplot(penguins,
aes(bill_length_mm, flipper_length_mm)) +
geom_point(stat = "identity")
ggplot(penguins,
aes(bill_length_mm, flipper_length_mm)) +
stat_identity(geom = "point")
ggplot(penguins, aes(species)) +
stat_count(geom = "bar")
Statistical summaries
ggplot(penguins,
aes(species, flipper_length_mm)) +
stat_summary()
ggplot(penguins,
aes(species, flipper_length_mm)) +
stat_summary(
fun.data = mean_se,
geom = "pointrange"
)
ggplot(penguins,
aes(species, flipper_length_mm)) +
geom_boxplot() +
stat_summary(
fun = mean,
geom = "point",
col = "red",
size = 3
)




