hist(x, ...)
14 Base R plots
Graphical presentations in Statistics
Graphs are helpful for presenting a summary of data and results of a statistical analysis
John W. Tukey, the father of exploratory data analysis once said
The greatest value of a picture is when it forces us to notice what we never expected to see.
Base R plot functions
Graphical presentations of data
There are several graphs available to use for describing data, and the selection of the most appropriate graph depends on the data type and the research objectives
-
Quantitative data
Histogram
Boxplot
Scatter plot
-
Qualitative data
Bar chart
Pie chart
Bivariate analysis involves two variables, depending on the combinations of the variables, i.e., qualitative or quantitative, there are different ways of presenting data graphically
-
Quantitative-qualitative combination
- Histogram and boxplot can be used for different levels of a qualitative variable
-
Quantitative-quantitative and qualitative-qualitative combinations
- Bar chart and scatter plot can be used
1 Histogram
-
x
is a quantitative vector
hist(x = penguins$body_mass_g)
-
Some useful arguments of
hist()
:-
xlab
,main
,probability
, etc.
-
hist(
x = penguins$body_mass_g,
xlab = "Body mass",
main = "Histogram of penguins'
body mass"
)
hist(
x = penguins$body_mass_g,
xlab = "Body mass",
main = "Histogram of penguins'
body mass",
breaks = 20,
probability = T
)
hist(
x = penguins$body_mass_g,
xlab = "Body mass",
main = "Histogram of penguins'
body mass",
breaks = 20,
probability = T
)
#
lines(
density(x = penguins$body_mass_g,
na.rm = T),
lwd = 3, col = "brown"
)
Exercise 3.1.1
(use mtcars
data frame to answer the followings)
Create a histogram of
mpg
with appropriate labelsAdd density line to the plot obtained in Question 1.
2 Boxplot
-
Boxplot is a useful graphical tool that can be used to compare distribution of a quantitative variable at different levels of a qualitative variable
- E.g. examine the distribution of body mass over different species of penguins
-
boxplot()
function can be used for both univariate and bivariate analysisboxplot(x)
is used to obtain a boxplot of a single quantitative vectorx
boxplot(formula, data)
function is used for a bivariate analysis, where the formula species the quantitative and qualitative variables of interestformula = quant_var ~ qual_var
data
is a data frame that must containquant_var
andqual_var
boxplot(x = penguins$body_mass_g)
boxplot(formula = body_mass_g ~ species,
data = penguins)
Exercise 3.1.2
(use mtcars
data frame to answer the followings)
Create a boxplot of
qsec
with appropriate labels.Create a boxplot of
mpg
to compare its distribution at different levels ofcyl
3 Scatter plot
The function
plot(x, y)
is used to obtain a scatter plot of two quantitative variablesx
andy
-
Some useful arguments of
plot()
functionxlab
,ylab
,main
pch
(point type)cex
(size of points), etc.
#
plot(x = penguins$bill_length_mm,
y = penguins$flipper_length_mm)
plot(x = penguins$bill_length_mm,
y = penguins$flipper_length_mm,
xlab = "Flipper length")
plot(
x = penguins$flipper_length_mm,
y = penguins$bill_length_mm,
xlab = "Flipper length",
pch = 20,
cex = 1.5,
col = "brown"
)
Scatter plot with a linear model fit
plot(
x = penguins$flipper_length_mm,
y = penguins$bill_length_mm,
xlab = "Flipper length",
pch = 20,
cex = 1.5,
col = "brown")
#
mod1 <- lm(bill_length_mm ~ flipper_length_mm,
data = penguins)
#
abline(mod1, col = "blue", lwd = 4)
-
lm()
is for fitting a linear model
Exercise 3.1.3
(use mtcars
data frame to answer the followings)
Create a scatter plot to examine the association between
mpg
anddisp
Add the fit of a linear regression model
mpg
ondisp
to the plot obtained in Question 6
4 Bar chart
Bar chart is used to examine the distribution of a qualitative variable
The function
barplot(height, ...)
is used to obtain a bar chart in R, whereheight
represents a frequencytable()
function takes a qualitative variable as an argument and returnsheight
, the frequency corresponding to each level of the qualitative variable
Frequency distribution of species
table(penguins$species)
#>
#> Adelie Chinstrap Gentoo
#> 152 68 124
Exercise 3.1.4
(use mtcars
data frame to answer the followings)
- Create a barchart if
cyl
Bivariate analysis
The function
par()
has many arguments that can be used to produce high-quality graphs using base R plot functions-
mfrow
argument ofpar()
is used to split a figure layout into a number of rows and columns- E.g.
mfrow = c(2, 3)
will split the figure layout into two rows and three columns
- E.g.
Distribution of bill_length_mm
at different levels of species
par(mfrow = c(1, 3))
hist(x = penguins$bill_length_mm[penguins$species == "Adelie"])
hist(x = penguins$bill_length_mm[penguins$species == "Gentoo"])
par(mfrow = c(1, 3))
hist(x = penguins$bill_length_mm[penguins$species == "Adelie"])
hist(x = penguins$bill_length_mm[penguins$species == "Gentoo"])
hist(x = penguins$bill_length_mm[penguins$species == "Chinstrap"])
Exercise 3.1.5
- Association between bill and flipper lengths by species