9 Probability and Statistics

Generating random data

  • The function sample() is used to generate random values from a vector, and it has the following arguments:

    • x \(\rightarrow\) A vector of outcome you want to sample from
    • size \(\rightarrow\) The number of samples (observations) you want to draw
    • replace \(\rightarrow\) It can take either TRUE or FALSE
    • prob \(\rightarrow\) Specifies probability of selection of different elements of x
sample(x = 1:10, size = 4, replace = F)
#> [1] 9 4 3 2

Select 10 numbers from 0 to 100

sample(x = 0:100, size = 10, replace = F) # replace=FALSE
#>  [1] 26 30 19 65 43 53 90 76 23 39
sample(x = 0:100, size = 10, replace = T) # replace=TRUE
#>  [1] 30 68 57 14 20 21  1 87 28 55
  • Select students’ grades randomly
sample(replace = TRUE, x = LETTERS[1:4], size = 10)
#>  [1] "B" "A" "C" "A" "A" "A" "C" "A" "D" "B"
  • Tossing a fair coin 10 times
sample(replace = TRUE, x = c("H", "T"), size = 10)
#>  [1] "H" "H" "H" "H" "T" "H" "H" "T" "T" "H"
  • Tossing a biased coin 10 times
sample(replace = TRUE, x = c("H", "T"), size = 10, prob = c(.7, .25))
#>  [1] "T" "T" "H" "H" "H" "H" "T" "H" "H" "T"

Use of initial seed in generating random numbers

Without seed:

# No seed
sample(1:10, 3)
#> [1] 1 2 5
# No seed
sample(1:10, 3)
#> [1] 4 5 7
# No seed
sample(1:10, 3)
#> [1] 4 1 7

With seed:

set.seed(100)
sample(1:10, 3)
#> [1] 10  7  6
set.seed(100)
sample(1:10, 3)
#> [1] 10  7  6
set.seed(100)
sample(1:10, 3)
#> [1] 10  7  6
Distribution Density Function Cumulative Distribution Quantile Random Variates
Normal dnorm() pnorm() qnorm() rnorm()
Poisson dpois() ppois() qpois() rpois()
Binomial dbinom() pbinom() qbinom() rbinom()
Uniform dunif() punif() qunif() runif()
  • Such functions are available for other probability distributions, such as exponential, logistic, Chi-squared etc.

rbinom() and rnorm

  • rbinom() is used to draw a sample from a binomial distribution

    • size \(\rightarrow\) number of Bernoulli trials

    • prob \(\rightarrow\) probability of success

    • n \(\rightarrow\) number of observations

  • Draw a sample of size 8 from \(B(10, 0.75)\)

rbinom(size = 10, prob = .75, n = 8)
#> [1] 9 8 8 6 8 7 9 7
  • rnorm() is used to draw a sample from a normal distribution

    • mean \(\rightarrow\) mean of the distribution \((\mu)\)

    • sd \(\rightarrow\) standard deviation of the distribution \((\sigma)\)

    • n \(\rightarrow\) number of observations

  • Draw a sample of size 5 from \(N(10, 16)\)

rnorm(mean = 10, sd = 4, n = 5)
#> [1] 14.743527  8.970948 11.748854  8.539669 11.986696

pnorm()

  • For \(X \sim N(50, 3^2)\), find \(P(45<X<55)\).

  • \(P(a < X ≤ b) = F(b) − F(a)\)

pnorm(q = 55, mean = 50, sd = 3) - 
  pnorm(q = 45, mean = 50, sd = 3) 
#> [1] 0.9044193

dnorm()

  • For \(X \sim Bin(10, 0.5)\), find \(P(X=5)\).
dbinom(x = 5, size = 10, prob = 0.5)
#> [1] 0.2460938

qnorm()

  • Let \(Z\) follows a standard normal distribution. Then the 0.975−quantile is \(Z_{0.975} ≈ 1.96\). It means the probability of sampling a value less than or equal to 1.96 is 0.975 or \(97.5%\)
qnorm(p = 0.975, mean = 0, sd = 1)
#> [1] 1.959964