9 Probability and Statistics

Generating random data

  • The function sample() is used to generate random values from a vector, and it has the following arguments:

    • x A vector of outcome you want to sample from
    • size The number of samples (observations) you want to draw
    • replace It can take either TRUE or FALSE
    • prob Specifies probability of selection of different elements of x
sample(x = 1:10, size = 4, replace = F)
#> [1] 4 8 7 9

Select 10 numbers from 0 to 100

sample(x = 0:100, size = 10, replace = F) # replace=FALSE
#>  [1] 73 65 57 85 51  2 72 55 80 86
sample(x = 0:100, size = 10, replace = T) # replace=TRUE
#>  [1] 75 29 76 85 42 13 32  6 31 43

  • Select students’ grades randomly
sample(replace = TRUE, x = LETTERS[1:4], size = 10)
#>  [1] "C" "B" "B" "C" "D" "C" "D" "D" "B" "A"
  • Tossing a fair coin 10 times
sample(replace = TRUE, x = c("H", "T"), size = 10)
#>  [1] "H" "T" "H" "T" "H" "H" "T" "H" "T" "H"
  • Tossing a biased coin 10 times
sample(replace = TRUE, x = c("H", "T"), size = 10, prob = c(.7, .25))
#>  [1] "T" "H" "H" "H" "T" "H" "H" "H" "T" "H"

Use of initial seed in generating random numbers

Without seed:

# No seed
sample(1:10, 3)
#> [1] 1 2 5
# No seed
sample(1:10, 3)
#> [1] 4 5 7
# No seed
sample(1:10, 3)
#> [1] 4 1 7

With seed:

set.seed(100)
sample(1:10, 3)
#> [1] 10  7  6
set.seed(100)
sample(1:10, 3)
#> [1] 10  7  6
set.seed(100)
sample(1:10, 3)
#> [1] 10  7  6

rbinom() and rnorm

  • rbinom() is used to draw a sample from a binomial distribution

    • size number of Bernoulli trials

    • prob probability of success

    • n number of observations

  • Draw a sample of size 8 from B(10,0.75)

rbinom(size = 10, prob = .75, n = 8)
#> [1] 9 8 8 6 8 7 9 7

  • rnorm() is used to draw a sample from a normal distribution

    • mean mean of the distribution (μ)

    • sd standard deviation of the distribution (σ)

    • n number of observations

  • Draw a sample of size 5 from N(10,16)

rnorm(mean = 10, sd = 4, n = 5)
#> [1] 14.743527  8.970948 11.748854  8.539669 11.986696

pnorm()

  • For XN(50,32), find P(45<X<55).

  • P(a<Xb)=F(b)F(a)

pnorm(q = 55, mean = 50, sd = 3) - 
  pnorm(q = 45, mean = 50, sd = 3) 
#> [1] 0.9044193

dnorm()

  • For XBin(10,0.5), find P(X=5).
dbinom(x = 5, size = 10, prob = 0.5)
#> [1] 0.2460938

qnorm()

  • Let Z follows a standard normal distribution. Then the 0.975−quantile is Z0.9751.96. It means the probability of sampling a value less than or equal to 1.96 is 0.975 or 97.5
qnorm(p = 0.975, mean = 0, sd = 1)
#> [1] 1.959964