Assignment 1 is now available. Find it here. Submit it on the Google Classroom by 11:59 PM, 24 December 2025

Assignment 1

Solve all questions first using base R functions, then solve them again using tidyverse functions. Create two .qmd files, render each to PDF (one for base R and one for tidyverse), and submit both PDFs on the Google Classroom thread by 11:59 PM, 24 December 2025. You can use this sample qmd file. Name the PDF files as roll_07_baseR.pdf and roll_07_tidyverse.pdf (for example, for roll number 7).

Question 1

The 2014 BDHS data contains information on different variables obtained for 7886 ever-married Bangladeshi women of reproductive age. Below are the definitions of a few variables.

Variable Description Value Labels
caseid case identification
v012 respondent’s current age
v102 type of place of residence 1=Urban, 2=Rural
v106 highest educational level 0=No education, 1=Primary, 2=Secondary, 3=Higher
v130 religion 1=Islam, 2=Hinduism, 3=Christianity, 4=Buddhism, 96=Others
v190 wealth index 1=Poorest, 2=Poorer, 3=Middle, 4=Richer, 5=Richest
  • Load the data and view the head:
bdhs2014 <- haven::read_dta("bdhs2014.dta")
head(bdhs2014, n = c(6, 10))
#> # A tibble: 6 × 10
#>   caseid             hidx v000   v001  v002  v003  v004  v008  v011  v012
#>   <chr>             <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 "      306 29  2"     1 BD6     306    29     2   306  1377  1066    25
#> 2 "      568 87  7"     1 BD6     568    87     7   568  1377  1161    18
#> 3 "      298 37  4"     1 BD6     298    37     4   298  1376  1076    25
#> 4 "      289 95  2"     1 BD6     289    95     2   289  1378  1100    23
#> 5 "      337 25  2"     1 BD6     337    25     2   337  1379   968    34
#> 6 "      500 95  4"     1 BD6     500    95     4   500  1376  1126    20
  • To view the label of a specific variable, use the var_label() function from the labelled package:
library(labelled)
var_label(bdhs2014$v012)
#> [1] "respondent's current age"
  • Alternatively, you can search for variable labels using keywords (e.g., age or weight):
labelled::look_for(bdhs2014, "weight")
#>  pos variable label                           col_type missing
#>  62  v437     respondent's weight in kilogra~ dbl+lbl  4      
#>                                                               
#>                                                               
#>  76  bwt      Weight at Birth in grams        dbl      0      
#>  values            
#>  [9994] not present
#>  [9995] refused    
#>  [9996] other      
#> 

Questions:

  1. How many observations and variables do the bdhs2014 data have?

  2. Rename the variable v130 to religion

  3. Create a subset of the data for women with age (v012) greater than 25, and save it as bdhs_age_20_plus.

  4. Create a new variable named age_gap which will be the difference of the age of husband (v730) and age of the women (v012).

  5. Create a new variable as indicator of early child bearing using the mother’s age at first birth (v212). The new variable named ecb is 1 if mother’s age at first birth is less than 18 years, otherwise ecb=0.

  6. Categorize the numeric variable age (v012) to some groups using the intervals [15-17), [18-19), [20-290), [30-39), [40-50). Set the new variable’s name as age_category.

  7. Sort the dataset in ascending order of the household number (v002)

  8. Find the mean, median, mode, range, standard deviation, and IQR of the respondent’s age (v012)

  9. Create the frequency tables of the variables ecb, age_cateogry, and wealth index (v190), and religion.

  10. Create the bivariate frequency table of the variables Education level (v106) and Type of residence (v025)

Question 2

  • A starwars is a tibble in dplyr package containing 13 variables about the features of 13 characters in the movie.
library(dplyr)
data(starwars)
glimpse(starwars)
#> Rows: 87
#> Columns: 14
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
#> $ sex        <chr> "male", "none", "none", "male", "female", "male", "female",…
#> $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini…
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
#> $ films      <list> <"A New Hope", "The Empire Strikes Back", "Return of the J…
#> $ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
#> $ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…

Questions:

  1. How many humans are contained in starwars overall?

  2. How many humans are contained in starwars by gender?

  3. From which homeworld do the most individuals (rows) come from?

  4. What is the mean height of all individuals with orange eyes from the most popular homeworld?

  5. Compute the median, mean, and standard deviation of height for all droids.