(AST230) R for Data Science
R is flexible and free to download (under GNU general public license), and has been widely used in academic environments over last two decades
R is open source and is supported by an extensive user community
R is currently maintained by the R Development Core Team
CRAN (the Comprehensive R Archive Network) is a repository of additional R packages, contributed by the R user community
R is a command line driven program. With R, all the steps used in the analysis (e.g. from reading the data to produce final results) can be saved and can redo the analysis without much effort
Working with scripts makes the steps used in the analysis clear, and the code can be inspected by others (this will improve the codes and remove mistakes, if there is any!)
Working with scripts helps to understand the associated statistical methods more clearly
The term reproducibility is used when someone else (including your future self!) can obtain the same results from the same data set when using the same analysis (coded in scripts)
Now-a-days funding agency and peer-reviewed journals expect the analyses to be reproducible (journals often ask for the data and codes before publishing the accepted manuscripts)
R becomes an integral part of reproducible research and it can be used to generate (dynamic) documents (e.g. manuscripts, report, etc.) from the codes (i.e. a small change of data, analysis, and organization can be updated automatically by running the scripts again)
Download R installer from Comprehensive R Archive Networks (CRAN) https://cran.r-project.org
To download R installer for Windows OS
Go to the page https://posit.co/download/rstudio-desktop/ to download RStudio
RStudio is an integrated development environment (IDE) for R. IDE is a GUI, where you can write your codes, see the results and also see the variables that are generated during the course of programming.
R is the language
RStudio is a software created to facilitate our use of R
The RStudio user interface has 4 primary panes:
Source pane: used to write and edit R codes and other related documents
Console pane: This is the workhorse of R. This is where R evaluates all the code you write.
Environment pane, containing the Environment, History, Connections, Build, VCS , and Tutorial tabs
Output pane, containing the Files, Plots, Packages, Help, Viewer, and Presentation tabs
R is always pointed at a directory on our computer. You can check the file path of your working directory by looking at bar at the top of the Console pane.
We can also find out the working directory by running the getwd()
function in console.
We can set the working directory manually in two ways:
setwd(“directory/path”)
.
setwd()
and give the path of the directory which you want to be the working directory for RStudio, in the double quotes.Once you choose your working directory, you need to use this setting button in the more tab and click it and then you get a popup menu, where you need to select “Set as working directory”