Data Analysis: R and R Studio


Overview


R is a programming language that many researchers use for transforming and manipulating data. It is also used to perform statistical analyses.

R Studio is an integrated development environment that will help you write and edit in R.

This walkthrough will get you started with R and R Studio so that you can use it to transform data files.

Downloading and Installing R


R:

  1. Go to www.r-project.org
  2. Click the download R link under the “Getting Started” header.
  3. Select a mirror.
  4. Click on the Download R for Windows link at the top of the page.
  5. Click the base link at the top of the page.
  6. Click the Download R [version number] for Windows link at the top of the page.
  7. Follow the installation instructions.

Downloading and Installing RStudio


RStudio:

  1. Go to www.rstudio.com
  2. Click on the Download link underneath the 'RStudio' image.
  3. Scroll to the bottom of the page and select the appropriate installer for your system.
  4. Follow the installation instructions.

Opening a new script


When you open Rstudio, you will see a box on the left called Console, a box in the upper right with Environment and History, and a box in the lower right with Files, Plots, Packages and Help. To input commands easily, you will need to open a fourth box, called a Script.

A script is like a document in which you can write and save code. The code in your script can be run easily and repeatedly. You can also copy and paste code from others into your own script, and export your script as a .R file so others can use it.

To open a new script, press the Open New icon in the top left hand corner, then select R Script.


This is your new script. You can write your code or copy and paste code from others into here.

Combining Files


Adding the Script below to R makes it easy to combine CSV files containing your task or questionnaire data into a single CSV file. The CSV will be formatted so that the final row of one CSV is followed by the first row of the next CSV


Requirements: The below code only works if the separate CSV files have identical column names, such as different questionnaires or different versions of the same questionnaire. For help combining CSVs with different column names, such as a task and a questionnaire, please get in touch with our support team.

We recommend creating a new folder containing only the CSVs you want to combine, as this code will combine every CSV in the folder you specify as your ‘working directory’ (see script).


To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a “#”).


#Set your working directory to the folder in which all your separate CSV files are located
setwd("C:/User/Folder/Subfolder")

#This line creates a list of all the CSV files in the folder you selected above. This list is called "files" 
#(if you type "files" (without quotation marks) into R, it will return a list of your CSV files)
files<-list.files(pattern="*.csv")

#This line imports all of the CSVs listed in "files" above and binds them together so that the first row of the second CSV in
#"files" comes after the last row of the first CSV; the first row of the third CSV follows the last row of the second CSV
#and so on. This combined dataset is called "combined.data" 
combined.data<-do.call("rbind",lapply(files,read.csv,header=TRUE,fill=TRUE)) 

#OPTIONAL: your dataset also has some rows that contain "END OF FILE" and nothing else. You can exclude these rows using this line.
combined.data<-combined.data[combined.data$ï..Event.Index!="END OF FILE",]

#This line exports your combined data as a CSV. This new CSV will be called "combineddata.csv" and will appear in your working directory
write.csv(combined.data,"combineddata.csv",row.names=FALSE)

Pressing CTRL+ENTER will run the line of code you are currently on and move you onto the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code. CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL+ENTER. You can also press CTRL+ALT+R to run the entire script without highlighting anything.

Now you should have a new CSV in the folder you specified as your working directory called “combineddata.csv”. This contains all the data from your separate CSVs.

We will be adding more guides for data transformation using R soon. For more information about Gorilla please consult our support page which contains guides on metrics.