Data Analysis: R and R Studio


Overview


R is a programming language that many researchers use for transforming and manipulating data. It is also used to perform statistical analyses.

R Studio is an integrated development environment that will help you write and edit in R.

This walkthrough will get you started with R and R Studio so that you can use it to transform data files.

Downloading and Installing R


R:

  1. Go to www.r-project.org
  2. Click the download R link under the “Getting Started” header.
  3. Select a mirror.
  4. Click on the Download R for Windows link at the top of the page.
  5. Click the base link at the top of the page.
  6. Click the Download R [version number] for Windows link at the top of the page.
  7. Follow the installation instructions.

Downloading and Installing RStudio


RStudio:

  1. Go to www.rstudio.com
  2. Click on the Download link underneath the 'RStudio' image.
  3. Scroll to the bottom of the page and select the appropriate installer for your system.
  4. Follow the installation instructions.

Opening a new script


When you open RStudio, there are 2 places you can write in R: the console and a script.

A script is what we normally use. It's like an empty document, and when you write code in it, all your lines will save so you can run them again and again, and you can export the whole thing as a .R file.

The console is where all the code actually runs. When you 'run' a line of code from your script, it basically copy-pastes to the console. You can write in the console, and it will work, but your lines don't get saved. They just run and disappear. So you have to write them again to run them. This is why we use scripts - so we can keep a record and run things again and again without rewriting them.

To open a new script, press the Open New icon in the top left hand corner, then select R Script.


This is your new script. You can write your R code in here.

Combining Files


Adding the Script below to R makes it easy to combine CSV files containing your task or questionnaire data into a single CSV file. The CSV will be formatted so that the final row of one CSV is followed by the first row of the next CSV


Requirements: The below code only works if the separate CSV files have identical column names, such as different questionnaires or different versions of the same questionnaire. For help combining CSVs with different column names, such as a task and a questionnaire, please get in touch with our support team.

We recommend creating a new folder containing only the CSVs you want to combine, as this code will combine every CSV in the folder you specify as your ‘working directory’ (see script).


To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a “#”).


#Set your working directory to the folder in which all your separate CSV files are located
setwd("C:/User/Folder/Subfolder")

#This line creates a list of all the CSV files in the folder you selected above. This list is called "files" 
#(if you type "files" (without quotation marks) into R, it will return a list of your CSV files)
files<-list.files(pattern="*.csv")

#This line imports all of the CSVs listed in "files" above and binds them together so that the first row of the second CSV in
#"files" comes after the last row of the first CSV; the first row of the third CSV follows the last row of the second CSV
#and so on. This combined dataset is called "combined.data" 
combined.data<-do.call("rbind",lapply(files,read.csv,header=TRUE,fill=TRUE)) 

#OPTIONAL: your dataset also has some rows that contain "END OF FILE" and nothing else. You can exclude these rows using this line.
combined.data<-combined.data[combined.data$ï..Event.Index!="END OF FILE",]

#This line exports your combined data as a CSV. This new CSV will be called "combineddata.csv" and will appear in your working directory
write.csv(combined.data,"combineddata.csv",row.names=FALSE)

Pressing CTRL+ENTER will run the line of code you are currently on and move you onto the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code. CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL+ENTER. You can also press CTRL+ALT+R to run the entire script without highlighting anything.

Now you should have a new CSV in the folder you specified as your working directory called “combineddata.csv”. This contains all the data from your separate CSVs.

We will be adding more guides for data transformation using R soon. For more information about Gorilla please consult our support page which contains guides on metrics.