Data Analysis: R and R Studio


Overview


R is a programming language that many researchers use for transforming and manipulating data. It is also used to perform statistical analyses.

R Studio is an integrated development environment that will help you write and edit in R.

This walkthrough will get you started with R and R Studio so that you can use it to transform data files.

Downloading and Installing R


R:

  1. Go to www.r-project.org
  2. Click the download R link under the “Getting Started” header.
  3. Select a mirror.
  4. Click on the Download R for Windows link at the top of the page.
  5. Click the base link at the top of the page.
  6. Click the Download R [version number] for Windows link at the top of the page.
  7. Follow the installation instructions.

Downloading and Installing RStudio


RStudio:

  1. Go to www.rstudio.com
  2. Click on the Download link underneath the 'RStudio' image.
  3. Scroll to the bottom of the page and select the appropriate installer for your system.
  4. Follow the installation instructions.

Opening a new script


When you open Rstudio, you will see a box on the left called Console, a box in the upper right with Environment and History, and a box in the lower right with Files, Plots, Packages and Help. To input commands easily, you will need to open a fourth box, called a Script.

A script is like a document in which you can write and save code. The code in your script can be run easily and repeatedly. You can also copy and paste code from others into your own script, and export your script as a .R file so others can use it.

To open a new script, press the Open New icon in the top left hand corner, then select R Script.


This is your new script. You can write your code or copy and paste code from others into here.

Combining Files


Adding the Script below to R makes it easy to combine CSV files containing your task or questionnaire data into a single CSV file. The CSV will be formatted so that the final row of one CSV is followed by the first row of the next CSV


Requirements: The below code only works if the separate CSV files have identical column names, such as different questionnaires or different versions of the same questionnaire. For help combining CSVs with different column names, such as a task and a questionnaire, please get in touch with our support team.

We recommend creating a new folder containing only the CSVs you want to combine, as this code will combine every CSV in the folder you specify as your ‘working directory’ (see script).


To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a “#”).


#Set your working directory to the folder in which all your separate CSV files are located
setwd("C:/User/Folder/Subfolder")

#This line creates a list of all the CSV files in the folder you selected above. This list is called "files" 
#(if you type "files" (without quotation marks) into R, it will return a list of your CSV files)
files<-list.files(pattern="*.csv")

#This line imports all of the CSVs listed in "files" above and binds them together so that the first row of the second CSV in
#"files" comes after the last row of the first CSV; the first row of the third CSV follows the last row of the second CSV
#and so on. This combined dataset is called "combined.data" 
combined.data<-do.call("rbind",lapply(files,read.csv,header=TRUE,fill=TRUE)) 

#OPTIONAL: your dataset also has some rows that contain "END OF FILE" and nothing else. You can exclude these rows using this line.
combined.data<-combined.data[combined.data$ï..Event.Index!="END OF FILE",]

#This line exports your combined data as a CSV. This new CSV will be called "combineddata.csv" and will appear in your working directory
write.csv(combined.data,"combineddata.csv",row.names=FALSE)

Pressing CTRL+ENTER will run the line of code you are currently on and move you onto the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code. CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL+ENTER. You can also press CTRL+ALT+R to run the entire script without highlighting anything.

Now you should have a new CSV in the folder you specified as your working directory called “combineddata.csv”. This contains all the data from your separate CSVs.

We will be adding more guides for data transformation using R soon. For more information about Gorilla please consult our support page which contains guides on metrics.

Mousetracking Data to Plots


This guide contains code for converting Gorilla mouse-tracking data to mousetrap format and using mousetrap functions to create plots.


Requirements: Make sure that all the individual mouse-tracking files (in xlsx format) are together in the same folder with no other unrelated xlsx files.


To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a “#”).


#Mouse tracking in Gorilla gives two types of dataset: response data and mouse-tracking data. 
#Response data are the normal Gorilla task data, showing what screens participants saw, how many answers they got correct etc.
#Mouse-tracking data present coordinates for zones and mouse movements within an individual trial. 
#There should be one response dataset in total, and one mouse-tracking dataset for each trial in the experiment.

#Before you start the guide below, make sure that all the individual mouse-tracking files (in xlsx format) are together in 
#the same folder with no other unrelated xlsx files.

#Note: the latest version of mousetrap at time of writing is v3.1.2.

############################## 1) Load all the packages you will need. ##############################
#library() loads packages. You need to install packages before you can load them. If you do not have any of the packages
#listed below installed, run the install.packages line below for each of the packages listed. 
#install.packages("mousetrap")
library(mousetrap)
library(readxl)
library(data.table)
library(dplyr)
library(ggplot2)

############################## 2) Import all the Gorilla mouse-tracking data into R. ##############################
#Here is an example way of importing all your mouse-tracking excel files into R. You can use other ways if you want to, as 
#long as they end up with a single data frame containing all the mouse-tracking data together.
#Set your working directory as the folder in which you have saved your data files. This tells R where to look for files.
setwd("C:/Users/Adam Flitton/Desktop/mousetracking")

#Create a list of the xlsx files in your working directory.
xslxfiles<-list.files(pattern="*.xlsx")

#Import all the excel files in the list.
xlsx.df.list<-lapply(xslxfiles,read_excel)

#Combine all these imported excel files into a single data frame. 
gorilla.traj<-rbindlist(xlsx.df.list)

############################## 3) Prepare the data for importing as an mousetrap object. ##############################
#Drop all the rows that contain coordinates for the zones. In these rows there is no mouse tracking data, so we do not need
#them for now.
gorilla.traj<-gorilla.traj[gorilla.traj$type=="mouse",]

#Gorilla data gives you normalised coordinates as well as raw ones. We can only import raw or normalised at the same time.
#Let's drop the normalised ones so we can just import the raw data.
gorilla.traj<-within(gorilla.traj,rm(x_normalised,y_normalised))

############################## 4) Use mousetrap's import function to write in the Gorilla data. ##############################
#This function requires you to specify columns in your dataset including x, y and timestamps. 
#It also requires a column in your dataset that shows the different trials (argument "mt_id_label"). 
#row_index should cover this in most datasets, as it prints what row of the spreadsheet the participant is on, which is more
#often than not a new trial.
#If you have a different trial column, replace "row_index" with whatever this column is called in the code below. 
gorilla.mt<-mt_import_long(gorilla.traj,xpos_label="x",ypos_label="y",timestamps_label="time_stamp",
                           mt_id_label="row_index")

#Now you have your mousetrap data object! It's a list containing two data frames:
gorilla.mt[["trajectories"]] 
#This contains your coordinates data in mousetrap format. The rows in this dataset are specified by the mt_id_label above.
gorilla.mt[["data"]] 
#This is basically empty at the moment, containing a row for each trial without much else. It isn't necessary
#to have much data in this data frame to use many of the functions in mousetrap (we will add a column to it later that 
#shows what condition the trial is in as this helps with plotting different conditions)

############################## 5) Use mousetrap's measures function to get summary statistics. ##############################
gorilla.mt<-mt_measures(gorilla.mt, use = "trajectories",save_as="measures",
                        dimensions = c("xpos", "ypos"), timestamps = "timestamps",
                        verbose = FALSE)

#This adds another data frame to your mousetrap list called "measures" that contains useful summary statistics. 
View(gorilla.mt[["measures"]])

############################## 6) Optional: conditions. ##############################
#If your study does not have conditions or anything else that you want to break the results down by, move on to the next step. 
#Below is one way of adding in conditions to your mouse-tracking dataset. This way works if you have a column in your Gorilla
#response dataset that shows what conditions participants are in. This might be a metadata column, for instance.
#This way also assumes that the Trial Number column in your Gorilla response dataset corresponds to different trials, just 
#like row_index does for the mouse-tracking data. 

#It may be that your data are in a different format. If so, see step 9 for an alternative (more manual) way of adding conditions.

#First, import the Gorilla response data. In this example this excel file is in the same working directory as before (as it is
#a CSV rather than an xlsx, it wasn't lumped in with the mouse-tracking data earlier). It may be that your response data are
#in a different place, in which case change your working directory again (remove the hash from the line below and change its path)
#before you read in the data.
#setwd("C:/Users/Adam Flitton/Desktop/mousetracking/PLACE WHERE RESPONSE DATA ARE")
gorilla.response.data<-read.csv("BetaZoneMouseTrackingData.csv",na.strings=c("","NA"))

#Stratify these data to only include rows in which a participant made an attempt. These rows are the response rows that give
#information about what answer they gave, how long it took them, whether it was correct etc. Other rows (that this function
#removes) just provide links to the mouse-tracking datasets and show fixations/continuation screens. 
gorilla.response.attempts<-gorilla.response.data[gorilla.response.data$Attempt %in% 1,]

#Now we have a dataset that shows what condition the different trials were in. In this example, the column with the condition
#information is called "Metadata" and the column with the trial information is called "Trial.Number". 
#We need to take this condition information and add it to our mousetrap data, making sure to match up the trials. 
#In the mousetrap dataset, trial number is shown in the column "row_index". So we need to make sure that we match up "Trial.
#.Number" with "row_index".
#Remember: if you used a column that is not "row_index" to specify trials when you imported your data above, use the name of
#this column instead of "row_index" in all of the code below.
#The code below creates a new column in our mousetrap object called "condition" and populates it with the entries in the 
#"Metadata" column mentioned above, matching by "row_index" and "Trial.Number".

gorilla.mt[["data"]]$condition<-
  gorilla.response.attempts$Metadata[match(gorilla.mt[["data"]]$row_index,gorilla.response.attempts$Trial.Number)]

#You can add any column from your response dataset using this same matching. Below makes a different column called "Correct"
#then populates the column with whether a response was correct from the "Correct" column (rather than the "Metadata" column). 
#gorilla.mt[["data"]]$Correct<-
#  gorilla.response.attempts$Correct[match(gorilla.mt[["data"]]$row_index,gorilla.response.attempts$Trial.Number)]

#Note: you can also use the above to break data down by any other group, such as participant. Just use the column that
#distinguishes between participants instead of the column that distinguishes between conditions. 

############################## 7) Optional: rectangles for graphs. ##############################
#In the step after this one you will make some graphs from the data. It might be that you want to present rectangles on this
#graph that show where the buttons and stimuli were on the screen. If you don't want to do this, move on to the next step.

#Create a copy of the original unaltered mouse-tracking dataset that contains zone names. 
gorilla.traj.rectangles<-rbindlist(xlsx.df.list)

#Create a matrix from the zone coordinates in the mouse-tracking dataset. 
#Make a variable containing the names of the zones that you want to represent with rectangles on the graph.
#The zones presented to participants in the task are listed in the "zone_name" column of the mouse-tracking dataset. In this
#example I assume that the rectangles that will be presented on the graph correspond to the zones called "Stimulus", 
#"ButtonLeft", and "ButtonRight" in the data. You can change these names in the row below to be whatever you have called them. 
matrix.contents<-c("Stimulus","ButtonLeft","ButtonRight")

#Now extract the zone_x, zone_y, zone_width and zone_height values for the zones listed in the matrix.contents variable
#specified above.
matrix.data<-filter(gorilla.traj.rectangles, zone_name %in% matrix.contents)
matrix.data<-matrix.data[1:length(matrix.contents),c("zone_x","zone_y","zone_width","zone_height")]

#Put these data in a matrix that will be referred to in the plot function later. 
rectangles<-as.matrix(sapply(matrix.data, as.numeric)) 
rectangles<-unname(rectangles)

############################## 8) Use mousetrap's plot functions. ##############################
#a) All the mouse-tracking data together:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos")
#b) Colour the lines by the trial number:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",color="row_index")
#c) Colour the lines by condition:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",color="condition")
#d) Add points:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")
#e) Add rectangles:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")+
  mt_plot_add_rect(rect=rectangles)
#f) You can customise the rectangles in the same way that you would customise a geom_rect in ggplot:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")+
  mt_plot_add_rect(rect=rectangles,color="NA",fill="blue",alpha=0.2)
#g) You can add themes and axis labels just like any other ggplot:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")+
  mt_plot_add_rect(rect=rectangles,color="NA",fill="blue",alpha=0.2)+
  xlab("X axis position")+
  ylab("Y axis position")+
  theme_classic()
#h) If you add the argument "only_ggplot=TRUE", the plot will be blank, but you can add paths and points to it like you
#would for a normal ggplot, which lets you customise them.
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",color="condition",only_ggplot=TRUE)+
  mt_plot_add_rect(rect=rectangles,color="NA",fill="blue",alpha=0.2)+
  geom_path(size=1)+
  geom_point()+
  scale_color_manual(values=c("Congruent"="purple","Incongruent"="green"))+
  theme_classic()

############################## 9) Alternative way of adding conditions. ##############################
#A more manual way of adding conditions might be needed for certain data structures. Below replaces step 2 in the above guide. 
#This way works by importing the data files for the different conditions as separate datasets at the beginning, and manually
#adding a column to each of these condition-specific datasets that states which condition they are.
#So this code assumes that you have separate folders, each one containing all the xlsx files for an individual condition.

#Import all the data from these folders separately so we have two separate datasets in R, one for each condition.
setwd("C:/Users/Adam Flitton/Desktop/mousetracking/condition1")
xslxfiles.cond1<-list.files(pattern="*.xlsx")
xlsx.df.list.cond1<-lapply(xslxfiles.cond1,read_excel)
cond1<-rbindlist(xlsx.df.list.cond1)

setwd("C:/Users/Adam Flitton/Desktop/mousetracking/condition2")
xslxfiles.cond2<-list.files(pattern="*.xlsx")
xlsx.df.list.cond2<-lapply(xslxfiles.cond2,read_excel)
cond2<-rbindlist(xlsx.df.list.cond2)

#Add a column called "condition" to each of these separate datasets, and fill this column with a label showing which condition each
#dataset is. 
cond1$condition<-"Congruent"
cond2$condition<-"Incongruent"

#Then combine these datasets together. 
gorilla.traj<-rbind(cond1,cond2)

#Now in the guide above continue onto step 3 and skip step 6.

Pressing CTRL+ENTER will run the line of code you are currently on and move you onto the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code. CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL+ENTER. You can also press CTRL+ALT+R to run the entire script without highlighting anything.

We will be adding more guides for data transformation using R soon. For more information about Gorilla please consult our support page which contains guides on metrics.