Support Home

DATA ANALYSIS: R and Excel

  • R Overview
  • Downloading and Installing R
  • Downloading and Installing RStudio
  • Opening a New Script in R
  • Getting started with R
  • Combining CSV Files using R
  • Long-to-Short Data Transformation using R
  • Excel Overview
  • Filter Your Data using Excel
  • Excel Pivot Tables
  • Advanced Data Handling in Excel
  • Mousetracking Data to Plots in R
  • Analysis of Eye Tracking Data using R

R Overview


R is a programming language that many researchers use for transforming and manipulating data. It is also used to perform statistical analyses.

R Studio is an integrated development environment that will help you write and edit in R.

This walkthrough will get you started with R and R Studio so that you can use it to transform data files.

Downloading and Installing R


R:

  1. Go to www.r-project.org
  2. Click the download R link under the “Getting Started” header.
  3. Select a mirror - you want to pick one that's relatively close to your location.
  4. Depending on your operating system, click on the Download R for Windows/Download R for Linux/Download R for (Mac) OS X link at the top of the page.
  5. Click the base link at the top of the page.
  6. Click the Download R [version number] link at the top of the page.
  7. Follow the installation instructions.

Downloading and Installing RStudio


RStudio

  1. Go to www.rstudio.com
  2. Click on the Download link underneath the 'RStudio' image.
  3. Scroll to the bottom of the page and select the appropriate installer for your system.
  4. Follow the installation instructions.

Opening a New Script in R

When you open Rstudio, you will see a box on the left called Console, a box in the upper right with Environment and History, and a box in the lower right with Files, Plots, Packages and Help. To input commands easily, you will need to open a fourth box, called a Script.

A script is like a document in which you can write and save code. The code in your script can be run easily and repeatedly. You can also copy and paste code from others into your own script, and export your script as a .R file so others can use it.

To open a new script, press the Open New icon in the top left hand corner, then select R Script.

This is your new script. You can write your code or copy and paste code from others into here.


Getting started with R


The script below allows you to get started with your Gorilla data in R. It explains how to import your file and how to tidy the data.

This is important, as sometomes, when you download the available CSV data from your experiment, there is a lot of files with a lot of data - which can be overwhelming!

Adding the script below to R makes it easy to start analysing your data by tidying it and making it more manageable.

To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a “#”).

#This is a package that is used thorughout our R scripts
#If this is your first time using R or this package, you will have to unhashtag the comment below
#install.packages("tidyverse")
library(tidyverse)

################# Looking at one task/questionnaire #################
#This is a script to use to explore and tody your data from a single task or questionnaire
#Set your working directory to the folder in which all your CSV files are located
setwd("C:/User/Folder/Subfolder")

#Read in your task or questionnaire
task <- read.csv("data_exp_1111-v1_task-1111.csv")
questionnaire <- read.csv("data_exp_1111-v1_task-1111.csv")

#To see all your data in a separate tab
View(task)
View(questionnaire)

################# Filtering rows #################
#The data produced has many rows, and not all the information may be relevant
#If we know that all the information we need for analysis occurs in a specific zone, e.g. a response zone, we can specifically select those rows to minimise the size of the dataset
#The task below filters by 'Zone.Type', but you can specify any row in your dataset
task <- task %>%
  filter(Zone.Type == "response_button_text")
#The questionnaire example here filters by a specific Question.Key, but you can specify any row in your dataset
questionnaire <- questionnaire %>%
  filter(Question.Key == "important-1")

################# Selecting columns #################
#Gorilla provides a lot of information, and depending on your task/questionnaire, not everything may be relevant
#This will show you the names of all the headers in your dataset, so you can choose the ones you wish to keep
names(task)
#The lines below will select the key variables you specify
task <- task %>%
  select(Participant.Private.ID, UTC.Date, Zone.Type, Response, ANSWER, Reaction.Time)
#The example here selects these specific columns to focus on, however, this will depend on your hypothesis

#You can also rename your column names if you want
task <- task %>% 
  rename(ID = Participant.Private.ID,
         date = UTC.Date,
         zone = Zone.Type,
         response = Response,
         answer = ANSWER,
         RT = Reaction.Time)

Pressing CTRL+ENTER will run the line of code you are currently on and move you to the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code – this is also a good way of easily spotting any errors if you encounter them whilst running the script.

CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL (or Command ⌘ on a Mac) +ENTER You can also press CTRL (or Command ⌘ on a Mac) +ALT+R to run the entire script without highlighting anything

We will are constantly updating our R support pages. For more information about Gorilla metrics specifically please consult our support page which contains guides on metrics.

Combining Files in R


Adding the Script below to R makes it easy to combine CSV files containing your task or questionnaire data into a single CSV file. The CSV will be formatted so that the final row of one CSV is followed by the first row of the next CSV.

The script below is divided into two parts, based on the CSV files you want to combine. If you want to combine questionnaires or identical tasks (that are seperated due to the experiment tree, e.g. randomiser node) the first part of the script will explain how to do this.

Combining CSV files of different tasks is a bit more complicated, due to the nature of the complexity of the task builder, so the second part of the script addresses how to combine such CSV files.

Combining CSVs of questionnaires or identical tasks:

#install.packages("tidyverse")
library(tidyverse)

#Set your working directory to the folder in which all your CSV files are located
setwd("C:/User/Folder/Subfolder")

################# Combining CSVs- Questionnaires/Same tasks #################
#This is the script to use if you want to combine questionnaires or identical tasks (differing only e.g. on counterbalancing)
#You list the files you want to combine, each with a "" around them
files <- c("data_exp_1111-v1_task-1111.csv",
           "data_exp_2222-v2_task-2222.csv")
#files <- c("data_exp_1111-v1_questionnaire-1111.csv",
#           "data_exp_2222-v2_questionnaire-2222.csv")

#You can combine the CSVs using either base R or tidyverse (subject to preference)
#using tidyverse
combined_data <- lapply(files, read.csv) %>% 
  bind_rows()
#using base R
combined_data<-do.call("rbind",lapply(files,read.csv,header=TRUE,fill=TRUE))

#Your dataset also has some rows that contain "END OF FILE" and nothing else. You can exclude these rows using this line.
combined_data<-combined_data[combined_data$ï..Event.Index!="END OF FILE",]

#This line exports your combined data as a CSV. This new CSV will be called "combineddata.csv" and will appear in your working directory
write.csv(combined.data,"combineddata.csv",row.names=FALSE)

Combining CSVs of different tasks and questionnaires:

#install.packages("tidyverse")
library(tidyverse)

#Set your working directory to the folder in which all your CSV files are located
setwd("C:/User/Folder/Subfolder")

################# Combining CSVs - Different tasks/questionnaires #################
# To combine tasks that are different, this will require sequential going through the tasks and adding the relevant data from each
# Firstly, you import a task or questionnaire that will contain your base information (information you want to be present in the final dataset)
# In this example, I have 3 tasks and 1 questionnaire I want to combine
quest1 = read.csv("data_exp_1111-v11_questionnaire-1111.csv", header=T)
task1 = read.csv("data_exp_1111-v11_task-1111.csv", header=T)
task2 = read.csv("data_exp_2222-v22_task-2222.csv", header=T)
task3 = read.csv("data_exp_3333-v33_task-3333.csv", header=T)

#I choose my questionnaire as the base and create a new dataset 'final'
final <- quest1 %>%
  as_tibble() %>%
  group_by(Participant.Private.ID) %>%
  #Depending on your dependent variable, adjust the filter line below
  #If you use a task for the base, you can use 'Zone.Type' rather than Question.Key
  filter(Question.Key == "important-1") %>%
  #the information I want in my base is the device, the operating system and the date of the participant
  transmute(Participant.OS, Participant.Device, UTC.Date)

#Now, I take my first task and choose the specific columns I am interested in, before combining it with the data we already have from above
task1 <- task1 %>%
  filter(Zone.Type == "response_button_text") %>%
  select(Participant.Private.ID, ANSWER, Response, Reaction.Time)
final <- final %>% 
  full_join(task1, by = "Participant.Private.ID")

#I then do this for the remaining tasks:
task2 <- task2 %>%
  filter(Zone.Type == "response_button_text") %>%
  select(Participant.Private.ID, ANSWER, Response, Reaction.Time) #%>%
#You may want to change the names of the columns here, if so, unhashtag the set_names line below and the pipe operator above (%>%)
#Important note: the renaming happens in the order specified in the select line above
#  set_names(c("Participant.Private.ID", "item", "item_response", "item_RT"))

final <- final %>% 
  full_join(task2, by = "Participant.Private.ID")

task3 <- task3 %>%
  filter(Zone.Type == "response_button_text") %>%
  select(Participant.Private.ID, ANSWER, Response, Reaction.Time) #%>%
#You may want to change the names of the columns here, if so, unhashtag the set_names line below and the pipe operator above (%>%)
#Important note: the renaming happens in the order specified in the select line above
#  set_names(c("Participant.Private.ID", "item", "item_response", "item_RT"))

final <- final %>% 
  full_join(task3, by = "Participant.Private.ID")

write.csv(final,"final_combined.csv",row.names=FALSE)

Now you should have a new CSV in the folder you specified as your working directory called “combineddata.csv” or "final_combined". This contains all the data from your separate CSVs.


Long-to-Short Data Transformation using R


To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a ‘#’).

The scripts below show how to transform your data (be it a questionnaire or task) from long to short data.

Questionnaire: long to short data transformation

#install.packages("tidyverse")
#install.packages("reshape2")
library(tidyverse)
library(reshape2)

#Set your working directory to the folder in which all your CSV files are located
setwd("C:/User/Folder/Subfolder")

#This is a script to use to transform your data from long to short

################# Questionnaire #################
questionnaire <- read.csv("data_exp_1111-v1_questionnaire-1111.csv")

#This is a custom function to enable long-to-short conversion of string data. By default, R tries to make means out of 
#values when it collapses them. This obviously doesn't work with strings, so this function is used to make R choose the 
#first value rather than calculating means.
select.first<-function(x){
  first(x)
}

#Transform the data from long to short.
#Here, you will specify certain lines depending on what you want to investigate
questionnaire<-dcast(questionnaire,                     #Our dataset                             
                     Participant.Private.ID+Task.Name   #Id variables (columns we want to preserve:Participant IDs, task name)
                     ~Question.Key,                     #Measured variables (column we want to make several columns from)
                     value.var="Response",              #Value variables (column we want to use to populate the new columns above)
                     fun=select.first)                  #Aggregation function (specified as the function we made above)

#Export as csv.
write.csv(questionnaire, "CombinedQuestionnaireData.csv", na="", row.names = FALSE)

Task: long to short data transformation

#install.packages("tidyverse")
#install.packages("reshape2")
library(tidyverse)
library(reshape2)

#Set your working directory to the folder in which all your CSV files are located
setwd("C:/User/Folder/Subfolder")

#This is a script to use to transform your data from long to short

################# Task #################
setwd("/C:/User/Folder/Subfolder")

task <- read.csv("data_exp_1111-v1_task-1111.csv")

#This is a custom function to enable long-to-short conversion of string data. By default, R tries to make means out of 
#values when it collapses them. This obviously doesn't work with strings, so this function is used to make R choose the 
#first value rather than calculating means.
select.first<-function(x){
  first(x)
}

#Transform the data from long to short. 
#Here, you will specify certain lines depending on what you want to investigate
task<-dcast(task,                         #Our dataset                             
            Participant.Private.ID             #Id variables (columns we want to preserve:Participant IDs)
            ~ANSWER,                           #Measured variables (column we want to make several columns from)
            value.var="Response",              #Value variables (column we want to use to populate the new columns above)
            fun=select.first)                  #Aggregation function (specified as the function we made above)

#Export as csv.
write.csv(task, "CombinedTaskData.csv", na="", row.names = FALSE)

Now you should have a new CSV in the folder you specified as your working directory called “CombinedQuestionnaireData.csv” or "CombinedTaskData.csv". The contains all the data from your initial, separate CSVs.


Excel Overview


Excel can be used to transform and clean your data.

This walkthrough will show you how you can use Excel to filter and clean your data, allowing you to choose only the information you may need.

It also highlights how Excel's Pivot Tables can be used to get your data in the format you want.

Clean and Filter Your Data using Excel


The raw CSV file you download from Gorilla contains a lot of information that you probably won't need. We want to give you a complete picture of your participant data,so if you happen to be interested in e.g. the local time when a participant completed your experiment, that information is available to you. However, you are probably only going to be interested in a few specific metrics.

The video below will walk you through what the different columns in your data file are, and how you can filter your data to get the information you need.


We have also produced a text document that will walk you through how to quickly clean and bulk-edit your data in Excel, by removing rows you are not interested in. You can download it here.

Excel Pivot Tables


If you're not familiar with Excel Pivot Tables, get ready to revolutionise the way you edit your data! Excel Pivot Tables are an incredibly useful way to get your data into the format you want it in. For instance, you can ask Excel to calculate the mean of each participants' score over a series of trials, and enter those means into a new table where each participant has one row.

Learn the basics of Excel Pivot Tables for transforming task data from long to short form in the video below.

Advanced Data Handling in Excel - Pivot Tables and Text Responses


One downside of pivot tables is that they don't allow you to pivot text responses. However, this is are ways to get around this using Excel functions; watch the video below to find out how.


Mousetracking Data to Plots in R


This guide contains code for converting Gorilla mouse-tracking data to mousetrap format and using mousetrap functions to create plots.

Requirements:

Make sure that all the individual mouse-tracking files (in xlsx format) are together in the same folder with no other unrelated xlsx files.

To use the script below, copy and paste all of it into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself (comments begin with a “#”).

#Mouse tracking in Gorilla gives two types of dataset: response data and mouse-tracking data. 
#Response data are the normal Gorilla task data, showing what screens participants saw, how many answers they got correct etc.
#Mouse-tracking data present coordinates for zones and mouse movements within an individual trial. 
#There should be one response dataset in total, and one mouse-tracking dataset for each trial in the experiment.

#Before you start the guide below, make sure that all the individual mouse-tracking files (in xlsx format) are together in 
#the same folder with no other unrelated xlsx files.

#Note: the latest version of mousetrap at time of writing is v3.1.2.

############################## 1) Load all the packages you will need. ##############################
#library() loads packages. You need to install packages before you can load them. If you do not have any of the packages
#listed below installed, run the install.packages line below for each of the packages listed. 
#install.packages("mousetrap")
library(mousetrap)
library(readxl)
library(data.table)
library(dplyr)
library(ggplot2)

############################## 2) Import all the Gorilla mouse-tracking data into R. ##############################
#Here is an example way of importing all your mouse-tracking excel files into R. You can use other ways if you want to, as 
#long as they end up with a single data frame containing all the mouse-tracking data together.
#Set your working directory as the folder in which you have saved your data files. This tells R where to look for files.
setwd("C:/Users/Adam Flitton/Desktop/mousetracking")

#Create a list of the xlsx files in your working directory.
xslxfiles<-list.files(pattern="*.xlsx")

#Import all the excel files in the list.
xlsx.df.list<-lapply(xslxfiles,read_excel)

#Combine all these imported excel files into a single data frame. 
gorilla.traj<-rbindlist(xlsx.df.list)

############################## 3) Prepare the data for importing as an mousetrap object. ##############################
#Drop all the rows that contain coordinates for the zones. In these rows there is no mouse tracking data, so we do not need
#them for now.
gorilla.traj<-gorilla.traj[gorilla.traj$type=="mouse",]

#Gorilla data gives you normalised coordinates as well as raw ones. We can only import raw or normalised at the same time.
#Let's drop the normalised ones so we can just import the raw data.
gorilla.traj<-within(gorilla.traj,rm(x_normalised,y_normalised))

############################## 4) Use mousetrap's import function to write in the Gorilla data. ##############################
#This function requires you to specify columns in your dataset including x, y and timestamps. 
#It also requires a column in your dataset that shows the different trials (argument "mt_id_label"). 
#row_index should cover this in most datasets, as it prints what row of the spreadsheet the participant is on, which is more
#often than not a new trial.
#If you have a different trial column, replace "row_index" with whatever this column is called in the code below. 
gorilla.mt<-mt_import_long(gorilla.traj,xpos_label="x",ypos_label="y",timestamps_label="time_stamp",
                           mt_id_label="row_index")

#Now you have your mousetrap data object! It's a list containing two data frames:
gorilla.mt[["trajectories"]] 
#This contains your coordinates data in mousetrap format. The rows in this dataset are specified by the mt_id_label above.
gorilla.mt[["data"]] 
#This is basically empty at the moment, containing a row for each trial without much else. It isn't necessary
#to have much data in this data frame to use many of the functions in mousetrap (we will add a column to it later that 
#shows what condition the trial is in as this helps with plotting different conditions)

############################## 5) Use mousetrap's measures function to get summary statistics. ##############################
gorilla.mt<-mt_measures(gorilla.mt, use = "trajectories",save_as="measures",
                        dimensions = c("xpos", "ypos"), timestamps = "timestamps",
                        verbose = FALSE)

#This adds another data frame to your mousetrap list called "measures" that contains useful summary statistics. 
View(gorilla.mt[["measures"]])

############################## 6) Optional: conditions. ##############################
#If your study does not have conditions or anything else that you want to break the results down by, move on to the next step. 
#Below is one way of adding in conditions to your mouse-tracking dataset. This way works if you have a column in your Gorilla
#response dataset that shows what conditions participants are in. This might be a metadata column, for instance.
#This way also assumes that the Trial Number column in your Gorilla response dataset corresponds to different trials, just 
#like row_index does for the mouse-tracking data. 

#It may be that your data are in a different format. If so, see step 9 for an alternative (more manual) way of adding conditions.

#First, import the Gorilla response data. In this example this excel file is in the same working directory as before (as it is
#a CSV rather than an xlsx, it wasn't lumped in with the mouse-tracking data earlier). It may be that your response data are
#in a different place, in which case change your working directory again (remove the hash from the line below and change its path)
#before you read in the data.
#setwd("C:/Users/Adam Flitton/Desktop/mousetracking/PLACE WHERE RESPONSE DATA ARE")
gorilla.response.data<-read.csv("BetaZoneMouseTrackingData.csv",na.strings=c("","NA"))

#Stratify these data to only include rows in which a participant made an attempt. These rows are the response rows that give
#information about what answer they gave, how long it took them, whether it was correct etc. Other rows (that this function
#removes) just provide links to the mouse-tracking datasets and show fixations/continuation screens. 
gorilla.response.attempts<-gorilla.response.data[gorilla.response.data$Attempt %in% 1,]

#Now we have a dataset that shows what condition the different trials were in. In this example, the column with the condition
#information is called "Metadata" and the column with the trial information is called "Trial.Number". 
#We need to take this condition information and add it to our mousetrap data, making sure to match up the trials. 
#In the mousetrap dataset, trial number is shown in the column "row_index". So we need to make sure that we match up "Trial.
#.Number" with "row_index".
#Remember: if you used a column that is not "row_index" to specify trials when you imported your data above, use the name of
#this column instead of "row_index" in all of the code below.
#The code below creates a new column in our mousetrap object called "condition" and populates it with the entries in the 
#"Metadata" column mentioned above, matching by "row_index" and "Trial.Number".

gorilla.mt[["data"]]$condition<-
  gorilla.response.attempts$Metadata[match(gorilla.mt[["data"]]$row_index,gorilla.response.attempts$Trial.Number)]

#You can add any column from your response dataset using this same matching. Below makes a different column called "Correct"
#then populates the column with whether a response was correct from the "Correct" column (rather than the "Metadata" column). 
#gorilla.mt[["data"]]$Correct<-
#gorilla.response.attempts$Correct[match(gorilla.mt[["data"]]$row_index,gorilla.response.attempts$Trial.Number)]

#Note: you can also use the above to break data down by any other group, such as participant. Just use the column that
#distinguishes between participants instead of the column that distinguishes between conditions. 

############################## 7) Optional: rectangles for graphs. ##############################
#In the step after this one you will make some graphs from the data. It might be that you want to present rectangles on this
#graph that show where the buttons and stimuli were on the screen. If you don't want to do this, move on to the next step.

#Create a copy of the original unaltered mouse-tracking dataset that contains zone names. 
gorilla.traj.rectangles<-rbindlist(xlsx.df.list)

#Create a matrix from the zone coordinates in the mouse-tracking dataset. 
#Make a variable containing the names of the zones that you want to represent with rectangles on the graph.
#The zones presented to participants in the task are listed in the "zone_name" column of the mouse-tracking dataset. In this
#example I assume that the rectangles that will be presented on the graph correspond to the zones called "Stimulus", 
#"ButtonLeft", and "ButtonRight" in the data. You can change these names in the row below to be whatever you have called them. 
matrix.contents<-c("Stimulus","ButtonLeft","ButtonRight")

#Now extract the zone_x, zone_y, zone_width and zone_height values for the zones listed in the matrix.contents variable
#specified above.
matrix.data<-filter(gorilla.traj.rectangles, zone_name %in% matrix.contents)
matrix.data<-matrix.data[1:length(matrix.contents),c("zone_x","zone_y","zone_width","zone_height")]

#Put these data in a matrix that will be referred to in the plot function later. 
rectangles<-as.matrix(sapply(matrix.data, as.numeric)) 
rectangles<-unname(rectangles)

############################## 8) Use mousetrap's plot functions. ##############################
#a) All the mouse-tracking data together:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos")
#b) Colour the lines by the trial number:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",color="row_index")
#c) Colour the lines by condition:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",color="condition")
#d) Add points:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")
#e) Add rectangles:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")+
  mt_plot_add_rect(rect=rectangles)
#f) You can customise the rectangles in the same way that you would customise a geom_rect in ggplot:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")+
  mt_plot_add_rect(rect=rectangles,color="NA",fill="blue",alpha=0.2)
#g) You can add themes and axis labels just like any other ggplot:
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",points=TRUE,color="condition")+
  mt_plot_add_rect(rect=rectangles,color="NA",fill="blue",alpha=0.2)+
  xlab("X axis position")+
  ylab("Y axis position")+
  theme_classic()
#h) If you add the argument "only_ggplot=TRUE", the plot will be blank, but you can add paths and points to it like you
#would for a normal ggplot, which lets you customise them.
mt_plot(gorilla.mt,use="trajectories",use2="data",x="xpos",y="ypos",color="condition",only_ggplot=TRUE)+
  mt_plot_add_rect(rect=rectangles,color="NA",fill="blue",alpha=0.2)+
  geom_path(size=1)+
  geom_point()+
  scale_color_manual(values=c("Congruent"="purple","Incongruent"="green"))+
  theme_classic()

############################## 9) Alternative way of adding conditions. ##############################
#A more manual way of adding conditions might be needed for certain data structures. Below replaces step 2 in the above guide. 
#This way works by importing the data files for the different conditions as separate datasets at the beginning, and manually
#adding a column to each of these condition-specific datasets that states which condition they are.
#So this code assumes that you have separate folders, each one containing all the xlsx files for an individual condition.

#Import all the data from these folders separately so we have two separate datasets in R, one for each condition.
setwd("C:/Users/Adam Flitton/Desktop/mousetracking/condition1")
xslxfiles.cond1<-list.files(pattern="*.xlsx")
xlsx.df.list.cond1<-lapply(xslxfiles.cond1,read_excel)
cond1<-rbindlist(xlsx.df.list.cond1)

setwd("C:/Users/Adam Flitton/Desktop/mousetracking/condition2")
xslxfiles.cond2<-list.files(pattern="*.xlsx")
xlsx.df.list.cond2<-lapply(xslxfiles.cond2,read_excel)
cond2<-rbindlist(xlsx.df.list.cond2)

#Add a column called "condition" to each of these separate datasets, and fill this column with a label showing which condition each
#dataset is. 
cond1$condition<-"Congruent"
cond2$condition<-"Incongruent"

#Then combine these datasets together. 
gorilla.traj<-rbind(cond1,cond2)

#Now in the guide above continue onto step 3 and skip step 6.

Pressing CTRL+ENTER will run the line of code you are currently on and move you onto the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code. CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL+ENTER. You can also press CTRL+ALT+R to run the entire script without highlighting anything.

We will be adding more guides for data transformation using R soon. For more information about Gorilla please consult our support page which contains guides on metrics.


Analysis of Eye Tracking Data using R

This guide contains code for using R to analyse your eye-tracking data using the Saccades package from GitHub. For information about getting and processing your eye tracking data, please consult the Eye Tracking Zone in the eye tracking metrics section at the bottom of the page.


To use the script below, copy and paste everything in the box into the top left-hand section of your new RStudio script. Then follow the instructions written in the comments of the script itself. Comments are written in the hashtags (#).

library("devtools")
install_github("tmalsburg/saccades/saccades", dependencies=TRUE)
install.packages('tidyverse')
install.packages('jpeg')
library('saccades')
library('tidyverse')
library('ggplot2')
library('jpeg')

#Load in file -- this is a single trial of freeviewing 
data <- read.csv('Documents/puppy-1-2.csv')
#Drop rows that are not predictions 
preds <- data[grepl("prediction", data$type),]

#Make dataframe with just time, x,y and trial columns 
preds_minimal <- preds %>%
  select(time_stamp, x_pred_normalised, y_pred_normalised, screen_index)
preds_minimal <- preds_minimal %>%
  rename(time = time_stamp, x = x_pred_normalised, y = y_pred_normalised, trial = screen_index) 

#visualise trials -- note how noisy the predictions are 
#it is difficult to tell what is going on though without seeing the images 
ggplot(preds_minimal, aes(x, y)) +
  geom_point(size=0.2) +
  coord_fixed() +
  facet_wrap(~trial)

# lets align it with the stimuli we had placed
img <- readJPEG('Documents/puppy.jpg') # the image 

#but we need to align it with our eye coordinate space, fortunately we have that in our 'zone' rows
zone <- data[grepl("Zone2", data$zone_name),] # Zone2 was our image zone 

# we extract coordinate info 
orig_x <- zone$zone_x_normalised
orig_y <- zone$zone_y_normalised
width <- zone$zone_width_normalised
height <- zone$zone_height_normalised

# now we add this image using ggplot2 annotation raster with coordinates calculated for the image
m <- ggplot(preds_minimal, aes(x, y)) +
  annotation_raster(img, xmin=orig_x, xmax=orig_x+width, ymin=orig_y, ymax=orig_y+height) +
  geom_point()

# If you look at the image it makes a bit more sense now 

# put on some density plots for aid 
m + geom_density_2d(data=preds_minimal)


# But this is not all we can do, lets try extracting some fixation data! 
#Detect fixations 
fixations <- subset(detect.fixations(preds_minimal), event=="fixation")

#Visualise diagnostics for fixations -- again note the noise
diagnostic.plot(preds_minimal, fixations)


# plot the fixations onto our ggplot, with lines between them 
m+ geom_point(data=fixations, colour="red") + geom_path(data=fixations, colour="red")


#as you can see, this is pretty rough and ready, but this hopefully gives you an idea of how you can visualise eye tracking data

# You could filter the data using a convergence threshold, or use this value to throw out trials
preds <- preds[preds$convergence <= 10, ] 

# After running the above line, try all the plotting functions and look at the difference, you should be able to see more fixations on the image 


# But if generally bad for a given participant you may need to exclude them
# Unfortunately this is a necessary issue with eye tracking data online at this time

# The best thing to increase data quality is to given clear instructions on how to setup the camera, and to repeat validation and calibration frequently

Pressing CTRL+ENTER will run the line of code you are currently on and move you onto the next line. Making your way through the code using CTRL+ENTER will allow you to see how the dataset gradually takes shape after every line of code. CTRL+ENTER will also run any highlighted code, so if you want to run the whole script together, highlight it all and press CTRL+ENTER. You can also press CTRL+ALT+R to run the entire script without highlighting anything.

We will be adding more guides for data transformation using R soon. For more information about Gorilla please consult our support page which contains guides on metrics.