Online Experiments Behavioural Economists

Back to blog


Picking a wine in a restaurant can be a nightmare, even if there are just three to choose from. Do you opt for the cheapest one, which sounds dependable? Or the mid-priced one that apparently goes with your food better? Or the most expensive one, even though it has a less appealing description than the mid-priced one? Chances are you would opt for the middle wine, even though it costs more than you wanted to pay. If so, you have experienced the decoy effect. The presence of the expensive option made you more likely to opt for the mid-priced wine than the cheap one.

The decoy effect is a classic piece of behavioural economics, backed up by numerous studies stretching back to the 1980s, and is used as a sales and marketing technique by countless companies (Huber, Payne & Puto, 1982). And yet, for a while, it was on the ropes after attempts to replicate some well-known decoy studies failed.

Being able to replicate and validate the results of research is the cornerstone of science, and the replication crisis that has torn through many scientific fields has seen key studies overturned after their results couldn’t be repeated and many others come under the long shadow of suspicion (Camerer et al., 2016; Chang & Li, 2015).

Scientists have blamed the replication crisis on numerous factors including, small samples and weird samples. Firstly, when collecting data in a lab, it can take days if not months to collect data face-to-face. Consequently, labs have often collected the smallest sample possible to detect the studied effect and that makes their results weaker than they could be. Secondly, if you are testing in a lab, then the easiest sample to test is your department’s undergraduate students. To support this, many departments require their students to take part in experiments.

However, undergraduate behavioural science students do not represent the full diversity of the wider population. Consequently, we know a lot about how behavioural science students behave, but there’s no guarantee that this will transfer to a general population.

To increase reproducibility, behavioural scientists have sought approaches that allow them to increase the size and diversity of their samples. One way to do this is to take research online, as it allows you to reach a large and diverse sample quickly and easily (Anwyl-Irvine et al., 2020). If you can get your behavioural task online, a wide range of market research agencies and participant recruitment services will provide participants for a fee. With these, you can get a representative sample of a thousand participants in a matter of hours.

Nevertheless, these gains come at a cost. Running experiments online requires researchers to give up control and accept a higher degree of uncertainty about the identity of participants and the testing conditions (Rodd, 2019). Additionally, while the timing accuracy provided by browsers as of 2015 is good enough for a wide range of behavioural research (Reimers & Stewart, 2015), it’s not as good as the timing accuracy of installed software typically used in the lab (Bridges et al., 2020).

By taking research online, behavioural researchers can trade a small amount of control and precision for a huge increase in experimental power, more representative samples and a dramatic increase in the pace of research. Online methods can then be used in conjunction with other research methods (natural experiments, field studies, focus groups, survey etc.) to provide a robust evidence base.

Rapid Deployment, Rapid Results

This increase in speed means that research can be done in response to current events and still give reliable findings that could be used to inform policy.

For example, experiments have already shown how subtle differences in messaging about the new coronavirus could influence how people respond to lockdown guidance and thus the rate of virus transmission. Shane Timmons at the Economic and Social Research Institute in Dublin, Ireland, and his colleagues have discovered two key things by showing people different posters in an online experiment.

They found that highlighting risks to people who are particularly vulnerable to covid-19 such as the elderly and healthcare workers and focusing on the exponential rate of transmission made people more cautious about “marginal behaviours” related to social distancing, such as meeting up with friends outdoors, visiting parents, letting kids play together (Lunn et al., 2020).

This suggests that there are better ways to promote social distancing than current official advice, said Timmons on Twitter (2020).

Timmon’s study has gone from conception to pre-print in a matter of weeks, which would not have been possible with a lab-based study.

AirBnB Accomodation

Ecological Validity

Mircea Zloteanu and his colleagues have been running experiments looking at people’s online behaviour on sharing economy platforms. His team has created a simulated AirBnB-style website to measure how people make decisions about hosts who are given different reviews or star ratings (Zloteanu et al., 2018). They found that participants over-weighed social information and under-weighed non-social information, drawing attention to a cognitive bias that can lead to poorer decision-making on a sharing economy platform.

As more of our lives happen online, for instance social media, banking, shopping, dating, online environments open up as ecologically valid setting for psychological research. Creating facsimiles of the websites that we use, and using them to study behaviour, gives us the experimental control we need to understand how people behave in the digital world.

Embedding Digital Experimentation in Industry

Many companies have struggled to embrace digital experimentation because of the wide range of specialist skills needed to do it successfully. Until recently you’d need a behavioural science graduate, a programmer and potentially also a data scientist. A key aspect to changing this, is to ensure that the next generation of behavioural scientists graduates with the skills and experience to create and analyse digital experiments independently.

“Online experiment builders have allowed our students to follow their scientific curiosity, and be rewarded with real data, from the very first stages of their degree,” says Daniel C. Richardson, an experimental psychologist at University College London.

He and his colleagues have used such tools in their lectures, seminars and lab modules. Their students generated their own hypotheses and used the tool to create experiments relating to what makes people donate money to charity and collected their own data.

Each experiment began with participants being told to imagine that they had just won £100. Then there were one of two slightly different appeals for a charity, which could be an image, text or even a movie. Participants were then asked how much money they would want to donate.

Crucially, every time there was a small difference between the two appeals, allowing the students to test a range of hypotheses relating to pro-social behaviour.

One of the most interesting findings was that in an advert for a domestic abuse charity, referring to someone as a “survivor” rather than a “victim” increased donation by more than 25%.

Students made posters of their results, and two of them were accepted to the British Psychological Society’s social psychology conference and won awards, even though they were first-year students competing against graduate students and established researchers.

As these students move on to careers in academia or industry, initiatives like this should help embed a culture of digital experimentation and evidence-based decision-making in a wide range of industries including marketing, advertising, recruitment, PR and policy making.

Large, Robust Study Sizes

A key aspect of reliable science is having a large enough sample size that you can be confident in whatever results are generated. This is an area in which digital experiments can really help. The speed, scale and reach of online research can be tremendous.

The large sample size made it possible for Richardson’s students to produce award-winning studies. “The students ran around 30 different experiments, crowd-sourcing data from over 1200 people, across more than 20 countries,” says Richardson. “I was astonished by this – that’s more data than my lab by itself would typically collect in a year. What was also impressive was the variety of ideas and theories that the students tested.”

If you don’t have a cohort of students willing to leverage their social networks, then pairing an online experiment platform with a recruitment service like Prolific makes it possible to get thousands of participants to take part in a study in a day. For small studies of 100 participants the main benefit is the time saving. It might take a lab 6 weeks to test 100 participants, but only an hour to do so online. But the more important revelation is that you can also test 1000 or 10,000 participants online in not much more time. Sample sizes of these magnitudes would be near impossible in a lab-based setting. The result is that researchers can ask and answer questions at pace, and build each new study on firm foundations of properly powered studies.

The Intention-Action Gap

David Ogilvy famously said “People don’t think what they feel, don’t say what they think and don’t do what they say”. Behavioural research allows you to measure what people actually do, rather than what they say. A new generation of behavioural science consultancies are going beyond traditional surveys and embracing behavioural experimentation to bridge this gap.

In a revealing example, behavioural change consultancy MoreThanNow wanted to see if messaging tweaks could boost the number of women who want to go into science, technology, engineering or mathematics (STEM) jobs.

In STEM-focused organisations, women have only 5% of board positions, with little evidence of a shift on the horizon. MoreThanNow wanted to address the disparity in application rates for technology careers by focusing on the effectiveness of recruitment messages, and try to understand not just what people think, but also how they actually behave in a recruitment situation.

Using a large sample of 18 to 23-year-olds, they tested different recruitment adverts and messages using a survey, but also gave participants the option of leaving the survey to explore current technology graduate roles on a popular recruitment website to understand if any of the messages changed behaviour

By simply adding a button to the end of a survey, MoreThanNow added a behavioural measure to test each job advert to measure the gap between self-reported intentions and action.

Three types of message were tested: prosocial ones focusing on helping people and solving social problems; self-interest ones that talked about increasing personal reward and career opportunities; and communal ones talking about work in a close community and being supported by a tight-knit team.

The survey part of the experiment showed that, in line with most self-report research on this topic, women responded to pro-social messages and men to those of self-interest. In contrast, the behavioural measure showed a different result. There was no statistical difference in gendered response to pro-social or self-interest messaging. Instead men responded to the communal message “join a community that works together” far more than women.

By using behavioural insights, rather than survey data, MoreThanNow have created adverts that doubled the number of women exploring technology careers. These findings underline how self-report surveys could lead us to draw false conclusions if they aren’t backed up by experimentation that tests the reality of what is said (Women in Technology — A Behavioural Approach, 2019).

Office workers at desk

Refining Advisory Services with Context Specific Experimentation

When it comes to human behaviour, the rich pageant of our cultures, knowledge and languages can influence what we do or how we act. It may be that there aren’t many theories that seamlessly replicate across people, industries, contexts, personalities and emotional states, but rather there are subtle location-specific differences. This is where online experimentation can really come into its own.

For example, the Behavioural Science Unit of public relations firm Hill + Knowlton Strategies has tried to understand how changes to adverts about cold and flu remedies affect whether people buy certain cold and flu products from a certain healthcare firm.

Focus groups and interviews had proved time consuming and generated insufficient insights to act on confidently. So the firm supplemented this work with digital experimentation to create a virtual walkthrough of a realistic, cluttered pharmacy. Participants could choose where to go with the click of a button, what shelves to look at and choose products to add to their basket. They could also interact with digital pharmacists. A bit like being in a computer game.

H+K used a between-subject design, in which different people test each variation of messaging, so that each person is exposed to only a single situation. An advert or a series of adverts were placed within the pharmacy, but other than that conditions were identical. The messages on the adverts differed in what behavioural insights they addressed.

Around half of participants didn’t notice the messages, which provides evidence for the validity of the simulation – in real-life, people tend not to consciously attend to such material.

The best-performing message increased purchases by around 10% compared with the worst-performing message. There was also evidence for variation in the effectiveness of different messages in different markets, which means the healthcare firm can now adapt messaging to different territories.

While behavioural literature can inform consultancies of the likely levers that will influence behaviour, behavioural experimentation can go further and allow companies to optimise interventions for maximum impact in their specific context.

The Promise of Impact

A wide number of challenges facing society have behavioural solutions: climate change, tax evasion, obesity to name a few. Using behavioural insights to inform policy will allow the behavioural sciences to deliver on the promise of improving lives.

For example, the University of Oxford’s Nuffield Department of Primary Care and Health Sciences has used an online tool to design a virtual supermarket to test how people respond to tweaks to food labelling. The fundamental premise is that if we can change what people buy, we can change what people eat. And if we can change what people eat, we can improve diets and reduce lifestyle diseases.

“It would be very challenging, if not impossible, to run these studies in real online supermarkets,” says team member Dimitrios Koutoukidis. “The experimental supermarket platform allows us to test and optimise different interventions quickly and relatively cheaply.”

Until recently any changes to messaging were largely tested in focus groups if at all, so are only likely to discern self-reported intentions, not the reality of a situation.

Containing all the features of a normal online supermarket, such as browsing for products, adding items to a basket and checking out, the specially designed online supermarket also contains features such as shopping lists and basket budgets. Behind the scenes, researchers can change adverts, add taxes and rebates, change the order in which lists of products appear, highlight nutritional information and change food labelling. They can also offer swaps for alternative items that might be a healthier or differently priced option.

The supermarket has revealed that fiscal policies that tax food or drinks may be an effective means of altering food purchasing, with a 20% rate being enough to significantly alter purchases of breakfast cereals and soft drinks (Zizzo et al., 2016).

The supermarket has also revealed that listing foods so that those with less saturated fat are at the top reduces the total amount of saturated fats in the shopping basket at checkout (Koutoukidis et al., 2019).

This exceptional degree of experimental control gives tools like this great power to inform public policy and ultimately improve lives.


All these case studies demonstrate how online tools – like the Gorilla Experiment Builder and Testable — are opening a new frontier for behavioural science. The ability to gain behavioural insights from experiments with large sample sizes in a short space of time eclipses what can be done in the lab and opens up new opportunities.

Online tools have already been used to investigate a wide range of topics, but they certainly haven’t reached their limits. As Bill Gates once said, “If you give people tools, and they use their natural abilities and their curiosity, they will develop things in ways that will surprise you very much beyond what you might have expected.”

Getting the science of behavioural economics right will have profound results. Academia has the opportunity to banish the ghost of the replication crisis and shift the evidence base back onto a firmer footing. Students can equip themselves for a future that will benefit from digital experimentation in a wide range of industries. Industry can use the insights gained to make better products and services that improve lives. And finally, policy makers can create evidence-informed regulations that improve society. Altogether these will combine to improve our health, wealth, happiness and education.

Photo of


Jo Evershed

Jo is the CEO and co-founder of Cauldron and Gorilla. Her mission is to provide behavioural scientists with the tools needed to improve the scale and impact of the evidence-based interventions that benefit society.

Date Published: 29th Apr 2021

Subscribe to Gorilla Grants

We regularly run grants to help researchers and lecturers get their projects off the ground. Sign up to get notified when new grants become available