How Online Behavioural Experiments Are Opening New Opportunities for Behavioural Economists featured image

Picking a wine in a restau­rant can be a night­mare, even if there are just three to choose from. Do you opt for the cheap­est one, which sounds depend­able? Or the mid-priced one that appar­ent­ly goes with your food better? Or the most expen­sive one, even though it has a less appeal­ing descrip­tion than the mid-priced one? Chances are you would opt for the middle wine, even though it costs more than you wanted to pay. If so, you have expe­ri­enced the decoy effect. The pres­ence of the expen­sive option made you more likely to opt for the mid-priced wine than the cheap one.

The decoy effect is a classic piece of behav­iour­al eco­nom­ics, backed up by numer­ous studies stretch­ing back to the 1980s, and is used as a sales and mar­ket­ing tech­nique by count­less com­pa­nies (Huber, Payne & Puto, 1982). And yet, for a while, it was on the ropes after attempts to repli­cate some well-known decoy studies failed.

Being able to repli­cate and val­i­date the results of research is the cor­ner­stone of science, and the repli­ca­tion crisis that has torn through many sci­en­tif­ic fields has seen key studies over­turned after their results couldn’t be repeat­ed and many others come under the long shadow of sus­pi­cion (Camerer et al., 2016; Chang & Li, 2015).

Sci­en­tists have blamed the repli­ca­tion crisis on numer­ous factors includ­ing, small samples and weird samples. Firstly, when col­lect­ing data in a lab, it can take days if not months to collect data face-to-face. Con­se­quent­ly, labs have often col­lect­ed the small­est sample pos­si­ble to detect the studied effect and that makes their results weaker than they could be. Sec­ond­ly, if you are testing in a lab, then the easiest sample to test is your department’s under­grad­u­ate stu­dents. To support this, many depart­ments require their stu­dents to take part in experiments.

However, under­grad­u­ate behav­iour­al science stu­dents do not rep­re­sent the full diver­si­ty of the wider pop­u­la­tion. Con­se­quent­ly, we know a lot about how behav­iour­al science stu­dents behave, but there’s no guar­an­tee that this will trans­fer to a general population.

To increase repro­ducibil­i­ty, behav­iour­al sci­en­tists have sought approach­es that allow them to increase the size and diver­si­ty of their samples. One way to do this is to take research online, as it allows you to reach a large and diverse sample quickly and easily (Anwyl-Irvine et al., 2020). If you can get your behav­iour­al task online, a wide range of market research agen­cies and par­tic­i­pant recruit­ment ser­vices will provide par­tic­i­pants for a fee. With these, you can get a rep­re­sen­ta­tive sample of a thou­sand par­tic­i­pants in a matter of hours.

Nev­er­the­less, these gains come at a cost. Running exper­i­ments online requires researchers to give up control and accept a higher degree of uncer­tain­ty about the iden­ti­ty of par­tic­i­pants and the testing con­di­tions (Rodd, 2019). Addi­tion­al­ly, while the timing accu­ra­cy pro­vid­ed by browsers as of 2015 is good enough for a wide range of behav­iour­al research (Reimers & Stewart, 2015), it’s not as good as the timing accu­ra­cy of installed soft­ware typ­i­cal­ly used in the lab (Bridges et al., 2020).

By taking research online, behav­iour­al researchers can trade a small amount of control and pre­ci­sion for a huge increase in exper­i­men­tal power, more rep­re­sen­ta­tive samples and a dra­mat­ic increase in the pace of research. Online methods can then be used in con­junc­tion with other research methods (natural exper­i­ments, field studies, focus groups, survey etc.) to provide a robust evi­dence base.

Rapid Deploy­ment, Rapid Results

This increase in speed means that research can be done in response to current events and still give reli­able find­ings that could be used to inform policy.

For example, exper­i­ments have already shown how subtle dif­fer­ences in mes­sag­ing about the new coro­n­avirus could influ­ence how people respond to lock­down guid­ance and thus the rate of virus trans­mis­sion. Shane Timmons at the Eco­nom­ic and Social Research Insti­tute in Dublin, Ireland, and his col­leagues have dis­cov­ered two key things by showing people dif­fer­ent posters in an online experiment.

They found that high­light­ing risks to people who are par­tic­u­lar­ly vul­ner­a­ble to covid-19 such as the elderly and health­care workers and focus­ing on the expo­nen­tial rate of trans­mis­sion made people more cau­tious about “mar­gin­al behav­iours” related to social dis­tanc­ing, such as meeting up with friends out­doors, vis­it­ing parents, letting kids play togeth­er (Lunn et al., 2020).

This sug­gests that there are better ways to promote social dis­tanc­ing than current offi­cial advice, said Timmons on Twitter (2020).

Timmon’s study has gone from con­cep­tion to pre-print in a matter of weeks, which would not have been pos­si­ble with a lab-based study.


Eco­log­i­cal Validity

Mircea Zloteanu and his col­leagues have been running exper­i­ments looking at people’s online behav­iour on sharing economy plat­forms. His team has created a sim­u­lat­ed AirBnB-style website to measure how people make deci­sions about hosts who are given dif­fer­ent reviews or star ratings (Zloteanu et al., 2018). They found that par­tic­i­pants over-weighed social infor­ma­tion and under-weighed non-social infor­ma­tion, drawing atten­tion to a cog­ni­tive bias that can lead to poorer deci­sion-making on a sharing economy platform.

As more of our lives happen online, for instance social media, banking, shop­ping, dating, online envi­ron­ments open up as eco­log­i­cal­ly valid setting for psy­cho­log­i­cal research. Cre­at­ing fac­sim­i­les of the web­sites that we use, and using them to study behav­iour, gives us the exper­i­men­tal control we need to under­stand how people behave in the digital world.

Embed­ding Digital Exper­i­men­ta­tion in Industry

Many com­pa­nies have strug­gled to embrace digital exper­i­men­ta­tion because of the wide range of spe­cial­ist skills needed to do it suc­cess­ful­ly. Until recent­ly you’d need a behav­iour­al science grad­u­ate, a pro­gram­mer and poten­tial­ly also a data sci­en­tist. A key aspect to chang­ing this, is to ensure that the next gen­er­a­tion of behav­iour­al sci­en­tists grad­u­ates with the skills and expe­ri­ence to create and analyse digital exper­i­ments independently.

Online experiment builders have allowed our stu­dents to follow their sci­en­tif­ic curios­i­ty, and be reward­ed with real data, from the very first stages of their degree,” says Daniel C. Richard­son, an exper­i­men­tal psy­chol­o­gist at Uni­ver­si­ty College London.

He and his col­leagues have used such tools in their lec­tures, sem­i­nars and lab modules. Their stu­dents gen­er­at­ed their own hypothe­ses and used the tool to create exper­i­ments relat­ing to what makes people donate money to charity and col­lect­ed their own data.

Each experiment began with par­tic­i­pants being told to imagine that they had just won £100. Then there were one of two slight­ly dif­fer­ent appeals for a charity, which could be an image, text or even a movie. Par­tic­i­pants were then asked how much money they would want to donate.

Cru­cial­ly, every time there was a small dif­fer­ence between the two appeals, allow­ing the stu­dents to test a range of hypothe­ses relat­ing to pro-social behaviour.

One of the most inter­est­ing find­ings was that in an advert for a domes­tic abuse charity, refer­ring to someone as a “sur­vivor” rather than a “victim” increased dona­tion by more than 25%.

Stu­dents made posters of their results, and two of them were accept­ed to the British Psy­cho­log­i­cal Society’s social psy­chol­o­gy con­fer­ence and won awards, even though they were first-year stu­dents com­pet­ing against grad­u­ate stu­dents and estab­lished researchers.

As these stu­dents move on to careers in acad­e­mia or indus­try, ini­tia­tives like this should help embed a culture of digital exper­i­men­ta­tion and evi­dence-based deci­sion-making in a wide range of indus­tries includ­ing mar­ket­ing, adver­tis­ing, recruit­ment, PR and policy making.

Large, Robust Study Sizes

A key aspect of reli­able science is having a large enough sample size that you can be con­fi­dent in what­ev­er results are gen­er­at­ed. This is an area in which digital exper­i­ments can really help. The speed, scale and reach of online research can be tremendous.

The large sample size made it pos­si­ble for Richardson’s stu­dents to produce award-winning studies. “The stu­dents ran around 30 dif­fer­ent exper­i­ments, crowd-sourc­ing data from over 1200 people, across more than 20 coun­tries,” says Richard­son. “I was aston­ished by this – that’s more data than my lab by itself would typ­i­cal­ly collect in a year. What was also impres­sive was the variety of ideas and the­o­ries that the stu­dents tested.”

If you don’t have a cohort of stu­dents willing to lever­age their social net­works, then pairing an online experiment plat­form with a recruit­ment service like Pro­lif­ic makes it pos­si­ble to get thou­sands of par­tic­i­pants to take part in a study in a day. For small studies of 100 par­tic­i­pants the main benefit is the time saving. It might take a lab 6 weeks to test 100 par­tic­i­pants, but only an hour to do so online. But the more impor­tant rev­e­la­tion is that you can also test 1000 or 10,000 par­tic­i­pants online in not much more time. Sample sizes of these mag­ni­tudes would be near impos­si­ble in a lab-based setting. The result is that researchers can ask and answer ques­tions at pace, and build each new study on firm foun­da­tions of prop­er­ly powered studies.

The Inten­tion-Action Gap

David Ogilvy famous­ly said “People don’t think what they feel, don’t say what they think and don’t do what they say”. Behav­iour­al research allows you to measure what people actu­al­ly do, rather than what they say. A new gen­er­a­tion of behav­iour­al science con­sul­tan­cies are going beyond tra­di­tion­al surveys and embrac­ing behav­iour­al exper­i­men­ta­tion to bridge this gap.

In a reveal­ing example, behav­iour­al change con­sul­tan­cy MoreThanNow wanted to see if mes­sag­ing tweaks could boost the number of women who want to go into science, tech­nol­o­gy, engi­neer­ing or math­e­mat­ics (STEM) jobs.

In STEM-focused organ­i­sa­tions, women have only 5% of board posi­tions, with little evi­dence of a shift on the horizon. MoreThanNow wanted to address the dis­par­i­ty in appli­ca­tion rates for tech­nol­o­gy careers by focus­ing on the effec­tive­ness of recruit­ment mes­sages, and try to under­stand not just what people think, but also how they actu­al­ly behave in a recruit­ment situation.

Using a large sample of 18 to 23-year-olds, they tested dif­fer­ent recruit­ment adverts and mes­sages using a survey, but also gave par­tic­i­pants the option of leaving the survey to explore current tech­nol­o­gy grad­u­ate roles on a popular recruit­ment website to under­stand if any of the mes­sages changed behaviour

By simply adding a button to the end of a survey, MoreThanNow added a behav­iour­al measure to test each job advert to measure the gap between self-report­ed inten­tions and action.

Three types of message were tested: proso­cial ones focus­ing on helping people and solving social prob­lems; self-inter­est ones that talked about increas­ing per­son­al reward and career oppor­tu­ni­ties; and com­mu­nal ones talking about work in a close com­mu­ni­ty and being sup­port­ed by a tight-knit team.

The survey part of the experiment showed that, in line with most self-report research on this topic, women respond­ed to pro-social mes­sages and men to those of self-inter­est. In con­trast, the behav­iour­al measure showed a dif­fer­ent result. There was no sta­tis­ti­cal dif­fer­ence in gen­dered response to pro-social or self-inter­est mes­sag­ing. Instead men respond­ed to the com­mu­nal message “join a com­mu­ni­ty that works togeth­er” far more than women.

By using behav­iour­al insights, rather than survey data, MoreThanNow have created adverts that doubled the number of women explor­ing tech­nol­o­gy careers. These find­ings under­line how self-report surveys could lead us to draw false con­clu­sions if they aren’t backed up by exper­i­men­ta­tion that tests the reality of what is said (Women in Tech­nol­o­gy — A Behav­iour­al Approach, 2019).


Refin­ing Advi­so­ry Ser­vices with Context Spe­cif­ic Experimentation

When it comes to human behav­iour, the rich pageant of our cul­tures, knowl­edge and lan­guages can influ­ence what we do or how we act. It may be that there aren’t many the­o­ries that seam­less­ly repli­cate across people, indus­tries, con­texts, per­son­al­i­ties and emo­tion­al states, but rather there are subtle loca­tion-spe­cif­ic dif­fer­ences. This is where online exper­i­men­ta­tion can really come into its own.

For example, the Behav­iour­al Science Unit of public rela­tions firm Hill + Knowl­ton Strate­gies has tried to under­stand how changes to adverts about cold and flu reme­dies affect whether people buy certain cold and flu prod­ucts from a certain health­care firm.

Focus groups and inter­views had proved time con­sum­ing and gen­er­at­ed insuf­fi­cient insights to act on con­fi­dent­ly. So the firm sup­ple­ment­ed this work with digital exper­i­men­ta­tion to create a virtual walk­through of a real­is­tic, clut­tered phar­ma­cy. Par­tic­i­pants could choose where to go with the click of a button, what shelves to look at and choose prod­ucts to add to their basket. They could also inter­act with digital phar­ma­cists. A bit like being in a com­put­er game.

H+K used a between-subject design, in which dif­fer­ent people test each vari­a­tion of mes­sag­ing, so that each person is exposed to only a single sit­u­a­tion. An advert or a series of adverts were placed within the phar­ma­cy, but other than that con­di­tions were iden­ti­cal. The mes­sages on the adverts dif­fered in what behav­iour­al insights they addressed.

Around half of par­tic­i­pants didn’t notice the mes­sages, which pro­vides evi­dence for the valid­i­ty of the sim­u­la­tion – in real-life, people tend not to con­scious­ly attend to such material.

The best-per­form­ing message increased pur­chas­es by around 10% com­pared with the worst-per­form­ing message. There was also evi­dence for vari­a­tion in the effec­tive­ness of dif­fer­ent mes­sages in dif­fer­ent markets, which means the health­care firm can now adapt mes­sag­ing to dif­fer­ent territories.

While behav­iour­al lit­er­a­ture can inform con­sul­tan­cies of the likely levers that will influ­ence behav­iour, behav­iour­al exper­i­men­ta­tion can go further and allow com­pa­nies to opti­mise inter­ven­tions for maximum impact in their spe­cif­ic context.

The Promise of Impact

A wide number of chal­lenges facing society have behav­iour­al solu­tions: climate change, tax evasion, obesity to name a few. Using behav­iour­al insights to inform policy will allow the behav­iour­al sci­ences to deliver on the promise of improv­ing lives.

For example, the Uni­ver­si­ty of Oxford’s Nuffield Depart­ment of Primary Care and Health Sci­ences has used an online tool to design a virtual super­mar­ket to test how people respond to tweaks to food labelling. The fun­da­men­tal premise is that if we can change what people buy, we can change what people eat. And if we can change what people eat, we can improve diets and reduce lifestyle diseases.

“It would be very chal­leng­ing, if not impos­si­ble, to run these studies in real online super­mar­kets,” says team member Dim­itrios Koutoukidis. “The exper­i­men­tal super­mar­ket plat­form allows us to test and opti­mise dif­fer­ent inter­ven­tions quickly and rel­a­tive­ly cheaply.”

Until recent­ly any changes to mes­sag­ing were largely tested in focus groups if at all, so are only likely to discern self-report­ed inten­tions, not the reality of a situation.

Con­tain­ing all the fea­tures of a normal online super­mar­ket, such as brows­ing for prod­ucts, adding items to a basket and check­ing out, the spe­cial­ly designed online super­mar­ket also con­tains fea­tures such as shop­ping lists and basket budgets. Behind the scenes, researchers can change adverts, add taxes and rebates, change the order in which lists of prod­ucts appear, high­light nutri­tion­al infor­ma­tion and change food labelling. They can also offer swaps for alter­na­tive items that might be a health­i­er or dif­fer­ent­ly priced option.

The super­mar­ket has revealed that fiscal poli­cies that tax food or drinks may be an effec­tive means of alter­ing food pur­chas­ing, with a 20% rate being enough to sig­nif­i­cant­ly alter pur­chas­es of break­fast cereals and soft drinks (Zizzo et al., 2016).

The super­mar­ket has also revealed that listing foods so that those with less sat­u­rat­ed fat are at the top reduces the total amount of sat­u­rat­ed fats in the shop­ping basket at check­out (Koutoukidis et al., 2019).

This excep­tion­al degree of exper­i­men­tal control gives tools like this great power to inform public policy and ulti­mate­ly improve lives.


All these case studies demon­strate how online tools – like the Gorilla Experiment Builder and Testable — are opening a new fron­tier for behav­iour­al science. The ability to gain behav­iour­al insights from exper­i­ments with large sample sizes in a short space of time eclipses what can be done in the lab and opens up new opportunities.

Online tools have already been used to inves­ti­gate a wide range of topics, but they cer­tain­ly haven’t reached their limits. As Bill Gates once said, “If you give people tools, and they use their natural abil­i­ties and their curios­i­ty, they will develop things in ways that will sur­prise you very much beyond what you might have expected.”

Getting the science of behav­iour­al eco­nom­ics right will have pro­found results. Acad­e­mia has the oppor­tu­ni­ty to banish the ghost of the repli­ca­tion crisis and shift the evi­dence base back onto a firmer footing. Stu­dents can equip them­selves for a future that will benefit from digital exper­i­men­ta­tion in a wide range of indus­tries. Indus­try can use the insights gained to make better prod­ucts and ser­vices that improve lives. And finally, policy makers can create evi­dence-informed reg­u­la­tions that improve society. Alto­geth­er these will combine to improve our health, wealth, hap­pi­ness and education.


Jo Ever­shed

Jo is the CEO and co-founder of Caul­dron and Gorilla. Her mission is to provide behav­iour­al sci­en­tists with the tools needed to improve the scale and impact of the evi­dence-based inter­ven­tions that benefit society. 

List of References

Anwyl-Irvine, A.L., Mas­son­nié, J., Flitton, A., Kirkham, N., & Ever­shed, J.K. (2020). Gorilla in our midst: An online behavioral experiment builder. Behav­iour Research Methods, 52(1), 388 – 407.–01237‑x 

Bridges, D., Pitiot, A., MacAskill, M. R., & Peirce, J. W. (2020). The timing mega-study: com­par­ing a range of experiment gen­er­a­tors, both lab-based and online. PsyArX­iv.

Camerer, C., Dreber, A., Forsell, E., Ho, T., Huber, J., Johan­nes­son, M., Kirch­ler, M., Almen­berg, J., Altmejd, A.,Chan, T., Heiken­sten, E., Holzmeis­ter, F., Imai, T., Isaks­son, S., Nave, G., Pfeif­fer, T., Razen, M., & Wu, H. (2016). Eval­u­at­ing replic­a­bil­i­ty of lab­o­ra­to­ry exper­i­ments in eco­nom­ics. Science, 351(6280), 1433 – 1436.

Chang, A.C., & Li, P. (2015). Is Eco­nom­ics Research Replic­a­ble? Sixty Pub­lished Papers from Thir­teen Jour­nals Say ”Usually Not”. Finance and Eco­nom­ics Dis­cus­sion Series 2015-083, Board of Gov­er­nors of the Federal Reserve System.

Huber, J., Payne, J. W., & Puto, C. (1982). Adding Asym­met­ri­cal­ly Dom­i­nat­ed Alter­na­tives: Vio­la­tions of Reg­u­lar­i­ty and the Sim­i­lar­i­ty Hypoth­e­sis. Journal of Con­sumer Research, 9(1): 90 – 98.

Koutoukidis, D.A., Jebb, S.A., Ordóñez-Mena, J.M., Noreik, M., Tsiountsioura, M., Kennedy, S., Payne-Riches, S., Aveyard, P., & Piernas, C. (2019). Promi­nent posi­tion­ing and food swaps are effec­tive inter­ven­tions to reduce the sat­u­rat­ed fat content of the shop­ping basket in an exper­i­men­tal online super­mar­ket: a ran­dom­ized con­trolled trial. Inter­na­tion­al Journal of Behavioral Nutri­tion and Phys­i­cal Ability, 16(50).‑0810‑9

Lunn, P.D., Timmons, S., Bar­jaková, M., Belton, C.A., Juli­enne, H., & Lavin, C. (2020). Moti­vat­ing social dis­tanc­ing during the Covid-19 pan­dem­ic: An online experiment. PsyArX­iv.

Reimers, S., & Stewart, N. (2015). Pre­sen­ta­tion and response timing accu­ra­cy in Adobe Flash and HTML5/JavaScript Web exper­i­ments. Behav­ior Research Methods, 47, 309 – 327.‑0471‑1

Rodd, J. (2019, Feb­ru­ary 27). How to Main­tain Data Quality When You Can’t See Your Par­tic­i­pants. Asso­ci­a­tion for Psy­cho­log­i­cal Science. Retrieved from:

Timmons, S. (2020, April 3). Some results from our first #COVID19 experiment! Twitter @_shanetimmons. Retrieved from

Women in Tech­nol­o­gy – A Behav­iour­al Approach. (2019, Decem­ber 9). MoreThanNow. Retrieved from

Zloteanu, M., Harvey, N., Tuckett, D., & Livan, G. (2018). Digital Iden­ti­ty: The effect of trust and rep­u­ta­tion infor­ma­tion on user judge­ment in the Sharing Economy. PloS one, 13(12).

Zizzo, D., Par­ra­vano, M., Naka­mu­ra, R., Forwood, S., & Suhrcke, M.E. (2016). The impact of tax­a­tion and sign­post­ing on diet: an online field study with break­fast cereals and soft drinks. Centre for Health Eco­nom­ics Research Paper. Retrieved from