How Online Behavioural Experiments Are Opening New Opportunities for Behavioural Economists featured image
How Online Behav­iour­al Exper­i­ments Are Open­ing New Oppor­tu­ni­ties for Behav­iour­al Economists

Pick­ing a wine in a restau­rant can be a night­mare, even if there are just three to choose from. Do you opt for the cheap­est one, which sounds depend­able? Or the mid-priced one that appar­ent­ly goes with your food bet­ter? Or the most expen­sive one, even though it has a less appeal­ing descrip­tion than the mid-priced one? Chances are you would opt for the mid­dle wine, even though it costs more than you want­ed to pay. If so, you have expe­ri­enced the decoy effect. The pres­ence of the expen­sive option made you more like­ly to opt for the mid-priced wine than the cheap one.

The decoy effect is a clas­sic piece of behav­iour­al eco­nom­ics, backed up by numer­ous stud­ies stretch­ing back to the 1980s, and is used as a sales and mar­ket­ing tech­nique by count­less com­pa­nies (Huber, Payne & Puto, 1982). And yet, for a while, it was on the ropes after attempts to repli­cate some well-known decoy stud­ies failed.

Being able to repli­cate and val­i­date the results of research is the cor­ner­stone of sci­ence, and the repli­ca­tion cri­sis that has torn through many sci­en­tif­ic fields has seen key stud­ies over­turned after their results couldn’t be repeat­ed and many oth­ers come under the long shad­ow of sus­pi­cion (Camer­er et al., 2016; Chang & Li, 2015).

Sci­en­tists have blamed the repli­ca­tion cri­sis on numer­ous fac­tors includ­ing, small sam­ples and weird sam­ples. First­ly, when col­lect­ing data in a lab, it can take days if not months to col­lect data face-to-face. Con­se­quent­ly, labs have often col­lect­ed the small­est sam­ple pos­si­ble to detect the stud­ied effect and that makes their results weak­er than they could be. Sec­ond­ly, if you are test­ing in a lab, then the eas­i­est sam­ple to test is your department’s under­grad­u­ate stu­dents. To sup­port this, many depart­ments require their stu­dents to take part in experiments.

How­ev­er, under­grad­u­ate behav­iour­al sci­ence stu­dents do not rep­re­sent the full diver­si­ty of the wider pop­u­la­tion. Con­se­quent­ly, we know a lot about how behav­iour­al sci­ence stu­dents behave, but there’s no guar­an­tee that this will trans­fer to a gen­er­al population.

To increase repro­ducibil­i­ty, behav­iour­al sci­en­tists have sought approach­es that allow them to increase the size and diver­si­ty of their sam­ples. One way to do this is to take research online, as it allows you to reach a large and diverse sam­ple quick­ly and eas­i­ly (Anwyl-Irvine et al., 2020). If you can get your behav­iour­al task online, a wide range of mar­ket research agen­cies and par­tic­i­pant recruit­ment ser­vices will pro­vide par­tic­i­pants for a fee. With these, you can get a rep­re­sen­ta­tive sam­ple of a thou­sand par­tic­i­pants in a mat­ter of hours.

Nev­er­the­less, these gains come at a cost. Run­ning exper­i­ments online requires researchers to give up con­trol and accept a high­er degree of uncer­tain­ty about the iden­ti­ty of par­tic­i­pants and the test­ing con­di­tions (Rodd, 2019). Addi­tion­al­ly, while the tim­ing accu­ra­cy pro­vid­ed by browsers as of 2015 is good enough for a wide range of behav­iour­al research (Reimers & Stew­art, 2015), it’s not as good as the tim­ing accu­ra­cy of installed soft­ware typ­i­cal­ly used in the lab (Bridges et al., 2020).

By tak­ing research online, behav­iour­al researchers can trade a small amount of con­trol and pre­ci­sion for a huge increase in exper­i­men­tal power, more rep­re­sen­ta­tive sam­ples and a dra­mat­ic increase in the pace of research. Online meth­ods can then be used in con­junc­tion with other research meth­ods (nat­ur­al exper­i­ments, field stud­ies, focus groups, sur­vey etc.) to pro­vide a robust evi­dence base.

Rapid Deploy­ment, Rapid Results

This increase in speed means that research can be done in response to cur­rent events and still give reli­able find­ings that could be used to inform policy.

For exam­ple, exper­i­ments have already shown how sub­tle dif­fer­ences in mes­sag­ing about the new coro­n­avirus could influ­ence how peo­ple respond to lock­down guid­ance and thus the rate of virus trans­mis­sion. Shane Tim­mons at the Eco­nom­ic and Social Research Insti­tute in Dublin, Ire­land, and his col­leagues have dis­cov­ered two key things by show­ing peo­ple dif­fer­ent posters in an online experiment.

They found that high­light­ing risks to peo­ple who are par­tic­u­lar­ly vul­ner­a­ble to covid-19 such as the elder­ly and health­care work­ers and focus­ing on the expo­nen­tial rate of trans­mis­sion made peo­ple more cau­tious about “mar­gin­al behav­iours” relat­ed to social dis­tanc­ing, such as meet­ing up with friends out­doors, vis­it­ing par­ents, let­ting kids play togeth­er (Lunn et al., 2020).

This sug­gests that there are bet­ter ways to pro­mote social dis­tanc­ing than cur­rent offi­cial advice, said Tim­mons on Twit­ter (2020).

Timmon’s study has gone from con­cep­tion to pre-print in a mat­ter of weeks, which would not have been pos­si­ble with a lab-based study.


Eco­log­i­cal Validity

Mircea Zloteanu and his col­leagues have been run­ning exper­i­ments look­ing at people’s online behav­iour on shar­ing econ­o­my plat­forms. His team has cre­at­ed a sim­u­lat­ed AirBnB-style web­site to mea­sure how peo­ple make deci­sions about hosts who are given dif­fer­ent reviews or star rat­ings (Zloteanu et al., 2018). They found that par­tic­i­pants over-weighed social infor­ma­tion and under-weighed non-social infor­ma­tion, draw­ing atten­tion to a cog­ni­tive bias that can lead to poor­er deci­sion-mak­ing on a shar­ing econ­o­my platform.

As more of our lives hap­pen online, for instance social media, bank­ing, shop­ping, dat­ing, online envi­ron­ments open up as eco­log­i­cal­ly valid set­ting for psy­cho­log­i­cal research. Cre­at­ing fac­sim­i­les of the web­sites that we use, and using them to study behav­iour, gives us the exper­i­men­tal con­trol we need to under­stand how peo­ple behave in the dig­i­tal world.

Embed­ding Dig­i­tal Exper­i­men­ta­tion in Industry

Many com­pa­nies have strug­gled to embrace dig­i­tal exper­i­men­ta­tion because of the wide range of spe­cial­ist skills need­ed to do it suc­cess­ful­ly. Until recent­ly you’d need a behav­iour­al sci­ence grad­u­ate, a pro­gram­mer and poten­tial­ly also a data sci­en­tist. A key aspect to chang­ing this, is to ensure that the next gen­er­a­tion of behav­iour­al sci­en­tists grad­u­ates with the skills and expe­ri­ence to cre­ate and analyse dig­i­tal exper­i­ments independently.

“[Online exper­i­ment builders have] allowed our stu­dents to fol­low their sci­en­tif­ic curios­i­ty, and be reward­ed with real data, from the very first stages of their degree,” says Daniel C. Richard­son, an exper­i­men­tal psy­chol­o­gist at Uni­ver­si­ty Col­lege London.

He and his col­leagues have used such tools in their lec­tures, sem­i­nars and lab mod­ules. Their stu­dents gen­er­at­ed their own hypothe­ses and used the tool to cre­ate exper­i­ments relat­ing to what makes peo­ple donate money to char­i­ty and col­lect­ed their own data.

Each exper­i­ment began with par­tic­i­pants being told to imag­ine that they had just won £100. Then there were one of two slight­ly dif­fer­ent appeals for a char­i­ty, which could be an image, text or even a movie. Par­tic­i­pants were then asked how much money they would want to donate.

Cru­cial­ly, every time there was a small dif­fer­ence between the two appeals, allow­ing the stu­dents to test a range of hypothe­ses relat­ing to pro-social behaviour.

One of the most inter­est­ing find­ings was that in an advert for a domes­tic abuse char­i­ty, refer­ring to some­one as a “sur­vivor” rather than a “vic­tim” increased dona­tion by more than 25%.

Stu­dents made posters of their results, and two of them were accept­ed to the British Psy­cho­log­i­cal Society’s social psy­chol­o­gy con­fer­ence and won awards, even though they were first-year stu­dents com­pet­ing against grad­u­ate stu­dents and estab­lished researchers.

As these stu­dents move on to careers in acad­e­mia or indus­try, ini­tia­tives like this should help embed a cul­ture of dig­i­tal exper­i­men­ta­tion and evi­dence-based deci­sion-mak­ing in a wide range of indus­tries includ­ing mar­ket­ing, adver­tis­ing, recruit­ment, PR and pol­i­cy making.

Large, Robust Study Sizes

A key aspect of reli­able sci­ence is hav­ing a large enough sam­ple size that you can be con­fi­dent in what­ev­er results are gen­er­at­ed. This is an area in which dig­i­tal exper­i­ments can real­ly help. The speed, scale and reach of online research can be tremendous.

The large sam­ple size made it pos­si­ble for Richardson’s stu­dents to pro­duce award-win­ning stud­ies. “The stu­dents ran around 30 dif­fer­ent exper­i­ments, crowd-sourc­ing data from over 1200 peo­ple, across more than 20 coun­tries,” says Richard­son. “I was aston­ished by this – that’s more data than my lab by itself would typ­i­cal­ly col­lect in a year. What was also impres­sive was the vari­ety of ideas and the­o­ries that the stu­dents tested.”

If you don’t have a cohort of stu­dents will­ing to lever­age their social net­works, then pair­ing an online exper­i­ment plat­form with a recruit­ment ser­vice like Pro­lif­ic makes it pos­si­ble to get thou­sands of par­tic­i­pants to take part in a study in a day. For small stud­ies of 100 par­tic­i­pants the main ben­e­fit is the time sav­ing. It might take a lab 6 weeks to test 100 par­tic­i­pants, but only an hour to do so online. But the more impor­tant rev­e­la­tion is that you can also test 1000 or 10,000 par­tic­i­pants online in not much more time. Sam­ple sizes of these mag­ni­tudes would be near impos­si­ble in a lab-based set­ting. The result is that researchers can ask and answer ques­tions at pace, and build each new study on firm foun­da­tions of prop­er­ly pow­ered studies.

The Inten­tion-Action Gap

David Ogilvy famous­ly said “Peo­ple don’t think what they feel, don’t say what they think and don’t do what they say”. Behav­iour­al research allows you to mea­sure what peo­ple actu­al­ly do, rather than what they say. A new gen­er­a­tion of behav­iour­al sci­ence con­sul­tan­cies are going beyond tra­di­tion­al sur­veys and embrac­ing behav­iour­al exper­i­men­ta­tion to bridge this gap.

In a reveal­ing exam­ple, behav­iour­al change con­sul­tan­cy MoreThanNow want­ed to see if mes­sag­ing tweaks could boost the num­ber of women who want to go into sci­ence, tech­nol­o­gy, engi­neer­ing or math­e­mat­ics (STEM) jobs.

In STEM-focused organ­i­sa­tions, women have only 5% of board posi­tions, with lit­tle evi­dence of a shift on the hori­zon. MoreThanNow want­ed to address the dis­par­i­ty in appli­ca­tion rates for tech­nol­o­gy careers by focus­ing on the effec­tive­ness of recruit­ment mes­sages, and try to under­stand not just what peo­ple think, but also how they actu­al­ly behave in a recruit­ment situation.

Using a large sam­ple of 18 to 23-year-olds, they test­ed dif­fer­ent recruit­ment adverts and mes­sages using a sur­vey, but also gave par­tic­i­pants the option of leav­ing the sur­vey to explore cur­rent tech­nol­o­gy grad­u­ate roles on a pop­u­lar recruit­ment web­site to under­stand if any of the mes­sages changed behaviour

By sim­ply adding a but­ton to the end of a sur­vey, MoreThanNow added a behav­iour­al mea­sure to test each job advert to mea­sure the gap between self-report­ed inten­tions and action.

Three types of mes­sage were test­ed: proso­cial ones focus­ing on help­ing peo­ple and solv­ing social prob­lems; self-inter­est ones that talked about increas­ing per­son­al reward and career oppor­tu­ni­ties; and com­mu­nal ones talk­ing about work in a close com­mu­ni­ty and being sup­port­ed by a tight-knit team.

The sur­vey part of the exper­i­ment showed that, in line with most self-report research on this topic, women respond­ed to pro-social mes­sages and men to those of self-inter­est. In con­trast, the behav­iour­al mea­sure showed a dif­fer­ent result. There was no sta­tis­ti­cal dif­fer­ence in gen­dered response to pro-social or self-inter­est mes­sag­ing. Instead men respond­ed to the com­mu­nal mes­sage “join a com­mu­ni­ty that works togeth­er” far more than women.

By using behav­iour­al insights, rather than sur­vey data, MoreThanNow have cre­at­ed adverts that dou­bled the num­ber of women explor­ing tech­nol­o­gy careers. These find­ings under­line how self-report sur­veys could lead us to draw false con­clu­sions if they aren’t backed up by exper­i­men­ta­tion that tests the real­i­ty of what is said (Women in Tech­nol­o­gy — A Behav­iour­al Approach, 2019).


Refin­ing Advi­so­ry Ser­vices with Con­text Spe­cif­ic Experimentation

When it comes to human behav­iour, the rich pageant of our cul­tures, knowl­edge and lan­guages can influ­ence what we do or how we act. It may be that there aren’t many the­o­ries that seam­less­ly repli­cate across peo­ple, indus­tries, con­texts, per­son­al­i­ties and emo­tion­al states, but rather there are sub­tle loca­tion-spe­cif­ic dif­fer­ences. This is where online exper­i­men­ta­tion can real­ly come into its own.

For exam­ple, the Behav­iour­al Sci­ence Unit of pub­lic rela­tions firm Hill + Knowl­ton Strate­gies has tried to under­stand how changes to adverts about cold and flu reme­dies affect whether peo­ple buy cer­tain cold and flu prod­ucts from a cer­tain health­care firm.

Focus groups and inter­views had proved time con­sum­ing and gen­er­at­ed insuf­fi­cient insights to act on con­fi­dent­ly. So the firm sup­ple­ment­ed this work with dig­i­tal exper­i­men­ta­tion to cre­ate a vir­tu­al walk­through of a real­is­tic, clut­tered phar­ma­cy. Par­tic­i­pants could choose where to go with the click of a but­ton, what shelves to look at and choose prod­ucts to add to their bas­ket. They could also inter­act with dig­i­tal phar­ma­cists. A bit like being in a com­put­er game.

H+K used a between-sub­ject design, in which dif­fer­ent peo­ple test each vari­a­tion of mes­sag­ing, so that each per­son is exposed to only a sin­gle sit­u­a­tion. An advert or a series of adverts were placed with­in the phar­ma­cy, but other than that con­di­tions were iden­ti­cal. The mes­sages on the adverts dif­fered in what behav­iour­al insights they addressed.

Around half of par­tic­i­pants didn’t notice the mes­sages, which pro­vides evi­dence for the valid­i­ty of the sim­u­la­tion – in real-life, peo­ple tend not to con­scious­ly attend to such material.

The best-per­form­ing mes­sage increased pur­chas­es by around 10% com­pared with the worst-per­form­ing mes­sage. There was also evi­dence for vari­a­tion in the effec­tive­ness of dif­fer­ent mes­sages in dif­fer­ent mar­kets, which means the health­care firm can now adapt mes­sag­ing to dif­fer­ent territories.

While behav­iour­al lit­er­a­ture can inform con­sul­tan­cies of the like­ly levers that will influ­ence behav­iour, behav­iour­al exper­i­men­ta­tion can go fur­ther and allow com­pa­nies to opti­mise inter­ven­tions for max­i­mum impact in their spe­cif­ic context.

The Promise of Impact

A wide num­ber of chal­lenges fac­ing soci­ety have behav­iour­al solu­tions: cli­mate change, tax eva­sion, obe­si­ty to name a few. Using behav­iour­al insights to inform pol­i­cy will allow the behav­iour­al sci­ences to deliv­er on the promise of improv­ing lives.

For exam­ple, the Uni­ver­si­ty of Oxford’s Nuffield Depart­ment of Pri­ma­ry Care and Health Sci­ences has used an online tool to design a vir­tu­al super­mar­ket to test how peo­ple respond to tweaks to food labelling. The fun­da­men­tal premise is that if we can change what peo­ple buy, we can change what peo­ple eat. And if we can change what peo­ple eat, we can improve diets and reduce lifestyle diseases.

“It would be very chal­leng­ing, if not impos­si­ble, to run these stud­ies in real online super­mar­kets,” says team mem­ber Dim­itrios Koutoukidis. “The exper­i­men­tal super­mar­ket plat­form allows us to test and opti­mise dif­fer­ent inter­ven­tions quick­ly and rel­a­tive­ly cheaply.”

Until recent­ly any changes to mes­sag­ing were large­ly test­ed in focus groups if at all, so are only like­ly to dis­cern self-report­ed inten­tions, not the real­i­ty of a situation.

Con­tain­ing all the fea­tures of a nor­mal online super­mar­ket, such as brows­ing for prod­ucts, adding items to a bas­ket and check­ing out, the spe­cial­ly designed online super­mar­ket also con­tains fea­tures such as shop­ping lists and bas­ket bud­gets. Behind the scenes, researchers can change adverts, add taxes and rebates, change the order in which lists of prod­ucts appear, high­light nutri­tion­al infor­ma­tion and change food labelling. They can also offer swaps for alter­na­tive items that might be a health­i­er or dif­fer­ent­ly priced option.

The super­mar­ket has revealed that fis­cal poli­cies that tax food or drinks may be an effec­tive means of alter­ing food pur­chas­ing, with a 20% rate being enough to sig­nif­i­cant­ly alter pur­chas­es of break­fast cere­als and soft drinks (Zizzo et al., 2016).

The super­mar­ket has also revealed that list­ing foods so that those with less sat­u­rat­ed fat are at the top reduces the total amount of sat­u­rat­ed fats in the shop­ping bas­ket at check­out (Koutoukidis et al., 2019).

This excep­tion­al degree of exper­i­men­tal con­trol gives tools like this great power to inform pub­lic pol­i­cy and ulti­mate­ly improve lives.


All these case stud­ies demon­strate how online tools – like the Gorilla Exper­i­ment Builder and Testable — are open­ing a new fron­tier for behav­iour­al sci­ence. The abil­i­ty to gain behav­iour­al insights from exper­i­ments with large sam­ple sizes in a short space of time eclipses what can be done in the lab and opens up new opportunities.

Online tools have already been used to inves­ti­gate a wide range of top­ics, but they cer­tain­ly haven’t reached their lim­its. As Bill Gates once said, “If you give peo­ple tools, and they use their nat­ur­al abil­i­ties and their curios­i­ty, they will devel­op things in ways that will sur­prise you very much beyond what you might have expected.”

Get­ting the sci­ence of behav­iour­al eco­nom­ics right will have pro­found results. Acad­e­mia has the oppor­tu­ni­ty to ban­ish the ghost of the repli­ca­tion cri­sis and shift the evi­dence base back onto a firmer foot­ing. Stu­dents can equip them­selves for a future that will ben­e­fit from dig­i­tal exper­i­men­ta­tion in a wide range of indus­tries. Indus­try can use the insights gained to make bet­ter prod­ucts and ser­vices that improve lives. And final­ly, pol­i­cy mak­ers can cre­ate evi­dence-informed reg­u­la­tions that improve soci­ety. Alto­geth­er these will com­bine to improve our health, wealth, hap­pi­ness and education.


Jo Ever­shed

Jo is the CEO and co-founder of Caul­dron and Gorilla. Her mis­sion is to pro­vide behav­iour­al sci­en­tists with the tools need­ed to improve the scale and impact of the evi­dence-based inter­ven­tions that ben­e­fit society. 

List of References

Anwyl-Irvine, A.L., Mas­son­nié, J., Flit­ton, A., Kirkham, N., & Ever­shed, J.K. (2020). Gorilla in our midst: An online behav­ioral exper­i­ment builder. Behav­iour Research Meth­ods, 52(1), 388 – 407.–01237‑x 

Bridges, D., Pitiot, A., MacAskill, M. R., & Peirce, J. W. (2020). The tim­ing mega-study: com­par­ing a range of exper­i­ment gen­er­a­tors, both lab-based and online. PsyArX­iv.

Camer­er, C., Dreber, A., Forsell, E., Ho, T., Huber, J., Johan­nes­son, M., Kirch­ler, M., Almen­berg, J., Alt­mejd, A.,Chan, T., Heiken­sten, E., Holzmeis­ter, F., Imai, T., Isaks­son, S., Nave, G., Pfeif­fer, T., Razen, M., & Wu, H. (2016). Eval­u­at­ing replic­a­bil­i­ty of lab­o­ra­to­ry exper­i­ments in eco­nom­ics. Sci­ence, 351(6280), 1433 – 1436.

Chang, A.C., & Li, P. (2015). Is Eco­nom­ics Research Replic­a­ble? Sixty Pub­lished Papers from Thir­teen Jour­nals Say ”Usu­al­ly Not”. Finance and Eco­nom­ics Dis­cus­sion Series 2015-083, Board of Gov­er­nors of the Fed­er­al Reserve Sys­tem.

Huber, J., Payne, J. W., & Puto, C. (1982). Adding Asym­met­ri­cal­ly Dom­i­nat­ed Alter­na­tives: Vio­la­tions of Reg­u­lar­i­ty and the Sim­i­lar­i­ty Hypoth­e­sis. Jour­nal of Con­sumer Research, 9(1): 90 – 98.

Koutoukidis, D.A., Jebb, S.A., Ordóñez-Mena, J.M., Nor­eik, M., Tsiountsioura, M., Kennedy, S., Payne-Rich­es, S., Ave­yard, P., & Pier­nas, C. (2019). Promi­nent posi­tion­ing and food swaps are effec­tive inter­ven­tions to reduce the sat­u­rat­ed fat con­tent of the shop­ping bas­ket in an exper­i­men­tal online super­mar­ket: a ran­dom­ized con­trolled trial. Inter­na­tion­al Jour­nal of Behav­ioral Nutri­tion and Phys­i­cal Abil­i­ty, 16(50).‑0810‑9

Lunn, P.D., Tim­mons, S., Bar­jaková, M., Bel­ton, C.A., Juli­enne, H., & Lavin, C. (2020). Moti­vat­ing social dis­tanc­ing dur­ing the Covid-19 pan­dem­ic: An online exper­i­ment. PsyArX­iv.

Reimers, S., & Stew­art, N. (2015). Pre­sen­ta­tion and response tim­ing accu­ra­cy in Adobe Flash and HTML5/JavaScript Web exper­i­ments. Behav­ior Research Meth­ods, 47, 309 – 327.‑0471‑1

Rodd, J. (2019, Feb­ru­ary 27). How to Main­tain Data Qual­i­ty When You Can’t See Your Par­tic­i­pants. Asso­ci­a­tion for Psy­cho­log­i­cal Sci­ence. Retrieved from:

Tim­mons, S. (2020, April 3). Some results from our first #COVID19 exper­i­ment! Twit­ter @_shanetimmons. Retrieved from

Women in Tech­nol­o­gy – A Behav­iour­al Approach. (2019, Decem­ber 9). MoreThanNow. Retrieved from

Zloteanu, M., Har­vey, N., Tuck­ett, D., & Livan, G. (2018). Dig­i­tal Iden­ti­ty: The effect of trust and rep­u­ta­tion infor­ma­tion on user judge­ment in the Shar­ing Econ­o­my. PloS one, 13(12).

Zizzo, D., Par­ra­vano, M., Naka­mu­ra, R., For­wood, S., & Suhrcke, M.E. (2016). The impact of tax­a­tion and sign­post­ing on diet: an online field study with break­fast cere­als and soft drinks. Cen­tre for Health Eco­nom­ics Research Paper. Retrieved from