Rethinking the Gold Standard With Multi-armed Bandits: Machine Learning Allocation Algorithms for Experiments

Abstract

In experiments, researchers commonly allocate subjects randomly and equally to the different treatment conditions before the experiment starts. While this approach is intuitive, it means that new information gathered during the experiment is not utilized until after the experiment has ended. Based on methodological approaches from other scientific disciplines such as computer science and medicine, we suggest machine learning algorithms for subject allocation in experiments. Specifically, we discuss a Bayesian multi-armed bandit algorithm for randomized controlled trials and use Monte Carlo simulations to compare its efficiency with randomized controlled trials that have a fixed and balanced subject allocation. Our findings indicate that a randomized allocation based on Bayesian multi-armed bandits is more efficient and ethical in most settings. We develop recommendations for researchers and discuss the limitations of our approach.

Keywords

experiments randomized controlled trial multi-armed bandit exploration versus exploitation machine learning ethics in research

Get full access to this article

View all access options for this article.

References

Agrawal

Goyal

(2012). Analysis of Thompson sampling for the multi-armed bandit problem. In 25th annual conference on learning theory (pp. 31–39). Retrieved from http://www.jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf

Aguinis

Edwards

J. R.

(2014). Methodological wishes for the next decade and how to make wishes come true. Journal of Management Studies, 51(1), 143–174. doi:10.1111/joms.12058

Ahuja

Birge

J. R.

(2016). Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients. European Journal of Operational Research, 248(2), 619–633. doi:10.1016/J.EJOR.2015.06.077

American Psychological Association. (2002). Ethical principles of psychologists and code of conduct. American Psychologist, 57(12), 1060–1073. doi:10.1037/0003-066X.57.12.1060

Antonakis

(2017). On doing better science: From thrill of discovery to policy implications. Leadership Quarterly, 28(1), 5–21. doi:10.1016/j.leaqua.2017.01.006

Antonakis

Bendahan

Jacquart

Lalive

(2010). On making causal claims: A review and recommendations. Leadership Quarterly, 21(6), 1086–1120. doi:10.1016/j.leaqua.2010.10.010

Arthur

Bennett

Edens

P. S.

Bell

S. T.

(2003). Effectiveness of training in organizations: A meta-analysis of design and evaluation features. Journal of Applied Psychology, 88(2), 234–245. doi:10.1037/0021-9010.88.2.234

Audibert

J.-Y.

Munos

Szepesvári

(2009). Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902. doi:10.1016/j.tcs.2009.01.016

Bartunek

J. M.

Rynes

S. L.

(2014). Academics and practitioners are alike and unlike the paradoxes of academic–practitioner relationships. Journal of Management, 40(5), 1181–1201. doi:10.1177/0149206314529160

10.

Bates

(2004). A critical analysis of evaluation practice: the Kirkpatrick model and the principle of beneficence. Evaluation and Program Planning, 27(3), 341–347. doi:10.1016/J.evalprogplan.2004.04.011

11.

Beauchamp

Childress

(1983). Principles of biomedical ethics (2nd ed.). New York: Oxford University Press.

12.

Bernerth

J. B.

Aguinis

(2016). A critical review and best-practice recommendations for control variable usage. Personnel Psychology, 69(1), 229–283. doi:10.1111/peps.12103

13.

Berry

D. A.

(2004). Bayesian statistics and the efficiency and ethics of clinical trials. Statistical Science, 19(1), 175–187. Retrieved from http://www.jstor.org/stable/4144381

14.

Berry

D. A.

Eick

S. G.

(1995). Adaptive assignment versus balanced randomization in clinical trials: A decision analysis. Statistics in Medicine, 14(3), 231–246. doi:10.1002/sim.4780140302

15.

Berry

D. A.

Fristedt

(1985). Bandit problems: Sequential allocation of experiments. Monographs on statistics and applied probability. London, UK: Chapman and Hall.

16.

Besbes

Gur

Zeevi

(2014). Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Retrieved from http://arxiv.org/abs/1405.3316

17.

Blume

B. D.

Ford

J. K.

Baldwin

T. T.

Huang

J. L.

(2010). Transfer of training: A meta-analytic review. Journal of Management, 36(4), 1065–1105. doi:10.1177/0149206309352880

18.

Bodner

T. E.

(2018). Estimating and testing for differential treatment effects on outcomes when the outcome variances differ. Psychological Methods, 23(1), 125–137. doi:10.1037/met0000158

19.

Bodner

T. E.

Bliese

P. D.

(2018). Detecting and differentiating the direction of change and intervention effects in randomized trials. Journal of Applied Psychology, 103(1), 37–53. doi:10.1037/apl0000251

20.

Bouffard

Reid

(2012). The good, the bad, and the ugly of evidence-based practice. Adapted Physical Activity Quarterly, 29(1), 1–24.

21.

Bouneffouf

Féraud

(2016). Multi-armed bandit problem with known trend. Neurocomputing, 205, 16–21. doi:10.1016/j.neucom.2016.02.052

22.

Brezzi

Lai

T. L.

(2002). Optimal learning and experimentation in bandit problems. Journal of Economic Dynamics and Control, 27(1), 87–108. doi:10.1016/S0165-1889(01)00028-8

23.

Briner

R. B.

Denyer

Rousseau

D. M.

(2009). Evidence-based management: Concept cleanup time? Academy of Management Perspectives, 23(4), 19–32. doi:10.5465/AMP.2009.45590138

24.

Briner

R. B.

Walshe

N. D.

(2013). Evidence-based management and leadership. In Leonard

H. S.

Lewis

Freedman

A. M.

Passmore

(Eds.), Wiley Blackwell handbook of the psychology of leadership change and organizational development (pp. 49–64). Somerset, NJ: Wiley.

25.

Brutus

Gill

Duniewicz

(2010). State of science in industrial and organizational psychology: A review of self-reported limitations. Personnel Psychology, 63(4), 907–936. doi:10.1111/j.1744-6570.2010.01192.x

26.

Burgess

Singh

P. J.

Koroglu

(2006). Supply chain management: A structured literature review and implications for future research. International Journal of Operations & Production Management, 26(7), 703–729. doi:10.1108/01443570610672202

27.

Cappe

Garivier

Kaufmann

(2012). pymaBandits. Retrieved from http://mloss.org/software/view/415/

28.

Carlgren

Rauth

Elmquist

(2016). Framing design thinking: The concept in idea and enactment. Creativity and Innovation Management, 25(1), 38–57. doi:10.1111/caim.12153

29.

Caro

Gallien

(2007). Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53(2), 276–292. doi:10.1287/mnsc.1060.0613

30.

Cascio

W. F.

Montealegre

(2016). How technology is changing work and organizations. Annual Review of Organizational Psychology and Organizational Behavior, 3(1), 349–375. doi:10.1146/annurev-orgpsych-041015-062352

31.

Chaffin

Heidl

Hollenbeck

J. R.

Howe

Voorhees

Calantone

(2017). The promise and perils of wearable sensors in organizational research. Organizational Research Methods, 20, 3–31. doi:10.1177/1094428115617004

32.

Chapelle

(2011). An empirical evaluation of Thompson sampling. In Shawe-Taylor

Zemel

R. S.

Bartlett

P. L.

Pereira

Weinberger

K. Q.

(Eds.), Advances in neural information processing systems 24 (pp. 2249–2257). Red Hook, NY: Curran Associates.

33.

Chatterji

A. K.

Findley

Jensen

N. M.

Meier

Nielson

(2016). Field experiments in strategy research. Strategic Management Journal, 37(1), 116–132. doi:10.1002/smj.2449

34.

Cohen

(1992). A power primer. Psychological Bulletin, 112(1), 155–159. doi:10.1037/0033-2909.112.1.155

35.

Colquitt

J. A.

(2008). From the editors publishing laboratory research in AMJ: A question of when, not if. Academy of Management Journal, 51(4), 616–620. doi:10.5465/amr.2008.33664717

36.

Cook

T. D.

Shadish

W. R.

(1994). Social experiments: Some developments over the past fifteen years. Annual Review of Psychology, 45(1), 545–580. doi:10.1146/annurev.ps.45.020194.002553

37.

Derouin

R. E.

Fritzsche

B. A.

Salas

(2005). E-learning in organizations. Journal of Management, 31(6), 920–940. doi:10.1177/0149206305279815

38.

Drugan

M. M.

Nowe

(2013). Designing multi-objective multi-armed bandits algorithms: A study. In The 2013 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). New York, NY: IEEE. doi:10.1109/IJCNN.2013.6707036

39.

Drugan

M. M.

Nowe

(2014). Scalarization based Pareto optimal set of arms identification algorithms. In 2014 International Joint Conference on Neural Networks (IJCNN) (pp. 2690–2697). New York, NY: IEEE. doi:10.1109/IJCNN.2014.6889484

40.

Cook

J. D.

Lee

J. J.

(2018). Comparing three regularization methods to avoid extreme allocation probability in response-adaptive randomization. Journal of Biopharmaceutical Statistics, 28(2), 309–319. doi:10.1080/10543406.2017.1293077

41.

Dumville

J. C.

Hahn

Miles

J. N. V

Torgerson

D. J.

(2006). The use of unequal randomisation ratios in clinical trials: A review. Contemporary Clinical Trials, 27(1), 1–12. doi:10.1016/j.cct.2005.08.003

42.

Eckerd

(2016). Experiments in purchasing and supply management research. Journal of Purchasing and Supply Management, 22(4), 258–261. doi:10.1016/J.PURSUP.2016.08.002

43.

Eden

(2017). Field experiments in organizations. Annual Review of Organizational Psychology and Organizational Behavior, 4(1), 91–122. doi:10.1146/annurev-orgpsych-041015-062400

44.

Eichfelder

(2008). Adaptive scalarization methods in multiobjective optimization (Vol. 436). Berlin, Germany: Springer.

45.

Fang

Levinthal

(2008). Near-term liability of exploitation: Exploration and exploitation in multistage problems. Organization Science, 20(3), 538–551. doi:10.1287/orsc.1080.0376

46.

Fisher

R. A.

(1925). Statistical methods for research workers. Edinburgh, Scotland: Oliver and Boyd.

47.

Fisher

R. A.

(1935). The design of experiments. London, UK: Oliver and Boyd.

48.

Garivier

Moulines

(2008). On upper-confidence bound policies for non-stationary bandit problems. Retrieved from http://arxiv.org/abs/0805.3415

49.

George

Osinga

E. C.

Lavie

Scott

B. A.

(2016). Big data and data science methods for management research. Academy of Management Journal, 59(5), 1493–1507. doi:10.5465/amj.2016.4005

50.

Gittins

J. C.

(1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41(2), 148–177. Retrieved from http://www.jstor.org/stable/2985029

51.

Gittins

J. C.

Weber

Glazebrook

K. D.

(2011). Multi-armed bandit allocation indices. New York, NY: John Wiley.

52.

Grant

A. M.

Wall

T. D.

(2009). The neglected science and art of quasi-experimentation why-to, when-to, and how-to advice for organizational researchers. Organizational Research Methods, 12(4), 653–686. doi:10.1177/1094428108320737

53.

Hauser

O. P.

Linos

Rogers

(2017). Innovation with field experiments: Studying organizational behaviors in actual organizations. Research in Organizational Behavior, 37, 185–198. doi:10.1016/J.RIOB.2017.10.004

54.

Hsu

D. K.

Simmons

S. A.

Wieland

A. M.

(2017). Designing entrepreneurship experiments. Organizational Research Methods, 20(3), 379–412. doi:10.1177/1094428116685613

55.

Hsu

(1994). Unbalanced designs to maximize statistical power in psychotherapy efficacy studies. Psychotherapy Research, 4(2), 95–106. doi:10.1080/10503309412331333932

56.

Zhu

(2015). A unified family of covariate-adjusted response-adaptive designs based on efficiency and ethics. Journal of the American Statistical Association, 110(509), 357–367. doi:10.1080/01621459.2014.903846

57.

Jiang

Jack Lee

Müller

(2013). A Bayesian decision-theoretic sequential response-adaptive randomization design. Statistics in Medicine, 32(12), 1975–1994. doi:10.1002/sim.5735

58.

Kalish

L. A.

Begg

C. B.

(1985). Treatment allocation methods in clinical trials: A review. Statistics in Medicine, 4(2), 129–144. doi:10.1002/sim.4780040204

59.

Kano

Honda

Sakamaki

Matsuura

Nakamura

Sugiyama

(2017). Good arm identification via bandit feedback. Retrieved from http://arxiv.org/abs/1710.06360

60.

Kaptein

(2015). The use of Thompson sampling to increase estimation precision. Behavior Research Methods, 47(2), 409–423. doi:10.3758/s13428-014-0480-0

61.

Kaptein

McFarland

Parvinen

(2018). Automated adaptive selling. European Journal of Marketing, 52(5/6), 1037–1059. doi:10.1108/EJM-08-2016-0485

62.

Kenny

D. A.

(1979). Correlation and causality. New York, NY: Wiley-Interscience.

63.

Kepes

Bennett

A. A.

McDaniel

M. A.

(2014). Evidence-based management and the trustworthiness of our cumulative scientific knowledge: Implications for teaching, research, and practice. Academy of Management Learning & Education, 13(3), 446–466. doi:10.5465/amle.2013.0193

64.

Keppel

(1982). Design and analysis: A researcher’s handbook (2nd ed.). Englewood Cliffs, NJ: Prentice Hall.

65.

Keyes

(2016). Slots: A multi-armed bandit library for Python. Retrieved from https://pypi.org/project/slots/

66.

King

E. B.

Hebl

M. R.

Morgan

W. B.

Ahmad

A. S.

(2013). Field experiments on sensitive organizational topics. Organizational Research Methods, 16(4), 501–521. doi:10.1177/1094428112462608

67.

Kirk

R. E.

(2013). Experimental design: Procedures for the behavioral science (4th ed.). Thousand Oaks, CA: Sage.

68.

Knemeyer

A. M.

Naylor

R. W.

(2011). Using behavioral experiments to expand our horizons and deepen our understanding of logistics and supply chain decision making. Journal of Business Logistics, 32(4), 296–302. doi:10.1111/j.0000-0000.2011.01025.x

69.

Koulouriotis

D. E.

Xanthopoulos

(2008). Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems. Applied Mathematics and Computation, 196(2), 913–922. doi:10.1016/J.AMC.2007.07.043

70.

Kraus

Meier

Niemand

(2016). Experimental methods in entrepreneurship research: The status quo. International Journal of Entrepreneurial Behavior & Research, 22(6), 958–983. doi:10.1108/IJEBR-05-2016-0135

71.

Kuleshov

Precup

(2000). Algorithms for multi-armed bandit problems. Journal of Machine Learning Research, 1, 1–48.

72.

Lai

T. L.

Robbins

(1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22. doi:10.1016/0196-8858(85)90002-8

73.

Lawler

E. E.

(1977). Adaptive experiments: An approach to organizational behavior research. Academy of Management Review, 2(4), 576–585. doi:10.5465/AMR.1977.4406735

74.

Lebreton

J. M.

Ployhart

R. E.

Ladd

R. T.

(2004). A Monte Carlo comparison of relative importance methodologies. Organizational Research Methods, 7(3), 258–282. doi:10.1177/1094428104266017

75.

Lee

Puranam

(2016). The implementation imperative: Why one should implement even imperfect strategies perfectly. Strategic Management Journal, 37(8), 1529–1546. doi:10.1002/smj.2414

76.

Lei

Ganjeizadeh

Jayachandran

P. K.

Ozcan

(2017). A statistical analysis of the effects of Scrum and Kanban on software development projects. Robotics and Computer-Integrated Manufacturing, 43, 59–67. doi:10.1016/j.rcim.2015.12.001

77.

Lin

L. A.

Sankoh

(2016). A Bayesian response-adaptive covariate-adjusted randomization design for clinical trials. Journal of Biometrics & Biostatistics, 07(02). doi:10.4172/2155-6180.1000287

78.

Lotze

Loecher

(2014, November 1). bandit: Functions for simple A/B split test and multi-armed bandit analysis. Retrieved from https://cran.r-project.org/web/packages/bandit/index.html

79.

Mann

Samson

Dow

(1998). A field experiment on the effects of benchmarking and goal setting on company sales performance. Journal of Management, 24, 73–96. Retrieved from http://journals.sagepub.com/doi/pdf/10.1177/014920639802400106

80.

Martin

S. L.

Liao

Campbell

E. M.

(2013). Directive versus empowering leadership: A field experiment comparing impacts on task proficiency and proactivity. Academy of Management Journal, 56(5), 1372–1395. doi:10.5465/amj.2011.0113

81.

May

B. C.

Korda

Lee

Leslie

D. S.

(2012). Optimistic Bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research, 13, 2069–2106. Retrieved from http://www.jmlr.org/papers/v13/may12a.html

82.

McClelland

G. H.

(1997). Optimal design in psychological research. Psychological Methods, 2(1), 3–19. doi:10.1037/1082-989X.2.1.3

83.

McInerney

R. E.

Roberts

S. J.

Rezek

(2010). Sequential Bayesian decision making for multi-armed bandit. In Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) (p. 38). Toronto, Canada.

84.

Nahum-Shani

Qian

Almirall

Pelham

W. E.

Gnagy

Fabiano

G. A.

… Murphy

S. A.

(2012). Experimental design and primary data analysis methods for comparing adaptive interventions. Psychological Methods, 17(4), 457–477. doi:10.1037/a0029372

85.

Newman

Brown

(1996). Applied ethics for program evaluation. Sage.

86.

Nowak

Sigmund

(1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364(6432), 56–58.

87.

Payne

S. C.

Youngcourt

S. S.

Beaubien

J. M.

(2007). A meta-analytic examination of the goal orientation nomological net. Journal of Applied Psychology, 92(1), 128–150.

88.

Pfeffer

Sutton

R. I.

(2006). Hard facts, dangerous half-truths, and total nonsense: Profiting from evidence-based management. Boston, MA: Harvard Business School Press.

89.

Piccorelli

A. V.

Fraker

S. A.

(2018). Balancing statistical and ethical considerations in planning clinical trials: Recommendations for response-adaptive randomization urn designs. Journal of Biopharmaceutical Statistics, 28, 1105–1118. doi:10.1080/10543406.2018.1437172

90.

Podsakoff

P. M.

Podsakoff

N. P.

(2019). Experimental designs in management and leadership research: Strengths, limitations, and recommendations for improving publishability. Leadership Quarterly, 30(1), 11–33. doi:10.1016/J.LEAQUA.2018.11.002

91.

Portman

(2017). bayesAB: Fast Bayesian methods for AB testing. Retrieved from https://cran.r-project.org/web/packages/bayesAB/

92.

Posen

H. E.

Levinthal

D. A.

(2011). Chasing a moving target: Exploitation and exploration in dynamic environments. Management Science, 58(3), 587–601. doi:10.1287/mnsc.1110.1420

93.

Press

W. H.

(2009). Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proceedings of the National Academy of Sciences, 106(52), 22387–22392. doi:10.1073/pnas.0912378106

94.

Puranam

Stieglitz

Osman

Pillutla

M. M.

(2015). Modelling bounded rationality in organizations: Progress and prospects. Academy of Management Annals, 9(1), 337–392. doi:10.1080/19416520.2015.1024498

95.

R Core Development Team. (2018, November 1). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.r-project.org/

96.

Reay

Berta

Kohn

M. K.

(2009). What’s the evidence on evidence-based management? Academy of Management Perspectives, 23(4), 5–18. doi:10.5465/AMP.2009.45590137

97.

Robbins

(1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535. doi:10.1090/S0002-9904-1952-09620-8

98.

Robbins

(1956). A sequential decision problem with a finite memory. Proceedings of the National Academy of Sciences, 42(12), 920–923. Retrieved from http://www.pnas.org/content/42/12/920.short

99.

Rosenberger

W. F.

Lachin

J. M.

(1993). The use of response-adaptive designs in clinical trials. Controlled Clinical Trials, 14(6), 471–484. doi:10.1016/0197-2456(93)90028-C

100.

Rosenberger

W. F.

Lachin

J. M.

(2016). Randomization in clinical trials: Theory and practice (2nd ed.). Hoboken, NJ: John Wiley.

101.

Roth

P. L.

Switzer

F. S.

(1995). A Monte Carlo analysis of missing data techniques in a HRM setting. Journal of Management, 21(5), 1003–1023. doi:10.1177/014920639502100511

102.

Rubin

D. B.

(2008). For objective causal inference, design trumps analysis. Annals of Applied Statistics, 2(3), 808–840. doi:10.1214/08-AOAS187

103.

Rugo

H. S.

Olopade

O. I.

DeMichele

Yau

van ‘t Veer

L. J.

Buxton

M. B.

… Esserman

L. J.

(2016). Adaptive randomization of veliparib–carboplatin treatment in breast cancer. New England Journal of Medicine, 375(1), 23–34. doi:10.1056/NEJMoa1513749

104.

Ryan

Herleman

(2016). A big data platform for workforce analytics. In Tonidandel

King

Cortina

J. M.

(Eds.), Big data at work: The data science revolution and organizational psychology (pp. 19–42). New York, NY: Routledge.

105.

Saville

B. R.

Berry

S. M.

(2017). Balanced covariates with response adaptive randomization. Pharmaceutical Statistics, 16(3), 210–217. doi:10.1002/pst.1803

106.

Scandura

T. A.

Williams

E. A.

(2000). Research methodology in management: Current practices, trends, and implications for future research. Academy of Management Journal, 43(6), 1248–1264. doi:10.2307/1556348

107.

Schwartz

E. M.

Bradlow

E. T.

Fader

P. S.

(2017). Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4), 500–522. doi:10.1287/mksc.2016.1023

108.

Scott

S. L.

(2010). A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6), 639–658. doi:10.1002/asmb.874

109.

Scott

S. L.

(2015). Multi-armed bandit experiments in the online service economy. Applied Stochastic Models in Business and Industry, 31(1), 37–45. doi:10.1002/asmb.2104

110.

Shadish

W. R.

Cook

T. D.

(2009). The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology, 60(1), 607–629. doi:10.1146/annurev.psych.60.110707.163544

111.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference (Vol. 21). Boston, MA: Houghton Mifflin. Retrieved from http://files/4738/2002-17373-000.html

112.

Staines

G. L.

(2007). Comparative outcome evaluations of psychotherapies: Guidelines for addressing eight limitations of the gold standard of causal inference. Psychotherapy: Theory, Research, Practice, Training, 44(2), 161–174. doi:10.1037/0033-3204.44.2.161

113.

Stettina

C. J.

Hörz

(2015). Agile portfolio management: An empirical perspective on the practice in use. International Journal of Project Management, 33(1), 140–152. doi:10.1016/j.ijproman.2014.03.008

114.

Stieglitz

Knudsen

Becker

M. C.

(2016). Adaptation and inertia in dynamic environments. Strategic Management Journal, 37(9), 1854–1864. doi:10.1002/smj.2433

115.

Strauss

Parker

S. K.

(2018). Intervening to enhance proactivity in organizations: Improving the present or changing the future. Journal of Management, 44(3), 1250–1278. doi:10.1177/0149206315602531

116.

Tang

Jiang

Zeng

(2015). Personalized recommendation via parameter-free contextual bandits. In SIGIR ‘15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 323–332). New York, NY: ACM. doi:10.1145/2766462.2767707

117.

Taylor

P. J.

Russ-Eft

D. F.

Chan

D. W. L.

(2005). A meta-analytic review of behavior modeling training. Journal of Applied Psychology, 90(4), 692–709.

118.

Thall

P. F.

Wathen

J. K.

(2007). Practical Bayesian adaptive randomisation in clinical trials. European Journal of Cancer, 43(5), 859–866. doi:10.1016/j.ejca.2007.01.006

119.

Thompson

W. R.

(1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.

120.

Torgerson

Campbell

(1997). Unequal randomisation can improve the economic efficiency of clinical trials. Journal of Health Services Research, 2(2), 81–85. doi:10.1177/135581969700200205

121.

Vermorel

Mohri

(2005). Multi-armed bandit algorithms and empirical evaluation. In Gama

Camacho

Brazdil

P. B.

Jorge

A. M.

Torgo

(Eds.), Machine learning: ECML 2005 (pp. 437–448). Berlin, Germany: Springer.

122.

Villar

S. S.

Bowden

Wason

(2015). Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statistical Science, 30(2), 199–215. doi:10.1214/14-STS504

123.

Villar

S. S.

Rosenberger

W. F.

(2018). Covariate-adjusted response-adaptive randomization for multi-arm clinical trials using a modified forward looking Gittins index rule. Biometrics, 74(1), 49–57. doi:10.1111/biom.12738

124.

Villar

S. S.

Wason

Bowden

(2015). Response-adaptive randomization for multi-arm clinical trials using the forward looking Gittins index rule. Biometrics, 71(4), 969–978. doi:10.1111/biom.12337

125.

Wason

J. M. S.

Trippa

(2014). A comparison of Bayesian adaptive randomization and multi-stage designs for multi-arm clinical trials. Statistics in Medicine, 33(13), 2206–2221. doi:10.1002/sim.6086

126.

Welch

B. L.

(1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3/4), 350–362. doi:10.2307/2332010

127.

Williamson

Jacko

Villar

S. S.

Jaki

(2017). A Bayesian adaptive design for clinical trials in rare diseases. Computational Statistics & Data Analysis, 113, 136–153. doi:10.1016/j.csda.2016.09.006

128.

Yahyaa

S. Q.

Drugan

M. M.

Manderick

(2015). Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit. In ICAART (2nd ed., pp. 55–65). doi:10.5220/0005184400550065

129.

Yahyaa

S. Q.

Manderick

(2015). Thompson sampling for multi-objective multi-armed bandits problem. In Proceedings: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 47–52). Louvain-la-Neuve, Belgium: Ciaco.

130.

Yogeswaran

Ponnambalam

S. G.

(2012). Reinforcement learning: Exploration–exploitation dilemma in multi-agent foraging task. OPSEARCH, 49(3), 223–236. doi:10.1007/s12597-012-0077-2