Sage Journals: Discover world-class research

Abstract

Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system’s task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.

Keywords

reinforcement learning decision making cognitive control open data open materials

Get full access to this article

View all access options for this article.

References

Akam

Costa

Dayan

(2015). Simple plans or sophisticated habits? State, transition and learning interactions in the two-step task. PLoS Computational Biology, 11(12), Article e1004648. doi:10.1371/journal.pcbi.1004648

Botvinick

M. M.

(2007). Conflict monitoring and decision making: Reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7, 356–366.

Botvinick

M. M.

Braver

(2015). Motivation and cognitive control: From behavior to neural mechanism. Annual Review of Psychology, 66, 83–113.

Brehm

J. W.

Wright

R. A.

Solomon

Silka

Greenberg

(1983). Perceived difficulty, energization, and the magnitude of goal valence. Journal of Experimental Social Psychology, 19, 21–48.

Daw

N. D.

Gershman

S. J.

Seymour

Dayan

Dolan

R. J.

(2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69, 1204–1215.

Daw

N. D.

Niv

Dayan

(2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711.

Decker

J. H.

Otto

A. R.

Daw

N. D.

Hartley

C. A.

(2016). From creatures of habit to goal-directed learners: Tracking the developmental emergence of model-based reinforcement learning. Psychological Science, 27, 848–858.

Dickinson

(1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society B: Biological Sciences, 308, 67–78.

Dolan

R. J.

Dayan

(2013). Goals and habits in the brain. Neuron, 80, 312–325.

10.

Doll

B. B.

Duncan

K. D.

Simon

D. A.

Shohamy

Daw

N. D.

(2015). Model-based choices involve prospective neural activity. Nature Neuroscience, 18, 767–772.

11.

Gershman

S. J.

(2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6.

12.

Gershman

S. J.

Horvitz

E. J.

Tenenbaum

J. B.

(2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349, 273–278.

13.

Gershman

S. J.

Markman

A. B.

Otto

A. R.

(2014). Retrospective revaluation in sequential decision making: A tale of two systems. Journal of Experimental Psychology: General, 143, 182–194.

14.

Gillan

C. M.

Kosinski

Whelan

Phelps

E. A.

Daw

N. D.

(2016). Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife, 5, Article e11305. doi:10.7554/eLife.11305.001

15.

Gläscher

Daw

Dayan

O’Doherty

J. P.

(2010). States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66, 585–595.

16.

Griffiths

T. L.

Lieder

Goodman

N. D.

(2015). Rational use of cognitive resources: Levels of analysis between the computational and the algorithmic. Topics in Cognitive Science, 7, 217–229.

17.

Kahneman

(2003). A perspective on judgment and choice: Mapping bounded rationality. American Psychologist, 58, 697–720.

18.

Keramati

Dezfouli

Piray

(2011). Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Computational Biology, 7(5), Article e1002055. doi:10.1371/journal.pcbi.1002055

19.

Kool

Botvinick

(2014). A labor/leisure tradeoff in cognitive control. Journal of Experimental Psychology: General, 143, 131–141.

20.

Kool

Cushman

F. A.

Gershman

S. J.

(2016). When does model-based control pay off? PLoS Computational Bi-ology, 12(8), Article e1005090. doi:10.1371/journal.pcbi.1005090

21.

Kool

McGuire

J. T.

Rosen

Z. B.

Botvinick

M. M.

(2010). Decision making and the avoidance of cognitive demand. Journal of Experimental Psychology: General, 139, 665–682.

22.

Kool

Shenhav

Botvinick

(2017). Cognitive control as cost-benefit decision making. In Egner

(Ed.), Wiley handbook of cognitive control (pp. 167–189). Chichester, England: John Wiley & Sons.

23.

Kurzban

Duckworth

A. L.

Kable

J. W.

Myers

(2013). An opportunity cost model of subjective effort and task performance. Behavioral & Brain Sciences, 36, 661–726.

24.

Lee

S. W.

Shimojo

O’Doherty

J. P.

(2014). Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81, 687–699.

25.

Otto

A. R.

Gershman

S. J.

Markman

A. B.

Daw

N. D.

(2013). The curse of planning: Dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological Science, 24, 751–761.

26.

Payne

J. W.

Bettman

J. R.

Johnson

E. J.

(1988). Adaptive strategy selection in decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 534–552.

27.

Pezzulo

Rigoli

Chersi

(2013). The mixed instrumental controller: Using value of information to combine habitual choice and mental simulation. Frontiers in Psychology, 4, Article 92. doi:10.3389/fpsyg.2013.00092

28.

Rieskamp

Otto

P. E.

(2006). SSL: A theory of how people learn to select strategies. Journal of Experimental Psychology: General, 135, 207–236.

29.

Shenhav

Botvinick

M. M.

Cohen

J. D.

(2013). The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron, 79, 217–240.

30.

Sloman

S. A.

(1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3–22.

31.

Smittenaar

FitzGerald

T. H. B.

Romei

Wright

N. D.

Dolan

R. J.

(2013). Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans. Neuron, 80, 914–919.

32.

Sutton

R. S.

Barto

A. G.

(1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

33.

Thorndike

E. L.

(1911). Animal intelligence: Experimental studies. New York, NY: Macmillan.

34.

Westbrook

Kester

Braver

T. S.

(2013). What is the subjective cost of cognitive effort? Load, trait, and aging effects revealed by economic preference. PLoS ONE, 8(7), Article e68210. doi:10.1371/journal.pone.0068210

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.09 MB

0.00 MB

Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems

Abstract

Keywords

Get full access to this article

References

Supplementary Material