Sage Journals: Discover world-class research

Abstract

General value functions (GVFs) in the reinforcement learning (RL) literature are long-term predictive summaries of the outcomes of agents following specific policies in the environment. Affordances as perceived action possibilities with specific valence may be cast into predicted policy-relative goodness and modeled as GVFs. A systematic explication of this connection shows that GVFs and especially their deep-learning embodiments (1) realize affordance prediction as a form of direct perception, (2) illuminate the fundamental connection between action and perception in affordance, and (3) offer a scalable way to learn affordances using RL methods. Through an extensive review of existing literature on GVF applications and representative affordance research in robotics, we demonstrate that GVFs provide the right framework for learning affordances in real-world applications. In addition, we highlight a few new avenues of research opened up by the perspective of “affordance as GVF,” including using GVFs for orchestrating complex behaviors.

Keywords

Affordance direct perception general value function robotics predictive learning reinforcement learning

Get full access to this article

View all access options for this article.

References

Abbeel

Coates

Quigley

A. Y.

(2006). An application of reinforcement learning to aerobatic helicopter flight. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06 (pp. 1–8). MIT Press.

Abbeel

A. Y.

(2004). Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the Twenty-First International Conference on Machine learning (p. 1). ACM.

Abeyruwan

Seekircher

Visser

(2014). Off-policy general value functions to represent dynamic role assignments in RoboCup 3D soccer simulation (abs/1402.4525). CoRR. http://arxiv.org/abs/1402.4525

Achiam

Edwards

Amodei

Abbeel

(2018). Variational option discovery algorithms (abs/1807.10299). CoRR. https://arxiv.org/abs/1807.10299

Berger

S. E.

Theuring

Adolph

K. E.

(2007). How and when infants learn to climb stairs. Infant Behavior and Development, 30(1), 36–49. https://doi.org/10.1016/j.infbeh.2006.11.002

Cesari

Formenti

Olivato

(2003). A common perceptual parameter for stair climbing for children, young and old adults. Human Movement Science, 22, 111–124. https://doi.org/10.1016/S0167-9457(03)00003-4

Chemero

(2003). An outline of a theory of affordances. Ecological Psychology, 15(2), 181–195.

Chemero

Turvey

(2007). Gibsonian affordances for roboticists. Adaptive Behavior, 15, 473–480. https://doi.org/10.1177/1059712307085098

Clark

(2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Science, 36(3), 181–204. https://doi.org/10.1017/S0140525X12000477

10.

Corriou

J. P.

(2004). Model predictive control. In Corriou

J. P.

(Ed.), Process control (pp. 575–615). Springer. https://doi.org/10.1007/978-1-4471-3848-8_16

11.

Cruz

Magg

Weber

Wermter

(2016). Training agents with interactive reinforcement learning and contextual affordances. IEEE Transactions on Cognitive and Developmental Systems, 8, 271–284.

12.

Dalrymple

A. N.

Roszko

D. A.

Sutton

R. S.

Mushahwar

V. K.

(2020). Pavlovian control of intraspinal microstimulation to produce over-ground walking. Journal of Neural Engineering, 17(3), Article 036002.

13.

Deisenroth

M. P.

Rasmussen

C. E.

(2011). Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11 (pp. 465–472). Omnipress.

14.

Duchon

A. P.

Kaelbling

L. P.

Warren

W. H.

(1998). Ecological robotics. Adaptive Behavior, 6(3–4), 473–507.

15.

Edwards

A. L.

Dawson

M. R.

Hebert

J. S.

Sherstan

Sutton

R. S.

Chan

K. M.

Pilarski

P. M.

(2016). Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching. Prosthetics and Orthotics International, 40(5), 573–581.

16.

Epstein

(2013). The sports gene: Inside the science of extraordinary athletic performance. Current. https://books.google.ca/books?id=sHLLkQEACAAJ

17.

Eysenbach

Gupta

Ibarz

Levine

(2018). Diversity is all you need: Learning skills without a reward function (abs/1802.06070). CoRR. https://arxiv.org/abs/1802.06070

18.

Fajen

Riley

Turvey

(2008). Information, affordances, and the control of action in sport. International Journal of Sport Psychology, 40, 79–107.

19.

Ghiassian

Patterson

Garg

Gupta

White

(2020). Gradient temporal-difference learning with regularized corrections (abs/2007.00611). CoRR. https://arxiv.org/abs/2007.00611

20.

Gibson

Pick

(2000). An ecological approach to perceptual learning and development. Oxford University Press.

21.

Gibson

J. J.

(1950). The perception of the visual world. Houghton Mifflin.

22.

Gibson

J. J.

(1966). The senses considered as perceptual systems. Allen and Unwin.

23.

Gibson

J. J.

(1967). New reasons for realism. Synthese, 17(1), 162–172. https://doi.org/10.1007/BF00485025

24.

Gibson

J. J.

(1979). The ecological approach to visual perception. Houghton Mifflin.

25.

Graves

Jin

Luo

(2020a). Lispr: An options framework for policy reuse with reinforcement learning.

26.

Graves

Nguyen

N. M.

Hassanzadeh

Jin

(2020b). Learning predictive representations in autonomous driving to improve deep reinforcement learning. CoRR. https://arxiv.org/abs/2006.15110

27.

Graves

Rezaee

Scheideman

(2019, November). Perception as prediction using general value functions in autonomous driving applications [Paper presentation]. 2019 IEEE International Conference on Intelligent Robots and Systems, Macau, China.

28.

Gray

(2002). “markov at the bat”: A model of cognitive processing in baseball batters. Psychological Science, 13, 542–547.

29.

Gregor

Rezende

D. J.

Wierstra

(2016). Variational intrinsic control (abs/1611.07507). CoRR. https://arxiv.org/abs/1611.07507

30.

Günther

(2018). Machine intelligence for adaptable closed loop and open loop production engineering systems [PhD thesis]. Technische Universität München.

31.

Günther

Ady

N. M.

Kearney

Dawson

M. R.

Pilarski

P. M.

(2020). Examining the use of temporal-difference incremental delta-bar-delta for real-world predictive knowledge architectures. Frontiers in Robotics and AI, 7, Article 34.

32.

Günther

Kearney

Dawson

M. R.

Sherstan

Pilarski

P. M.

(2018). Predictions, surprise, and predictions of surprise in general value function architectures. In AAAI 2018 Fall Symposium on Reasoning and Learning in Real-World Systems for Long-Term Autonomy (pp. 22–29). AAAI.

33.

Günther

Pilarski

P. M.

Helfrich

Shen

Diepold

(2016). Intelligent laser welding through representation, prediction, and control learning: An architecture with deep neural networks and reinforcement learning. Mechatronics, 34, 1–11. https://doi.org/10.1016/j.mechatronics.2015.09.004

34.

Harutyunyan

Dabney

Borsa

Heess

Munos

Precup

(2019). The termination critic (abs/1902.09996). CoRR. http://arxiv.org/abs/1902.09996

35.

Hassanin

Khan

Tahtali

(2018). Visual affordance and function understanding: A survey (abs/1807.06775). CoRR. http://arxiv.org/abs/1807.06775

36.

Jamone

Ugur

Cangelosi

Fadiga

Bernardino

Piater

Santos-Victor

(2018). Affordances in psychology, neuroscience, and robotics: A survey. IEEE Transactions on Cognitive and Developmental Systems, 10(1), 4–25.

37.

Jensen

T. W.

Pedersen

S. B.

(2016). Affect and affordances: The role of action and emotion in social interaction. Cognitive Semiotics, 9(1), 79–103.

38.

Jin

Nguyen

N. M.

Sakib

Graves

Yao

Jagersand

(2020). Mapless navigation among dynamics with social-safety-awareness: A reinforcement learning approach from 2D laser scans. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (pp. 6979–6985). IEEE. https://doi.org/10.1109/ICRA40945.2020.9197148

39.

Khetarpal

Ahmed

Comanici

Abel

Precup

(2020, July). What can I do here? A theory of affordances in reinforcement learning [Conference session]. Proceedings of the 37th International Conference on Machine Learning, ICML2020, Vienna, Austria.

40.

Konczak

Meeuwsen

Cress

M. E.

(1992). Changing affordances in stair climbing: The perception of maximum climbability in young and older adults. Journal of Experimental Psychology. Human Perception and Performance, 18, 691–697. https://doi.org/10.1037//0096-1523.18.3.691

41.

Lee

(1976). A theory of visual control of braking based on information about time-to-collision. Perception, 5, 437–459. https://doi.org/10.1068/p050437

42.

Littman

M. L.

Sutton

R. S.

(2002). Predictive representations of state. In Dietterich

T. G.

Becker

Ghahramani

(Eds.), Advances in neural information processing systems 14 (pp. 1555–1561). MIT Press. http://papers.nips.cc/paper/1983-predictive-representations-of-state.pdf

43.

Manuelli

Gao

Florence

Tedrake

(2019). kPAM: Keypoint affordances for category-level robotic manipulation. https://arxiv.org/abs/1903.06684

44.

Mark

L. S.

(1987). Eyeheight-scaled information about affordances: A study of sitting and stair climbing. Journal of Experimental Psychology. Human Perception and Performance, 13, 3361–3370.

45.

McDowell

(1994). The content of perceptual experience. Philosopical Quarterly, 44(175), 190–205. https://doi.org/10.2307/2219740

46.

Mnih

Kavukcuoglu

Silver

Graves

Antonoglou

Wierstra

Riedmiller

(2013). Playing Atari with deep reinforcement learning (abs/1312.5602). CoRR. http://arxiv.org/abs/1312.5602

47.

Modayil

White

Sutton

R. S.

(2012). Multi-timescale nexting in a reinforcement learning robot. In Ziemke

Balkenius

Hallam

(eds.), From animals to animats 12 (pp. 299–309). Springer.

48.

Muller

Fadde

(2015). The relationship between visual anticipation and baseball batting game statistics. Journal of Applied Sport Psychology, 28, 49–61. https://doi.org/10.1080/10413200.2015.1058867

49.

A. Y.

Russell

S. J.

(2000). Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00 (pp. 663–670). Morgan Kaufmann Publishers Inc.

50.

Nöe

(2004). Action in perception. MIT Press.

51.

Norman

(1988). The design of everyday things. Basic Books.

52.

OpenAI Akkaya

Andrychowicz

Chociej

Litwin

McGrew

Petron

Paino

Plappert

Powell

Ribas

Schneider

Tezak

Tworek

Welinder

Weng

Yuan

Zaremba

Zhang

(2019). Solving Rubik’s cube with a robot hand. https://arxiv.org/abs/1910.07113

53.

Paletta

Fritz

Kintzler

Irran

Dorffner

(2007). Perception and developmental learning of affordances in autonomous robots. In Hertzberg

Beetz

Englert

(Eds.), KI 2007: Advances in artificial intelligence (pp. 235–250). Springer.

54.

Pilarski

P. M.

Dawson

M. R.

Degris

Carey

J. P.

Chan

K. M.

Hebert

J. S.

Sutton

R. S.

(2013). Adaptive artificial limbs: A real-time approach to prediction and anticipation. IEEE Robotics Automation Magazine, 20(1), 53–64. https://doi.org/10.1109/MRA.2012.2229948

55.

Rafiee

Ghiassian

White

Sutton

R. S.

(2019). Prediction in intelligence: An empirical comparison of off-policy algorithms on robots. In Elkind

Veloso

Agmon

Taylor

M. E.

(Eds.), Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13–17, 2019 (pp. 332–334). International Foundation for Autonomous Agents and MultiAgent Systems. http://dl.acm.org/citation.cfm?id=3331711.

56.

Rafols

E. J.

Ring

M. B.

Sutton

R. S.

Tanner

(2005). Using predictive representations to improve generalization in reinforcement learning. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05 (pp. 835–840). Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=1642293.1642427

57.

Şahin

Çakmak

Doğar

M. R.

Uğur

Üçoluk

(2007). To afford or not to afford: A new formalization of affordances toward affordance-based robot control. Adaptive Behavior, 15(4), 447–472. https://doi.org/10.1177/1059712307084689

58.

Schaul

Ring

(2013). Better generalization with forecasts. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI ’13 (pp. 1656–1662). AAAI Press. http://dl.acm.org/citation.cfm?id=2540128.2540366

59.

Schlegel

Chung

Graves

Qian

White

(2019). Importance resampling for off-policy prediction (abs/1906.04328). CoRR. http://arxiv.org/abs/1906.04328

60.

Schlegel

White

Patterson

White

(2018). General value function networks (abs/1807.06763). CoRR. http://arxiv.org/abs/1807.06763

61.

Sherstan

Pilarski

(2014, September). Multilayer general value functions for robotic prediction and control [Conference session]. 2014 IROS Workshop on AI and Robotics, Chicago, IL, United States. https://doi.org/10.13140/2.1.1545.6006

62.

Silver

Huang

Maddison

C. J.

Guez

Sifre

van den Driessche

Schrittwieser

Antonoglou

Panneershelvam

Lanctot

Dieleman

Grewe

Nham

Kalchbrenner

Sutskever

Lillicrap

Leach

Kavukcuoglu

Graepel

Hassabis

(2016). Mastering the game of go with deep neural networks and tree search. Nature, 529, 484–503. http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html

63.

Silver

Hubert

Schrittwieser

Antonoglou

Lai

Guez

Lanctot

Sifre

Kumaran

Graepel

Lillicrap

Simonyan

Hassabis

(2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144. https://science.sciencemag.org/content/362/6419/1140

64.

Smith

B. C.

(1996). On the origin of objects. MIT Press.

65.

Smith

B. C.

(2019). The promise of artificial intelligence: Reckoning and judgment. MIT Press. https://books.google.ca/books?id=iemvDwAAQBAJ

66.

Stadler

(2007). The psychology of baseball: Inside the mental game of the major league player. Penguin Publishing Group. https://books.google.ca/books?id=nfpb__TRjgcC

67.

Sutton

R. S.

(1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. https://doi.org/10.1023/A:1022633531479

68.

Sutton

R. S.

(2019). The bitter lesson. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

69.

Sutton

R. S.

Barto

A. G.

(1998). Introduction to reinforcement learning. MIT Press.

70.

Sutton

R. S.

Barto

A. G.

(2018). Reinforcement learning: An introduction. MIT Press.

71.

Sutton

R. S.

Modayil

Delp

Degris

Pilarski

P. M.

White

Precup

(2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and MultiAgent Systems—Volume 2, AAMAS ’11 (pp. 761–768). International Foundation for Autonomous Agents and MultiAgent Systems. http://dl.acm.org/citation.cfm?id=2031678.2031726

72.

Sutton

R. S.

Precup

Singh

(1999). Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1), 181–211. https://doi.org/10.1016/S0004-3702(99)00052-1

73.

Sutton

R. S.

Szepesvári

Geramifard

Bowling

(2008). Dyna-style planning with linear function approximation and prioritized sweeping. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI.’08 (pp. 528–536). AUAI Press.

74.

Tesauro

(1995). Temporal difference learning and TD-gammon. Communication ACM, 38(3), 58–68. https://doi.org/10.1145/203330.203343

75.

Thill

Caligiore

Borghi

Ziemke

Baldassarre

(2013). Theories and computational models of affordance and mirror systems: An integrative review. Neuroscience and Biobehavioral Reviews, 37. https://doi.org/10.1016/j.neubiorev.2013.01.012

76.

Thomas

Bengio

Fedus

Pondard

Beaudoin

Larochelle

Pineau

Precup

Bengio

(2018). Disentangling the independently controllable factors of variation by interacting with the world (abs/1802.09484). CoRR. http://arxiv.org/abs/1802.09484

77.

Thomas

Pondard

Bengio

Sarfati

Beaudoin

Meurs

Pineau

Precup

Bengio

(2017). Independently controllable factors (abs/1708.01289). CoRR. http://arxiv.org/abs/1708.01289

78.

Toussaint

(2004). Learning a world model and planning with a self-organizing, dynamic neural system. In Thrun

Saul

L. K.

Schölkopf

(Eds.), Advances in neural information processing systems 16 (pp. 926–936). MIT Press.

79.

Turvey

(1992). Affordances and prospective control: An outline of the ontology. Ecological Psychology, 4(3), 173–187. https://doi.org/10.1207/s15326969eco0403\_3

80.

Ugur

Dogar

M. R.

Cakmak

Sahin

(2007). The learning and use of traversability affordance using range images on a mobile robot. In Proceedings of 2007 IEEE International Conference on Robotics and Automation (pp. 1721–1726). IEEE. https://doi.org/10.1109/ROBOT.2007.363571

81.

Ugur

Sahin

Oztop

(2009). Predicting future object states using learned affordances. In 2009 24th International Symposium on Computer and Information Sciences (pp. 415–419). IEEE.

82.

Urmson

Anhalt

Bae

Bagnell

J. A. D.

Baker

C. R.

Bittner

R. E.

Brown

Clark

M. N.

Darms

Demitrish

Dolan

J. M.

Duggins

Ferguson

Galatali

Geyer

C. M.

Gittleman

Harbaugh

Hebert

Howard

. . . Ziglar

(2008). Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8): 425–466.

83.

Vinyals

Babuschkin

Czarnecki

Mathieu

Dudzik

Chung

Choi

Powell

Ewalds

Georgiev

Horgan

Kroiss

Danihelka

Huang

Sifre

Cai

Agapiou

Jaderberg

Silver

(2019). Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature, 575, 350–354. https://doi.org/10.1038/s41586-019-1724-z

84.

Wang

Bao

Clavera

Hoang

Wen

Langlois

Zhang

Abbeel

(2019). Benchmarking model-based reinforcement learning (abs/1907.02057). CoRR. http://arxiv.org/abs/1907.02057

85.

Warren

(1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10, 683–703. https://doi.org/10.1037/0096-1523.10.5.683

86.

White

(2015). Developing a predictive approach to knowledge [PhD thesis]. University of Alberta.

87.

White

(2017, August). Unifying task specification in reinforcement learning [Conference session]. Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia.

88.

Sun

Zeng

Song

Lee

Rusinkiewicz

Funkhouser

(2020, July). Spatial action maps for mobile manipulation [Conference session]. Proceedings of Robotics: Science and Systems (RSS), Corvalis, OR, United States. https://doi.org/10.15607/RSS.2020.XVI.035

89.

Yamanobe

Wan

Ramirez-Alpizar

I. G.

Petit

Tsuji

Akizuki

Hashimoto

Nagata

Harada

(2017). A brief review of affordance in robotic manipulation research. Advanced Robotics, 31, 1086–1101.

90.

Yang

Hao

Meng

Zhang

Chen

Fan

Wang

Liu

Wang

Peng

(2020). Efficient deep reinforcement learning via adaptive policy transfer. In Bessiere

(Ed.), Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 (pp. 3094–3100). International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2020/428.

91.

Yarrow

Brown

Krakauer

(2009). Inside the brain of an elite athlete: The neural processes that support high achievement in sports. Nature Reviews Neuroscience, 10, 585–596.

92.

Zech

Haller

Lakani

S. R.

Ridge

Ugur

Piater

(2017). Computational models of affordance in robotics: A taxonomy and systematic classification. Adaptive Behavior, 25(5), 235–271.

93.

Zeng

Song

K. T.

Donlon

Hogan

F. R.

Bauza

Taylor

Liu

Romo

Fazeli

Alet

Dafle

N. C.

Holladay

Morona

Nair

P. Q.

Green

Taylor

Liu

. . . Rodriguez

(2018). Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. IEEE International Conference on Robotics and Automation (ICRA), pp. 3750–3757, doi: 10.1109/ICRA.2018.8461044.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.20 MB

Affordance as general value function: a computational model

Abstract

Keywords

Get full access to this article

References

Supplementary Material