Sage Journals: Discover world-class research

Abstract

Reusing knowledge obtained in other related but different tasks to accelerate the learning procedure of reinforcement learning (RL) has attracted more and more attention and expert knowledge transfer is the root cause of positive effect. Nevertheless, compared with acquiring knowledge by RL training in source tasks, this paper proposes to transfer knowledge contained in human-demonstrations of source tasks. Based on this, three specific forms of knowledge in total are mined from demonstration trajectories to be reused in the target task to shape RL and all of them are closely associated with the similarity between states of different tasks which can be measured by Euclidean distance via human-supplied inter-task mappings. In more detail, the similarity between the target state and the most similar state in source samples, the proportion of different actions among the k-NN of the target state in source samples and the proportion of different actions under a constant similarity with the target state in source samples are respectively selected to initialize the value of state-action function. Simulation experiments of mountain car problems with different difficulties and different dimensions suggest that all the three shaping methods could obviously speed up RL. In comparison, it can also be found that the two latter methods are more robust and efficient to the quality of human demonstrations as it takes more source samples’ information into consideration.

Keywords

Transfer human-demonstrations shaping reinforcement learning

Get full access to this article

View all access options for this article.

References

, Zuo

and Huang

, Reinforcement learning algorithms with function approximation: Recent advances and applications[J], Information Sciences 261(5) (2014), 1–31.

Kober

and Peters

, Reinforcement learning in robotics: A survey[J], International Journal of Robotics Research[J] 32(11) (2013), 1238–1274.

Silver

, Huang

, Maddison

C.J.

, et al., Mastering the game of Go with deep neural networks and tree search[J], Nature 529(7587) (2016), 484–489.

Sutton

R.S.

, Precup

and Singh

, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning[J], Artificial Intelligence 112(1-2) (1999), 181–211.

Botvinick

M.M.

, Hierarchical reinforcement learning and decision making[J], Current Opinion in Neurobiology 22(6) (2012), 956–962.

Grondman

, Busoniu

, Lopes

G.A.D.

, et al., A survey of actor-critic reinforcement learning: Standard and natural policy gradients[J], IEEE Transactions on Systems Man & Cybernetics Part C 42(6) (2012), 1291–1307.

Wawrzyński

and Tanwani

A.K.

, Autonomous reinforcement learning with experience replay[J], Neural Networks the Official Journal of the International Neural Network Society 41(5) (2012), 156–167.

Kamal

M.A.S.

and Murata

, Reinforcement learning for problems with symmetrical restricted states[J], Robotics & Autonomous Systems 56(9) (2008), 717–727.

Knox

W.B.

and Stone

, Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance[J], Artificial Intelligence 225 (2015), 24–50.

10.

Pan

S.J.

and Yang

, A Survey on Transfer Learning[J], IEEE Transactions on Knowledge & Data Engineering 22(10) (2010), 1345–1359.

11.

Lazaric

, Restelli

and Bonarini

, Transfer of samples in batch reinforcement learning[C], International Conference DBLP, 2008, pp. 544–551.

12.

Taylor

M.E.

and Stone

, Transfer learning for reinforcement learning domains: A survey[J], Journal of Machine Learning Research 10(10) (2009), 1633–1685.

13.

Konidaris

, Scheidwasser

and Barto

A.G.

, Transfer in reinforcement learning via shared features[J], Journal of Machine Learning Research 13(1) (2012), 1333–1371.

14.

Torrey

, Shavlik

J.W.

, Walker

, et al., Skill Acquisition Via Transfer Learning and Advice Taking[C], European Conference on Machine Learning, Berlin, Germany, 2006, pp. 425–436.

15.

Brys

, Harutyunyan

, Suay

H.B.

, et al., Reinforcement learning from demonstration through shaping[C], International Conference on Artificial Intelligence, AAAI Press, 2015, pp. 3352–3358.

16.

Fernández

and Veloso

, Probabilistic policy reuse in a reinforcement learning agent[C] , International Joint Conference on Autonomous Agents and Multiagent Systems, 2006, pp. 720–727.

17.

García

and Fernández

, A comprehensive survey on safe reinforcement learning[J], Journal of Machine Learning Research 16(1) (2015), 1437–1480.

18.

Theodorou

, Buchli

, Schaal

, et al., A generalized path integral control approach to reinforcement learning[J], Journal of Machine Learning Research (2010), 3137–3181.

19.

Taylor

M.E.

, Suay

H.B.

and Chernova

, Integrating reinforcement learning with human demonstrations of varying ability[C], International Conference on Autonomous Agents and Multiagent Systems DBLP, 2011, pp. 617–624.

20.

Wang

G.F.

, Fang

, Li

, et al., Shaping in reinforcement learning via knowledge transferred from human-demonstrations [C], Chinese Control Conference, 2015, pp. 3033–3038.

21.

Wang

G.F.

, Fang

, Li

, et al., Transferring knowledge from human-demonstration trajectories to reinforcement learning[J], Transactions of the Institute of Measurement & Control, 2016.

22.

Peters

and Schaal

, Natural actor-critic[J], Neurocomputing 71(7/9) (2005), 1180–1190.

23.

A.Y.

, Harada

and Russell

S.J.

, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping[C], Sixteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc, 1999, pp. 278–287.

24.

Wiewiora

, Potential-based shaping and Q-value initialization are equivalent[J], Journal of Artificial Intelligence Research 19(1) (2011), 205–208.

25.

Argall

B.D.

, Chernova

, Veloso

, et al., A survey of robot learning from demonstration[J], Robotics & Autonomous Systems 57(5) (2009), 469–483.

26.

Abbeel

, Apprenticeship learning and reinforcement learning with application to robotic control[M], Stanford University, 2008.

27.

Fachantidis

, Partalas

, Tsoumakas

, et al., Transferring task models in reinforcement learning agents[J], Neurocomputing 107 (2013), 23–32.

Shaping in reinforcement learning by knowledge transferred from human-demonstrations of a simple similar task

Abstract

Keywords

Get full access to this article

References