Sage Journals: Discover world-class research

Abstract

Tetris has been an important field for research in deep reinforcement learning (DRL). However, most studies about Tetris are focused on simulation validation, and a few attempts are conducted in the real-world environment. In this paper, the DRL algorithms are trained in the constructed Tetris simulation environment, after that they are deployed into the real-world Tetris experiments. The dynamic timesteps method is integrated into the proximal policy optimization (PPO) method to accelerate its training speed, which reaches the goal of the game within 1483 episodes. With the help of multiple recognition and segmented moving techniques, the robotic arm provides accurate and robust performance to play real-world Tetris. The effectiveness of the developed system is experimentally verified; the experimental results show that the proposed algorithm achieved superior performance compared with conventional method and Deep Q-Network (DQN) in real-world Tetris environments.

Keywords

Deep reinforcement learning proximal policy optimization robotic manipulation

Get full access to this article

View all access options for this article.

References

Algorta

Şimşek

(2019) The game of Tetris in machine learning. arXiv preprint arXiv: 1905.01652.

Baker

(1985) A new proof for the first-fit decreasing bin-packing algorithm. Journal of Algorithms 6(1): 49–70.

Bertsekas

Tsitsiklis

(1995) Neuro-dynamic programming: An overview. In: Proceedings of 1995 34th IEEE conference on decision and control, vol. 1, New Orleans, LA, 13–15 December, pp. 560–564. New York: IEEE.

Böhm

Kókai

Mandl

(2005) An evolutionary approach to tetris. In: The sixth metaheuristics international conference (MIC2005), p. 5. Citeseer. Available at: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.68.9918&rep=rep1&type=pdf#:∼:text=To%20determine%20the%20best%20tetris,finding%20a%20good%20rating%20function.

Bruck

Goodman

(1988) A generalized convergence theorem for neural networks. IEEE Transactions on Information Theory 34(5): 1089–1092.

Cheridito

Jentzen

Riekert

, et al. (2022) A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions. Journal of Complexity 72: 101646.

Csirik

Woeginger

(1997) Shelf algorithms for on-line strip packing. Information Processing Letters 63(4): 171–175.

Dósa

Sgall

(2014) Optimal analysis of best fit bin packing. In: Esparza

Fraigniaud

Husfeldt

, et al. (eds) International Colloquium on Automata, Languages, and Programming. Berlin: Springer, pp. 429–441.

Fan

Wang

Xie

, et al. (2020) A theoretical analysis of deep Q-learning. In: Learning for dynamics and control, Online, 11–12 June, pp. 486–489. New York, NY: PMLR.

10.

Hafner

Lillicrap

, et al. (2019) Dream to control: Learning behaviors by latent imagination. In: International conference on learning representations, New Orleans, 6–9 May.

11.

Honerkamp

Welschehold

Valada

(2021) Learning kinematic feasibility for mobile manipulation through deep reinforcement learning. IEEE Robotics and Automation Letters 6(4): 6289–6296.

12.

Hopper

Turton

(1999) A genetic algorithm for a 2D industrial packing problem. Computers & Industrial Engineering 37(1–2): 375–378.

13.

Zhang

Yan

, et al. (2017) Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv preprint arXiv:1708.05930.

14.

Huang

Ontañón

(2020) A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006. 14171.

15.

Jentzen

Riekert

(2021) A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLu activation for constant target functions. arXiv preprint arXiv:2104.00277.

16.

Kakade

(2001) A natural policy gradient. In: Dietterich

Becker

Ghahramani

(eds) Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, pp.1531–1538.

17.

Kar

Ghosh

Konar

, et al. (2021) EEG-induced autonomous game-teaching to a robot arm by human trainers using reinforcement learning. IEEE Transactions on Games. Epub ahead of print 2 November. DOI: 10.1109/TG.2021.3124340.

18.

Khodamipour

Khorashadizadeh

Farshad

(2021) Observer-based adaptive control of robot manipulators using reinforcement learning and the Fourier series expansion. Transactions of the Institute of Measurement and Control 43(10): 2307–2320.

19.

Kundu

Dutta

Kumar

(2019) Deep-pack: A vision-based 2D online bin packing algorithm with deep reinforcement learning. In: 2019 28th IEEE international conference on robot and human interactive communication (RO-MAN), New Delhi, India, 14–18 October, pp. 1–7. New York: IEEE.

20.

Lagoudakis

Parr

Littman

(2002) Least-squares methods in reinforcement learning for control. In: Vlahavas

Spyropoulos

(eds) Hellenic Conference on Artificial Intelligence. Berlin: Springer, pp. 249–260.

21.

Layeb

Chenche

(2012) A novel grasp algorithm for solving the bin packing problem. International Journal of Information Engineering and Electronic Business 4(2): 8–14.

22.

Levine

Ducatelle

(2004) Ant colony optimization and local search for bin packing and cutting stock problems. Journal of the Operational Research Society 55(7): 705–716.

23.

Lindner

Milecki

Wyrwał

(2021) Positioning of the robotic arm using different reinforcement learning algorithms. International Journal of Control, Automation and Systems 19(4): 1661–1676.

24.

Liu

(2020) Learn to play tetris with deep reinforcement learning. Unpublished. Available at: https://openreview.net/forum?id=8TLyqLGQ7Tg

25.

Liu

Song

, et al. (2021) A collaborative control method of dual-arm robots based on deep reinforcement learning. Applied Sciences 11(4): 1816.

26.

Mnih

Kavukcuoglu

Silver

, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533.

27.

Quigley

Conley

Gerkey

, et al. (2009) ROS: An open-source robot operating system. In: ICRA workshop on open source software, vol. 3, Kobe, Japan, Online, 17 May, p. 5. New York, NY: IEEE.

28.

Ramon

Driessens

(2004) On the numeric stability of Gaussian processes regression for relational reinforcement learning. In: ICML-2004 Workshop on Relational Reinforcement Learning, Banff, AB, Canada, 9 July, pp. 10–14. New York, NY: ACM.

29.

Schulman

Levine

Abbeel

, et al. (2015) Trust region policy optimization. In: International conference on machine learning, Lille, France, 6–11 July, pp. 1889–1897. New York, NY: PMLR.

30.

Schulman

Wolski

Dhariwal

, et al. (2017) Proximal policy optimization algorithms. Arxiv Preprint Arxiv:1707.06347.

31.

Sekkat

Tigani

Saadane

, et al. (2021) Vision-based robotic arm control algorithm using deep reinforcement learning for autonomous objects grasping. Applied Sciences 11(17): 7917.

32.

Silver

Schrittwieser

Simonyan

, et al. (2017) Mastering the game of go without human knowledge. Nature 550(7676): 354–359.

33.

Stevens

Pradhan

(2016) Playing tetris with deep reinforcement learning. Available at: http://cs231n.stanford.edu/reports/2016/pdfs/121_Report.pdf

34.

Sutton

Barto

(2018) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

35.

Szita

Lorincz

(2007) Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man. Journal of Artificial Intelligence Research 30: 659–684.

36.

Tsitsiklis

Van Roy

(1996) Feature-based methods for large scale dynamic programming. Machine Learning 22(1): 59–94.

37.

Yang

Luo

Fuentes

, et al. (2021) A flexible reinforced bin packing framework with automatic slack selection. Mathematical Problems in Engineering 2021: 6653586.

38.

Zhang

Jia

, et al. (2022) Brain-inspired experience reinforcement model for bin packing in varying environments. IEEE Transactions on Neural Networks and Learning Systems 33(5): 2168–2180.

39.

Zheng

Wang

Chen

, et al. (2022) Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots. Transactions of the Institute of Measurement and Control 44(4): 892–904.

40.

Zhu

Sui

Yang

(2011) Bin-packing algorithms for periodic task scheduling. International Journal of Pattern Recognition and Artificial Intelligence 25(07): 1147–1160.

Deep reinforcement learning in playing Tetris with robotic arm experiment

Abstract

Keywords

Get full access to this article

References