Sage Journals: Discover world-class research

Abstract

A combined reinforcement learning (RL) and motion planning framework is proposed in this paper to address a multi-class in-rack test tube rearrangement problem. The RL works at the task level to plan a sequence of swap actions while ignoring the details of robotic motion. The motion planning works at the motion level to plan detailed robotic pick-and-place motion based on the action sequences obtained at the task level. The task level and motion level are combined in a closed loop with the help of a condition set maintained for each rack slot, which allows the framework to perform replanning efficiently and thus effectively find solutions in the presence of failures. In particular, for RL, the framework leverages a distributed deep Q learning structure with the dueling double deep Q network (D3QN) to learn policies from training data amplified by an A^⋆-based post-processing technique. The D3QN and distributed learning help increase training efficiency. The post-processing helps salvage failed action sequences and remove redundancy, thus making training data more effective. Simulations and real-world studies are carried out in the experiments to understand the performance of the proposed framework. The results verify the advantages of the RL and post-processing, and show that the closed-loop combination improves robustness. The framework is ready to incorporate various sensory feedbacks, as demonstrated by real-world studies.

Keywords

manipulation planning deep reinforcement learning combined task and motion planning

Get full access to this article

View all access options for this article.

References

Altman

(1999) Constrained Markov Decision Processes. Chapman & Hall.

Andrychowicz

Wolski

Ray

, et al. (2017) Hindsight experience replay. Advances in Neural Information Processing Systems 30: 5048–5058.

Anh

Hoang

Lee

, et al. (2007) Heuristic search based exploration in reinforcement learning. In: International Work-Conference on Artificial Neural Networks (IWANN), 110–118.

Bai

Meng

Liu

, et al. (2022) Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement. Biomimetic Intelligence and Robotics 2(3): 100047. https://doi.org/10.1016/j.birob.2022.100047

Bianchi

Ribeiro

Costa

(2008) Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics 14: 135–168. https://doi.org/10.1007/s10732-007-9031-5

Chavan-Dafle

Rodriguez

Paolini

, et al. (2014) Extrinsic Dexterity: In-Hand Manipulation with External Forces. IEEE International Conference on Robotics and Automation (ICRA), 1578–1585.

Chen

Wan

Matsushita

, et al. (2023a) Automatically prepare training data for yolo using robotic in-hand observation and synthesis. IEEE Transactions on Automation Science and Engineering: 1–17.

Chen

Wan

Matsushita

, et al. (2023b) In-rack test tube pose estimation using rgb-d data IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1–6.

Cheng

Kolobov

Swaminathan

(2021) Heuristic-guided reinforcement learning. Advances in Neural Information Processing Systems 34: 13550–13563.

10.

Cheong

Cho

Lee

, et al. (2021) Obstacle rearrangement for robotic manipulation in clutter using a deep q-network. Intelligent Service Robotics 14(4): 549–561. https://doi.org/10.1007/s11370-021-00377-4

11.

Clevert

Unterthiner

Hochreiter

(2016) Fast and accurate deep network learning by exponential linear units (elus). In: International Conference on Learning Representations (ICLR).

12.

Da Costa

Zhang

Akcay

, et al. (2021) Learning 2-opt local search from heuristics as expert demonstrations. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8.

13.

Dann

Brunskill

(2015) Sample complexity of episodic fixed-horizon reinforcement learning. Advances in Neural Information Processing Systems 28: 2818–2826.

14.

Dantam

Kingston

Chaudhuri

, et al. (2018) An incremental constraint-based framework for task and motion planning. The International Journal of Robotics Research 37(10): 1134–1151. https://doi.org/10.1177/0278364918761570

15.

Deng

Xia

Wang

, et al. (2022) Deep Reinforcement Learning Based on Local Gnn for goal-conditioned Deformable Object Rearranging. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1131–1138.

16.

Esteves

JJA

Boubendir

Guillemin

, et al. (2022) A heuristically assisted deep reinforcement learning approach for network slice placement. IEEE Transactions on Network and Service Management 19(4): 4794–4806. https://doi.org/10.1109/tnsm.2021.3132103

17.

Fang

Zhou

, et al. (2019) Curriculum-guided hindsight experience replay. Advances in Neural Information Processing Systems 32: 12600–12611.

18.

Fawzi

Balog

Huang

, et al. (2022) Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610(7930): 47–53. https://doi.org/10.1038/s41586-022-05172-4

19.

Gao

Feng

(2021) On minimizing the number of running buffers for tabletop rearrangement. In: Robotics: Science and Systems (RSS).

20.

Gao

Feng

Huang

, et al. (2023) Minimizing running buffers for tabletop object rearrangement: complexity, fast algorithms, and applications. The International Journal of Robotics Research 42(10): 755–776. https://doi.org/10.1177/02783649231178565

21.

Garrett

Lozano-Pérez

Kaelbling

(2020) Pddlstream: integrating symbolic planners and blackbox samplers via optimistic adaptive planning. Proceedings of the international conference on automated planning and scheduling 30: 440–448. https://doi.org/10.1609/icaps.v30i1.6739

22.

Garrett

Chitnis

Holladay

, et al. (2021) Integrated task and motion planning. Annual review of control, robotics, and autonomous systems 4: 265–293. https://doi.org/10.1146/annurev-control-091420-084139

23.

Gupta

Mao

Bhatia

, et al. (2022) Extrinsic Dexterous Manipulation with a direct-drive Hand: A Case Study. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4660–4667.

24.

Gupta

Dhir

Dani

, et al. (2023) Maner: multi-agent neural rearrangement planning of objects in cluttered environments. IEEE Robotics and Automation Letters 8(12): 8295–8302. https://doi.org/10.1109/lra.2023.3327936

25.

Han

Stiffler

Krontiris

, et al. (2018) Complexity results and fast methods for optimal tabletop rearrangement with overhand grasps. International Journal of Robotics Research 37(13-14): 1775–1795. https://doi.org/10.1177/0278364918780999

26.

Haustein

King

Srinivasa

, et al. (2015) Kinodynamic Randomized Rearrangement Planning via Dynamic Transitions Between Statically Stable States. IEEE International Conference on Robotics and Automation (ICRA), 3075–3082.

27.

Zhang

Ren

, et al. (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.

28.

Helmert

(2006) The fast downward planning system. Journal of Artificial Intelligence Research 26: 191–246. https://doi.org/10.1613/jair.1705

29.

Hester

Vecerik

Pietquin

, et al. (2018) Deep q-learning from demonstrations. In: AAAI Conference on Artificial Intelligence (AAAI).

30.

Horgan

Quan

Budden

, et al. (2018) Distributed prioritized experience replay. In: International Conference on Learning Representations (ICLR).

31.

Huang

Jia

Mason

(2019) Large-Scale Multi-Object Rearrangement. IEEE International Conference on Robotics and Automation (ICRA), 211–218.

32.

Huang

Conkey

Hermans

(2023) Planning for Multi-Object Manipulation with Graph Neural Network Relational Classifiers. IEEE International Conference on Robotics and Automation (ICRA), 1822–1829.

33.

Gao

Zhang

, et al. (2023) A heuristically accelerated reinforcement learning-based neurosurgical path planner. Cyborg and Bionic Systems 4: 0026. https://doi.org/10.34133/cbsystems.0026

34.

King

Cognetti

Srinivasa

(2016) Rearrangement Planning Using object-centric and robot-centric Action Spaces. IEEE International Conference on Robotics and Automation (ICRA), 3940–3947.

35.

Krontiris

Shome

Dobson

, et al. (2014) Rearranging similar objects with a manipulator using pebble graphs IEEE-RAS International Conference on Humanoid Robots, pp. 1081–1087.

36.

Kuffner

LaValle

(2000) RRT-connect: an efficient approach to single-query path planning. In: Rrt-connect: An efficient approach to single-query path planning. IEEE International Conference on Robotics and Automation (ICRA), Vol. 2, 995–1001. https://doi.org/10.1109/robot.2000.844730

37.

Kulshrestha

Qureshi

(2023) Structural concept learning via graph attention for multi-level rearrangement planning. In: Conference on Robot Learning (CoRL), pp. 3180–3193.

38.

Labbé

Zagoruyko

Kalevatykh

, et al. (2020) Monte-carlo tree search for efficient visually guided rearrangement planning. IEEE Robotics and Automation Letters 5(2): 3715–3722. https://doi.org/10.1109/lra.2020.2980984

39.

Larsson

(2018) Evaluation of Pretraining Methods for Deep Reinforcement Learning. Uppsala University. PhD Thesis.

40.

Lee

Nam

Park

, et al. (2021a) Tree Search-based Task and Motion Planning with Prehensile and Non-prehensile Manipulation for Obstacle Rearrangement in Clutter. IEEE International Conference on Robotics and Automation (ICRA), 8516–8522.

41.

Lee

Smith

Abbeel

(2021b) Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In: International Conference on Machine Learning (ICML), 6152–6163.

42.

Xiang

(2023) Vanishing bias heuristic-guided reinforcement learning algorithm. arXiv preprint arXiv:2306.10216.

43.

Liu

Paxton

Hermans

, et al. (2022) Structformer: learning spatial structure for language-guided semantic rearrangement of novel objects. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 6322–6329.

44.

Luo

Zhao

Zhu

, et al. (2023) A* guiding dqn algorithm for automated guided vehicle pathfinding problem of robotic mobile fulfillment systems. Computers & Industrial Engineering 178: 109112. https://doi.org/10.1016/j.cie.2023.109112

45.

Mankowitz

Michi

Zhernov

, et al. (2023) Faster sorting algorithms discovered using deep reinforcement learning. Nature 618(7964): 257–263. https://doi.org/10.1038/s41586-023-06004-9

46.

Mnih

Kavukcuoglu

Silver

, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533. https://doi.org/10.1038/nature14236

47.

Morozs

Clarke

Grace

(2015) Heuristically accelerated reinforcement learning for dynamic secondary spectrum sharing. IEEE Access 3: 2771–2783. https://doi.org/10.1109/access.2015.2507158

48.

Narvekar

Peng

Leonetti

, et al. (2020) Curriculum learning for reinforcement learning domains: a framework and survey. Journal of Machine Learning Research 21(1): 7382–7431.

49.

Paxton

Xie

Hermans

, et al. (2022) Predicting stable configurations for semantic placement of novel objects. In: Conference on Robot Learning (CoRL), pp. 806–815.

50.

, et al. (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 652–660.

51.

Qureshi

Mousavian

Paxton

, et al. (2021) Nerp: neural rearrangement planning for unknown objects. In: Robotics: Science and Systems (RSS).

52.

Raessa

Wan

Harada

(2021) Planning to repose long and heavy objects considering a combination of regrasp and constrained drooping. Assembly Automation 41(3): 324–332. https://doi.org/10.1108/aa-01-2021-0008

53.

Saxena

Saleem

Likhachev

(2021) Manipulation Planning Among Movable Obstacles Using Physics-based Adaptive Motion Primitives. IEEE International Conference on Robotics and Automation (ICRA), 6570–6576.

54.

Schaul

Quan

Antonoglou

, et al. (2016) Prioritized experience replay. In: International Conference on Learning Representations (ICLR).

55.

Schiavi

Wulkop

Rizzi

, et al. (2023) Learning agent-aware affordances for closed-loop interaction with articulated objects. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5916–5922.

56.

Schulman

Wolski

Dhariwal

, et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

57.

Silver

Hubert

Schrittwieser

, et al. (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419): 1140–1144. https://doi.org/10.1126/science.aar6404

58.

Song

Haustein

Yuan

, et al. (2020) Multi-Object Rearrangement with Monte Carlo Tree Search: A Case Study on Planar Nonprehensile Sorting. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 9433–9440.

59.

Stilman

Kuffner

(2005) Navigation among movable obstacles: Real-time reasoning in complex environments. International Journal of Humanoid Robotics 2(04): 479–503. https://doi.org/10.1142/s0219843605000545

60.

Tang

Liu

Huang

, et al. (2023) A reinforcement learning method for rearranging scattered irregular objects inside a crate. IEEE Transactions on Cognitive and Developmental Systems 15(3): 1314–1322. https://doi.org/10.1109/tcds.2022.3209680

61.

Van Hasselt

Guez

Silver

(2016) Deep reinforcement learning with double q-learning AAAI Conference on Artificial Intelligence (AAAI).

62.

Migimatsu

Bohg

(2024) Coast: constraints and streams for task and motion planning. arXiv preprint arXiv:2405.08572.

63.

Wan

Harada

(2016) Developing and comparing single-arm and dual-arm regrasp. IEEE Robotics and Automation Letters 1(1): 243–250. https://doi.org/10.1109/lra.2016.2517147

64.

Wan

Harada

Kanehiro

(2020) Preparatory manipulation planning using automatically determined single and dual arm. IEEE Transactions on Industrial Informatics 16: 442–453. https://doi.org/10.1109/tii.2019.2892772

65.

Wan

Kotaka

Harada

(2022) Arranging test tubes in racks using combined task and motion planning. Robotics and Autonomous Systems 147: 103918. https://doi.org/10.1016/j.robot.2021.103918

66.

Wang

Schaul

Hessel

, et al. (2016) Dueling network architectures for deep reinforcement learning International Conference on Machine Learning (ICML), pp. 1995–2003.

67.

Wang

Liang

(2020) Scene mover: automatic move planning for scene arrangement by deep reinforcement learning. ACM Transactions on Graphics 39(6): 1–15. https://doi.org/10.1145/3414685.3417788

68.

Wang

Miao

Bekris

(2022) Efficient and high-quality Prehensile Rearrangement in Cluttered and Confined Spaces. IEEE International Conference on Robotics and Automation (ICRA), 1968–1975.

69.

Weihs

Deitke

Kembhavi

, et al. (2021) Visual room rearrangement IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5922–5931.

70.

Xie

Meng

Zhou

, et al. (2020) Heuristic q-learning based on experience replay for three-dimensional path planning of the unmanned aerial vehicle. Science Progress 103(1): 0036850419879024. https://doi.org/10.1177/0036850419879024

71.

Yang

Lyu

, et al. (2023) Multi-step hindsight experience replay with bias reduction for efficient multi-goal reinforcement learning International Conference on Frontiers of Robotics and Software Engineering (FRSE), pp. 144–156.

72.

Yuan

Hang

Kragic

, et al. (2019) End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer. Robotics and Autonomous Systems 119: 119–134. https://doi.org/10.1016/j.robot.2019.06.007

73.

Zaheer

Kottur

Ravanbakhsh

, et al. (2017) Deep sets. Advances in Neural Information Processing Systems 30.

74.

Zeng

Song

Welker

, et al. (2018) Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4238–4245.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

Robotic test tube rearrangement using combined reinforcement learning and motion planning in a closed loop

Abstract

Keywords

Get full access to this article

References

Supplementary Material