Pose prediction of textureless objects for robot bin picking with deep learning approach

Abstract

Six dimensional pose prediction of objects for robotic manipulation has received much attention in industrial applications. Despite considerable research on improving the performance of pose estimation, the problem remains challenging for industrial objects because of their textureless and mostly homogeneous properties in the presence of clutter and occlusion. This article proposes deep learning-based pose estimation by jointly detecting, segmenting and pose predicting. The binary classification branch as an attention module is suggested to improve the accuracy of instance segmentation. Using the instance information, the initial pose estimation network is designed by fusing the depth information into the grayscale image to strengthen the geometric features. Then, to obtain a highly accurate pose, an iterative network is constructed with point clouds as inputs to refine the initial pose. The networks are implemented to predict the pose of textureless object on the synthetic and real scenes. Experimental results indicate the pose estimation method is efficient and robust to pose prediction of textureless objects in cluttered and occluded scenes.

Keywords

Pose prediction instance segmentation deep learning textureless objects robot bin picking

Get full access to this article

View all access options for this article.

References

Kaipa

Kankanhalli-Nagendra

Kumbla

, et al. Addressing perception uncertainty induced failure modes in robotic bin-picking. Robot Comput Integr Manuf 2016; 42: 17–38.

Gkioxari

Dollár

, et al. Mask R-CNN. In: IEEE international conference on computer vision, Venice, Italy, 22–29 October2017, pp. 2980–2988. New York: IEEE.

Pinheiro

Collobert

Dollár

. Learning to segment object candidates. In: Advances in neural information processing systems, Quebec, Canada, 7–12 December2015, pp. 1981–1989. California: Google Research.

Pinheiro

Lin

Collobert

, et al. Learning to refine object segments. In: European conference on computer vision, Amsterdam, The Netherlands, October 8–162016, pp. 75–91. New York: Springer.

Bolya

Zhou

Xiao

, et al. YOLACT: real-time instance segmentation. In: IEEE/CVF international conference on computer vision, Seoul, Korea, 27 October–2 November2019, pp. 9156–9165. New York: IEEE.

Chen

Sun

Tian

, et al. BlendMask: top-down meets bottom-up for instance segmentation. In: IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, 13–19 June2020, pp. 8570–8578. New York: IEEE.

Wang

Kong

Shen

, et al. SOLO: segmenting objects by locations, https://arxiv.org/abs/1912.04488. (2019accessed 10 December 2019).

Wang

Zhang

Kong

, et al. SOLOv2: dynamic and fast instance segmentation, https://arxiv.org/abs/2003.10152. (2020, accessed 23 March 2020).

Xiang

Schmidt

Narayanan

, et al. PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. Robot Sci Syst XIV, https://arxiv.org/abs/1711.00199. (2018, accessed 26 May 2018).

10.

Kehl

Manhardt

Tombari

, et al. SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE international conference on computer vision, Venice, Italy, 22–29 October2017, pp. 1530–1538. New York: IEEE.

11.

Liu

Anguelov

Erhan

, et al. SSD: single shot multibox detector. In: European conference on computer vision, Amsterdam, The Netherlands, 8–16 October2016, pp. 21–37. New York: Springer.

12.

Wang

, et al. DeepIM: deep iterative matching for 6D pose estimation. In: European conference on computer vision, Munich, Germany, 8–14 September2018, pp. 683–698. New York: Springer.

13.

Anguelov

Jain

. PointFusion: deep sensor fusion for 3D bounding box estimation. In: IEEE conference on computer vision and pattern recognition, Salt Lake City, UT

18–23 June

2018, pp. 244–253. New York: IEEE.

14.

Charles

Kaichun

, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: IEEE conference on computer vision and pattern recognition, Hawaii, USA, 21–26 July2017, pp. 77–85. New York: IEEE.

15.

Wang

Zhu

, et al. Densefusion: 6D object pose estimation by iterative dense fusion. In: IEEE conference on computer vision and pattern recognition, Long Beach, CA, 15–20 June2019, pp. 3343–3352. New York: IEEE.

16.

Wang

Manhardt

Tombari

, et al. GDR-Net: Geometry-guided direct regression network for monocular 6D object pose estimation. In: IEEE conference on computer vision and pattern recognition, Nashville, TN, 20–25 June2021, pp. 16606–16616. New York: IEEE.

17.

Shugurov

Zakharov

Ilic

. DPODv2: Dense correspondence-based 6DoF pose estimation. In: IEEE transactions on pattern analysis and machine intelligence8 October 2021, pp. 1–1. New York: IEEE. https://ieeexplore.ieee.org/document/9565319

18.

Saadi

Besbes

Kramm

, et al. Optimizing RGB-D fusion for accurate 6DoF pose estimation. IEEE Robot Autom Lett 2021; 6: 2413–2420.

19.

Yin

Lin

, et al. Graph neural network for 6D object pose estimation. Knowl Based Syst 2021; 218: 10839. 1–9.

20.

Guan

Sheng

Xue

. HRPose: real-time high-resolution 6D pose estimation network using knowledge distillation, https://arxiv.org/abs/2204.09429. (2022, accessed 20 April 2022).

21.

Gao

Sun

, et al. Efficient 6D object pose estimation based on attentive multi-scale contextual information. IET Comput Vis. Epub ahead of print 2 April 2022. DOI: 10.1049/cvi2.12101

22.

Wolnitza

Kaya

Kulvicius

, et al. 3D object reconstruction and 6D-pose estimation from 2D shape for robotic grasping of objects, https://arxiv.org/abs/2203.01051 (2022, accessed 2 March 2022).

23.

Wang

Chen

, et al. Robot grasping in dense clutter via view-based experience transfer. Int J Intell Robot 2022; 6: 23–37.

24.

Ronneberger

Fischer

Brox

. U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Munich, Germany, 2–5 October2015, pp. 234–241. New York: Springer.

25.

Sarode

Goforth

, et al. PCRNet: point cloud registration network using PointNet encoding, https://arxiv.org/abs/1908.07906. (2019, accessed 4 Novomber 2019).

26.

Besl

McKay

ND.

A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 1992; 14: 239–256.

27.

Lin

Maire

Belongie

, et al. Microsoft COCO: common objects in context. In: European conference on computer vision, Zurich, Switzerland, 6–12 September2014, pp. 740–755. New York: Springer.

28.

Hinterstoisser

Cagniart

Ilic

, et al. Gradient response maps for real-time detection of textureless objects. IEEE Trans Pattern Anal Mach Intell 2012; 34: 876–888.

29.

Rad

Lepetit

. BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: IEEE international conference on computer vision, Venice, Italy, 22–29 October2017, pp. 3848–3856. New York: IEEE.

30.

Park

Mousavian

Xiang

, et al. LatentFusion: end-to-end differentiable reconstruction and rendering for unseen object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Washington State, USA, 16–18 June2020, pp. 10710–10719. New York: IEEE.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB