Multimodal fusion for robotics

Abstract

Introduction

The ever-growing development of sensor technology has led to the use of multimodal sensors for the development of robots.^1

–4 It is therefore highly expected to develop methodologies capable of integrating information from multimodal sensors with the goal of improving the performance of robotic perception.

Despite the significant progress made in intelligent robots that use various sensors including camera, infrared, laser radar, tactile array, and so on, the issue how to fuse multimodal information to improve the perception capability is more attractive and challenging. Multimodality information offers complementary characteristics that make them the ideal choice for robust perception. In this special issue, the guest editors solicited novel and solid works on robotic perception using multimodal fusion technology. The topics of this special issue explicitly included (but were not limited to) the following aspects:

Cognitive mechanisms of multimodal fusion;

Joint representation and learning for multimodal information;

Bioinspired multimodal fusion methods for robotics;

Deep learning for multimodal fusion;

Crossmodal perception and learning for robotics;

New robotic applications of multimodal fusion; and

Benchmarks of multimodal fusion problems.

The papers

This special issue received a high number of submissions. After peer reviewing, the special issue accepted 19 papers for publication. The accepted papers are divided into three categories, according to their content.

Part I, which consists of seven papers, deals with various modalities using advanced machine learning technology. The first paper⁵ proposes an objective method for quality assessment of colored multiexposure image fusion based on image saturation, together with texture and structure similarities, which can measure the perceived color, texture, and structure information of fused images. The second paper⁶ develops a method for mutual identification of multiple mobile robots. The method interchanges identification numbers among multiple robots via audible beeps multiplexed by frequency. The identification sound is sent only when robots approach each other. The authors point that the developed algorithm can also be used in other multiagent robotic systems. Also, the key problems in front vehicle detection and tracking based on computer vision are investigated in another paper.⁷ It proposes a new adaptive threshold segmentation algorithm which is resistant to interference from complex environments. The next work⁸ develops a novel hyperspectral image classification approach based on multiresolution segmentation with a few labeled samples, which is motivated by the fact that pixels within a homogenous region are very likely to have the same class label, which can be utilized to increase the number of labeled samples. In another paper,⁹ a hand and club tracking framework based on recognition with a complex descriptor combining histograms of oriented gradients and spatial–temporal vector is proposed to obtain their movement trajectories in golf video. The sixth paper¹⁰ introduces a new image concept named superimposed image for testing subjective and objective assessment methods. The last paper of this part¹¹ presents a method of probabilistic soft assignment recognition scheme based on Gaussian mixture models to recognize similar actions. The authors firstly generate a representative posture dictionary based on the standard bag-of-words model, and then use a Gaussian mixture model for the similar pose recognition.

Part II, which is composed of five papers, is related to heterogonous multimodal methods. The first paper¹² assembles a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set provides a new benchmark for research purposes. In another work,¹³ a stereovision-based robust free space detection method, which mainly depends on geometry information and Gaussian process regression, is proposed. The authors apply Bayesian framework and conditional random field inference to fuse the multimodal information including two-dimensional image and three-dimensional point geometric information. The following paper¹⁴ proposes a novel alignment approach based on time-domain constraints. The authors of another paper¹⁵ propose a fast-discriminative collaborative representation–based classification algorithm for multimodal fusion recognition. The last paper in this second part¹⁶ presents a recent overview about visual–tactile fusion recognition.

Lastly, part III, which consists of seven papers, is highly relevant to robotic applications. Its first work¹⁷ presents a strategy for hands-free control of a NAO humanoid robot via head gesture detected by multisensor fusion based on Google Glass. The second paper¹⁸ proposes a dynamic priority–based multitask algorithm to avoid self-collision during highly complex robotic whole-body motions and rebalance after external disturbance using momentum compensation strategies. The third proposal¹⁹ establishes a robotic teleoperation system with a wearable multimodal fusion device. The following paper²⁰ presents a semantic approach for multimodal interaction between humans and industrial robots to enhance the dependability and naturalness of the collaboration between them in real industrial settings. Then, a work²¹ proposes an objective method for quality assessment of colored multiexposure image fusion based on image saturation, together with texture and structure similarities, which can measure the perceived color, texture, and structure information of fused images. In another approach,²² the design on the shock absorber of the lunar probe soft landing is formulated as a single- or multi-objective optimization problem. The multi-objective optimization strategy is proposed and nondominated sorting genetic algorithm II is employed to find the best decision parameters of the shock absorber design. The last paper²³ introduces a novel longitudinal control method composed of a learning-based acceleration decision phase and an internal model–based acceleration tracking phase for the follower vehicle.

Huaping Liu Tsinghua University, China Chenwei Deng Beijing Institute of Technology, China Antonio Fernández-Caballero Universidad de Castilla-La Mancha, Spain Fuchun Sun Tsinghua University, China

References

Fernández-Caballero

Ferrández

. Biologically inspired vision systems in robotics. Int J Adv Robot Syst 2017; 14(6): 1–2. DOI: 10.1177/1729881417745947.

Kubelka

Reinstein

Svoboda

. Improving multimodal data fusion for mobile robots by trajectory smoothing. Robot Auton Syst 2016; 84: 88–96.

Djaid

Saadia

Ramdane-Cherif

. Multimodal fusion engine for an intelligent assistance robot using ontology. Procedia Comput Sci 2015; 52: 129–136.

Almansa-Valverde

Castillo

Fernández-Caballero

. Mobile robot map building from time-of-flight camera. Expert Syst Appl 2012; 39(10): 8835–8843.

Deng

Wang

. Saturation-based quality assessment for colorful multi-exposure image fusion. Int J Adv Robot Syst 2017; 14(2): 1–15. DOI: 10.1177/1729881417694627.

Nemec

Janota

Hruboš

. Mutual acoustic identification in the swarm of e-puck robots. Int J Adv Robot Syst 2017; 14(3): 1–10. DOI: 10.1177/1729881417710794.

Zhang

Gao

Xue

. Real-time vehicle detection and tracking using improved histogram of gradient features and Kalman filters. Int J Adv Robot Syst 2018; 15(1): 1–9. DOI: 10.1177/1729881417749949.

Cui

Zhao

. A novel hyperspectral image classification approach based on multiresolution segmentation with a few labeled samples. Int J Adv Robot Syst 2017; 14(4): 1–10. DOI: 10.1177/1729881417710219.

Weixian

Xiaoping

Mingli

. Golf video tracking based on recognition with HOG and spatial–temporal vector. Int J Adv Robot Syst 2017; 14(3): 1–8. DOI: 10.1177/1729881417704544.

10.

Jiao

Deng

. A structural similarity-inspired performance assessment model for multisensor image registration algorithms. Int J Adv Robot Syst 2017; 14(4): 1–11. DOI: 10.1177/1729881417717059.

11.

Liu

Yin

. A new action recognition method by distinguishing ambiguous postures. Int J Adv Robot Syst 2018; 15(1). DOI: 10.1177/1729881417749482.

12.

Zhang

. Collecting public RGB-D datasets for human daily activity recognition. Int J Adv Robot Syst 2017; 14(4). DOI: 10.1177/1729881417709079.

13.

Xiao

Dai

. Gaussian process regression-based robust free space detection for autonomous vehicle by 3-D point cloud and 2-D appearance information fusion. Int J Adv Robot Syst 2017; 14(4). DOI: 10.1177/1729881417717058.

14.

Yang

Sun

Liu

. Sensor to sensor calibration of the integrated INS/vision navigation system: time-domain optimization. Int J Adv Robot Syst 2017; 14(3): 1–8. DOI: 10.1177/1729881417707322.

15.

Sun

Wang

Yang

. Discriminative collaborative representation for multimodal image classification. Int J Adv Robot Syst 2017; 14(3): 1–9. DOI: 10.1177/1729881417714211.

16.

Liu

Sun

. Recent progress on tactile object recognition. Int J Adv Robot Syst 2017; 14(4): 1–12. DOI: 10.1177/1729881417717056.

17.

Mao

Wen

Song

. Eliminating drift of the head gesture reference to enhance Google Glass-based control of an NAO humanoid robot. Int J Adv Robot Syst 2017; 14(2): 1–10. DOI: 10.1177/1729881417692583.

18.

Liu

Ning

. Active balance of humanoid movement based on dynamic task-prior system. Int J Adv Robot Syst 2017; 14(3): 1–13. DOI: 10.1177/1729881417710793.

19.

Fang

Sun

Liu

. Robotic teleoperation systems using a wearable multimodal fusion device. Int J Adv Robot Syst 2017; 14(4): 1–11. DOI: 10.1177/1729881417717057.

20.

Maurtua

Fernández

Tellaeche

. Natural multimodal communication for human–robot collaboration. Int J Adv Robot Syst 2017; 14(4): 1–12. DOI: 10.1177/1729881417716043.

21.

Zhang

Gao

. Semantic segmentation–aided visual odometry for urban autonomous driving. Int J Adv Robot Syst 2017; 14(5): 1–11. DOI: 10.1177/1729881417735667.

22.

Liu

Song

Wang

. Multi-objective optimization on the shock absorber design for the lunar probe using nondominated sorting genetic algorithm II. Int J Adv Robot Syst 2017; 14(4): 1–10. DOI: 10.1177/1729881417720467.

23.

Zhu

Dai

Huang

. An adaptive longitudinal control method for autonomous follow driving based on neural dynamic programming and internal model structure. Int J Adv Robot Syst 2017; 14(6): 1–13. DOI: 10.1177/1729881417740711.