Abstract

Introduction
The ever-growing development of sensor technology has led to the use of multimodal sensors for the development of robots. 1 –4 It is therefore highly expected to develop methodologies capable of integrating information from multimodal sensors with the goal of improving the performance of robotic perception.
Despite the significant progress made in intelligent robots that use various sensors including camera, infrared, laser radar, tactile array, and so on, the issue how to fuse multimodal information to improve the perception capability is more attractive and challenging. Multimodality information offers complementary characteristics that make them the ideal choice for robust perception. In this special issue, the guest editors solicited novel and solid works on robotic perception using multimodal fusion technology. The topics of this special issue explicitly included (but were not limited to) the following aspects: Cognitive mechanisms of multimodal fusion; Joint representation and learning for multimodal information; Bioinspired multimodal fusion methods for robotics; Deep learning for multimodal fusion; Crossmodal perception and learning for robotics; New robotic applications of multimodal fusion; and Benchmarks of multimodal fusion problems.
The papers
This special issue received a high number of submissions. After peer reviewing, the special issue accepted 19 papers for publication. The accepted papers are divided into three categories, according to their content.
Part I, which consists of seven papers, deals with various modalities using advanced machine learning technology. The first paper 5 proposes an objective method for quality assessment of colored multiexposure image fusion based on image saturation, together with texture and structure similarities, which can measure the perceived color, texture, and structure information of fused images. The second paper 6 develops a method for mutual identification of multiple mobile robots. The method interchanges identification numbers among multiple robots via audible beeps multiplexed by frequency. The identification sound is sent only when robots approach each other. The authors point that the developed algorithm can also be used in other multiagent robotic systems. Also, the key problems in front vehicle detection and tracking based on computer vision are investigated in another paper. 7 It proposes a new adaptive threshold segmentation algorithm which is resistant to interference from complex environments. The next work 8 develops a novel hyperspectral image classification approach based on multiresolution segmentation with a few labeled samples, which is motivated by the fact that pixels within a homogenous region are very likely to have the same class label, which can be utilized to increase the number of labeled samples. In another paper, 9 a hand and club tracking framework based on recognition with a complex descriptor combining histograms of oriented gradients and spatial–temporal vector is proposed to obtain their movement trajectories in golf video. The sixth paper 10 introduces a new image concept named superimposed image for testing subjective and objective assessment methods. The last paper of this part 11 presents a method of probabilistic soft assignment recognition scheme based on Gaussian mixture models to recognize similar actions. The authors firstly generate a representative posture dictionary based on the standard bag-of-words model, and then use a Gaussian mixture model for the similar pose recognition.
Part II, which is composed of five papers, is related to heterogonous multimodal methods. The first paper 12 assembles a large-scale RGB-D activity data set by merging five public RGB-D data sets that differ from each other on many aspects such as length of actions, nationality of subjects, or camera angles. This data set provides a new benchmark for research purposes. In another work, 13 a stereovision-based robust free space detection method, which mainly depends on geometry information and Gaussian process regression, is proposed. The authors apply Bayesian framework and conditional random field inference to fuse the multimodal information including two-dimensional image and three-dimensional point geometric information. The following paper 14 proposes a novel alignment approach based on time-domain constraints. The authors of another paper 15 propose a fast-discriminative collaborative representation–based classification algorithm for multimodal fusion recognition. The last paper in this second part 16 presents a recent overview about visual–tactile fusion recognition.
Lastly, part III, which consists of seven papers, is highly relevant to robotic applications. Its first work 17 presents a strategy for hands-free control of a NAO humanoid robot via head gesture detected by multisensor fusion based on Google Glass. The second paper 18 proposes a dynamic priority–based multitask algorithm to avoid self-collision during highly complex robotic whole-body motions and rebalance after external disturbance using momentum compensation strategies. The third proposal 19 establishes a robotic teleoperation system with a wearable multimodal fusion device. The following paper 20 presents a semantic approach for multimodal interaction between humans and industrial robots to enhance the dependability and naturalness of the collaboration between them in real industrial settings. Then, a work 21 proposes an objective method for quality assessment of colored multiexposure image fusion based on image saturation, together with texture and structure similarities, which can measure the perceived color, texture, and structure information of fused images. In another approach, 22 the design on the shock absorber of the lunar probe soft landing is formulated as a single- or multi-objective optimization problem. The multi-objective optimization strategy is proposed and nondominated sorting genetic algorithm II is employed to find the best decision parameters of the shock absorber design. The last paper 23 introduces a novel longitudinal control method composed of a learning-based acceleration decision phase and an internal model–based acceleration tracking phase for the follower vehicle.
