Abstract
Conventional visual perception technology is subject to many restrictions, such as illumination, background clutter, and occlusion. Many intrinsic properties of objects, like stiffness, hardness, and internal state, cannot be effectively perceived by visual sensors. For robots, tactile perception is a key approach to obtain environmental and object information. Different from vision sensors, tactile sensors can directly measure some physical properties of objects and environment. At the same time, humans also utilize touch sensory receptors as an important means to perceive and interact with the environment. In this article, we present a detailed discussion on tactile object recognition problem. We divide the current studies on the tactile object recognition into three subcategories and detailed analysis has been put forward on them. In addition, we also discuss some advanced topics such as visual–tactile fusion, exploratory procedure, and data sets.
Introduction
At present, many robots are equipped with visual sensors. However, the visual sensing is subject to many restrictions in practical applications, such as lighting conditions and occlusions. Tactile sensing is another sensing modality which is widely used in robotics. Different from visual sensing, tactile sensors are capable of perceiving some physical properties (e.g. softness/hardness, texture, temperature) of an object directly. Humans also utilize touch sensory receptors as an important means to perceive and interact with the environment. Early in the 1980s, experiments with local anesthesia indicated the importance of humans’ tactile sensing for a stable grasp. 1 It was found that the applied grip force was critically balanced to optimize the motor behavior, so that slipping between the skin and the gripped object did not occur and the grip force did not reach exceedingly high values. Therefore, introducing tactile perception to the robot not only simulates human’s perception and cognitive mechanisms to a certain extent but also satisfies the strong demand for practical robotic applications.
With the development of modern sensors, control, and artificial intelligence technology, extensive research has been conducted on tactile sensors, grasp stability analysis, and object recognition based on tactile sensing. One early review 2 elaborated the state of the art in tactile sensing and its likely research motivations. It also pointed out the increasing emphasize and understanding on tactile sensing in the task of dexterous manipulation. Broadly speaking, tactile properties refer to any properties measured through contact, including pressure, force, temperature, and so on, while a narrow definition of tactile sensing requires that there must be force measurements involved.
Research on robotic tactile perception has expanded in recent years. Journals such as
Tactile sensors are essential to tactile perception. According to operation principles, tactile sensors can be classified into several types, such as piezoresistive, capacitive, piezoelectric, quantum tunnel effect, optical, barometric, structure-borne sound, and so on. Kappassov et al. 5 compared the advantages and disadvantages of 28 kinds of tactile sensors in detail, while Denei et al. 7 developed a multimodal tactile sensor that contained a capacitive sensor array, audio measurement, and proximity perception for recognizing different materials. Although many researchers still use self-developed tactile sensors, commercial tactile sensors play an increasingly important role in robotics. The most representative examples are BioTac (http://www.syntouchllc.com/Products/BioTac/), PPS (http://www.pressureprofile.com/), Weiss (http://www.weiss-robotics.de/en.html), and Tekscan (http://www.tekscan.com/).
Tactile perception technology has been widely applied to robotics. For example, Bekiroglu et al. 8 used a three-fingered dexterous hand to catch objects of different shapes. A tactile model built from the hidden Markov model was used to judge object-grasping stability. Additionally, tactile perception technology has also been applied to sliding detection, 9 object localization, 10,11 tactile servo, 12 and 3-D modeling. 13
Object recognition has always been a key problem in robotics and also important for environment perception. Therefore, how to make use of tactile information and realize object recognition arouse wide attention from researchers, and tactile object recognition has become a major research direction. However, these studies lack a unified theoretical framework. Tactile sensor data acquisition mechanisms differ from each other, and different grasping movements in tactile information acquisition also affect data characteristics. Therefore, no mature method for data collection and classification is established. To help address this problem, Kappassov et al. 5 divided tactile object recognition research into three major categories: tactile object identification, texture recognition, and contact pattern recognition. In this article, we adopt different guidelines based on tactile exploratory procedure (EP) which can be used to differentiate the shape of rigid objects, the texture of material surfaces, and the deformation of soft objects. Typical examples of these objects are metal parts, cloth, and fruits, respectively. Figure 1 shows representative objects extracted from research papers illustrating these three kinds of tasks. For rigid object recognition, shape information is primarily used. Objects for texture recognition generally have standard and regular shapes, so it is important to investigate their surface properties for recognition. In contrast, deformable objects are primarily recognized by their stiffness and damping characteristics.

Three kinds of representative objects which are usually used for tactile recognition research. Left: rigid objects used in the study by Luo et al. 14 ; middle: textured material used in the study by Sinapov et al. 15 ; right: deformable objects used in the study by Drimus et al. 16 The figures in left and middle panels have been reproduced with permission from IEEE. The figure in right panel has been reproduced with permission from Elsevier.
In this article, we divide the current studies on the tactile object recognition into three subcategories and present detailed discussions on them. In addition, we also discuss some advanced topics such as visual–tactile fusion, EP, and data sets. Please note that in the study by Dahiya et al., 3 the tactile sensing was classified with respect to some criterions such as the sensing principle, location of the sensor, the task to be done, and so on, while our work focuses on extrinsic tactile sensing according to the above classification.
The rest of this article is organized as follows: In “Exploratory procedure” section, we analyze the EP for tactile information acquisition. In “Tactile perception for shape,” “Tactile perception for texture,” and “Tactile perception for deformable objects” sections, we discuss how to use the tactile sensor to perceive the shape, stiffness, and texture. “Visual–tactile fusion for object recognition” section is about visual–tactile fusion. “Data sets” section gives the information about some data sets. The final section gives the conclusion.
Exploratory procedure
Since tactile signals can only be obtained by contacting and touching, the EP for collecting the tactile data is very important. Generally speaking, recognizing shape requires to contact and grasp the object; recognizing materials requires to slide on the surface of the robot; and recognizing deformability requires to squeeze the object. However, there does not exist a unified strategy for the EP. According to the findings of experimental psychology,
17
human needs six kinds of actions to explore objects, which were as follows: Press the object to determine the stiffness of an object. Laterally slide on the object to perceive the surface texture. Static contact with the objects to measure the temperature or heat conductivity. Enclosure with an object to determine its global shape and volume. Lift the object to get its weight. Contour following to perceive the local shape.
The above actions can be imitated by the robotic manipulator to perform the tactile exploration. In the existing work, the main EPs can be roughly classified as the following categories: Passive mode: In this case, the robotic manipulator is fixed, and the human operator hands over the object to the manipulator. The robotic hand mounted on the manipulator blindly grasps the object and performs actions such as press, squeeze, and so on. For example, in the study by Chitta et al.,
18
the human operator grasps the bottle and put it in the space between the two-finger hand, and then, a press action is performed to obtain tactile sequences. Such a passive mode is very popular in the recognition tasks for textured materials. Because the diversity of the materials, various EPs are combined together to obtain sufficient tactile information. Both the work by Sinapov et al.
15
and Chu et al.
19
take five different EPs for such task. Semi-active mode: In this case, objects are usually fixed in some places. The manipulator detects and approaches the object according to some prescribed trajectory and perform EPs. Since the approach procedure utilizes a closed-loop control strategy, there exists uncertainties on the grasp point. The EPs, however, are beforehand designed. Due to the grasp uncertainties, relative motion between object and fingers, the semi-active mode may produce more noisy tactile signals. However, such a strategy is more close to the human’s grasp strategy. A representative work can be found in the study by Soh and Demiris,
20
which divides the whole procedure into three stages: (1) open fingers according to some preshape, (2) move the fingers according to some planned path until it contact the objects, and (3) perform EPs such as press or squeeze until some termination conditions are satisfied. Active mode: This strategy exhibits strong flexibility because the manipulator finds the object and explores it autonomously. There exist many challenges when adopting this strategy and no unified method has ever been presented. Schneider et al.
21
adopted the decision theory to design the active grasp strategy to increase the recognition performance. Pezzementi et al.
22
developed the tactile alignment method using the object surface contour information. Xu et al.
23
adopted the Bayesian inference to develop the EP. Gemici and Saxena
24
resorted to the learning technology and designed the EP for cutting food.
From the above discussions, we find that different EPs should be performed to recognize the object’s shape, stiffness, texture, and so on. It is difficult, if not impossible, to recognize all properties of an object using one single tactile EP. Therefore, we naturally classify the existing work on tactile object recognition according to action of tactile EP. In the following three sections, we will discuss how to use the tactile sensor to perceive the object’s shape, texture, and deformability.
Tactile perception for shape
Shape classification of rigid objects using tactile signals has a long history. For example, Allen and Roberts 25 used touch point cloud data to fit the quadratic surface model and further realized the tactile shape recognition. Russell and Wijaya 26 described the process of object recognition using a mobile robot equipped with eight tentacle tactile sensors. While moving, a tentacle tactile sensor cannot only scan and instantly record the shape of objects but also recognize, catch, and move the object according to internal programs. After analyzing the tactile information of grasped objects, the least square fitting method and circle fitting method were used to classify and recognize objects. Conversely, Jin et al. 27 proposed the Gaussian process classification method for sparse tactile point cloud. Generally, a single grasp can only perceive the shape of the part that is in contact with the tactile sensor, so it can only handle the situation where objects are with simple shapes. Recently, Liu et al. 28 described the local shape of objects by extracting the covariance matrix of the tactile array signal. Most of the existing work can only deal with simple shaped objects.
When the tactile data are regarded as a low-resolution gray image, image processing and computer vision technologies can be used to extract features from the tactile array signal and classify different shapes. As aforementioned, tactile sensors can only perceive limited parts of objects, so the information from different parts of the object is expected to be obtained through multiple grasps. A bag-of-words (BoW) model can then be constructed for tactile signals. Based on the BoW feature, a naive Bayes classifier can be trained to classify the objects. However, this method requires a manipulator to grasp an object by many times to collect tactile information from different parts. Furthermore, the direct use of tactile image data introduces a lot of noises. To avoid these problems, Pezzementi et al. 22 used the SIFT feature to deal with the tactile image and relaxed restrictions on the position and orientation of objects. Ho et al. 29 applied this method to the task of sliding detection. Tactile sensors collect tactile signals through direct contact. However, since the position of objects may change during the contact, directly applying feature extraction methods in image processing may cause some problems. To address this, Luo et al. 14 proposed Tactile-SIFT, a new descriptor suitable for tactile image processing, and it was used to construct a BoW model. Figure 2 lists some of the aforementioned experimental objects for tactile shape recognition. These objects usually have obvious shape features, and some are even specially designed for algorithm verification (please see the first row of the left column in Figure 2). It is important to note that object recognition does not only rely on tactile sensors. In reality, visual sensors have already been widely used in this task, and therefore, how to combine the visual and tactile sensing to accurately recognize different objects is worth for further research.

Tactile perception for texture
The texture information reflects the distribution characteristics of microstructures on the surface of an object. Classification for the object’s material can be determined by surface texture features. Tactile sensors can perceive a lot of material information that visual sensors have difficulty to perceive. There is an extensive of research in tactile object recognition. To classify the material texture usually requires actions like scraping, sliding, and rubbing to obtain vibration signals. Therefore, it is necessary to consider characteristics of the time series. The most intuitive solution is to employ the signal processing methods. For example, Sinapov et al. 15 recognized 20 different textures using five scraping actions, Heyneman and Cutkosky 30 recognized eight different textured discs, Romano and Kuchenbecker 31 recognized 15 kinds of surface materials, and Jamali and Sammut 32 identified eight surface textures by simulating human behaviors through multiple contacts and implementing a majority voting. Conversely, Dallaire et al. 33 directly used the slope and peak of the tactile time series to develop a perception model based on the nonparametric Bayesian method and used it to recognize 28 different discs. Additionally, Chitta et al. 18 learned 25 haptic adjectives for textured materials. All of these works extracted features using the discrete Fourier transform or other frequency domain analysis tools. From another viewpoint, Strese et al. 34 pointed out the similarity between tactile and speech signals, and resorted to the Mel-frequency cepstral coefficients, which are commonly used in audio processing to extract features. Common methods for classification include the nearest-neighbor classifier, support vector machine (SVM), 15,35 and Gaussian process. 34 One example of using the SVM classifier is seen in the study by Chathuranga et al. 36 In this study, magnetic flux obtained three-dimension tactile values and treated the covariance matrix of these values as features to classify eight texture materials by SVM. More recently, Baishya and Bauml 37 developed deep learning method for robust material classification with tactile skins. Figure 3 shows representative examples of experimental objects for texture recognition, and Figure 4 lists the corresponding operation scenes. Until now, most of the existing work focuses on regular textured objects, and the complicated natural textured objects receive very little attentions.

Representative textured objects used for texture-based tactile recognition. Left: the 20 material surfaces used in the study by Sinapov et al. 15 ; middle: the 8 material surfaces used in the study by Jamali and Sammut 32 ; right: the 43 material surfaces used in the study by Strese et al. 34 Those figures have been reproduced with permission from IEEE.

The manipulation scenarios. Left: scratching trajectories performed by the robot on the plastic kitchen roll. Middle: the robot drags the finger across each material at a constant speed. Right: hardware setup used for the texture recording in the study by Strese et al. 34 Those figures have been reproduced with permission from IEEE. 15,32,34
Texture materials usually belong to surface characteristics. Therefore, it can also be perceived by visual sensors. Compared to visual sensors, tactile sensors obtain finer texture characteristics and also yield additional information through vibration and sliding. However, most current texture recognition task is limited to simple object shapes. For objects with complex shapes, further work is required to investigate how to design reasonable contact actions and use tactile information for analysis.
Tactile perception for deformable objects
Compared to tactile recognition based on shape and texture, object recognition for deformable objects pays more attention to surface characteristics and the internal state of the object, which cannot be directly obtained by normal visual sensors. Because of the complexity of object deformation, it is difficult to describe the deformation mechanism with a unified model.
Based on the changes of one-dimension force on fingertip, Chitta et al. 18 proposed a simple and useful method for tactile feature extraction and designed a classification algorithm to classify empty and full bottles. This resulted in a method that could perceive the internal state of an object by touch. For the 5-finger iCub robotic hand, Soh and Demiris 20 used the Gaussian process to extract features from tactile sequences and proposed an incremental object recognition method. It was used to identify soft toys, hard books, and bottles and cans containing different amounts of liquid. Drimus et al. 16 described a self-designed tactile sensor with 64 tactile sensing modules arranged in an 8 × 8 grid. This sensor was attached to the Schunk2 and Schunk3 fingers, which were used to grasp various objects to build a tactile data set. After the data collection process, distance was measured using dynamic time warping (DTW) and the nearest neighborhood classifier was used to classify objects.
The aforementioned works primarily use hands with two, three, or five fingers, and each finger is able to collect data. However, most of these works only concatenate data from different fingers through simple splicing or counting statistics, which pay more attention to individual fingers rather than the relationship between fingers. In recent years, sparse coding and dictionary learning methods have been successfully applied to fields such as signal processing and pattern recognition, which have attracted the attention from many researchers. Classifiers based on sparse coding has become an important method for dealing with images, text, and videos, but it is rarely implemented in tactile object recognition. This is because the tactile sequence does not satisfy the linear reconstruction assumption and conventional linear sparse coding cannot be applied. To solve this problem, Nguyes et al. 38 introduced kernels to sparse coding. This method not only reduced reconstruction error but also made dealing with tactile time sequences in non-Euclidean space more convenient. On the other hand, in practical applications, complex signals imply some potential structure information in addition to sparsity. The question of how to make full use of this information has increased the academic attention on structured sparse coding. Recently, Liu et al. 39 proposed joint sparse coding to model the relationship between fingers. This method effectively improved object recognition accuracy and also provided a new unified processing approach for multifinger tactile sequences. In the study by Liu et al., 40,41 the dictionary learning method was further introduced to improve the performance.
Figures 5 and 6 provide literature examples of experimental objects and their corresponding operation scenarios for the identification of tactile deformation. Objects used in these studies are common household objects such as fruits or containers like cans and bottles. Furthermore, the operation patterns of these objects differ from each other. All these factors make the object recognition very challenging. To address this, Madry et al. 42 proposed a spatiotemporal unsupervised feature learning method that made full use of the correlation information from tactile sequences both in time and space domains, and the method was validated on different data sets. 20,16 Another study 24 considered the deformation of food and designed operations with knives and forks to obtain the tactile information. After training, 12 different foods can be successfully recognized. On the other hand, Liarokapis et al. 43 used a soft manipulator and simplified the grasp process. A random forest classifier was designed based on deformation and rigidity. Xu et al. 23 integrated force, oscillation, and temperature signals from BioTac sensor to deal with both the texture and deformation recognition.


Most of the aforementioned works used tactile sensors integrated on fingers and grasped objects with the precise grasp. However, many manipulators also have tactile sensors on their palm, which inspire some scholars to grasp objects with a power grasp. For example, Navarro et al. 44 and Schmitz et al. 45 used this grasp to recognize multiple objects based on a single layer neural network and deep learning technology. Ma et al. 46 also used this power grasp strategy to classify 16 different objects. Figures 7 and 8 list the experimental objects and grasp sceneries used in these works. A future trend is to combine the two grasp methods together to obtain better performances.


Visual–tactile fusion for object recognition
To achieve fine operation, robots are usually equipped with a variety of sensors. If all sensors only perceive and understand the surrounding environment with independent methods for different modalities, the intrinsic relationship between different modalities is cutoff, which severely reduces the intelligence of the perception action. Therefore, in order to determine the object’s location and attribute information, we must study the multimodal (e.g. visual and tactile modalities) fusion theory and methods. The problem of visual–tactile fusion for perception has attracted the attention of many robotic researchers. The internationally renowned academic conference Robotics Science and Systems 2015 specifically organized a workshop named
Visual and tactile modalities are quite different from each other. Firstly, the format, frequency, and range of object information are different. The tactile sensing obtains the information about objects when it continuously touch the object, while the visual modality can simultaneously obtain multiple different features of an object at a distance. Furthermore, some features can only be obtained by one single perceptual mode. For example, the color of an object can only be obtained visually, while the texture, hardness, and temperature of a surface can be more naturally obtained through the tactile sensing. Asynchronous information obtained from two modalities and different perception ranges brings a great challenge to multimodal fusion. There are various methods for visual and tactile fusion. Son et al. 47 reported the experimental result about using both visual and tactile information for manipulator grasping and pointed out that with the introduction of tactile information, more precise result can be obtained. Boshra and Zhang 48 realized localization of polyhedron by integrating visual and tactile information.
The application of visual–tactile fusion is very extensive. Ilonen et al. 49 used the visual images and tactile signal for 3-D reconstruction. In the study by Bjorkman et al., 50 the visual image was first used for rough modeling, and then the tactile signals were used for fine-tuning. The grasp stability learning using visual–tactile fusion were investigated by Bekiroglu et al. 51 The work on dense mapping was recently developed by Bhattacharjee et al., 52 which assumed that the visually similar scene should have similar tactile characteristics. In the study by Prats et al., 53 the author developed a system which integrates the visual, tactile, and force to perform the task of complicated opening doors. Luo et al. 54 investigated the correspondence between visual and tactile features.
As to the visual–tactile object recognition, the existing work are still limited. Woods and Newell 55 presented a detailed explanation on this problem. An early work was developed by Kroemer et al., 56 which focused on utilizing visual information to help tactile feature extraction. In that work, the author discovered a very important problem which indicated that the visual data and the tactile data are weakly paired. In addition, they just used the tactile information for classification, while visual information is just used for training. Very recently, Gao et al. 57 and Zheng et al. 58 utilized deep learning method for the joint learning of visual and tactile information. All of the above work deals only with the textured material. For deformable objects, Guler et al. 59 presented an interesting work which tried to tell the internal state of a container. The author used a fixed Kinect camera to capture the object in order to detect the deformability, and then the tactile signals were introduced to analyze the internal state. However, this method requires the 3-D model of the object and therefore it is difficult to be extended to unfamiliar objects. In Figure 9, we list some representative data collection scenes or the algorithm architectures, which are adopted in the corresponding references.

As to the multimodality fusion, the joint sparse coding method has become a very effective strategy. 60 This method benefits from the sparse coding technique and is able to characterize the intrinsic relationship among different modalities by requiring the coding vectors to share the common sparsity pattern. Since the coding vectors are not required to be completely equal, the difference among them can also be effectively preserved. Such coding strategies has already been successfully used in face recognition and identity validation. 61 Very recently, Liu et al. 62 attempted to develop joint sparse coding method for visual–tactile fusion and solved the intrinsic weakly pairing problem by relaxing the joint coding requirements. In the left panel of Figure 10, we list some representative experimental objects. In the right panel of Figure 10, we can clearly see the complementary roles of the two modality. In fact, we can see two bottle, one is full and the other is empty. Their visual appearances are very similar while their tactile characteristics differ significantly. Liu et al. 62,63 provided a number of such examples to illustrate motivation of visual fusion for object recognition. Currently, there still exists great challenges due to the gap between different perception modalities.

Left: representative objects used for visual–tactile fusion recognition. Right: two objects which exhibits similar appearances and different tactile signals. 62 Those figures have been reproduced with permission from IEEE.
Data sets
Currently, there are a lot of visual object recognition data sets which are developed as benchmark. Some of them could be used for robotic application, such as the 300 objects data sets developed by Lai et al. 64 As a comparison, the construction work of tactile object data set is still in a very preliminary stage. Most of the existing work uses their own developed small-scale data sets and isn’t released for public usage, which prevents the comparison under same standards. Here, we summarize some representative data sets as below:
Tactile data set
In the study by Goldfeder et al., 65 the authors utilized multiple types of robotic hands to implement the grasp simulation. They generated the grasp configuration for each object and constructed the famous Columbia Grasp Database. Furthermore, they utilized GraspIt!, which is a simulation tool, 66 to produce the tactile feedback for each grasp configuration, and established a set of complete simulation tactile data sets. Dang 67 used such data set for learning the grasp stability.
Bekiroglu et al. 8 utilized the RobWorkSim tool to develop the simulation tactile data set SDS. This data set includes the tactile signals from five types of grasp cases. In addition, they established SD-5 which used the Schunk robotic hand to collect tactile signals for five objects. The original application of SDS and SD-5 is used for grasp stability analysis and learning.
For textured materials, the GRASP group in University of Pennsylvania released Penn Haptic Texture Toolkit (HaTT) 68 which includes 100 materials. In the study by Strese et al., 34 the authors introduced a haptic texture database which includes acceleration measurements collected during controlled and well-defined texture scans, as well as uncontrolled human free hand texture explorations for 43 different textures.
In the study by Drimus et al. 16 some various tactile data sets were developed. The tactile signals were collected using the developed flexible tactile sensor. Seven deformable fruits (grape, kiwi, lime, mushroom, orange, plum, tomato) were grasped by the Schunk Parallel gripper and the obtained tactile signals form the data set SPr-7. For the other 10 various household objects, they are rigid or deformable and some of them are of similar shape and size. The set of objects consist of a rubber ball, a balsam bottle, a rubber duck, an empty plastic bottle, a full plastic bottle, a bad orange, a fresh orange, a juggling ball, a tape roll, and a small block of wood. They used the Schunk Parallel gripper to establish the data set SPr-10. In addition, they used a three-finger hand to collect the new data set SD-10 using the same objects. Soh and Demiris 20 described the 10-object iCub data set which were collected using five-finger hand. It seems that existing data sets are of small scale due to the high cost of the collection. Recently, Schmitz et al. 45 attempted to developed deep learning method for tactile signal analysis and they investigated 20 objects. However, this data set has not been released yet.
Visual–tactile fusion data sets
Since the study on visual–tactile fusion recognition has only been started for a short time, the data set is in its infant stage. Kroemer et al. 56 provided a small-scale data set which includes both image and tactile signals. The tactile signals were obtained by sliding the pin on the materials. In HaTT, 68 although the textured image are provided, they did not focus on the problem of object recognition, but the texture synthesis using both the tactile and image information.
Very recently, Gao et al. 57 and Zheng et al. 58 developed deep learning methods for visual–tactile fusion. Both data sets are collected for textured materials. To the best knowledge of the author, there is now no publicly available visual–tactile data set for general household objects. It should be noted that Burka et al. 69 reported a plan in CVPR2015 to achieve this task, but their data set is still in construction.
Conclusions
In summary, tactile information and visual–tactile fusion information are very important in the process of robot precise operation and are also the focus of research in the field of robot perception. However, the research work in this area is faced with many challenges at present. The specific summary is as follows: Although object recognition based on tactile information has made great progress and many research results are achieved on the recognition of shape, texture, and deformation, there is still a long way to practical applications. In the meantime, the recognition of deformable objects has become a main direction of tactile object recognition because it combines the surface characteristics and the internal state of the object. Machine learning has become the mainstream method for tactile object recognition. At present, the reported methods include nearest-neighbor classification, SVM, decision tree, hidden Markov model, Gaussian process, and Bayesian learning. Application of deep learning in tactile signal processing is also started. However, it is still a challenging problem of how to achieve a more efficient and accurate classification by effectively extracting the tactile features of ordinary objects. Although the EP is a very key ingredient for tactile sensing, the related work is rather immature. Overall, the exploratory manner is very important for tactile perception, but there is still a lack of relevant theoretical basis. Current research focuses on developing exploration strategies based on human experience. Visual–tactile fusion can provide more information for object recognition. In practical application, how to find the correlation properties of tactile and visual information, and how to learn joint expression for them are very important problems, but the research in this area has just started. But this work is not only the foundation but also the important direction of the future visual–tactile fusion perception. Currently, joint sparse coding provides an effective strategy for multimodality fusion and deep learning provide effective method for feature extraction. How to combine them to performed complicated visual–tactile fusion is a very interesting open problem. Both tactile object recognition and visual–tactile fusion object recognition are faced with a problem of lacking a large scale data sets. Lack of data samples on the one hand will restrict the application of some machine learning methods (such as deep learning), on the other hand hinders the performance evaluation of various methods. Theory and algorithm research and the construction of an effective data set is the focus of future research. From the currently publicly reported papers, we can see that visual and tactile fusion object recognition data set has attracted great attention from many famous universities including the University of Pennsylvania,
69
University of California Berkeley,
57
Technical University of Munich,
58
Waseda University,
45
and so on.
Overall, visual–tactile fusion and other multimodal perception technology provide a more extensive and comprehensive perception information for robots to understand and interact with the environment. It is of great significance to improve robots’ operation intelligence level as well. More research is expected to conduct in the aspects of signal processing, machine learning, and robotic manipulation.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under grants U1613212, 61673238, 91420302 and 61327809 and in part by the National High-Tech Research and Development Plan under grant 2015AA042306.
