Abstract
Sensory perception for dexterous robotic hands is an active research area and recent progress in robotics. Effective dexterous manipulation requires robotic hands to accurately feedback their state or perceive the surrounding environment. This article reviews the state-of-the-art of sensory perception for dexterous robotic manipulation. Two types of sensors, such as intrinsic and extrinsic sensors, are introduced according to their function and layout in robotic hands. These sensors provide rich information to a robotic hand, which contains the posture, the contact information of objects, and the physical information of the environment. Then, a comprehensive analysis of perception methods including planning-level, control-level, and learning-level perceptions is presented. The information obtained from sensory perception can help robotic hands to make decisions effectively. Previously issued reviews mainly focus on the design of tactile senor, while we analyze and discuss the relationship among sensing, perception, and dexterous manipulation. Some potential research topics on sensory perception are also summarized and discussed.
Introduction
For an intelligent robotic hand system, an excellent tactile perception system is essential. 1 In an interaction between the robotic hand and the surrounding environment, tactile perception provides an important way for information transmission. Humans, who have a defect in tactile perception, are difficulty in manipulating objects. 2 Similarly, to interact with the surrounding environment or perform dexterous manipulation, a human-like perception system is necessary for robotic hands. In recent years, the benefit of the development of sensor technology, the perception performance, and operation ability of robotic hands have been improved accordingly.
However, dexterous manipulation is still challenging, involving the coordinated movement of multi-fingers and contact control with objects. Effective dexterous manipulation requires sensor perception that accurately feedback their state or knows the surrounding environment. 3
Generally, it is simple for a robotic hand to make a grasping action, but it is complicated to perform a firm and stable grasping motion without destroying the object. A flexible robotic operation cannot be performed without the assistance of a sensor.
According to their layout and function in robotic hands, sensors are classified into two groups, such as intrinsic and extrinsic sensors. The intrinsic sensors feedback the kinetic or dynamic information of the robotic body, such as hand joint angle, joint torque, and tendon strain, whereas the extrinsic sensors perceive the external environment, such as the pressure, force, temperature, and smoothness. 4 A variety of sensors based on different principles, such as piezoresistive, 5 optical, 6 capacitive, 7 quantum tunnel effect, 8 and barometric, 9 are designed to sense one specific information. Most of the previous reviews have been organized from this perspective. 10,11 Recent trends in sensors are moving toward multimodal sensors. 12
One of the most important abilities for humans is to perceive the environment, understand the environment, and complete some meaningful operations. We recognize the world around us through sight, hearing, taste, smell, and touch and through the information processing and fusion of the brain and then form a cognition of the surrounding environment. Tactile sensors enable robotic hands the ability to perceive the environment. The main functions of tactile perception are exploring objects and feedback on tactile features of objects. For robotic hands, because the sensor perception information is relatively limited, the lack of tactile information is one of the main factors limiting the robot’s ability to dexterous manipulation. Better tactile perception allows robotics to recognize the surrounding environment effectively and manipulate objects dexterously.
Although several reviews on the design of tactile sensors have been previously issued, little work has been done on perceptual methods and their relationship to dexterous robotic manipulation. Some literature has analyzed the hardware part of sensors 11 and common perception methods, such as slip detection 13 and object recognition. 14 Those works generally ignore the specific application scenarios. 15 This work focuses on discussing the sensory perception methods for dexterous manipulation. Dahiya et al. 4,16 classified robotic tasks involving tactile sensing into three categories: perception for action, action for perception, and action–reaction. 17 However, no matter what type of task or application is performed, and tactile perception can help robotic hands interact with objects. To analyze the relationship between perception and manipulation, this article summarizes the relevant information on different kinds of sensors that are used for dexterous manipulation.
As shown in Figure 1, this article reviews state-of-the-art sensory perception to understand the relationship between sensor, perception, and dexterous manipulation. The emerging trends in the sensor design and perception method are addressed. According to their function and layout in robotic hands and function, two types of sensors, such as intrinsic and extrinsic sensors, are introduced. To manipulate objects, different methods of planning, controlling, and learning are applied to compute the motion and control solution for dexterous robotic hands. Correspondingly, sensory perception methods are classified into three groups: plan-level, control-level, and learning-level perceptions. The advantages and disadvantages of different perception methods are compared. This analysis is supported by a review of 70 journals and 50 conference papers that are published from 2010 to 2020.

Sensing, perception, and action for dexterous manipulation. And two demonstrations of dexterous manipulation.
The rest of this article is organized as follows: the second section introduces sensors in dexterous robotic hands. The application of the perception hierarchy for dexterous manipulators and the used methods is introduced in the third section. The fourth section discusses the challenge of sensory perception for dexterous manipulation. Finally, the conclusion and future work are summarized in the fifth section.
Sensor in dexterous robotic hand
The implementation of dexterous manipulation requires robot hands to perceive the information of the external environment and its own state. According to their layout in the robotic hands, sensors can be divided into two categories, such as intrinsic and extrinsic sensors. In Figure 2, the functions of the two kinds of sensor are introduced respectively. In this part, two types of sensors are introduced respectively.

Classification of the robotic sensor.
Intrinsic sensor in robotic hands
The intrinsic sensors are used to feedback on the robotic hand’s own state information that includes position or force. These two types of information are indispensable parts of robotic hands. In the assembly process of a mechanical system, due to the existence of various errors, the actual transmission part may be affected by these errors. Therefore, intrinsic sensors are required to enable the robot hand to better understand its state. Three typical intrinsic sensors (joint angle sensor, bend sensor and tension sensor) are shown in Figure 3.

Motion sensor
Displacement and angle are two common types of signals, which are used to detect the gesture of dexterous hands. When a robotic hand is used to perform an action, motion sensors can capture the hand’s motion information, for example, the joint angle and the position of a fingertip. In this subsection, two kinds of motion sensors are discussed as follows.
Position sensor. Because the joint space of robotic hands is small, the volume requirement for a joint sensor is high. A variety of joint sensors are presented based on different principles. In the early stage, the joint position is usually measured by a Hall effect sensor, such as Cyber hand,
21
and was equipped with eight Hall effect sensors in each joint. In addition, by modifying the structure of the Hall effect sensor, the joint angle can be well-measured.
22
Palli and Pirozzi
18
designed an optical angular position sensor for robotic fingers. Petkovic et al.
23
used conductive silicone rubber to design a passive compliant robotic joint, which can not only measure the joint angle but also act as a damper to protect the joints. The Hall effect sensor can also measure the motor position.
24
Bend sensor. For tendon-driven robotic hands, the displacement of the tendon is always varied with the change of sheath shape. It is considered a position error that may cause a decrease in control accuracy. In this case, only using tension information of the tendon for robot control may produce errors. To address this issue, Useok Jeong and Kyu-Jin Cho
19
presented a bend sensor to measure the bend angle of the sheath. The bend sensor has a high resolution and large bend sensing range. In their subsequent work, combining with the bend sensor, they proposed a novel compensating method for the changing nonlinearity of the output force of the Bowden-cable structure. Without directly measuring the output tension, the tension of the Bowden cable can be accurately controlled.
25
In the process of dexterous manipulation, the motion sensor is used to measure the joint angle of the robotic hand, the displacement of the motor, and so on. For the kinematics analysis of the robot, both types of information are essential. More accurate joint angle information can reduce the motion error of the fingertip. The main problem with motion sensors is their miniaturization. It is still challenging to integrate a high-precision motion sensor in a narrow joint range. A better improvement is directed to complete angle detection through new material or new principle design. For example, the electronic skin can complete angle detection by attaching to the surface of the robotic hand. 26
Force/torque sensor
Focusing on dexterous manipulation, a force/torque sensor is essential to interact with unknown objects whatever for operational security and autonomously. For manipulators and robotic hands, these joints are usually equipped with joint torque sensors. In addition, the tension sensor, which measures the change of tension of the transmission rope, is also necessary for tendon-based robotic hands.
Joint torque sensor. The DLR hand
27
was equipped with a strain gauge for measuring joint torque and a potentiometer for measuring the joint position. In the subsequent DLR hand II,
28
the authors used the Hall effect sensors to replace the potentiometers to measure the joint position. Each finger of the Shadow hand
29
is also equipped with a joint torque sensor. The sensor supports the hand to achieve precise operation. In Tsetserukou et al.,
30
an optical joint torque sensor is presented to replace the strain gauge. With this sensor, the authors developed a position-based local impedance control method for dexterous grasping.
Tension sensor. The tendon can only provide force in one direction, and the friction loss in the path of the tendon is difficult to establish with an accurate mathematical model. Researchers usually use a tension sensor to measure tendon tension. Salisbury and Craig
31
placed a strain gauge on each bottom of the finger as the cable tension-sensing mechanism to realize the force control of the Stanford-JPL hand. In addition, based on the tension information of each cable, they presented several positions and force control architectures. With the development of optoelectronic sensors, some works applied them for tension measurement. Lotti et al.
32
used a pair of strain-gauges to measure tendon force. Subsequently, novel tension sensors are produced gradually. For Robonaut 2, Bridgwater et al. designed a tension sensor with high integration.
22
Palli and Pirozzi
33
developed an innovative optical sensor to control a tendon-driven robotic hand. The sensor can be mounted in any position along the tendon. Compared with strain gauges, this scheme has the advantages of simplicity and compactness. A force sensor based on a photo-interrupter, which has a high resolution, low nonlinearity, and low hysteresis, was presented in Jeong et al.
20
The size of this sensor is small enough to embed on a robot finger,
34
which can measure the tension of the tendon to estimate the contact force between fingertips and objects.
The force/torque sensors provide the dynamic information of the robotic hand. Dynamic information is necessary for robotic hands to perform stable dexterous grasping and manipulation. For example, when robotic hands are grasping a paper cup or a glass cup, the sensor should be able to feedback on how much grasping force is appropriate in these two situations. The acquired force information is useful to the robotic hand in dexterous manipulation (e.g. object pivoting). 35
Extrinsic sensor for robotic hands
The extrinsic sensor is a necessary component for the robot to obtain information about the surrounding environment. When facing unknown environments and objects, the external sensors provide sufficient security and operability. To achieve dexterous manipulation, the robotic hand needs to move close to the target object in the pre-operation phase and operate it with the hand figures in the operation phase. Therefore, in the pre-operation phase, the proximal sensors are used to detect the distance between the object and the robotic hand. When the robotic hand contacts an object, the tactile sensor is used to provide physical information and contact force about objects. In this subsection, we discuss three types of extrinsic sensorsas shown in Figure 4, such as proximal sensor, tactile sensor, and multimodal sensor, that are wildly used for dexterous manipulation.

Proximal sensor
Proximal sensors are always used to provide robots with the ability to detect the surface and the relative position between objects and robotic hands before grasping or manipulating objects. For humans, this ability is usually provided by visual feedback, but for robots, this ability can be given by proximity sensors. With proximity sensors, robotic hands can estimate the position, shape, and other physical information of an object before operating. For dexterous operations, knowing this information in advance can contribute to the success rate of the operation.
Combining a hierarchical reactive controller with the proximity sensor, the grasping performance of the Baxter’s hand is improved. 40 Konstantinova et al. 38 designed a force and proximity sensor based on fiber optic technology. Through a two-finger pinch experiment, they demonstrate this sensor can enhance grasping capabilities. Proximity sensors based on optical principles also can provide visual information to the robotic system. 41 In addition to an optical proximity sensor, a sensor based on the acoustic principle 42 can be used for pre-touch sensing to collect spatial information. A variety of proximity sensors has been designed for predicting contact and improving grasping ability. In addition, pre-touch sensing demonstrated an improved grasp planning.
Tactile sensor
For human hands, the tactile system can provide rich information, such as force, texture, temperature, and the hardness of the grasped object. The skillfulness of human hands depends on their sophisticated structure and powerful sensory system. Considering the importance of tactile information for dexterous manipulation, researchers have put a lot of effort to manufacture dexterous hands, which can realize most of the functions of human hands. This part briefly reviews the development of tactile sensors and serves as the foreword for the following perceptual methods.
In the early stage, the major thrust of tactile sensor research is pressure and force measurement. In addition, almost all robotic hands will be equipped with these sensors. 27,43 –48 These kinds of sensors are usually mounted at the fingertips to measure multidimensional forces. The common principle of these sensors is the piezoelectric effect. 49,50 Although this type of sensor can detect the direction of force, it is not so reliable for the detection of the contact position. Gradually, more research has focused on array tactile sensors.
Multi-array tactile sensor is designed for tactile sensing research. Most current robot hands are equipped with sensors that fall into this category. At present, a lot of commercial tactile sensors are of this type, among which the representative one is the BioTac sensor designed by Syntouch. 51 In addition, many researchers have also researched this type of sensor. GelForce sensor 52 is a finger-shaped tactile sensor that applies to the robotic hands. In addition, it can be easily integrated into the manipulator for grasping and other operations. Xie et al. 53 presented a 3 × 4 fiber optic tactile array used for measuring the normal force and pressure distribution. Such sensors can sense a wealth of information.
Multi-array tactile sensors generally perceive single-modal information. However, when humans grasp an object, its weight, size, temperature, texture, etc. are perceived concurrently. The ability of dexterous manipulation for humans is based on this perception. It is also important for robotic hands to obtain multimodal tactile information simultaneously. Generally, the tactile features obtained by multiple single-modal sensors are independent of each other. For robots, it is not conducive to establishing the inner relationship between various tactile features.
A multimodal sensor that can obtain multiple tactile information simultaneously is presented in response to this challenge, for example, a camera-based light conductive plate sensor can detect contact position and force. 54 It can also combine with an elastomer marker to feedback a multidimensional force. 41 Another typical multimodal tactile sensor is BioTac, which can provide information on contact forces, microvibrations, and thermal fluxes induced by contact with external objects. 55 Dexterous manipulation requires the simultaneous processing and acquisition of multiple pieces of information.
Although the optical-based tactile sensor has been presented for approximately 40 years, 56,57 it is not been widely applied in the early stage limited by the large volume, high price, and complex structure. As camera prices have declined and performance has improved in recent years, the advantages of tactile sensors based on cameras have become apparent and the related research is gradually increasing. With the development of image processing hardware and software, researchers can obtain more useful information from an optical tactile sensor, including but not limited to contact position, temperature, roughness, stiffness, texture, and shape. At this stage, a series of the multimodal optical tactile sensor was presented. Yuan et al. 37 designed a high-resolution tactile sensor that consists of an elastomer and a camera. This sensor can detect normal force, shear force, In-plane torque, and tilt torque. Based on the same principle, Padmanabha et al. 58 presented a kind of high-sensitive multidirectional tactile fingertip using Mirco-camera and gel-coat, which can provide high-resolution tactile signals. Meanwhile, this sensor can measure Omni-direction contact information by optimizing its structure. Lambeta et al. 59 designed a novel low-cost tactile sensor, which was applied to perform an in-hand manipulation task. They used a deep model to learn to manipulate glass marbles from raw tactile inputs toward desired target positions. Sun et al. 60 designed a tactile sensor that can measure surface texture, force, and thermal information to enhance the performance of the tactile sensor. Similarly, by using the image feature of deformation and machine learning methods, Baimukashev et al. 61 presented an optical tactile sensor that can detect shear force, tension, and pressure. Multimodal perception is a characteristic of this type of sensor.
The tactile perception of robots is gradually developing from simple force perception to multimodal perception 60 and from little contact range to a larger coverage range.
Multiregion perception is another research direction of tactile perception. Researchers focused on electronic skin that can cover the whole robotic hand to detect haptic perception rather than restrict it to fingertip perception. Lee et al. 62 presented a cross-reactive sensor that can be used for multimodal intelligent electronic skin that can identify multimodal information individually by utilizing the machine-learning method. Yao et al. 26 used a triboelectric nanogenerator as a robotic tactile sensor. Due to the advantage of the flexibility and high sensitivity, this type of electronic skin can be assembled to the robotic hand easily. It can detect the force, roughness, and stiffness of the object in contact. Moreover, Yin et al. 63 reported on a flexible, bioinspired sensor skin. The sensor skin can measure dynamic shear force and the performance is equivalent to a human fingertip. Based on the magnetic Halbach Array, Yan et al. designed a soft magnetic skin with self-decoupling and super-resolution abilities, which can apply to precise manipulation. 64
The tactile sensor is used to transmit the contact information with the object in real time. The controller can measure the grasping quality and prevent sliding through the feedback of tactile information. Some complex in-hand manipulation also relies on the collaboration of tactile sensors. 3 However, the information transmitted by sensors is not rich enough. It is still challenging to extract effective information from complex information for robotic hands. In addition, for the tactile sensor itself, the current research direction is as follows: soft, thin, flexible, stretchable, and lightweight. The purpose is to make the sensor fit well into the robotic hand surface without interfering with mobility or contact mechanics. 65
Perception for dexterous manipulation
In this section, sensory perception methods are discussed that the sensing information is processed to accomplish dexterous manipulation tasks. The information obtained from perception can help robots make decisions effectively. To manipulate objects, different methods from planning, control, and learning can be applied to compute the motion and control solution for dexterous robotic hands. Correspondingly, the perception methods are classified into three groups: planning-level, control-level, and learning-level perceptions as shown in Figure 5.
Planning-level perception. the planning of dexterous manipulation concerns the generation of the sequence of robot motions to meet the task requirements and avoid collision with obstacles. At the planning level, sensory perception is used to feedback information about the object and surrounding environment. For example, reconstructing the object shape and recognizing the object material.
Control-level perception. The dexterous manipulation control of robot hands is to drive its fingers to operate the target object and to achieve the target posture. At this level, sensory perception provides information used for motion feedback and adjustment of robotic hands such as adjusting grasping pose and force.
Learning-level perception. Advance machine learning methods are recently used to allow robots to learn and acquire manipulation skills automatically, which addresses the defects of traditional planning methods. At the learning level, the robot usually perceives the state information by directly interacting with the environment and then selects the correct action to achieve the learning goal.

The role of perception for the three levels.
Planning-level perception
This subsection explains how the object information, such as shape and material, can be recognized and understood from tactile perception. We also discuss how the robot can reconstruct objects model and estimate their pose using perception. In the planning-level perception, we focus on how the robot hand understands the target object and environment. The related research topics involve object model reconstruction based on tactile exploration and object material classification. As shown in Figure 6, three examples of planning-level are demonstrated respectively.

Object model reconstruction based on tactile exploration
One of the main challenges for robot hands is how to use tactile information to reconstruct accurate object models. Generally, a vision camera can provide most shape information of objects. 69 However, humans can perform exploratory tasks without relying on visual information, for example, taking a phone or keys out of the package. In this process, humans can find what they need in a very short time even without observing the objects in the package. To date, most robotic hands demonstrated good performance in a structured environment, such as robotic hands used on a factory assembly line. However, in the unstructured environment, robotic hands presently are not “smart” enough. Because a lack of surface information would cause operation instability, it drives researchers to investigate the novel exploration approach.
More and more research studies start focusing on tactile exploration and recognition. 70 Object shape provides useful information for dexterous manipulation. For example, the geometric model of the object determines the choice of the grasping points. Hence, object model reconstruction is a widely studied problem. Sommer et al. 71 proposed a general method of bimanual compliant tactile exploration for guiding the movement of arms and fingers along with objects. The tactile information is collected to produce the point clouds of the object. Based on the point clouds, the object can be identified and manipulated by an iCub robot. There have been some works 72 that defined exploration strategies based on the Gaussian process. For their methods, a tactile sensor is used to detect the contact between finger and object and then generate the point clouds based on the tactile information. Pezzementi et al. 73 proposed an approach for generating rich surface models from multiple images acquired from an array-type tactile sensor. Meanwhile, based on the established surface model, a pose detection algorithm is also introduced to estimate an unknown object’s pose.
Tactile information can only provide local contact information. Hence, obtaining accurate 3D object models of objects only by using tactile contact information is usually tedious. There have been some ways to combine touch with other sensors to explore objects. Ottenhaus et al. 66 introduced a combined sensor concept consisting of an inertial measurement unit and a pressure to detect tactile sensing of the surface normal. With the proposed combined sensor, they reconstruct the surface of an unknown object with a low error. Yang and Lepora 74 presented an exploration approach combined with passive vision and active touch. Park et al. designed a large-area electrical impedance tomography-based tactile sensor, which can be used for contact contour detection. 75 These methods also show good performance in tactile exploration. Many object exploration and recognition approaches were developed according to tactile information and machine learning methods.
Object material classification
Humans can recognize the object material by using their sense of touch, and it is easy to identify metal, glass, wood, and other materials by tactile. The ability to recognize materials lays the foundation for humans to manipulate objects. Even if the volume of the object is similar, humans use different forces to manipulate objects by distinguishing materials. This ability is also important for robotic hands. When grabbing a glass cup and a metal cup, the robotic should know how much force to use to operate each one. Hence, the ability of material classification is necessary for dexterous manipulation.
The material can be directly classified by different tactile signals. Early researchers attempted to use the perception of touch to establish a representation of objects. Chu et al. 76 trained robots to learn new objects and concepts through daily experience by equipping the PR2 robot with BioTac sensors. They established 34 adjective labels to describe each object. They used a machine learning technique to train the robot to learn haptic properties through tactile signals and then generalize this understanding across previously unfelt objects. It demonstrated that robotic tactile perception can be related to subjective human labels and established a baseline for future work on material classification. Chin et al. 77 used contact force and pressure information between manipulators and objects to detect metal, paper, and plastic by using the sort algorithm. In addition, they noticed that more available sensor signals are needed to improve the accuracy of the classification. Liu et al. 78 presented a joint kernel sparse coding method to address the tactile data representation and classification problem. The method can use multi-fingers tactile perception to improve classification performance and establish an intrinsic relationship between fingers, while also improving recognition performance. In the follow-up work of Liu et al., they fused vision information with tactile perception 79 and obtained better object recognition accuracy. Gao et al. 68 built a fusion deep model for material classification by combining it with visual features. This approach can help robots better understand the features of objects. Currently, a deep neural network (DNN) is the most common method for material classification, and the accuracy of classification results is relatively high. 80 However, this approach often relies on large amounts of data to train. In general, the use of richer tactile information leads to a more accurate to identify the object’s material and texture. 81 In addition, the combination of visual and tactile features may better describe the texture of an object.
Understanding the characteristics of an object is a prerequisite for dexterous manipulation of the object. Relying solely on tactile perception, humans can perform some dexterous manipulation or use tools. However, it is hard for robots right now. This is because the information acquired by tactile is not enough at the moment. When robotic hands understand an object through tactile, they will lose some important physical information. Hence, at the planning level, tactile should move in the multimodal perception direction to percept more tactile information. The combination of machine learning methods may provide a richer knowledge of touch to solve this challenge. 82
Control-level Perception
At the control level, the dexterous robotic hand can realize more accurate adjustments through the combination of real-time feedback of touch and the control system. The sense of touch is no longer simply “telling” the robot about contact information and object information. Robot hands use tactile information to make a judgment of fingers and control the object state to reach the target state. In control-level perception, the related research topics can be tactile servoing, slip detection, and grasp quality measures as shown in Figure 7.

Tactile servoing
In the process of dexterous manipulation, the tactile servo can be assigned in the control architecture, where the motor is driven by tactile feedback. Meanwhile, tactile servoing also gives the robotic hand a new direction for exploring new environments more safely. For people, touch is an important piece of information when exploring the environment, and the same is true for robots. It is an important research direction for robots to better understand and utilize tactile information.
Su et al. 86 presented an approach to estimating contact force between the robotic hand and objects based on the machine learning method. In that work, robotic hands can detect different slip events and adjust the gripping force according to contact force. Based on large-scale data and feed-forward neural networks, Sundaralingam et al. 87 built a robust learning model to estimate fingertip force from tactile sensor information. Combined with a force feedback grasp controller, the robotic can perform grasp and place tasks stably. Delgado et al. 88 presented a universal framework to control movements of fingers according to the tactile images, which are obtained using a combination of dynamic Gaussians. In addition, the tactile servoing technique is used to stabilize the grasp points, avoid sliding, and adjust grasp points.
The application of tactile servoing is not only in force estimation but also in active exploration and performing complex tasks. Using a real-time tactile feedback control framework, She et al. 89 achieved the task of following a dangling cable. They presented two tactile-based controllers used to adjust grasping force and grasping posture, respectively. The perception and control framework allowed the robotic hand to accomplish more complex grasp tasks. Researchers also employed a tactile sensor to actively explore object information. 90 Active exploration needs actuator control and tactile perception; tactile perception is used to adjust the movement strategies of the robot.
Slip detection
Dexterous manipulation with a multi-fingered robotic hand is a challenging task due to the existence of uncertainties arising from sensor noise, slippage, or external disturbances. External disturbances caused by environmental changes may cause a planned-to-be stable grasp into an unstable one. Humans can react quickly to instabilities through tactile sensing. The ability of slip detection-based tactile perception is critically essential for further dexterous manipulations.
Most current works on slip detection are based on two kinds of sensors: traditional commercially tactile sensors (such as BioTac tactile sensor) and visual-tactile sensors. Van Wyk and Falco 91 revealed the difference between non-slipping and slipping stimuli through spectral analysis of univariate tactile signal gradients. They trained an LSTM network to classify between slipping and non-slipping, with an accuracy of above 90% in three commercially available sensors. Huh et al. presented a dynamically reconfigurable tactile sensor, which can detect and distinguish between linear and rotational sliding. 92 Veiga et al. 85 developed a generalizable random forest classifier to predict unknown object slipping. In the follow-up work, they 67 presented a modular independent finger grip stability control approach inspired by neurophysiological findings. Through the use of the tactile signal, they achieve a stable grasp control for in-hand operation. Zapata-Impata et al. 93 proposed a learning methodology for detecting the type of slipping (translational, rotational, and its direction). They used a Convolutional Long Short-Term Memory (ConvLSTM) to learn Spatiotemporal features from a tactile sensor that can detect seven categories of slipping. Li et al. 94 explored a method combined with SVM and window pursuit matching to classify the slip events. To stabilize unknown objects without prior knowledge of their shape or physical properties, Deng et al. 95 proposed an online detection architecture based on DNN to detect contact events and object material simultaneously from tactile data. According to the results of tactile sensing, they employed a PID-based controller to adjust the contact configuration of the robotic hand.
On the other hand, visual-tactile sensors are always used for slip detection. Through analyzing the sequence of images obtained by Gelsight, the information on force loading and partial slipping was obtained. 96 Dong et al. 97 acquired a combination of information about relative motions on the membrane surface and the shear distortions by the Gelsight sensor. Then based on the acquired information, they accomplished the task of slip detection and grasp control. Slip detection based on visual-tactile sensors combined with the DNN method can also get good results. Li et al. 98 trained a DNN by using the image sequences captured as input to detect whether a slip occurred or not. Zhang et al. 99 proposed a novel finger version tactile sensor and designed a slip detection framework based on a convolutional LSTM network. In addition, a pixel motion network based on LSTM was proposed by Zhang et al. 100 to predict the occurrence of sliding. When slipping occurred, the gripper would adjust the output force to maintain the stability of the grip. No matter which sensor is used for slip detection, most methods are data-driven. Therefore, the data set is important for slipping detection. Wang et al. 101 presented a multimodal grasp data set for robotic manipulation. It can be used for the training of slip detection. Based on the slip events detection, robotic hands can gently grip different objects and not destroy the objects.
Grasp quality measure
The correct grasp of objects is critically important for dexterous manipulation. Planning a good grasp includes the determination of grasping points on the object surface and the choice of hand configurations. Given an object and a hand, the key step in grasping planning is to measure the grasp quality. 102 The review of grasp quality measures can be found in the work of Roa and Suarez 103 In this work, we focus on reviewing the quality measures that used tactile information to evaluate grasp quality.
Researchers have presented many approaches to using tactile information for the grasp quality measure. Romano et al. 104 presented a human-inspired grasp controller that relied only on tactile signals. In addition, the controller can decide what kind of action the manipulator performs from six states: Close, Load, Lift and Hold, Replace, Unload, and Open. Schill et al. 105 proposed an approach for grasp-quality learning based on the temporal filtering of a support vector machine (SVM) classifier. The method can be extended to flexible hands. Wan et al. 106 used a simple SVM to extract grasping information from tactile signals and reached 90% accuracy in successful grasp prediction. Li et al. 107 proposed an adaptive grasp framework to deal with uncertainties about physical properties. They used a Gaussian mixture model to estimate grasp stability. In addition, the method based on a neural network is also used for grasping quality detection. Hogan et al. 108 presented a novel regrasp control policy combine with deep convolutional neural network (CNN) to plan local grasp adjustments of robotic hands. Krug et al. 109 used a wrench-based classifier and tactile feedback to evaluate grasp success without the need for training data. Garcia-Garcia et al. 84 presented a graph convolutional network to predict grasp stability. These findings indicate that the stability of grasping can be solved using tactile information, and the auxiliary effect of visual information will be improved.
Visual information, as well as tactile perception, is useful in grasping quality measures. Currently, some researchers attempted to combine visual and tactile information to detect grasp quality. Guo et al. 110 presented a grasp detection deep network to find the best grasp configuration from an image and then assessed grasp stability by a metric derived from tactile sensing. The grasp performance was verified by using real robots and the environment. In Calandra et al., 111 authors used deep, multimodal convolutional networks to predict the grasp quality of raw tactile-visual information. Then, this network can continuously optimize grasp action until finding the best grasp action. Liang et al. 112 proposed a PointNetGPD to detect grasp configurations from point sets and evaluate grasp quality. Compared with CNN, their PointNetGPD has fewer network parameters. By using a deep network method, robotics can find an efficient grasp action without modeling the contact force and sensor calibration. This improves the robustness of the whole robotic system.
By combining tactile perception with the control system, the robotic hand gradually acquired a certain dexterity. Tasks of slip detection and grasp quality measure give the robotic hand ability to grasp steadily. Tactile servoing provides robotic hands the ability to explore objects safely and manipulate objects dexterously. Control-level perception enables the robotic hand to operate autonomously.
Learning-level perception
The goal of robot learning is to acquire a behavior, such as dexterous manipulation, which is usually represented by skill policies. This section discusses how tactile information can be used for robot skill learning and how tactile information is represented in skill policies. Two types of robot learning algorithms, such as imitation learning and reinforcement learning (RL), are considered. In Figure 8, we give two examples of learning-level perception compared with traditional methods, learning-based methods have the ability of decision-making to manipulate complex objects and improve the generalization of robotic manipulation. 113

Imitation learning with tactile information
Once a robot starts to perform a task in a new environment, an effective method is to enlighten robot behaviors with human behavior strategies. In this way, tactile information is required to assist the robot to understand the behavior and the new environment.
Imitation learning methods use action from human demonstrations as a revelation to control a robot. This method first records the human demonstrated movement. Then, an optimized trajectory of the robot is obtained through the human action demonstration and the motion coding regression. In this process, tactile feedback plays an important role and it is closely related to the human demonstration. Besides imitation learning, robotic hands may also correct their movements according to tactile feedback. Chebotar et al. 90 made the robotic hand learn to perform a complex task in an altered environment with tactile feedback to the control loop. Argall et al. 114 presented a tactile policy correction algorithm to improve the performance of a simple demonstration policy. Tian et al. 116 proposed a touch-based control method that can enable robots to use tactile perception effectively to perform repositioning objects tasks.
As things stand, most of the tasks that can be performed by imitation learning are relatively simple. In the context of dexterous manipulation, the corresponding manipulation algorithms of robotic hands will also become more complex. In addition, many tasks are based on the simulation environment for experiments. More experiments in the real world are needed to verify the robotic dexterous manipulation ability and the performance of the imitation learning algorithm.
RL with tactile information
RL can learn skill policies directly from the interaction between robots and environments. Learning skills from tactile data is a core research direction in the robotic domain. Robots should be able to learn useful information and policy from touch or grasp experiments. For robots, tactile perception can be used to learn manipulation skills, such as in-hand manipulation 117 and grasp quality measurement. 115,118,119 A key feature of RL is that it allows a robot to learn from experience by trial and error without a kinetic or dynamic model of the robotic hand. This would help to improve robotic hand manipulation accuracy and reduce the modeling difficulty of the robotic system. Pape et al. 120 presented a curious exploration RL framework that makes robotics can learn multiple operation skills from tactile perception. The learning method showed better performance in an unstructured environment than traditional engineered solutions. Moreover, the learning method can be allowed to train in a real environment. Church et al. 121 built the platform to train a deep reinforcement learning (DRL) agent to type on a braille keyboard through a high-resolution tactile image. It laid a foundation for the practical application of the robotic hand with tactile perception and DRL methods. In addition, for the robot, tactile and action feedback combined with RL can complete delicate tasks in the absence of vision. 122 Li et al. presented a hierarchical approach that relies on traditional, model-based controllers on the low-level, and learned policies on the mid-level to improve robotic in-hand manipulation ability. 123
The main challenge for RL with tactile information is that no object or environment model can be directly used for a manipulated task. To solve this problem, many learning methods are presented. Compared to the traditional hand-coded tactile feedback policy, the learning policy shows better performance. Meanwhile, learning policy methods enhance the generalization of the robot, and the manipulator can also accomplish the given task better for a new object. The RL method allows the robot to learn new object information and make corresponding action feedback based on it. 124
Compared with control-level perception, learning-level perception enables the robotic hands to have higher intelligence. The robotic hand can control its behavior through the information it interacts with the environment. The decision-making of robotic hands can be improved by combining with learning-level perception. 125 However, for the robotic hand, adopting the learning approach usually requires a large number of samples, which is a time-consuming and expensive process. At the same time, compared with human learning efficiency, robot learning efficiency still has great room for improvement. 126
Discussion and challenges of perception in dexterous manipulation
We have surveyed the state-of-the-art research on sensory perception for dexterous robotic manipulation. These works are grouped and commented based on their characteristics. Despite the great progress that has been achieved in sensory perception, the challenge of perception, especially for dexterous manipulation, is still far from being resolved. In this section, we provide a discussion to summarize some challenges that need to be addressed further.
Safety during tactile exploration
To facilitate dexterous manipulation, robots need to move their hands to recognize the characteristics of the objects being operated. Different exploratory operations, such as pressing, contour following, and sliding, are used for recognizing object properties and reconstructing object shapes. It is necessary to ensure the safety of the robotic hand and its environment during the physical interaction between robotic hands and objects, and tactile exploration. On the one hand, tactile signals should be collected and transmitted to the controller in time. For humans, they can detect vibrations up to 700 Hz, 127 which means the skin can feedback tactile perception is about 1.43 ms. 4 Such a sensitive tactile response allows humans to avoid getting an injury in tactile exploration (e.g. when touching an unknown object, humans will quickly move away from it if they feel overheated). Similarly, the fast response of the tactile sensor is essential for robotic hands to give feedback to the environment in time and avoid unnecessary damage. On the other hand, because the object model is unknown, the robot may violate some critical constraints. If accidental damage occurs during exploration, a certain capacity for self-healing and stretchability may ensure the safety of the system. 128,129 Both of these abilities will guarantee the safety of the robot during the process of exploration.
Integrating multiple sensory modalities and acquiring rich information
The implementation of dexterous manipulation requires multiple sensory modalities. Integrating multiple sensor modalities can address the limitations of a single sensor modality and acquire richer information about the environment. For example, in a low-light scenario, the tactile sensor can be greatly superior to visual cameras. In recent years, several multimodal sensors have been proposed for different applications, such as assisted living, surgical operation, and telepresence. In robot applications, visual and tactile are often combined to perceive the environment. However, tactile perception is just a support for visual perception due to the low resolution of tactile sensors. To effective visual and tactile modalities, some optical-based tactile sensors, like the Gelsight, have been developed, which utilize the image obtained by a tactile sensor that can provide multimodal, high-resolution information. Different sensory modalities have different dimensions, frequencies, and characteristics. Therefore, fusion strategies should further be explored to combine multiple modalities for effectively dexterous manipulation. The feature-level correlation among different sensory modalities should be considered more effective.
Understanding perception for high-level semantic information
How to extract high-level semantic information, such as task requirement and human preference, from low-level sensory data? To perform dexterous manipulation, a robot first understands what the object being manipulated is and what kind of operation is required, i.e. task requirement. Semantic information can provide a representation of the environment with high-level abstraction. One important issue to address is how to represent high-level semantic information. One possible way is to define a hierarchy to represent semantic information. The spatial hierarchy associate high-level semantic information with sensory information, such as images or tactile. In addition, the improvement of the robot’s cognitive ability can be enhanced with the development of natural language processing, human–robotic interaction, and other technologies.
Novel tactile sensors for dexterous manipulation
Tactile sensors have been integrated into robotic hands to detect the interaction force and location. 130 However, there is still some limitation to be solved in tactile sensors. This first issue is how to improve the flexibility of tactile sensors. Because the surface of the hand is usually irregular, compared with the rigid sensor, the flexible sensor is easier to integrate with the surface of the hand. 131 The flexible sensor can be placed on the whole surface of the hand rather than the fingertip, resulting in more rich contact information. Other performance of tactile sensors should also be further improved, such as the abilities of self-healing and self-power. Tactile sensors with self-healing ability can improve their adaptation to the unstructured environment. In addition, self-power can realize the low power consumption and long-term use of a manipulator system.
Cross-modal algorithm that transfer knowledge across different sensory modalities
Humans do not detect objects from a single sensory modality but from multiple modalities. Tactile information, taste information, visual information, and sound information can all help humans to understand objects. 101,117 Generally, visual data provide the geometric properties of objects and tactile data can provide the physical properties of an object, such as weight or hardness. These two sensory modalities are complementary and concurrent. Falco et al. 132 proposed a transfer learning approach to cross-modal object recognition. They learn a visuotactile common representation that can be transferring the features across the domains. The methods of establishing unified feature representation and association relation for different modal information still need to be further explored. For example, how to explore novel visuotactile representation to transfer the acquired knowledge from a huge amount of visual data to a small amount of tactile data.
Conclusion
This article provides a comprehensive survey of sensory perception with a focus on its application in dexterous robotic manipulation. First, two main types of sensors (intrinsic sensor and extrinsic sensor) that can be used in robotic hands are introduced. The application of robotic sensors in two categories is described in detail. Then, the application of robotic tactile sensors in robot operation is discussed emphatically. Based on these tactile sensors, perception methods that process raw sensory data for dexterous manipulations are further explained. Three different perception levels, such as planning-level, control-level, and learning-level perception, can cover the whole anthropomorphic process of a robotic hand, from exploration to feedback control, and finally to learning. With the application of new materials and new methods, the tactile sensors are bound to become the necessary medium for the robotic hand to interact with the environment.
Although substantial progress in tactile sensors in robotic hand manipulators has been made, there is still plenty of room for improvement. In the future, there are several promising research topics to be further investigated. From the perspective of hardware structure, high integration, multifunction, and wide coverage of sensors will continue to be one of the main challenges in the future. In terms of software, a better representation algorithm to process and interpret tactile information for safe and dexterous manipulation will be another challenge for the robotic field. In addition, how to combine the tactile data process method and autonomous control algorithms to perform tactile-based dexterous manipulation is also an important development direction. The combination of these two methods will improve the intelligence and dexterity of the robot’s hands.
Footnotes
Author contribution
ZX and ZD contributed equally to this work.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Key Research Program of China with 2019YFB1309800 and the National Natural Science Foundation of China (Grant No. 62173197).
