Sage Journals: Discover world-class research

Abstract

Vision-based grasping plays an important role in the robot providing better services. It is still challenging under disturbed scenes, where the target object cannot be directly grasped constrained by the interferences from other objects. In this article, a robotic grasping approach with firstly moving the interference objects is proposed based on elliptical cone-based potential fields. Single-shot multibox detector (SSD) is adopted to detect objects, and considering the scene complexity, Euclidean cluster is also employed to obtain the objects without being trained by SSD. And then, we acquire the vertical projection of the point cloud for each object. Considering that different objects have different shapes with respective orientation, the vertical projection is executed along its major axis acquired by the principal component analysis. On this basis, the minimum projected envelope rectangle of each object is obtained. To construct continuous potential field functions, an elliptical-based functional representation is introduced due to the better matching degree of the ellipse with the envelope rectangle among continuous closed convex curves. Guided by the design principles, including continuity, same-eccentricity equivalence, and monotonicity, the potential fields based on elliptical cone are designed. The current interference object to be grasped generates an attractive field, whereas other objects correspond to repulsive ones, and their resultant field is used to solve the best placement of the current interference object. The effectiveness of the proposed approach is verified by experiments.

Keywords

Robotic grasping elliptical cone potential field disturbed scene

Introduction

With the rapid development of artificial intelligence, robots are more expected in our daily lives. They are required to operate in complex environments with different assignments.^1
–3 To accomplish the given task, the robot has to interact with the environment, where autonomous grasping is an important aspect. With this capability, the robot can provide better services.

In robotic grasping, the grasp detection can be determined directly on the point clouds.^4,5 Zapata-Impata et al. presented a method to find the best pair of grasping points given a three-dimensional point cloud for an unknown object, where a set of geometric rules is employed to explore the cloud.⁴ Considering the raw incomplete 3D point cloud, Gori et al. first reconstruct the object in 3D and then obtain candidate triplets using discrete particle swarm optimization for three-finger manipulation. Finally, the best grasp triplet is selected.⁵ Suzuki and Oka presented a method for untrained objects with a single depth image.⁶ The planar surface and the object are extracted by employing random sample consensus (RANSAC), and the robot grasps the object along the principal axis obtained by principal component analysis (PCA). Such solutions are suitable for untrained objects in some tasks, such as table clearing, and they cannot handle the specific object due to the indiscrimination of point clouds. To solve this problem, the object detection provides a preferable scheme.

For object detection, visual perception is commonly used. Traditionally, the robot can use the depth and appearance features to recognize an object.^7,8 The detection accuracy may be affected by illumination variations. With the development of deep learning, researchers have proposed abundant deep networks. The representative methods include two-stage faster regions with convolutional neural networks (Faster R-CNN),⁹ single-stage You Only Look Once (YOLO),¹⁰ and single-shot multibox detector (SSD).¹¹ SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which possesses competitive accuracy with a fast processing. On the basis of object detection, grasp detection is then executed. A grasp detection solution is the model-based method. Morales et al. determined the best grasp based on a global model database that contains computer-aided design object models and grasp set for each object.¹² However, the construction process is time consuming. Asif et al.¹³ presented a framework of hierarchical cascaded forests with CNN features to perform object recognition and grasp detection with RGB-D images. Chao et al. applied direct predictor and multimodal grasp predictor to obtain the best graspable region, where faster R-CNN and SSD are used to locate the grasp objects.¹⁴ Chu et al. proposed a deep learning architecture to predict grasps for robotic manipulation, and the learning problem is defined to be classified with null hypothesis competition instead of regression, the deep neural network can predict multiple grasp candidates of multiple objects.¹⁵ These methods mainly concentrate on the grasp detection of the target object and the influence of other objects on the target is seldom considered.

Actually, in everyday life environments, the scenes tend to be complex. Nevertheless, the robot should grasp the target object with the consideration of other objects. Kuehnle et al. proposed a collision-free method to grasp objects in the presence of obstacles, where each object is constructed by a 3D scale-invariant feature transform (SIFT) model.¹⁶ A problem is its computation burden of scene model updating. Berenson et al. presented a framework to find valid grasps in cluttered environments, which consider the robot kinematics, the local environment around the object as well as the grasp force-closure quality.¹⁷ Nagata et al. found a grasp point with grasp evaluation using geometric information about the target object with a 3D environment model around it by stereo vision. After the target object is assigned by the user, the user also determines its category from a list of the object models and selects the grasp mode from the corresponding list in sequence on the monitor.¹⁸ For these methods, they tend to no solution in disturbed scenes, where there exist interference objects, which severely affect the grasping and the target cannot be directly grasped. Some pioneering works have been proposed to handle the grasping under disturbed scenes.^19
–21

Dogar and Srinivasa presented a push-grasp planner, which can push the target object away from clutter.¹⁹ The object is pushed using a robot finger, and the capture region is introduced to enable the object roll into the robot hand. An assumption is that the pressure distribution is at the object’s periphery. Also, this method has a critical requirement of the contact point on the object for the finger, and an improper contact point shall lead to that the object deviates out of the hand. Different from the above push-grasp scheme, manipulating the interference object away from the target object provides another choice. Stilman et al. presented a resolve spatial constraint (RSC) algorithm for manipulation planning,²⁰ where the placement of an interference object is determined from feasible sampled placements in a probabilistic way. The larger the area of a region without objects is, the higher the chosen probability of sampled placements in the region is. The aforementioned placement scheme neglects the energy consumption of the manipulator and sometimes the interference object is placed at a farther position. The evaluation of placements still needs to be further improved. Zhao et al. proposed to move the interference object, whose best placement is determined based on artificial potential field.²¹ One drawback is that the object pose is not considered. The object is non-inclined and it is expressed by a projected envelope rectangle on the table plane, whose two neighboring sides are coincident with X_w axis and Y_w axis of the base coordinate system O_wX_wY_wZ_w of the robot. Actually, the object can be placed in an arbitrary orientation, and the adaptability of this method to the object with an arbitrary orientation is poor. How to better select the placement of the interference object is still a challenge.

To achieve the matching of projected envelope rectangle and object pose, a natural solution is to generate an inclined projected envelope rectangle along the object orientation. However, it is difficult to express this projected envelope rectangle in a form of equation, and thus, the corresponding artificial potential field has to be discretely represented. This discretization in the inclined state brings in a computation burden, especially for fine-grained fields. A nondiscretized functional representation becomes crucial. In this article, a circumscribed ellipse corresponding to the minimum projected envelope rectangle is designed due to its small occupation space among the continuous closed convex curves. Then, the design principles of the potential field are presented, and they include continuity, same-eccentricity equivalence, and monotonicity. Guided by these principles, the elliptical cone is introduced and the attractive and repulsive potential field functions are then constructed according to different object types: current interference object to be grasped and other objects. Finally, the resultant potential field is acquired to determine the best placement of the current interference object.

This article is organized as follows. The problem statement is first given. Then, an object grasping method based on elliptical cone potential field model is presented. The effectiveness of the proposed method is verified by experiments.

Problem statement

This article is motivated by the robotic grasping under disturbed scenes, where the target object cannot be grasped directly and we focus on the moving of interference objects by the manipulator. Figure 1 shows an illustration, where a robot is required to use its manipulator grasping the target object. O_wX_wY_wZ_w is the base coordinate system of the robot, where O_w is at the center of the robot base. We label O_cX_cY_cZ_c as the camera coordinate system with its origin O_c being the center of the camera. Besides, the pixel coordinate system is expressed by o-uv. Also, the joint angles of the robot manipulator are described as θ ₁, θ ₂,…, θ ₆.

Figure 1.

Illustration of the robot grasping the target object, where O_g is the target object, O ₁ is an untrained object, and O ₂ refers to the detectable object. W_m describes the workspace of the manipulator.

A point p(u_p , v_p ) in the o-uv system can be converted to ^cP( ^cx_P , ^cy_P , ^cz_P ) under O_cX_cY_cZ_c according to the camera’s intrinsic matrix T = [f_x , 0, u_c ; 0, f_y , v_c ; 0, 0, 1], and its coordinate ^wP( ^wx_P , ^wy_P , ^wz_P ) in O_wX_wY_wZ_w is then obtained as follows

[\begin{matrix} {}^{w}x_{P} \\ {}^{w}y_{P} \\ {}^{w}z_{P} \\ 1 \end{matrix}] = {}^{w}T_{c} [\begin{matrix} {}^{c}x_{P} \\ {}^{c}y_{P} \\ {}^{c}z_{P} \\ 1 \end{matrix}]

where ${}^{c}x_{P} = \frac{u_{p} - u_{c}}{f_{x}} {}^{c}z _{P}$ , ${}^{c}y _{P} = \frac{v_{p} - v_{c}}{f_{y}} {}^{c}z _{P}$ , and ${}^{w}T _{c}$ is the transformation matrix from O_cX_cY_cZ_c to O_wX_wY_wZ_w .

To accomplish the grasping of target object, the robot should possess the capability of object detection. In this article, SSD¹¹ is adopted and each detectable object is enclosed by a bounding box. In Figure 1, an apple and a cup are the detectable objects with their bounding boxes described in o-uv coordinate system. With the depth map provided by Kinect V2 camera, one can obtain the point cloud corresponding to each detectable object. In practice, the environments are complex and there inevitably exist some objects that are not trained by SSD. These untrained objects possibly disturb the grasping of detectable objects, and thus, their point cloud information is also required. The table plane where the objects are placed is firstly fitted by RANSAC,²² and then, the straight-pass filter is applied to obtain the point clouds of the objects. And then, a 2D coordinate system O_dX_dY_d on the table plane is defined and it is parallel to O_wX_wY_w , where the projection of O_d in O_wX_wY_w is O_w . To better extract the untrained objects, a prerequisite step is to remove the point clouds related to detectable objects. Then, Euclidean clustering²³ is utilized to acquire the point cloud of each untrained object. No matter detectable objects or untrained objects, with the point cloud of each object, PCA²⁴ shall be used to obtain the respective principal axis. Moreover, a minimum 3D bounding box $B_{}^{O_{j}}$ along the principal axis direction is acquired. Accordingly, the height of an object is computed.

For an object O_j , j = 1, 2,…, N_o , the vertex coordinates of its 3D bounding box are calculated and then vertically projected on W_m to form a closure region, where N_o is the number of objects, W_m refers to the manipulator’s workspace (see Figure 1) and it is a predefined zone in O_dX_dY_d . This closure region is bounded by a minimum envelope rectangle, which is expressed by R_s . The four vertexes of R_s are described as $P_{t}^{O_{j}} (t = 1, \dots, 4)$ . The center and orientation of R_s are labeled as $U_{o}^{O_{j}} (x_{o}^{O_{j}}, y_{o}^{O_{j}})$ and $θ^{O_{j}}$ . We denote the lengths of its long side, short side, and height with $l_{}^{O_{j}}$ , $w_{}^{O_{j}}$ , and $h_{}^{O_{j}}$ , respectively.

Algorithm 1 presents the information extraction process of the objects including all detectable objects and untrained objects, where N_um is the number of the detectable objects whose information is stored in Det_Obj. C_O _[i] refers to the point cloud of the detectable object Det_Obj[i] for i = 1,2,…, N_um or those of the untrained objects for i = N_um + 1,…, N_o . The function minbox ( ${}^{w}C _{O [j]}$ ) is used to acquire the minimum 3D bounding box $B_{}^{O_{j}}$ along the principal axis direction according to ${}^{w}C _{O [j]}$ , and VerticalProject ( $B_{}^{O_{j}}$ ) is to obtain the minimum envelope rectangle of the region, where $B_{}^{O_{j}}$ is vertically projected in O_dX_dY_d .

Algorithm 1.

The information extraction of detectable objects and untrained objects.

Object grasping based on potential field model with elliptical cone

To achieve the grasping under disturbed scenes, the robot should move the interference objects firstly with the consideration of other objects. Artificial potential field method is a general solution to express the interaction among the objects. In reality, different objects have different shapes with respective orientation. The design of the potential field should consider the object information including its orientation and height.

Elliptical potential field

For an inclined object, the construction of the potential field is not accurate by relying on an arbitrary projected envelope rectangle, and a minimum envelope rectangle corresponding to its major axis is more preferable. Instead of adopting the rectangle-based discrete potential field, in this article, we first obtain a circumscribed ellipse for this minimum rectangle, and then, an ellipse-based continuous potential field is proposed.

As shown in Figure 2, for the minimum envelope rectangle R_s of the object O_j , the minimum circumscribed ellipse E_l can be obtained, where F ₁ and F ₂ are the focal points of E_l , and the points $U_{o}^{O_{j}}$ , F ₁, and F ₂ are in the line l_a . Due to the fact that the object O_j may be in an arbitrary pose, it is needed to be transformed to the standard pose for calculating the parameters of E_l . The transformation is as follows

[\begin{array}{l} x_{P^{O_{j}}}^{'} \\ y_{P^{O_{j}}}^{'} \end{array}] = R [\begin{array}{l} x_{P^{O_{j}}} - x_{o}^{O_{j}} \\ y_{P^{O_{j}}} - y_{o}^{O_{j}} \end{array}]

where $(x_{P^{O_{j}}}, y_{P^{O_{j}}})$ is the coordinate of any point P in E_l , $(x_{P^{O_{j}}}^{'}, y_{P^{O_{j}}}^{'})$ is the coordinate of point P in the standard pose, and $R = [\begin{matrix} \vec{r_{1}} \\ \vec{r_{2}} \end{matrix}] = [\begin{matrix} cos θ^{O_{j}} & sin θ^{O_{j}} \\ - sin θ^{O_{j}} & cos θ^{O_{j}} \end{matrix}]$ is the transformation matrix.

Figure 2.

The schematic minimum circumscribed ellipse E_l .

According to the standard ellipse equation $\frac{{(x_{P^{O_{j}}}^{'})}^{2}}{a^{2}} + \frac{{(y_{P^{O_{j}}}^{'})}^{2}}{b^{2}} = 1$ , we have

\frac{{(\vec{r_{1}} \cdot \vec{c_{r}})}^{2}}{a^{2}} + \frac{{(\vec{r_{2}} \cdot \vec{c_{r}})}^{2}}{b^{2}} = 1

where $\vec{c_{r}} = [\begin{array}{l} x_{P^{O_{j}}} - x_{o}^{O_{j}} \\ y_{P^{O_{j}}} - y_{o}^{O_{j}} \end{array}]$ , the parameters a and b are semimajor axis and semiminor axis, respectively.

For the object O_j , combining with $P_{t}^{O_{j}} (t = 1, \dots, 4)$ , one can obtain the parameters of its minimum circumscribed ellipse

\begin{array}{l} e = \sqrt{\frac{{|l_{}^{O_{j}}|}^{2} - {|w_{}^{O_{j}}|}^{2}}{{|l_{}^{O_{j}}|}^{2}}} \\ a = \sqrt{\frac{{(\vec{r_{1}} \cdot \vec{c_{r 1}})}^{2} (1 - e^{2}) + {(\vec{r_{2}} \cdot \vec{c_{r 1}})}^{2}}{1 - e^{2}}} \\ c = a e \\ b = a \sqrt{1 - e^{2}} \end{array}

where $\vec{c_{r 1}} = [\begin{array}{l} x_{P_{1}^{O_{j}}} - x_{o}^{O_{j}} \\ y_{P_{1}^{O_{j}}} - y_{o}^{O_{j}} \end{array}]$ , c and e are the focal distance and eccentricity of E_l , respectively.

Substituting a and b in equations (4) into (3), the general elliptical equation of E_l is then acquired

C_{1}^{O_{j}} x_{P^{O_{j}}}^{2} + C_{2}^{O_{j}} x_{P^{O_{j}}} y_{P^{O_{j}}} + C_{3}^{O_{j}} y_{P^{O_{j}}}^{2} + C_{4}^{O_{j}} x_{P^{O_{j}}} + C_{5}^{O_{j}} y_{P^{O_{j}}} + C_{6}^{O_{j}} = 0

where $C_{1}^{O_{j}} = a^{2} {sin}^{2} θ^{O_{j}} + b^{2} {cos}^{2} θ^{O_{j}}$ , $C_{2}^{O_{j}} = 2 (b^{2} - a^{2}) sin θ^{O_{j}} cos θ^{O_{j}}$ , $C_{3}^{O_{j}} = a^{2} {cos}^{2} θ^{O_{j}} + b^{2} {sin}^{2} θ^{O_{j}}$ , $C_{4}^{O_{j}} = - 2 x_{o}^{O_{j}} C_{1}^{O_{j}} - y_{o}^{O_{j}} C_{2}^{O_{j}}$ , $C_{5}^{O_{j}} = - C_{2}^{O_{j}} x_{o}^{O_{j}} - 2 y_{o}^{O_{j}} C_{3}^{O_{j}}$ , and $C_{6}^{O_{j}} = C_{1}^{O_{j}} x_{o}^{O_{j}}^{2} + C_{2}^{O_{j}} x_{o}^{O_{j}} y_{o}^{O_{j}} + C_{3}^{O_{j}} y_{o}^{O_{j}}^{2} - a^{2} b^{2}$ .

After the minimum circumscribed ellipse E_l of the object O_j is obtained, the influence of O_j can be considered by concentrically expanding E_l with the same eccentricity in a discrete way (see E_l ₁ in Figure 2). However, this solution based on an incremental focal distance shall lead to a complicated calculation process. To solve this problem, a functional solution is required. Note that the potential field function is constant within the ellipse E_l . In the region outside E_l , the potential field function should conform to the following design principles:

The potential field function is continuous;

For any two positions on the plane O_dX_dY_d , if they locate at an ellipse, whose center and eccentricity are the same as E_l , the influences of O_j on them are equal, which is called same-eccentricity equivalence;

The variation trends influenced by the object O_j are different for the directions of major axis and minor axis of E_l ;

The function is monotonically increasing if we expect that the farther a position is away from the center of E_l , the larger the influence of O_j is. It is referred to as attractive potential field function;

The function is monotonically decreasing if we expect that the farther a position is away from the center of E_l , the smaller the influence of O_j is. In this case, it corresponds to repulsive potential field function.

Calculation of the potential fields

When the manipulator cannot grasp the target object, it has to select an interference object and move it away first. The selection criterion is based on the distance to the center of the target object and smaller values represent higher priorities. The chosen interference object is called current grasped object, and other objects are subdivided into two categories: the residual interference objects and the target, and noninterference objects. The object numbers of these two categories are labeled as N _r ₁ and N _r ₂, respectively. It shall be noted that the less the current grasped object is moved, the smaller the manipulator consumes. Therefore, the influence of the current grasped object is consistent with the fourth design principle. On the contrary, every other object generates a repulsion to current grasped object, and its influence coincides with the fifth design principle.

Meanwhile, for any object O_j with its minimum circumscribed ellipse E_l , the former three items of the design principles are the basis. To satisfy these requirements, we introduce an elliptical cone $E_{cone}^{O_{j}}$ , which will take effect for a point P^O outside E_l

\begin{array}{l} E_{cone}^{O_{j}} = \sqrt{E^{O_{j}}} (E^{O_{j}} > 0) \\ E^{O_{j}} = C_{1}^{O_{j}} x_{P^{O}}^{2} + C_{2}^{O_{j}} x_{P^{O}} y_{P^{O}} + C_{3}^{O_{j}} y_{P^{O}}^{2} \\ + C_{4}^{O_{j}} x_{P^{O}} + C_{5}^{O_{j}} y_{P^{O}} + C_{6}^{O_{j}} \end{array}

For the current grasped object O_ig , its elliptical potential field function F_at should conform to the fourth design principle, and it tends to be moved along the short edge direction of projected rectangle. The attractive field of O_ig is designed as follows

F_{a t} = \{\begin{matrix} \frac{(1 + e^{- d_{c g}^{4}}) W^{O_{i g}}}{1 + e^{- {(h^{O_{i g}})}^{2}}} exp (\frac{- 1}{E_{cone}^{O_{i g}} / (μ e_{f f}^{O_{i g}})}) & E^{O_{i g}} > 0 \\ C_{a t} & E^{O_{i g}} \leq 0 \end{matrix}

where d_cg refers to the distance between the object O_ig and its nearest object. μ is a given value, $e_{f f}^{O_{i g}} = 1 + exp (- \frac{l_{}^{O_{i g}} h^{O_{i g}}}{w_{}^{O_{i g}}})$ , and $W^{O_{i g}} = exp (\frac{- 1}{W_{cone}^{O_{i g}} / (α e_{f f}^{O_{i g}})})$ . $W_{cone}^{O_{i g}}$ is calculated according to equation (6) and its corresponding ellipse has the same eccentricity and center as the ellipse $E^{O_{i g}}$ , with k_vir times of semimajor axis, whose direction is perpendicular to that of $E^{O_{i g}}$ , where k_vir (k_vir < 1) and α are given values. F_at is set to a constant C_at when a point is within the ellipse $E^{O_{i g}} = 0$ . F_at is monotonically increasing in the region outside its minimum circumscribed ellipse.

Different from the object O_ig , every other object O_r generates a repulsive elliptical field $F_{rep}^{O_{r}}$ in accordance with the fifth design principle

F_{rep}^{O_{r}} = \{\begin{matrix} \frac{1 + e^{- {(d_{O_{r} O_{i g}})}^{4}}}{1 + e^{- {(h^{O_{r}})}^{2}}} exp (\frac{- E_{cone}^{O_{r}}}{σ e_{f f}^{O_{r}}}) & E^{O_{r}} > 0 \\ C_{rep} & E^{O_{r}} \leq 0 \end{matrix}

where $d_{O_{r} O_{i g}}$ is the distance between the objects O_r and O_ig , and σ is a given value. $F_{rep}^{O_{r}}$ is set to a constant C _rep for the points satisfying $E^{O_{r}} \leq 0$ . Note that $F_{rep}^{O_{r}}$ is monotonically decreasing in the region outside its minimum circumscribed ellipse.

Considering all the objects, we calculate the resultant potential field F _res as follows

F_{res} = F_{a t} + \sum_{r 1 = 1}^{N_{r 1}} \frac{1 / d_{O_{r 1} O_{i g}}^{2}}{w_{rep}^{r 1}} F_{rep}^{O_{r 1}} + \sum_{r 2 = 1}^{N_{r 2}} \frac{1 / d_{O_{r 2} O_{i g}}^{2}}{w_{rep}^{r 2}} F_{rep}^{O_{r 2}}

where $w_{rep}^{r 1} = \sum_{r 1 = 1}^{N_{r 1}} (1 / d_{O_{r 1} O_{i g}}^{2})$ and $w_{rep}^{r 2} = \sum_{r 2 = 1}^{N_{r 2}} (1 / d_{O_{r 2} O_{i g}}^{2})$ . The placement to minimize the resultant potential field is considered as the best one $p_{O_{i g}}^{*}$ , which is given by

p_{O_{i g}}^{*} = \underset{p \in W_{m}}{argmin} F_{res}

where p represents a position in the manipulator’s workspace W_m . If there exists more than one minimum solution, the robot shall choose the solution, which is closest to the object O_ig . Notice that the point $p_{O_{i g}}^{*}$ is respect to O_dX_dY_d .

The flowchart of the proposed approach is shown in Figure 3. The image information and depth information of the scene are firstly provided by Kinect2. The robot recognizes the detectable objects and others are referred to untrained objects. Combining with the depth information, the point cloud of each detectable or untrained object is then obtained by Euclidean clustering. According to PCA, we acquire minimum 3D bounding box of each object including size and pose information. By comparing the Euclidean distance D_is between the target object and every nontarget object with a given threshold D_t , nontarget objects are classified into the interference objects and noninterference objects. If the number N_fer of interference objects is zero, it means that the robot may grasp the target object directly, otherwise, the robot has to move the interference objects according to the placement position solution based on elliptical cone potential fields. Note that the interference object with smaller D_is has a higher priority. Repeat the above processing until the robot seizes the target object.

Figure 3.

The flowchart of the proposed approach.

Experiments

In the experiments, Kinect V2 is used to obtain the scene information, and a service robot with six degrees of freedom Kinova manipulator is required to grasp the target object. SSD is adopted to detect objects with 2D red bounding boxes, and then, we acquire the point clouds of untrained objects. The experiments concern the following objects: target apples, cup, beverage and box, where box and beverage belong to untrained objects.

Experiment 1 considers the box and the beverage as noninterference objects. Figure 4 shows the detection result of experiment 1. Figure 5 describes the potential fields. The potential field of each object is shown in Figure 5(a) to (d), respectively, and one can obtain the resultant field, as shown in Figure 5(e). In Figure 5(b), the beverage is closed to the border of the robot workspace, and thus, the potential field seems to be cut. The video snapshots of the manipulator grasping the target apple are shown in Figure 6. It is seen that the manipulator firstly moves the cup away and places it to a new position (−0.24, −0.49) determined by equation (10), and then, the apple is grasped smoothly.

Figure 4.

The detection result of experiment 1.

Figure 5.

The potential fields of experiment 1: (a) The repulsive field of the apple, (b) the repulsive field of the beverage, (c) the repulsive field of the box, (d) the attractive field of the cup, and (e) the resultant field.

Figure 6.

(a–f) The video snapshots of experiment 1.

In experiment 2, the target object apple is located at the right-bottom corner of the scene. The beverage is regarded as a noninterference object and the box is seen as an interference object. Figure 7 shows the detection result of experiment 2. All the potential fields are shown in Figure 8. By combining the potential fields of all objects shown in Figure 8(a) to (c), one can obtain the resultant field, as shown in Figure 8(d). The video snapshots of the manipulator grasping are shown in Figure 9. It is seen that the manipulator firstly moves the box to a new position (−0.12, −0.46), and the apple is grasped smoothly.

Figure 7.

The detection result of experiment 2.

Figure 8.

The potential fields of experiment 2: (a) The repulsive field of the apple, (b) the repulsive field of the beverage, (c) the attractive field of the box, and (d) the resultant field.

Figure 9.

(a–f) The video snapshots of experiment 2.

Experiment 3 considers two interference objects. The detection result is shown in Figure 10. Figures 11 and 12 demonstrate the potential fields and video snapshots of experiment 3, respectively. For better description of the process, it is divided into two stages: stage I and stage II. The former is to move away the interference cup, and in the latter, the interference box is moved. The potential fields of the box, apple, and cup in stage I are shown in Figure 11(a) to (c), and at this moment, the current grasped object cup corresponds to the attractive one. Based on the best placement result obtained by the resultant field shown in Figure 11(d), the cup is moved to its new placement position and the moving process is shown in Figure 12(a) to (d). After the cup is released, the robot continues to execute stage II. In this stage, the box becomes the current grasped object, and thus, it generates an attractive field, whereas the cup corresponds to a repulsive one. Note that for the target apple, it always generates the repulsive field. Combining the fields shown in Figure 11(b), (e), and (f), one can obtain the resultant field (see Figure 11(g)). On this basis, the robot grasps the box and moves it to its new placement position, and the moving process is shown in Figure 12(e) to (g). Finally, the apple is grasped, as shown in Figure 12(h) to (i).

Figure 10.

The detection result of experiment 3.

Figure 11.

The potential fields of experiment 3 with two-stage interference objects moving. Stages I and II correspond to panels (a)–(d) and (e)–(g), respectively. (a) The repulsive field of the box, (b) the repulsive field of the apple, (c) the attractive field of the cup, (d) the resultant field of stage I, (e) the repulsive field of the cup, (f) the attractive field of the box, and (g) the resultant field of stage II.

Figure 12.

(a–i) The video snapshots of experiment 3.

Conclusion

In this article, a robotic grasping approach with elliptical cone-based potential field is proposed to handle the challenge from disturbed scenes. Based on the extraction results of SSD and Euclidean cluster for the detectable and untrained objects, the robot acquires the attractive or repulsive potential field of each object and determines the placements of interference objects. Compared to conventional circumscribed circle envelope, the circumscribed ellipse envelope in this article is better due to its small occupation space among the continuous closed convex curves. The form of ellipse can reflect the poses of different objects with better fitting degrees. The resultant continuous elliptical cone fields can improve the placement position of interference object. The experimental results verify the effectiveness of the proposed approach. In the near future, we shall conduct more deep research on the moving sequence of interference objects as well as the grasping in a larger environment that also relies on the navigation ability of the robot.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China under grant nos 62073322, 61633020, 61633017, 61836015, and in part by the Open Foundation of the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences under grant no. 20190106.

ORCID iDs

Wenjie Geng

Junzhi Yu

References

Dede

Maaroof

Tatlicioglu

. A new objective function for obstacle avoidance by redundant service robot arms. Int J Adv Robot Syst 2016; 13: 48.

Yuan

, et al. Research of the localization of restaurant service robot. Int J Adv Robot Syst 2010; 7(3): 227–238.

Yang

Lee

. A service-oriented framework for the development of home robots. Int J Adv Robot Syst 2013; 10: 122.

Zapata-Impata

Gil

Pomares

, et al. Fast geometry-based computation of grasping points on three-dimensional point clouds. Int J Adv Robot Syst 2019; 16: 1–18.

Gori

Pattacini

Tikhanoff

, et al. Three-finger precision grasp on incomplete 3D point clouds. In: IEEE international conference on robotics and automation, Hong Kong, 31 May–7 June 2014, pp. 5366–5373.

Suzuki

Oka

. Grasping of unknown objects on a planar surface using a single depth image. In: IEEE international conference on advanced intelligent mechatronics, Banff, Canada, 12–15 July 2016, pp. 572–577.

Ramisa

Alenyà

Moreno-Noguer

, et al. Using depth and appearance features for informed robot grasping of highly wrinkled clothes. In: IEEE international conference on robotics and automation, St Paul, MN, USA, 14–18 May 2012, pp.1703–1708.

Gopalan

Dariush

Toward a vision based hand gesture interface for robotic grasping. In: IEEE/RSJ international conference on intelligent robots and systems, St Louis, MO, USA, 10–15 October 2009, pp. 1452–1459.

Ren

Girshick

, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans on Pattern Anal and Machine Int 2017; 39(6): 1137–1149.

10.

Redmon

Divvala

Girshick

, et al. You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788.

11.

Liu

Anguelov

Erhan

, et al. SSD: single shot multibox detector. In: European conference on computer vision, Amsterdam, The Netherlands, 11–14 October 2016, pp. 21–37.

12.

Morales

Asfour

Azad

, et al. Integrated grasp planning and visual object localization for a humanoid robot with five-fingered hands. In: IEEE/RSJ international conference on intelligent robots and systems, Beijing, China, 9–15 October 2006, pp. 5663–5668.

13.

Asif

Bennamoun

Sohel

. RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans Robot 2017; 17(1): 547–564.

14.

Chao

Chen

Xiao

. Deep learning-based grasp-detection method for a five-fingered industrial robot hand. IET Comput Vis 2019; 13(1): 61–70.

15.

Chu

Vela

. Real-world multiobject, multigrasp detection. IEEE Robot Autom Lett 2018; 3(4): 3355–3362.

16.

Kuehnle

Verl

Xue

, et al. 6D object localization and obstacle detection for collision-free manipulation with a mobile service robot. In: International conference on advanced robotics, Munich, Germany, 22–26 June 2009, pp. 1–6.

17.

Berenson

Diankov

Nishiwaki

, et al. Grasp planning in complex scenes. In: IEEE-RAS international conference on humanoid robots, Pittsburgh, PA, USA, 29 November–1 December 2007, pp. 42–48.

18.

Nagata

Miyasaka

Nenchev

, et al. Picking up an indicated object in a complex environment. In: IEEE/RSJ international conference on intelligent robots and systems, Taipei, 18–22 October 2010, pp. 2109–2116.

19.

Dogar

Srinivasa

SS.

Push-grasping with dexterous hands: mechanics and a method. In: IEEE/RSJ international conference on intelligent robots and systems, Taipei, 18–22 October 2010, pp. 2123–2130.

20.

Stilman

Schamburek

Kuffner

, et al. Manipulation planning among movable obstacles. In: IEEE international conference on robotics and automation, Roma, Italy, 10–14 April 2007, pp. 3327–3332.

21.

Zhao

Cao

Jia

, et al. A vision-based robotic grasping approach under the disturbance of obstacles. In: IEEE international conference on mechatronics and automation, Changchun, China, 5–8 August 2018, pp. 2175–2179.

22.

Shi

Dai

SIFT feature point matching based on improved RANSAC algorithm. In: 5th international conference on intelligent human-machine systems and cybernetics, Hangzhou, China, 26–27 August 2013, pp. 474–477.

23.

Carvalho

Pacifico

LDS.

A weighted partitioning dynamic clustering algorithm for quantitative feature data based on adaptive Euclidean distances. In: 8th international conference on hybrid intelligent systems, Barcelona, Spain, 10–12 September 2008, 2008, pp. 398–403.

24.

Ebied

. Feature extraction using PCA and kernel-PCA for face recognition. In: 8th international conference on informatics and systems, Cairo, Egypt, 14–16 May 2012, pp. MM-72–MM-77.