Abstract
Grasping objects in clutter is more difficult than grasping a separated single object. An important issue is that unsafe grasps may occur, in case, one object sits or leans on another, which could cause the collapse of objects. In addition, reachability of each object surrounded by other obstacles also has to be considered. So the order of multiple objects for grasping and the grasp configuration of each object must be planned simultaneously. This article combines grasp order and grasp configuration planning to perform fast and safe multiobject grasping in cluttered scenes. First, a comprehensive grasp configuration database is built to provide enough feasible grasp configurations for the objects. Then, we propose an obstruction degree to estimate the likelihood of reachability of each grasp configuration as well as each object. This measurement also implicitly infers object interactions. Finally, grasp order and grasp configurations are planned together to deal with the constraints caused by reachability and object interaction. Simulations and experiments in a series of cluttered scenes demonstrate that our method can grasp objects efficiently and can greatly reduce unsafe grasps.
Introduction
Multiobject grasp planning in cluttered scenes has been regarded as a key problem in robot manipulation field. Grasping multiple objects is constrained by reachability and object interaction. On the one hand, there are various grasp configurations for different objects, and each object is surrounded by other objects, so one reachable grasp configuration must be chosen for each object. On the other hand, objects will collapse if an object that is supporting other objects is removed first, and objects may be damaged and may roll to unreachable places. Therefore, an appropriate grasp order and reachable grasp configurations for all objects are required to achieve safe and fast grasping in cluttered scenes.
Many researchers have solved grasping known objects by dividing the whole process into two steps: offline grasp generation and online grasp planning. Shape primitive 1 –3 is widely used in grasp generation. It approximates the target object to one or more shape primitives, which include an appropriate set of grasp configurations. Region masking 4 –6 is also a popular methodology. It can quickly locate the region, which has a high probability of containing high-quality grasp configurations. Tsuji et al. 7 combine the above two methods, select the constricted region of the object as the grasp interest region, and fit the local model of the object near the grasp interest region to the grasp primitives. Li et al. 8 wrap ropes around the object to find possible grasp regions and then computes the contacts of a multifingered hand with object surfaces around these regions. Wan et al. 9 apply superimposed segmentation to the object mesh model and use the uniform facets to locate contacts and generate grasp poses for grippers and suction cups. Grasp generation only provides possible grasp configurations but does not decide the final grasp configuration to grasp an object in a cluttered scene.
Online grasp planning is usually used to decide how to grasp each object in clutter. One type is planning on action level, which finds the best way to grasp one object in the current scene. Most studies usually choose some metric as objective function to optimize grasp configurations in the scene. Berenson et al. 10 propose a grasp objective function that takes into account the kinematics of the robot, the environment around the object, and the grasp force-closure quality. Some studies are interested in planning a path that approaches the target while pushes away other obstacles 11 –13 or pushing the target itself to make it graspable. 14,15 The other type is planning on task level, which finds an order of objects to grasp in the current scene. Stilman et al. 16 and Dogar and Srinivasa 17 consider occlusion between objects and plan how to move the objects to get a target object. But they cannot deal with complex object interactions like stacking or slanting. Some studies work on recognizing complex object interactions, 18 –22 which is used to decide a safe grasping order. Due to neglect of concrete grasp configuration planning, the actual feasibility of the plan cannot be guaranteed.
In addition, a lot of research also solves grasping unknown objects, which usually generates grasp configurations online. Lippiello et al. 23 reconstruct object surface through images taken by a camera and move the fingertips on the surface to find optimal grasp. Lei and Wisse 24 extract object’s concave hull contour from the point cloud and calculate suitable grasping by maximizing the coefficient of force balance. Lin et al. 25 transfer example grasps taught by human demonstration to similar objects. Now deep learning 26 –28 is becoming the most prevailing approach to solve grasping problems. However, these methods usually consider scenes where objects are sparsely placed or object collapse is not detrimental.
In summary, most of the current research on multiobject grasp planning separates task level and action level or only involves one of the two levels. We aim at combining the two planning levels. Concretely, we focus on a task that a robotic arm with a two-finger parallel gripper is commanded to grasp all objects on a table. The order of objects to grasp and reachable grasp configuration for each object are planned simultaneously, for the purpose that the risk of object collapse is reduced and reachable grasp configurations are found quickly. We hold several assumptions: Object three-dimensional (3D) models are known. Identifications (IDs) and poses of visible objects can be recognized. No object belongs to containers, that is, an object is never in another object.
Grasp planning based on obstruction degree (OD) is proposed to realize safe grasp order and fast grasp configuration search. Differently from previous works, we do not infer object interactions explicitly using complex algorithms. Our method is very simple and easy for implementation and has high efficiency in planning.
The structure of this article is as follows: The second section gives a glimpse of the overall planning framework; the third section describes the method used to generate grasp configurations offline; the fourth section proposes OD computation and online planning algorithms; the fifth section evaluates our planning framework by simulations and experiments; and the sixth section gives a conclusion and discusses the future work.
Grasp planning framework
The whole planning framework is shown in Figure 1. Grasp generation only needs to be computed once offline and provides a grasp database with widely distributed grasp center points and grasp approach directions. Grasp planning is performed online to decide grasp order and grasp configurations for all objects based on ODs.

Proposed grasp planning framework.
Offline grasp generation
Region masking extracts the object mean curvature skeleton
Online grasp planning
Grasp candidates selection selects possible grasp candidates
Grasp generation
Grasp parameterization
A grasp is commonly parameterized by gripper-specific information and gripper pose. Following parameters define a grasp configuration P for a two-finger parallel gripper:
Pr
: gripper roll angle: the roll angle of gripper around the approach direction.
Ps
: opening degree: the distance between the two parallel fingers before grasping.
Region masking
We assume that 3D models of the objects are available. To reduce computation effort of grasp generation and online planning, the models can be simplified, for example, as unions of primitive shapes. 1 –3
According to human grasp experience, the surrounding areas of skeleton vertices are usually used as preferred areas for humans to grasp objects.
29
We choose the mean curvature skeleton
30
as the interested region, like Vahrenkamp et al.
6
have done. A resulting skeleton is a graph
Hypotheses generation
Grasp parameters are sampled according to Algorithm 1. First, a spherical coordinate frame is established at an interested vertex (line 2), and the

Spherical coordinate frame.
Search in Spherical Grid.
A grasp hypothesis for a two-finger parallel gripper can also be seen as
where
Grasp stability analysis
Inspired by force balance,
24
we propose a metric to evaluate grasp stability for a two-finger parallel gripper. As shown in Figure 3, we establish a set of line segments connecting corresponding points between the two fingers in the region, which may contact the object. These line segments are perpendicular to

Grasp stability analysis. A set of line segments connect corresponding points between the two fingers. They are parallel with
Then the gripper is put at one grasp configuration, all intersection points of the line segments and the object surface are collected. The intersection points closest to finger 1 and finger 2 are labeled by
For an ideal grasp configuration,
where operator
A threshold Mt
is set for grasp stability filter. All grasp hypotheses in H are evaluated by equation (2), and those whose
Grasp planning
Grasp candidates selection
Grasp candidates
where

Grasp candidate selection.
For each grasp candidate P, its relative pose score is defined as follows
where
Obstruction analysis
We utilize the distance map
10
to identify obstruction relationships between grasp candidates and objects. We build a set of rays
where

Then the OD of P by O is defined as follows
where
Suppose
where
Grasp order planning
Suppose
It actually minimizes the sum of the upper right part of the matrix
This problem is similar to the traveling salesman problem (TSP), and any planning method, which can solve the TSP, can also solve our problem. In our implementation, branch and bound method 31 is utilized as shown in Algorithm 2.
GraspOrderPlanning.
The branch and bound method incrementally builds a tree to search for an optimal plan. A node q on the tree contains a grasp order
At the beginning, an upper bound is initialized by a greed method, which chooses the next object to be grasped with the least scene OD. Then a set Q is created to save all expandable leaf nodes on the tree, it initially contains a single root node q
0, which includes no object. At each loop, the leaf node
Actually, the planning is driven by inequality of ODs between two objects:

Basic object interactions. (a) Occlusion, (b) stacking, and (c) slanting.
Grasp configuration planning
After planning the grasp order, we should decide which grasp candidate is used to grasp each object. Suppose
where
For object xi
, its final grasp configuration is computed by Algorithm 3. A motion planner is required to compute collision-free trajectories of the robot. If there is at least one trajectory that arrives at some grasp configuration, then this grasp configuration is reachable. First, yi
is given a default P
0, which cannot be executed, and
GraspConfigPlanning.
The complete
Obstruction Degree based Planning.
Complexity analysis
Suppose one object has at most m grasp configurations, that is,
Grasp order planning cannot promise to run in polynomial time, as in the worst-case
In grasp configuration planning, at most
Simulation and experiment
Setup
We built an experiment platform as shown in Figure 7(a). The robot is an ABB IRB 120 manipulator with a Robotiq 2F-140 gripper. A Microsoft Kinect is mounted on the manipulator. A small table with the size of 40 cm

(a) Experiment and (b) simulation platforms.
Six objects and their IDs are shown in Figure 8. We measured the sizes of these objects and directly modeled them as cylinders or boxes. Approximate models are used not only for grasp generation, obstruction analysis, and collision checking but also for grasping in simulations.

Objects and IDs. 3D models are shown under the real objects. (a) object 1, (b) object 2, (c) object 3, (d) object 4, (e) object 5, and (f) object 6. ID: identification; 3D: three-dimensional.
The parameters for grasp generation are
We would like to compare our planning method based on OD with a baseline method based on grasp ranking (GR), which ranks all grasp candidates by some metric and selects the best one.
33
Here, we use the relative pose score as the metric and Algorithm 5 shows the details. Grasp candidate selection is the same as our method, then
Grasp Ranking based Planning.
To evaluate performance, the following data are recorded during simulations and experiments:
Number of picked objects: the number of objects that are successfully grasped.
Number of fallen objects: the number of objects that fall from their original poses.
Motion time: the time that is spent on robot motion.
Obstruction analysis time: the time that is spent on obstruction analysis.
Grasp order planning time: the time that is spent on grasp order planning.
Grasp configuration planning time: the time that is spent on grasp configuration planning.
Number of motion planning trials: the number of grasp candidates that are tested by motion planning.
Simulation
We generated four sets of scenes for 3, 4, 5, and 6 objects, respectively, and each set has 50 random scenes. To generate a scene, a first certain number of objects are randomly selected, then they are dropped one by one from random positions over the table. Figure 9 shows some examples of the scenes. In the simulation, the robot can completely get IDs and poses of all objects on the table. Each scene was tested once under methods OD and GR, respectively, and the results are presented in Table 1. The source code for simulation is available at https://github.com/Kazfyx/grasp-planning.

Examples of simulation scenes.
Simulation results.
Source: Each datum is the sum of completing all tasks in the 50 scenes.
OD: obstruction degree; GR: grasp ranking.
Unsuccessful grasping may be due to two reasons: (1) the object is initially at an unreachable pose, that is, all of its grasp candidates are not reachable, and (2) the object falls and rolls to an unreachable pose. We can see that more objects were grasped under method OD since less objects fell down. Method GR caused more fallen objects because it is originally designed for grasping a single object and no appropriate order of grasping is planned. As the number of objects increased, more objects fell because complex object interactions occurred more often.
Planning time (including time of obstruction analysis, grasp order planning, and grasp configuration planning) increased as the number of objects increased. Obstruction analysis time conformed to the time complexity
Table 2 presents some average data computed from Table 1. The average planning time per grasp has no obvious difference between the two methods, in the condition that GR does not do obstruction analysis and grasp order planning. Less motion planning trials were taken in OD since obstruction analysis gives which object is more likely to be graspable and which grasp candidate is more likely to be reachable. Besides, the average motion time per grasp is nearly unchanged over all simulations. Because the grasp candidates that likely generate short paths are tested first, and object positions did not vary much on a not big table.
Average data per successful grasp.
OD: obstruction degree; GR: grasp ranking.
Infrequently, there were two fallen objects under method OD. In Figure 10(a), object 1 sat on object 6 while leaned against object 4. Slanting angle of object 1 was too little to filter grasp configurations toward object 4, so object 4 was grasped first as it had more unobstructed grasp candidates. In Figure 10(b), object 2 leaned against object 6 from the back, and object 6 was not obstructed at all and was grasped first. In this case, object 2 can never be grasped first as it was unreachable.

(a) and (b) Scenes causing object falling under method OD. OD: obstruction degree.
Experiment
We designed four scenes as shown in Figure 11 for experiments. Object recognition 34 is based on 3D point cloud captured by the Kinect, and occluded objects may not be recognized. Scene 1 represents the situation that objects lean against others, scene 2 represents the situation that objects sit on others, and scenes 3 and 4 are complex situations including multiple object interactions.

Scenes for experiments. Left part of each subfigure shows the arrangement of objects. Right part of each subfigure shows the result of object recognition before the first grasp. (a) Scene 1, (b) scene 2, (c) scene 3, and (d) scene 4.
Each scene was tested once under methods OD and GR, respectively, and the results are presented in Table 3. Method OD picked all the objects in the four scenes with no object collapse and taken less time for planning. Method GR led to object collapse in 3 scenes, 3 objects fell down and were not able to be grasped later.
Experiment results.
OD: obstruction degree; GR: grasp ranking.
In scene 1, all objects were recognized at the beginning. Figure 12 shows the selected grasp candidates, and Table 4 presents the object obstruction set. The execution processes are shown in Figure 13. OD planned a grasp order of

Selected grasp candidates in scene 1 before the first grasp. Each arrow represents the approach direction of one grasp candidate.
Object obstruction set of scene 1 before the first grasp.

Execution processes in scene 1 under methods (a) OD and (b) GR. OD: obstruction degree; GR: grasp ranking.
Conclusion and discussion
This article proposes a grasp planning method based on OD. First, a grasp configuration database, which has widely distributed grasp approach directions, is built and then it facilitates the search of reachable grasp configurations in cluttered scenes. Then ODs of grasp configurations and objects are analyzed according to the geometric relation between grasp configurations and object models. Finally, grasp order is planned in which object interactions are implicitly inferred by the ODs. At the same time, reachable grasp configuration is searched quickly for each object. Simulations and experiments in a series of scenes demonstrate that our method can grasp objects in clutter efficiently and safely.
Compared with previous grasp action planning, we consider the relationships among objects and give proper order of grasping, which reduces the risk of object collapse. Compared with previous work on object support relation recognition, our method does not need complex computer vision algorithms or mechanical analysis and plans concrete grasp configurations. Also because of that, it cannot completely avoid improper grasp order, as object interaction is not explicitly recognized.
In the future, we are going to add more constraints in grasp planning and introduce risk estimation to allow the robot to terminate task execution when object collapse may happen. In addition, the uncertainty of perception is common, so possible unseen objects are also going to be taken into account for grasp order planning.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Key R&D Program of China under grant 2017YFB1303600.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
