Abstract
A Gaussian Mixture Model (GMM)-based grasp-pose generation method is proposed in this paper. Through offline training, the GMM is set up and used to depict the distribution of the robot's reachable orientations. By dividing the robot's workspace into small 3D voxels and training the GMM for each voxel, a look-up table covering all the workspace is built with the x, y and z positions as the index and the GMM as the entry. Through the definition of Task Space Regions (TSR), an object's feasible grasp poses are expressed as a continuous region. With the GMM, grasp poses can be preferentially sampled from regions with high reachability probabilities in the online grasp-planning stage. The GMM can also be used as a preliminary judgement of a grasp pose's reachability. Experiments on both a simulated and a real robot show the superiority of our method over the existing method.
Introduction
The ability to grasp objects autonomously is an important function for robots. For industrial robots, most grasp targets are placed on conveyor belts or on a flat surface with known height [1–3]. In this case the robot only needs to determine a grasp pose constituted by three degrees of freedom (DOF): the x and y positions, and the rotation around the z axis. For service robots, the working space is usually unstructured and hence more complicated. While three-DOF grasping is still the focus for some researchers [4–6], more and more are placing the emphasis on six-DOF grasping [7–10]. Mannheim et al. presented a parallel gripping method for a robot system including several robotic arms [11]. Miller and Allen developed an open-source software named Grasplt [12], which can produce a set of grasp poses in the offline stage based on the object's shape and the end effector's configuration. In the online grasp stage, grasp poses are extracted from the pose set according to the current environment. For most household objects, however, feasible grasp poses usually form a continuous region. The set of discrete grasp poses was therefore unable to cover all feasible poses. Based on this consideration, Berenson et al. proposed the concept of Task Space Regions (TSR) [9, 13, 14]. In the TSR definition, the end effector's pose is expressed as a six-dimensional vector constituted by x, y, z, roll, pitch and yaw, with respect to the object's coordinate system. After defining the boundary in each dimension of the vector, the continuous region formed by the grasp poses can be clearly expressed mathematically. In the online stage, grasp poses are extracted from TSR randomly with uniform sampling. After a concrete grasp pose is obtained, collision detection is executed to check the pose's validity. If it is valid, inverse kinematics is invoked to convert the goal from task space to configuration space. Subsequently, motion-planning algorithms such as CBiRRT2 [9] are used to find a collision-free trajectory from the start configuration to the goal configuration. Given enough time, uniform random sampling can always find a valid grasp pose, as long as one exists. However, considering the requirements of real time, it is vital to sample a good grasp pose efficiently. Generally speaking, there are two cases in which a grasp pose is invalid. One is the case of collision, including collision between the grasp pose and the environment and between the configuration of the corresponding robot – obtained through inverse kinematics – and the environment. The other is where no inverse kinematics solution exists for the required grasp pose. This is usually because the pose is out of the robot's reachable space. Berenson et al. proposed casting rays from the surface of the object and computing the minimum distance between the object and the environment's obstacles [15]. With these distances an environment clearance score was determined. Grasp poses with high environment clearance scores were selected preferentially. While this method was able to generate collision-free grasp poses with high probability, it could not provide information on whether an inverse kinematics solution existed for the selected grasp pose. In most current service robots, the arms are of seven or more DOF. Generally there is no analytical inverse kinematics solution for these kinds of redundant arms. The numerical solutions, which are usually based on Jacobian pseudo inverse or Jacobian transpose, are time consuming. Therefore, if the grasp pose's reachability could be determined beforehand, the number of invocations to inverse kinematics could be decreased, and the efficiency thus improved. Vahrenkamp et al. proposed the concepts of reachability distributions and inverse reachability distributions to judge whether or not a grasp pose is reachable [16]. Zacharias et al. proposed the use of shape primitives including cones and cylinders to represent reachable orientations [17]. But this kind of representation lacks clear mathematical expression. In this paper, we propose a Gaussian mixture-model-based grasp-pose generation method. In the offline stage, the robot's reachable space is divided into 3D voxels based on x, y and z coordinates. For each voxel, thousands of orientations are generated with random sampling over the dimensions of roll, pitch and yaw. Through inverse kinematics computation, the orientations with valid inverse kinematics solutions are stored. These reachable orientations form a set {x | x = [roll, pitch, yaw] T }. The Gaussian mixture model is used to depict the distribution of this set. A look-up table is built to store the GMM of each voxel. In the online grasp-planning stage, a GMM is queried in the look-up table based on the target object's x, y and z positions. After transforming the GMM to the object's coordinate system, grasp poses are sampled from the object's TSR by Gaussian sampling. Through the Gaussian sampling, grasp poses with high reachability probabilities are selected preferentially. To the best of our knowledge, this is the first time that the GMM has been used to model the robot's reachable orientations. The merits of this method are twofold. In one aspect, reachable orientations can be sampled with high probability. In the other aspect, the reachability of a concrete grasp pose can be determined with this model.
The remainder of this paper is organized as follows. The reachability data collection process is explained in section 2. Section 3 and section 4 introduce the offline GMM modelling and the online grasp-pose generation procedure, respectively. The experiments are presented in section 5. Different rotation representations are further discussed in section 6. In section 7 we conclude our work.
Collecting Reachability Data
In general, there are two ways to generate the reachability data. One is to collect thousands of joint configurations through random sampling in the robot's configuration space. The pose of the end effector corresponding to each configuration is calculated through forward kinematics [16]. The other way is to sample in the workspace directly and use inverse kinematics to determine the reachability of each sampled pose [18]. The drawback of the first method is that it is inclined to introduce a preference towards singular configurations, since large displacements in the configuration space are mapped to identical voxels in such configurations [17]. Therefore, we use the second method to gather the reachability data. Figure 1 shows our robot arm and its kinematic configuration. It is the left arm of a SmartPalIV robot manufactured by the Yaskawa Electrical Corporation and has seven DOF. The end effector is a gripper with one DOF in the form of a rotating thumb. In Figure 1 the arm's coordinate system is labelled as O-xyz. The seven revolute joints are labelled as q1 to q7.

The robot arm and its kinematic configuration
Two expressions are used to represent the gripper's pose in the arm's coordinate system. One is a four-by-four matrix denoted as
where c stands for cos() and s stands for sin().
The reachability data are gathered through two steps. In the first, the robot's workspace is divided into 3D voxels along the x-, y- and z-axes with fixed grid size δ
x
, δ
y
and δ
z
. The centroid of each voxel is c
i
= [x
i
, y
i
, z
i
]
T
. In the second step, for each voxel thousands of grasp poses are generated through random sampling over the six dimensions of
Gaussian mixture models are widely used in image processing, especially in image segmentation for the modelling of the background [19–21]. A GMM is expressed as a weighted sum of several Gaussian models, as shown in (2):
where
As shown in (2), a Gaussian mixture model is completely determined by four variables: μ, σ, w k and m. The Expectation-Maximization (EM) algorithm is used to estimate these variables [22], as described below.
Set m to a certain value. With the K−means algorithm, initialize the variables μ, σ and w k .
Compute the responsibilities that the k−th Gaussian model takes for explaining the observation
Re-estimate the parameters using the current responsibilities:
Repeat step 2 and step 3 until the difference between the values of log p
For Gaussian mixture models, the number of Gaussian models plays an important role. In general, with more Gaussian models, the distribution of the sample set can be approximated more accurately. However, having too many Gaussian models will introduce the over-fitting problem. To solve this problem, we use the above EM algorithm to estimate the GMM with m varying from 1 to a certain upper limit M. For each estimated GMM, cross-validation is used to determine the best m.
One GMM is used to depict the distribution of the orientation set
The Gaussian mixture model depicts the distribution of the grasp-pose set
Experiments
The robot arm used in our experiments is shown in Figure 1. Its workspace is roughly a semi-sphere with the radius equal to 0.65 m. With the grid size δ
x
= δ
y
= δ
z
= 0.05m, the look-up table contains 4712 entries. For each grid, 8000 reachable poses were collected and the parameters of the corresponding GMM were estimated using the algorithm presented in section 3. The object to be grasped was a can of CocaCola, which was approximated as a cylinder with radius equal to 33 mm and height equal to 115 mm. The object's coordinate system was set up in such a way that the origin was at the cylinder's centroid, and the z-axis was aligned with the cylinder's rotation axis. According to the TSR definition, the boundary of the feasible grasp region was

The generated grasp poses, as indicated by green arrows (a) with GMM sampling, (b) with uniform sampling
To compare the performances of the two methods quantitatively for each generated grasp pose, inverse kinematics was used to obtain the corresponding joint configuration. If an inverse kinematics solution exists for a grasp pose, the pose is marked as reachable and the IK success number is increased by 1. Table 1 shows the IK success number of each method. Since the GMM is a combination of several Gaussian models, we tried different numbers of Gaussian model. The results of the methods with Gaussian model number varying from one to four are presented below. For each position, the maximum IK success number is highlighted in red. The improvement of the best GMM over the uniform sampling is listed in the last row.
Comparison of the IK success number for sampling with different methods
From Table 1 we can see that the best number of Gaussian models varies from position to position. The GMM sampling method has an average improvement of 73.5% over the uniform sampling method.
Next we applied our method in grasp planning with the real robot. The experiment set-up is shown in Figure 3. A Microsoft Kinect was used as the perception sensor and the object's six-DOF pose was calculated with the method proposed in [23]. With the object's six-DOF pose, grasp poses were generated from TSR with the GMM. Inverse kinematics was used to convert the pose to the robot's joint configuration and RRT-Connect [24] was used to plan a trajectory for the arm in the configuration space. The time from the start of the grasp-pose generation to the first inverse kinematics solution being found was recorded as the grasp-planning time. Fifty trials were conducted with varying object poses. The average grasp-planning time with our method was 14.3 ms, compared to 41.7 ms with the method proposed in [9].

The experiment set-up for grasping with a real robot
Besides generating grasp poses, the GMM could also be used in preliminary judgements on the reachability of a grasp pose. To show this we collected 5000 grasp poses with randomly sampled roll-pitch-yaw angles. Through an inverse kinematics calculation, 1842 poses were marked as reachable and 3158 poses were marked as unreachable. The reachability probability of each pose was calculated with the GMM with equation (2). Figure 4 shows the distributions of the reachable poses and the unreachable poses. The blue dots are reachable poses and the red dots are unreachable poses. The x-axis represents the reachability probability of each pose calculated with equation (2). The y-axis represents the number of reachable and unreachable poses at that probability. From Figure 4 we can see that most of the unreachable poses are of low reachability probabilities. Setting the threshold th=0.0075, a precision of 80.8% at a recall of 90.1% was achieved when using the probability as the judgement for a pose's reachability.

The probability distribution of randomly generated poses
In the collection of the reachability data and the subsequent offline GMM modelling process we used the Euler angle to represent the rotation because it is intuitively straightforward. We also tried another two kinds of rotation representations: the Quaternion [q w , q x , q y , q z ] and the Rodrigues parameters [v i , v j , v k ]. For the Rodrigues representation, a vector of three elements is used whose direction encodes the direction of the rotation axis, and the size of the vector encodes the amount of rotation. For the two rotation representations, in the reachability data collection process, poses are generated through random sampling with 0≤v w ≤1, −≤v x ≤1, −1≤v y ≤1, −1≤v z ≤1 and −π<v i ≤π, −π<v j ≤π, −π <v k ≤π, respectively. Similar to the set-up in section 5, 8000 reachable poses were collected for each voxel and the parameters of the corresponding GMM were estimated using the algorithm presented in section 3. With the GMM models estimated for each of the three rotation representations, 1000 grasp poses were generated and their reachabilities were verified with an inverse kinematics calculation. Table 2 shows the result.
Comparison of the IK success number for sampling with different rotation representations
Comparison of the IK success number for sampling with different rotation representations
From Table 2 we can see that the performances of the Euler angle and the Rodrigues parameters are almost the same, while the Quaternion obtains fewer reachable poses. We think this is because the Quaternion representation has one more parameter than the other two. Hence it is harder for the GMM to model the distributions.
Currently, our method has only been tested on objects of simple shapes such as cylinders and rectangles. In the future we may consider dealing with objects of complex shapes by chaining several TSR to describe an object's graspable region.
This paper has presented a GMM-based grasp-pose generation method for a robot's autonomous grasping. It is achieved through the combination of TSR and random sampling with GMM models. In the offline stage, the robot's workspace is divided into 3D grids along the x-, y-and z-axes. For each grid, thousands of orientations are collected with varying roll, pitch and yaw angles. These orientations are used to train a GMM model with an EM algorithm. A look-up table is set up to store the GMM models for each grid. In the online grasp-planning stage, the GMM model is queried from the look-up table with the object's position, and grasp poses are sampled from the object's TSR with the GMM model. This method achieves an average improvement of 73.5% over the uniform sampling method in [9] in the sense of the IK success number. The average grasp-planning time is reduced from 41.7 ms to 14.3 ms. The GMM could also be directly used as a preliminary judgement of the reachability of a grasp pose, which could allow the rejection of unreachable grasp poses before using time-consuming inverse kinematics, and thus improve the efficiency.
Footnotes
8.
This research was partially funded by Shanghai universities young teacher funding programme (No. ZZSDJ13020) and equipment manufacturing system optimization computing subject foundation (No. 13XKJC01).
