Abstract
This paper presents a real-time remote robot teleoperation method using markerless Kinect-based hand tracking. Using this tracking algorithm, the positions of index finger and thumb in 3D can be estimated by processing depth images from Kinect. The hand pose is used as a model to specify the pose of a real-time remote robot's end-effector. This method provides a way to send a whole task to a remote robot instead of sending limited motion commands like gesture-based approaches and this method has been tested in pick-and-place tasks.
1. Introduction
If a task is too complex for an autonomous robot to complete, then human intelligence is required to make a decision and control the robot, especially when it is in unstructured dynamic environments. Furthermore, when the robot is in a dangerous environment, robot teleoperation may be necessary. Some human-robot interfaces (Yussof et al. [1]; Mitsantisuk et al. [2]) like joysticks, dials and robot replicas, have been commonly used, but these contacting mechanical devices require unnatural hand and arm motions to complete a teleoperation task.
Another way to communicate complex motions to a remote robot, which is more natural, is to track the operator hand-arm motion which is used to complete the required task using contacting electromagnetic tracking sensors, inertial sensors and gloves instrumented with angle sensors (Hirche et al. [3]; Villaverde et al. [4]; Wang et al. [5]). However, these contacting devices may hinder natural human-limb motions.
Because vision-based techniques are non- contact and less hindrance to hand-arm motions, they have also been used. Vision-based methods always use physical markers placed on the anatomical body part (Kofman et al. [6]; Lathuilière and Hervé [7]; GuangLong Du et al. [8]). There are a lot of applications (Peer et al. [9] Borghese and Rigiroli [10]; Kofman et al. [6]) using marker-based human motion tracking, however, because body markers may hinder the motion for highly dexterous tasks and may get occluded, this marker-based tracking is not always practical. Thus, a markerless approach seems better for many applications.
Compared to image-based tracking which uses markers, markerless is not only less invasive, but also eliminates problems of marker occlusion and identification (Verma [11]). Thus, markerless tracking may be a better approach for remote robot teleoperation. However, existing markerless human-limb tracking techniques have so many limitations that they may be difficult to use in robot teleoperation applications. A lot of existing markerless tracking techniques capture images and compute the motion later like a post-process (Goncalves et al. [12]; Kakadiaris et al. [13]; Ueda et al. [14] Rosales and Scarloff [15]). The markerless tracking has to perform simultaneously in real-time for remote robot teleoperation when controlling continuous robot motion. To allow the human operator to perform hand-arm motions for a task in a natural way without interruption, the position and orientation of the hand and arm should be provided immediately. Many techniques can only provide 2D image information of the human motion (Koara et al. [16]; Mac Cormick and Isard [17]) and the tracking methods cannot be extended for accurate 3D joint-position data. An end-effector of a remote robot would require the 3D position and orientation information of the operator's limb-joint centres with respect to a fixed reference system, and identifying human body parts in different orientations has always been a significant challenge (Kakadiaris et al. [13]; Goncalves et al.[12]; Triesch and Malsburg [18]).
For robot teleoperation, there is limited research on markerless human tracking. Most techniques have tried to use a human-robot interface based on hand-gesture recognition to control robot motion (Fong et al. [19]; Hu et al. [20]; Moy [21]). Coquin et al. and Ionescu et al. [22] developed markerless hand-gesture recognition methods which can be used for mobile robot control where only a few different commands are enough like “go,” “stop,” “left,” “right” and so on. However, for object manipulation in 3D space, it is not possible to achieve natural control and flexible robot motion using gestures only. If a human operator wants to use gestures, he/she needs to think of those limited separate commands that the human-robot interface can understand like move up, down, forward and so on. A better way of human-robot interaction would be to permit the operator to focus on the complex global task as a human naturally does when grasping and manipulating objects in 3D space instead of thinking about what type of hand motions are required. To achieve this goal, a method that allows the operator to complete the task using the hand-arm motions naturally, providing the robot with information of the hand-arm motion in real-time like the hand and arm anatomical position and orientation (Kofman et al. [23]), is needed. However, to achieve the initialization, the human operator must assume a simple posture with an unclothed arm in front of a dark background, hand placed higher than the shoulder. It is not possible to get a precise result with a complex background. In addition, the human operator would find it hard to work in cold weather as the arm is unclothed. It is also limited because of the lighting effect, i.e., it is difficult to use when it is too bright or too dark.
This paper presents a method of remote robot teleoperation using markerless Kinect-based 3D hand tracking of the human operator (Figure 1). Markerless Kinect-based hand tracking is used to acquire 3D anatomical position and orientation, and then it sends the data to the robot manipulator by a human-robot interface to enable the robot end-effector to copy the operator hand motion in real-time. This natural way to communicate with the robot allows the operator to focus on the task instead of thinking in terms of limited separate commands that the human-robot interface can understand like gesture-based approaches. Using the non-invasive Kinect-based tracking avoids the problem that physical sensors, cables and other contacting interfaces may hinder natural motions and that there may be marker occlusion and identification when using marker-based approaches.

Non-invasive robot teleoperation system based on the Kinect
2. Human hand tracking and positioning system
Human hand tracking and positioning is carried out by continuously processing RGB images and depth images of an operator who is performing the hand motion to complete a robot manipulation task. The RGB images and depth images are captured by the Kinect which is fixed in the front of the operator.
The Kinect has three autofocus cameras: two infrared cameras optimized for depth detection and one standard visual-spectrum camera used for visual recognition.
2.1 Kinect coordinate system
In Figure 2, an operator stands in front of the Kinect and controls a robot. We can define the Kinect-coordinate as shown in Figure 2: axis X is upturned, axis Y is rightward and axis Z is vertical. The Kinect can capture the depth of any objects in its workspace. In Figure 2 we can see the index finger tip(

depth of objects. K: Kinect;
2.2 Image capture and segmentation of hand
In order to catch the hand motion used for controlling the robot manipulator, we need to separate the hand from the depth image. The arm is segmented from the body by thresholding the raw depth image.
A depth image

Segmentation of hand and determination of thumb and index-finger tip positions
Where
When the human operator held out the hand to control the robot manipulator, the arm is closer than the body, we can first compute the mean value
Then we can divide the arm region
The arm region
2.3 Determination of thumb and index- finger tip positions
The positions of thumb tip and index-finger tip are determined by an image that contains the arm. The arm region
For all 2D points (
Then project the 3D points
Define the minimize project function
Determine the one maximum (at
3. Position model
To avoid large scale motion when the operator performs manipulation, we need to confine the working space of the operator to a relatively small space. However, the working space of the remote robot should not be limited. This means the mapping from a relatively small place to an unconfined large space is necessary. Because of direct mapping from small space to a larger space, the mapping will lose some precision. To avoid this problem, we adjust a differential positioning method in this situation.
Similar to the mouse and the keyboard, the position of the hand can be calculated by the incremental method. From section 2, the 3D position of
Define the 3D position of the

Hand pose

Positioning model
The 3D position of B in the last frame is
Where u is a threshold that determines whether the robot keeps moving or pauses. When
Because σ is an adjustable parameter, theoretically the space manipulated by the operator is an infinite space and we can obtain coarse-control and fine-control through adjustment to the value of σ.
4. Orientation Model
As described in Figure, the orientation of the end-effector is in accordance with the orientation formed by thumb tip, index finger tip and The part between the thumb and the index finger

Orientation model
The orientation of the end-effector is calculated using the 3D positions of the
This means that if we only get the transformation matrix from the coordinate system of the console to the coordinate system of the operator's hand, we can obtain the transformation matrix from the base coordinate system to the end-effector. The details of the derivation of the orientation matrix are given below:
Assuming the origin of the operator's hand coordinate system is identical to the one in console coordinate system and the transformation matrix is a 3*3 matrix
In hand tracking and positioning, the unit vector [
Through (13), (14), (15), we can get:
As stated before, the transformation matrix from the console coordinate system to the operator's hand coordinate system is identical to the one from the base coordinate system to the end-effector coordinate system, and the translation relationship between the end-effector and the base coordinate system is already yielding in the positioning model, so the transformation matrix of orientation is:
Notice that the [
5. Virtual Robot Manipulation System
We use a six degree-of-freedom industrial robot to perform this experiment, as shown in Figure 7. The task is to grab the target object which is in the robot's working space and then place the object at the destination.

Six-axis robot-manipulator used at the remote robot site
There are two working modes for the robot. The first one is to calculate the angle of every joint by reversing kinematic according to the position of the end-effector. After joints execute the entire requested angles, the end-effector of the virtual robot reaches the destination. This mode is suitable for a situation where no obstacle occurs in the work space of the virtual robot. However, the second mode is suitable for the situation where an obstacle shows up in the virtual robot's working space. In this mode the virtual robot has to move along a safe path, which ensures the virtual robot will not collide with the obstacle.
In DH representation,
For a robot with six joints, the homogeneous coordinate transformation matrix from the base coordinate system to the end-effector's coordinate system is defined as:
Where
Through (8) we can have the angle of six joints: (
6. Experiments
We evaluated the algorithm on our robot platform. When testing it, we built up an experimental environment of teleoperation. We built a set of emulation environments for the technical robot and a set of virtual reality systems based on video at the local site. The remote site is the real robot in the working environment. In this experiment, considering the real environment of teleoperation, we limit bandwidth to 30kB/s and the delay time is approximately 3 seconds.
To evaluate the Kinect-based teleoperation algorithm described in this paper, we use C++ to develop a Kinect-based human-robot interface system (Figure 8) and this system is used for the teleoperation of a six-axis technical robot. This experimental system includes three modules:
Use the human hand tracking and positioning system to get the hand images, and then calculate the 3D positions of
Virtual robot manipulation system drives the virtual robot based on the joint angles which are calculated through reverse kinematic. If the commands are safe, they will be transmitted to the remote site to control the real robot.
The remote site will transmit the video to the local site and the video fuse system displays the virtual environment and the real environment. Then the edges of the virtual robot cover the video frame which is transmitted from the remote site.
In the experiment, the operator placed his hand in the workspace to control the virtual robot. The orientation of the virtual robot's end-effector coincided with the human hand. The position of the virtual robot's end-effector was adjusted by moving the human hand through different faces of the direction space, as shown in Figure 3.
As shown in Figure 3, the way the operator controls the robot is natural and intuitive. Because of using an incremental method which is similar to keyboard control, the operator is not required to make large scale movements to control the robot.
7. Result
After reconstructing and controlling robots by reverse kinematics, the precision of manipulation will decrease because of the transformation of the coordinate system and solving of the equations set.
Figure shows the position and orientation of the robot's end-effector and the operator's hand during teleoperation experiments. The dashed line represents the end-effector's path. The solid line with green squares represents the path of the operator's hand. The virtual robot was manipulated to grab the ball which is placed on a square. The data generated by this experiment has shown that the position errors ranged from −13 to +13 mm and the orientation errors ranged from −2 to 2 degree. Figure (c,d,e) shows the

Non-invasive vision-based teleoperation system

Analysis of the experiment
8. Discussion
In the remote unstructured environment of the robot teleoperation, we assume that all the remote robot site components, including robotic arm, robot controller, cameras on end-effectors and some other cameras, can be installed on a mobile platform and enter those unstructured environments. The method shown here is proved on grabbing objects, picking up objects and positioning accurately during grabbing objects in the fine adjustment controlling mode. One advantage of this system is that it includes the operator into the decision control loop. It allows a robot to grab, move and place the object without any prior knowledge like starting location and even destination location. There are some similar tasks which require decision making when picking up objects and targets from multiple objects like packing and cleaning some objects which may contain some dangerous items. It is expected that this system can be used to achieve those more complex poses when the joints of the robot are limited. The hole task shows how to determine the position of an extruded body and a target hole randomly. Assembly and disassembly may include more limited hole tasks. We may need an appropriate grab hook, bigger hole and groove unless this system includes force feedback.
Compared with the automatic capture (Kofman et al. [6]), this algorithm uses manual positioning. Considering hand tremor, this algorithm includes a coarse adjustment and fine adjustment function. When guiding the robot, we can use the coarse adjustment to move the robot close to the target quickly. When grabbing the target, we can use the fine adjustment to position the robot accurately. That can ensure the safety and the efficiency of the teleoperation, and solve the problem of inaccuracy caused by manual operation.
This paper contributes to the guiding teleoperation system based on non-contact measurement. By using tracking based on Kinect, robot teleoperation allows the operator to control the robot in a more natural way. Generally speaking, using the same hand motion that naturally would be used in a task can accomplish the operation task and what is more, this tracking based on Kinect is non-contact. Thus, compared with contacting electromagnetic devices, devices based on sensor and data gloves which are used normally, non-contact devices may cause less hindrance to the natural human-limb motion. The method proposed here will allow the operator to focus on the task instead of thinking of how to decompose the commands into some simple commands that the voice recognition teleoperation system can understand. This method is more natural and intuitive than the operation in Kofman et al. [23]. The system can be used immediately without any initialization and this non-contacting control system can be used outdoors. Because this algorithm uses infrared distance measurement to get arm information, it can ignore the lighting effect and does not need to extract the 3D coordinates by accurate image processing. That allows the system to be used in more severe environments, like when it is too bright or too dark. In addition, the algorithm of [23] reference 1 needs a bare hand to recognize the colour of skin, otherwise, it cannot be used to extract the hand data. Compared with that algorithm, this algorithm does not require a bare hand and the operator can wear gloves when using the system in a cold outdoor working environment. That enlarges the field of application of the system.
9. Conclusion
A method of human-robot interaction using markerless Kinect-based tracking of the human hand for a robot-manipulator teleoperation has been presented. Via tracking of the thumb tip, index-finger tip and the part of the hand between the thumb and the index finger in real-time, the 3D position and orientation of the hand are computed accurately and the robot manipulator can be controlled by hand to perform the task of picking up and placing. To complete the complex tasks, multi-Kinect will be used to work together in future work.
