Abstract
This article presents a two-camera bi-axial parallel vision configuration to realize robot manipulator uncalibrated visual serving. The intrinsic and extrinsic parameters of the camera and the model of robot manipulator are not known, as well as without real-time computes the inverse of image Jacobian when image-based dynamic control of a robot manipulator in this article. First, the proposed vision configuration is applied to transform the Cartesian posture information into the point and angle parameters in the two image planes. After that, a qualitative mathematical model was established which analyzes the relationship between the feature point and line of the robot manipulator in the Cartesian space and the corresponding specific point and angle in the image plane. In addition, the monotonic property of the point and line features in the mathematic model has been proved. Then, the controller was designed based on image and realized the five-degrees of freedom of the robot manipulator uncalibrated visual serving control in the simulation platform. Moreover, the Lyapunov theory was used to prove the global asymptotic stability of proposed method in the image plane. Finally, the proposed method was compared with the method of image Jacobian matrix in the actual platform experiment, and the comparative experiment results show the effectiveness of the robot manipulator uncalibrated visual serving under the proposed vision configuration.
Keywords
Introduction
Robot manipulator uncalibrated visual serving is the method that drives the robot manipulator move to the desired posture through real-time accessing and updating image feature changes in the image plane of camera, as well as it has no explicit to calculate the internal and external parameters of the camera. 1 According to the posture relations between the camera and the robot manipulator in the Cartesian space, the visual serving relationship has been divided into eye-in-hand and eye-to-hand. The eye-in-hand is a camera mounted on the end of the robot manipulator and the eye-to-hand is the camera fixed in the Cartesian space. Moreover, the visual serving system can be divided into monocular vision system, binocular vision system, and multi-purpose vision system on the basis of the number of cameras.2,3 The vision configuration can directly affect the point and line feature projecting on the image plane in binocular vision system. In our experience, a suitable two-camera configuration can transform the Cartesian posture information into the image feature information in the two image planes, and then it can reduce the difficulty of robot manipulator uncalibrated visual serving control in the Cartesian space.
The two-camera orthogonal vision configuration can obtain the posture information of robot manipulator in the Cartesian space. Meanwhile, the orthogonal vision configuration has many successful application cases. For instance, Wang et al. 4 decomposed robot manipulator movement into the position parameter in two-camera planes and the robot manipulator succeeded in tracking the circular path. In the article by Chen et al., 5 the automatic marking control system with the orthogonal vision configuration is successfully applied for the automatic marking of gas bottles. In the article by Xu et al., 6 Harbin Institute of Technology used the orthogonal vision configuration to develop the microscopic visual micro-manipulator control system, and it has been successfully used in optical fiber butt. Although the orthogonal vision configuration can avoid estimating depth information from the image feature of the camera image plane, but it need to be strictly orthogonal in order to guide the robot manipulator to complete the task. Qian et al. 7 achieved robot manipulator precisely tracking the moving object under the eye-in-hand and eye-to-hand vision configurations. Pan et al. 8 established the nonlinear visual mapping model between the Cartesian space and the image plane, and they designed a controller based on artificial neural network to eliminate the tracking error. Recently, Liu et al. 9 used Kalman filter algorithm to estimate image Jacobian matrix and realized the five-degrees of freedom of robot manipulator uncalibrated visual serving control in MATLAB simulation. Assa et al. 10 employed the weighted average data fusion method and the extended Kalman filter algorithm to estimate the posture of robot manipulator in the Cartesian space in order to improve the robustness of the controller. Obviously, previous works7–10 can achieve robot manipulator uncalibrated visual serving control, but it requires a lot of math or complex nonlinear mapping model. Chang et al. 11 calibrated the relationship between the camera and robot manipulator in advance, and robot manipulator realized automated assembly cell phone’s cover with eye-in-hand vision configuration. Wang et al. 12 used stereo vision to obtain the target depth information and to grab any object in the Cartesian space. However, the articles by Chang et al. 11 and Wang et al. 12 calibrated the relationship between camera and robot manipulator, and the relationship is related to the controller whether it is able to stabilize convergence or not.
This article proposed a two-camera bi-axis parallel vision configuration, which neither needs two cameras strictly orthogonal nor calculates the image Jacobian or estimates nonlinear mapping model. Moreover, the simulation and experiment realized the five-degrees of freedom of robot manipulator uncalibrated visual serving control.
Vision configuration
Two-camera bi-axial parallel vision configuration
Ideally, two cameras use orthogonal vision configuration, as shown in Figure 1, in which camera 1 and camera 2 are mutually orthogonal in the xy plane of the Cartesian space. But in Figure 1, vision configuration has certain limitations in some engineering applications, for instance, two cameras cannot be completely orthogonal in the same plane. Therefore, this article proposed a two-camera bi-axial parallel vision configuration, as shown in Figure 2, which is based on the orthogonal vision configuration. In the proposed vision configuration, the x axis of camera 1 is parallel to the x axis of the Cartesian space and the x axis of camera 2 is parallel to the y axis of the Cartesian space. besides, camera 1 rotating around x axis is ψ1 and camera 2 rotating around y axis is ψ2. It should be pointed out that the proposed vision configuration is not to reconstruct the posture of robot manipulator in the Cartesian space, but to compensate the depth information of the target feature and decompose robot manipulator movement in the Cartesian space into the image feature information in the two image planes. Compared with the orthogonal vision configuration, the proposed vision configuration is still able to realize robot manipulator uncalibrated visual serving control task and achieve an excellent control effect.

The orthogonal vision configuration.

The bi-axial parallel vision configuration.
Camera model
The pinhole image model is commonly used to represent the camera model in machine vision, as shown in Figure 3, which is referred to the central perspective projection model.6,13 Denote point P coordinates in the Cartesian space as (Xw, Yw, Zw) and the coordinates in the camera coordinate as (Xc, Yc, Zc).

The central perspective model.
Denote the coordinates of point in the imaging plane as (x, y) and f as the focal length of the camera. Represent the pixel coordinates of the image plane by (u, v), and (u0, v0) indicates the main center point of the pixel coordinates. According to the central perspective projection model, equation (1) stands for the point P (Xc, Yc, Zc) projected onto the imaging plane
Note that each pixel’s physical size in the u axis and v axis of the image plane is dx and dy, respectively. Then, the imaging plane’s coordinates (x, y) change into the pixel coordinates of image plane (u, v)
By equations (1) and (2), the transformation from the imaging plane’s coordinates to the pixel coordinates in the image plane can be represented by equation (3)
where ax = f/dx and ay = f/dy are the amplification factors in the x axis and y axis from the imaging plane to the image plane, respectively. The parameters (ax, ay, u0, v0) are the structural parameters only related to the camera itself.
In this article, the xyz order’s rotation matrix of Euler angle was selected to stand for rigid body’s posture in the Cartesian space.
14
The parameters (ψ, θ, φ) are defined to represent the angle of rigid body rotating around the x, y, and z axes, respectively.
14
Therefore, the rotation matrixes
According to equation (4) and the vision configuration in Figure 2, the rotation matrix and translation vector of camera 1 are
where ψ1 is the tilt angle of camera 1 and the parameters (x1, y1, z1) are the position of camera 1 in the Cartesian space. Similarly, the rotation matrix and translation vector of camera 2 are referred to as
where ψ2 is the angle that camera 2 rotates around the y axis and the variables (x2, y2, z2) are the position of camera 2 in the Cartesian space.
Therefore, the homogeneous coordinate transformation from the Cartesian coordinates (Xw, Yw, Zw) to the camera coordinates (Xc, Yc, Zc) can be represented by equation (7)
where
Image feature selection
Image feature’s selection and extraction directly determine the controller design and the robustness of the closed-loop control system. Image feature frequently used is point, line, angle, area, optical flow field, or Fourier descriptor. Local image feature, for instance, point or angle, is relatively easy to extract and adapt to changing environment. Hence, this article selects a point and an angle as the image feature to indicate that robot manipulator moves in the Cartesian space. However, the number of image features in this article must be greater than or equal to the degrees of freedom controlled by the robot manipulator in order to achieve robot manipulator uncalibrated visual serving control task. Therefore, this article needs to select five image features at least for the purpose of realizing the five-degrees of freedom of robot manipulator uncalibrated visual serving control. According to the above discussion, the image feature set of this article is (u1p1, v1p1, u2p1, v2p1, θ1, θ2). The parameters (u1p1, v1p1) are the pixel coordinates of point P in camera 1 and θ1 is the angle θ between line p1p2 and u axis in camera 1. In addition, the parameters (u2p1, v2p1) are the pixel coordinates of point P in camera 2 and θ2 is the angle θ between line p1p2 and u axis in camera 2. The abstract projection model of image feature set is shown in Figure 4. Moreover, we selected the magenta and orange color block on the robot manipulator end in order to represent the image feature of a point and an angle in the actual platform experiment, and the actual projection model of image feature set is shown in Figure 5.

The abstract projection model of image feature set.

The actual projection model of image feature set.
Vision model analysis and controller design
Vision model analysis
Robot manipulator movement is synthesized by the translational component of the x or y or z axis in the Cartesian space. Thus, it is of great significance to analyze the changing trend of the specific feature in the image plane when robot manipulator translates along the x or y or z axis of the Cartesian space. Denote the robot manipulator end’s point by P and its coordinates as Pw = (Xw, Yw, Zw) in the Cartesian space. The parameters Pc1 = (Xc1, Yc1, Zc1) and Pc2 = (Xc2, Yc2, Zc2) are the coordinates of point P in camera 1 and camera 2 coordinates, respectively, as well as Pimg1 = (u1,v1) and Pimg2 = (u2,v2) are the projection coordinates of point P in the image plane of camera 1 and camera 2. Note that the parameters (P′w, P′c1, P′c2, P′img1, P′img2) represent the feature point coordinates of robot manipulator end in the previous time and the parameters (Pw, Pc1, Pc2, Pimg1, Pimg2) are the feature point’s coordinates of the current time. To guarantee the image feature always within the visual area of camera 1 and camera 2, the parameters Zc and Z′c must satisfy Zc > 0 and Z′c > 0. With reference to the camera model, ax and ay are the physical parameters of the camera, and they meet the condition ax and ay. According to the two-camera bi-axis parallel vision configuration, a qualitative mathematical model has been established in this article and it has the following properties.
Property 1
The pixel coordinate function u1 = f (Xw) of camera 1 image plane has a monotonically increasing or decreasing nature when the robot manipulator end’s feature point translates ΔXw along the Xw axis of the Cartesian space.
Proof
The current coordinates of feature point are Pw = (X′w+ΔXw, Y′w, Z′w) after the robot manipulator translates ΔXw along the Xw axis of the Cartesian space. Then, equation (2), equation (3), and point coordinates Pw are substituted into equation (7). After that, the feature point’s coordinates Pc1 and Pc2 in camera 1 and camera 2 coordinate can result in formula (8)
By combining equations (5) and (8), the difference between the previous and the current pixel coordinates in the image plane of camera 1 is formula (9)
For ax1> 0 and Z′c1> 0, there is
Therefore, the function u1 = f (Xw) has a monotonically increasing or decreasing nature which is analyzed by formula (10).
Property 2
The pixel coordinates function u2 = f (Yw) of camera 2 image plane has a monotonically increasing or decreasing nature when robot manipulator end’s feature point translates ΔYw along the Yw axis of the Cartesian space.
Proof
Similarly, the current coordinates of feature point are Pw = (X′w, Y′w+ΔYw, Z′w) after the robot manipulator translates ΔYw along the Yw axis of the Cartesian space. Afterward, equation (2), equation (3), and point coordinates Pw are substituted into equation (7). Then, the feature point’s coordinates Pc1 and Pc2 in camera 1 and camera 2 can be expressed by formula (11)
From equations (5) and (11), the difference pixel coordinates of camera 2 between the previous position and the current position can be related as formula (12)
For ax2> 0 and Z′c2> 0, there is
Thus, the function u2 = f (Yw) has a monotonically increasing or decreasing nature according to formula (12).
Property 3
The pixel coordinates function v1 = f (Zw) of camera 1 image plane has a monotonically increasing or decreasing nature when robot manipulator end’s feature point translates ΔZw along the Zw axis of the Cartesian space.
Proof
The proof process is similar to property 1 and property 2. First, the current coordinates of feature point are Pw = (X′w, Y′w, Z′w+ΔZw) after robot manipulator translates ΔZw along the Zw axis of the Cartesian space. After that, there is formula (14) to indicate the feature point’s coordinates Pc1 and Pc2 in camera 1 and camera 2 coordinates by substituting equation (2), equation (3), and point coordinates Pw into equation (7)
By combining equations (5) and (14), the difference pixel coordinates of camera 1 between the previous position and the current position can be given by formula (15)
Assuming the camera’s resolution is R1 × R2 and the camera sensor size is S1 × S2, there is formula (16) according to the pinhole image model
From formulas (3) and (16), we obtain
where (v, v0) are the pixel coordinates and f is the focal length of the camera. It should be pointed out that the molecular of formula (17) is far greater than the denominator with respect to the parameter of the high-resolution industrial camera. Besides, the tilt angle of camera 1 ψ1(0 ≤ ψ1 ≤ 20) is small and ΔZw is a very small amount of translation. Therefore, the simplification of formula (18) exists when the tilt angle of camera 1 ψ1 is small
By submitting formula (18) into formula (15), formula (15) can be re-written as follows
For ay1> 0 and Z′c1> 0, there is
According to formula (19), the function v1 = f (Zw) has a monotonically increasing or decreasing nature.
Property 4
The line feature of the robot manipulator end composed by two feature points, and between the line feature and the z axis of the Cartesian space has an angle
Proof
The posture of the line can use the spherical coordinates ϕ and θ

The spherical coordinate.
Note that unit vector
Denote the projection vector of unit vector
Suppose that the angle θis a function of a, formula (22) can be re-written as
The function y = arctg(x) is a monotonically increasing function, and the monotonic of
Hence, the monotonicity of function v1 = f (Zw) is related to the quadrant of angle φ and it has piecewise monotonic nature.
According to the homogeneity of spherical coordinate notation, between the line feature and the x axis of the Cartesian space has an angle
Controller design
Taking the above discussions into account, the robot manipulator translates along the Xw or Yw or Zw axis and rotates around the Xw or Zw in the Cartesian space. After that, the trajectory of feature point in the image plane is shown in Figures 7 and 8. As we have seen in Figure 7, the variables (u1, u2, v1) are the main characteristic variation to reflect robot manipulator translating along the Xw or Yw or Zw axis, respectively. Besides, the feature point’s trajectory in the image plane demonstrated the correctness of property 1, property 2, and property 3. Similarly, the angles (θ1, θ2) are the main characteristic variation when robot manipulator rotates around the Xw or Zw axis, respectively, in Figure 8. Moreover, the monotonic of the trajectory in the image plane also proves the validity of property 4.

Robot manipulator translates along the (a) x, (b) y, and (c) z axes.

Robot manipulator rotates around the (a) x and (b) z axes.
According to the experiment trajectory in Figures 7 and 8, the controller selects a main characteristic variation to reflect the robot manipulator translating along or rotating around the Xw or Yw or Zw axis in the Cartesian space. Thus, the controller selects the variables (u1, u2, v1) to represent the robot manipulator to translate along the Xw, Yw, and Zw axes respectively, and chooses the angles θ1 and θ2 to reflect the robot manipulator rotating around the Xw and Zw axes. For simplicity, the quantitative control model based on the two-camera bi-axial parallel vision configuration can be described by formula (25)
Based on the quantitative control model above, this article proposed the following controller
where
Stability analysisa
This section analyzes the stability of the robot manipulator under the control of the proposed vision configuration and the controller. According to the analysis and proof in section “Vision model analysis,” we can conclude that the image error asymptotic converge to 0 meaning that the robot manipulator can stably converge to the desired posture in the Cartesian space. Thus, the stability in the image plane is equivalent to the stability of the robot manipulator in the Cartesian space. For simplicity, we assume that the feature point is visible during the motion. Following is the process of stability proof in the Cartesian space.
Proof
It is well-known that the dynamic equation of robot manipulator is equation (27).16–18
where
The proposed controller in this article is
Introduce the following non-negative energy function
From formula (29), the value of
By combining formulas (27) and (28), the closed-loop dynamics equation is obtained
Multiplying the
Differentiating the function
The following equation was established through the structural analysis of Lagrange motion equation19–21
By combining formulas (31)–(33), the derivation of Lyapunov function can be simplified
Because the parameter of
Because the matrix
Therefore, this article can achieve the global asymptotic stability of robot manipulator in the Cartesian space under the proposed vision configuration and the controller.
Simulation and experiment
Simulation
This article developed the simulation platform in order to verify the validity of the proposed controller under the two-camera bi-axis parallel vision configuration. The simulation platform is shown in Figure 9, which is composed of the three-dimensional (3D) scene platform model in the Open Inventor, the vision module in the OpenCv, and the robot control module in the Orocos. All its modules are open software source that are widely used by scholars. Moreover, the graphical user interface module and the human–machine interaction module are developed by the MFC and DirectInput development kit.

The simulation platform.
The robot manipulator model used in the simulation platform is Puma560. The intrinsic parameters of two cameras are reference to the actual industrial camera that the image resolution is 1024 × 768, the camera sensor size is 4.8 × 3.6, and the focal length is 12 mm. The camera 1 counter-rotates around the x axis of the Cartesian spaceŁ20 degrees and the camera 2 rotates around the y axis of the Cartesian space Ł60 degrees. The extrinsic parameters of two cameras are

The flow chart of the simulation program.
Denote the initial image feature vector in the image plane as (970.9, 178.9, −121.8, 961.0, 624.9, −36.1) and the initial posture of the robot manipulator in the Cartesian space as (242.8, −439.6, 368.7, 46.9, 36.0, 0.0) in the simulation experiment. The desired image feature vector is (387.2, 522.3, 177.4, 527.1 226.4, −96.6) and the expectation posture of robot manipulator is (680.3, −114.1, 269.5, 46.9, 89.2, 52.2). After the simulation, the error curves of image feature and robot manipulator’s posture are as shown in Figures 11 and 12, respectively.

Image feature error’s curve.

Robot manipulator pose error’s curve.
After the simulation, the final image feature vector is (386.9, 522.2, 177.4, 527.8, 226.4, −96.7) and the final posture of the robot manipulator in the Cartesian space is (680.6, −114.8, 269.0, 46.9, 89.5, 52.1). As a result, the error vector with respect to the desired image feature is (−0.3, −0.1, 0.0, 0.7, 0.0, −0.1) and the error vector compared with the expectation posture of robot manipulator is (0.3, −0.7, −0.5, 0.0, 0.3, −0.1). According to the error vector, the image feature’s error in the image plane is within 1 pixel or degree and the posture error in the Cartesian space is less than 1 mm or 1°. Besides, the point trajectory in the Cartesian space is shown in Figure 13. In conclusion, the simulation result shows that robot manipulator realized the five-degrees of freedom of robot manipulator uncalibrated visual serving control with minimal error both in the image plane and in the Cartesian space.

Point trajectory in Cartesian space.
Experiment
This article conducts a comparison experiment in the actual platform in order to verify the effectiveness of the proposed vision configuration. The actual physical platform is shown in Figure 12, which composed by the Denso robot VS-6556, the MV-VS078FC industrial camera, and the Advantech IPC 610L industrial PC. Similar to the simulation platform, the graphical user interface module and the human–machine interaction module in the actual physical platform are based on the MFC and VS2008. Moreover, the intrinsic parameters of the industrial camera are identical to the intrinsic parameters of the camera in the simulation platform, and the posture of the two industrial cameras in the Cartesian space is shown in Figure 14.

The actual platform.
Similarly, the two-camera bi-axial parallel vision configuration was used in this actual platform experiment. Denote the initial image feature vector in the image plane as (204.7, 257.1, 22.0, 790.4, 579.7, 66.6) and the initial posture of the robot manipulator in the Cartesian space as (257.7, −190.3, 406.0, −68.3, 35.3, 166.4). Beyond that, the desired image feature vector in the image plane is (676.2, 442.8, −1.4, 422.6, 340.3, 88.3) and the desired posture of robot manipulator in the Cartesian space is (333.4, −275.7, 408.6, −91.8, 35.5, 175.0). After the experiment, the image feature vector is (676.1, 442.5, −1.3, 423.0, 339.8, 88.3) and the robot manipulator’s posture is (333.9, −276.0, 409.0, −92.3, 35.7, 174.1) under the method of image Jacobian matrix. On this basis, the image error vector is (0.1, −0.3, −0.1, −0.4, 0.5, 0) compared with the desired image feature, as well as the posture error vector is (−0.5, 0.3, −0.4, 0.5, −0.2, 0.9) with respect to the expectation posture. However, it should be pointed out that the image feature vector is (676.1, 442.5, −1.3, 423.0, 339.8, 88.3) and robot manipulator’s posture is (333.4, −275.8, 409.0, −91.9, 35.4, 175.0) under the method of the two-camera bi-axis parallel vision configuration. Moreover, the image error vector is (0.3, 0.3, 0.2, 0.8, 0.4, 0) with respect to the desired image feature and the posture error vector (0, −0.1, −0.4, 0.1, 0.1, 0) compared with the desired posture. In addition, the point and angle feature error’s contrast curves in camera 1 are shown in Figures 15 and 16, respectively, and the point and angle feature error’s contrast curves in camera 2 are shown in Figures 17 and 18, respectively. For convenience, the symbol KP represents the method of image Jacobian matrix and the symbol DP stands for the proposed vision configuration in the figure of this article.

Point error’s contrast curve in camera 1.

Angle error’s contrast curve in camera 1.

Point error’s contrast curve in camera 2.

Angle error’s contrast curve in camera 2.
According to the comparison of experiment data, the image feature error in the image plane is within 1 pixel or degree and the posture error in the Cartesian space is less than 1 mm or 1° under any of the two methods. Figure 19 shows the two methods’ error contrast curve in the Cartesian space, as well as Figure 20 indicates the contrast trajectory of the robot manipulator in the Cartesian space. Therefore, the effectiveness of the two methods has little difference. But the image Jacobian matrix method requires real-time to estimate the image Jacobian matrix and to calculate the inverse of the image Jacobian matrix. Thus, the disadvantage of this method is the calculation of the complex matrix and the irreversible of the image Jacobian matrix. For comparison, the proposed method can completely avoid complex mathematical operations and the irreversible of matrix.

Posture error curve in Cartesian space.

Point contrast curve in Cartesian space.
Conclusion
In this article, the two-camera bi-axial parallel vision configuration was proposed to realize robot manipulator uncalibrated visual serving. The qualitative mathematical model was established under the proposed vision configuration, and the relationship between the feature point and line in the Cartesian space and the corresponding point and angle in the image space has been analyzed. Then, the controller was designed by selecting the specific feature point and line in the image plane. By taking the nonlinear dynamic forces of the robot manipulator into account, the Lyapunov theory was used to prove that the robot manipulator can achieve global asymptotic stability in the Cartesian space. Finally, the simulation and physical experiments realized the robot manipulator five-degrees of freedom uncalibrated vision positioning, and the experimental results verify the validity of the proposed control method and vision configuration.
Footnotes
Academic Editor: Seung-Bok Choi
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
This work was supported by National Natural Science Foundation of China (61174104), the Fundamental Research Funds for the Central Universities (Project No. 1061120131706), and the Research Foundation for Talents of Chongqing University.
