Depth Adaptive Zooming Visual Servoing for a Robot with a Zooming Camera

Abstract

To solve the view visibility problem and keep the observed object in the field of view (FOV) during the visual servoing, a depth adaptive zooming visual servoing strategy for a manipulator robot with a zooming camera is proposed. Firstly, a zoom control mechanism is introduced into the robot visual servoing system. It can dynamically adjust the camera's field of view to keep all the feature points on the object in the field of view of the camera and get high object local resolution at the end of visual servoing. Secondly, an invariant visual servoing method is employed to control the robot to the desired position under the changing intrinsic parameters of the camera. Finally, a nonlinear depth adaptive estimation scheme in the invariant space using Lyapunov stability theory is proposed to estimate adaptively the depth of the image features on the object. Three kinds of robot 4DOF visual positioning simulation experiments are conducted. The simulation experiment results show that the proposed approach has higher positioning precision.

Keywords

Zooming Control Invariant Visual Servoing Depth Adaptive Estimation Lyapunov Stability Theory

1. Introduction

The introduction of visual servoing techniques would increase the diversity and extend the application fields of a robot. The aim of robot visual servoing is to control the relative pose of the robot's end-effector, with respect to the manipulated object, using real-time visual information captured from a camera, which is either fixed in the robot work space (eye-on-hand configuration) or mounted at the robot's end-effector (eye-in-hand configuration). In the last few years, many robot visual servoing methods have been proposed to solve the problem of robot positioning and tracking with respect to the manipulated object in the dynamic and unknown environment. These methods can be roughly classified into three types according to the definition of the system error function: position-based visual servoing (PBVS), image-based visual servoing (IBVS), hybrid visual servoing (HVS or 2.5D VS). Each kind of visual servoing method has its own pros and cons. A comprehensive review paper about robot visual servoing can be found in [1]^.

The basic requirement of these visual servoing algorithms is to keep the observed object in the field of view of the camera during the visual servoing (also known as the visibility problem). When the robot initial position is far from the desired position, the object could leave the camera field of view during the visual servoing. This will cause the visual servoing task to fail and limit its further application ^{[2, 3]}. There are two kinds of common solutions to this view visibility problem: trajectory planning and active vision techniques such as active zoom control. The main idea of trajectory planning is to plan the trajectories of a set of features points on the object in the image space and then to track these trajectories. At present, many trajectory planning methods have been proposed. In general these methods would be roughly classified into four types: image space-based planning ^[5], global path planning ^[6], optimization-based planning ^[7] and potential field-based planning ^[8]. Tracking a planned trajectory can keep the change of the robot pose in a small range. Thus, it would be possible to keep the object in the camera field of view by enforcing such constraints on the trajectories ^[4] and the visibility problem could be solved. However, this kind of method is sensitive to the change of the camera intrinsic parameters. A comprehensive review paper about trajectory planning can be found in [9].

Another solution is to combine the active vision technique with the robot visual servoing system, e.g., an active zoom control technique. The main idea of active zooming could be described as follows:

Zooming out when one of the interest points, which are also called the features points on the object, is close to leaving the image.

Zooming in when the interest points are well centred in the image, to get good resolution of the interesting points.

It can be seen that active zoom control could not only solve the view visibility problem, but also improve the accuracy of the features extraction from the interest object and the precision of the whole robot visual servoing system ^[2]. This has been widely used in the computer vision field ^{[10, 11]}. For example, Kumar et al. constructed a stereo vision system through the two pant-tilt-zoom (PTZ) cameras and localized a moving target precisely in a complex and large environment ^[11]. If we combine the active zooming idea with the robot visual servoing method, we could solve the view visibility problem in robot visual servoing and improve the accuracy of the whole visual servoing system. However, it is difficult to apply this active vision technique directly to traditional visual servoing approaches because traditional visual servoing approaches are based on a teaching-by-showing technique. This means that we must first store the desired image captured in the desired position and then control the robot, starting at another position than the desired position, where the current image coincides with the desired image. In other words, traditional visual servoing methods are “camera-dependent”. To solve this “camera-dependency” problem, one good solution is to employ the invariant visual servoing (or intrinsic-free visual servoing) method proposed in [12]. The key idea of invariant visual servoing is to construct a system error function using a projective invariant property. The error function is invariant to the changes in camera intrinsic parameters and only relevant to the relative position and pose between camera and object. An image Jacobian matrix in the error function describes the relationship between the camera velocity and the image features error in the invariant space. This invariant visual servoing approach provides a good methodology for controlling the robot with a zooming camera, but it cannot be used on a planar object. In addition, it doesn't consider the focus length control problem. This means that we can control the robot using a different focus length to the one used in the learned stage, but we can't control or change the focus length during the visual servoing. In order to deal with this problem, a zooming visual servoing method is proposed in [13]. This method allows us to control the focus length during the visual servoing by using a focus length control strategy and then recover the focus length value corresponding to the desired image at the end of the visual servoing. In addition, this method can also solve the planarity problem from the method proposed in [12]. However, the methods proposed in [12 and 13] do not take into account depth estimation problems and estimate the object depth at the very beginning of the visual servoing, then keep it constant throughout. However, the image Jacobian matrix depends on the depth parameter, which is a time-varying parameter, so a constant depth estimation value could only locally guarantee the convergence of the method. In order to further improve the invariant visual servoing method proposed in [12 and 13], we proposed a nonlinear depth adaptive estimation algorithm using Lyapunov stability theory in the invariant space. Some depth estimation algorithms using Lyapunov stability theory have been proposed in the past years ^{[14, 15]}. These estimation methods can work well in the case of the fixed camera intrinsic parameters but can not be used in the case of the variant camera intrinsic parameters. In this paper we extend these depth estimation methods to zooming visual servoing, where the camera intrinsic parameter changes during the visual servoing and gives a detailed derivation. Finally, we design a zooming visual servoing strategy for a manipulator robot with a zooming camera. Simulation results show that our approach has a better convergence performance of the image error.

The remaining part of this paper is organized as follows. The basic principle of the invariant visual servoing method is briefly introduced in Section 2. The details of our algorithm are described in Section 3, where we propose an efficient depth adaptive zooming visual servoing approach. Simulation results and analysis are given in Section 4. Section 5 will conclude the paper by discussing the proposed approach and suggest some improvements for further research.

2. Basic Principles of Invariant Visual Servoing

In this section, we briefly introduce the basic principles of the invariant visual servoing method, related space notations and a pin-hole imaging model.

2.1 Space Notations

The 3D points, with homogeneous coordinates χ_i = (X_i,Y_i,Z_i,1) (i = {1,2,…n}) are projected in the absolute camera frame F* to the points m_i*(i ∊ {1,2,…n}) ∊ P². These points are projected in the current camera frame F to the points m_i(i ∊ {1,2,…n}) ∊ P². Then according to the projection model:

m_{i} = \frac{1}{Z_{i}} [R^{*} t^{*}] χ_{i} = (x_{i}, y_{i}, 1)

(1)

where Z_i is the positive depth and R* and t* are the rotation and translation, respectively, between F* and F.

2.2 Camera Model

The image point p = [u v 1]^T is given by a pinhole camera in the current frame F. The relationship between p and m can be written as:

p (Z, K) = K m (Z) p \in I (Z, K)

(2)

where K is the camera intrinsic parameter matrix:

K = [\begin{array}{l} {fk}_{u} & - {fk}_{u} (cot (θ)) & u_{0} \\ 0 & {fk}_{v} / sin (θ) & v_{0} \\ 0 & 0 & 1 \end{array}]

(3)

f is the focal length (in metres), k_u and k_v separately are the magnifications in the ↑ and ↕ direction (in pixels/m), u₀ and v₀ are the coordinates of the principle point (in pixels) and θ is the angle between ↑ and ↕ axes.

2.3 Basic Principle of the Invariant Visual Servoing

The main idea of the invariant visual servoing proposed in [12] is to construct an error function, which is independent of the changes in camera intrinsic parameters and θ is only dependent on the relative pose of the camera/robot, with respect to an observed object on the invariant space. The image Jacobian matrix in the error function describes the relationship between the camera velocity and image features error in the invariant space. Then the visual servoing law can be designed by the error function. Therefore, the construction of the invariant space and image Jacobian matrix are two important elements for invariant visual servoing.

2.3.1 Construction of the Invariant Space

The aim of the construction of the invariant space is to keep the features points invariant to the camera intrinsic parameters. The whole construction process can be divided into five steps as follows.

Step 1: to obtain the pixel coordinates of the target in the desired frame.

Step 2: to chose three non-collinear 3D points χ₁, χ₂, χ₃ among the n non-coplanar points χ_i(i ∊ {1,2,…n}),which are visible on the observed object.

Step 3: to obtain the corresponding three normalized projective points m₁, m₂, m₃ (unit:m) in the current frame and another three points m₁^*, m^*₂, m ^*₃ (unit:m) in the desired frame respectively.

Step 4: to compute the corresponding image pixel coordinate p₁, p₂,p₃and p₁*,p*₂, p ^*₃ respectively according to the camera model defined in (2), as follows:

Q (Z, K) = K M (Z)

(4)

Q^{*} (Z^{*}, K^{*}) = K^{*} M^{*} (Z^{*})

(5)

Non-singular matrices Q and Q* are called as invariant spaces transformation matrices.

Image feature point p and p* in the original image space can be converted into the invariant image feature point and q* in the invariant space via invariant spaces transformation matrix Q and Q* as follow

q (Z) = Q^{- 1} p = M^{- 1} K^{- 1} K m = M^{- 1} m

(6)

q^{*} (Z^{*}) = Q^{* - 1} p^{*} = M^{*} ​^{- 1} K^{*} ​^{- 1} K^{*} m^{*} = M^{*} ​^{- 1} m^{*}

(7)

From the equations above we know that both q and q* do not depend on the camera intrinsic parameters, but are only related to the camera position with respect to the observed object and the 3D structure itself. Now we can use the error in invariant space to control the camera's motion, even if the camera intrinsic parameters change during the visual servoing.

2.3.2 Construction of the Image Jacobian Matrix in the Invariant Space

Just like in the IBVS (image-based visual servoing) approach, the camera is now controlled in the invariant space. The derivative of the vector q is

\dot{q} = J_{q} v_{c}

(8)

J_q is the (3×6) Jacobian matrix in the invariant space and is q similar to the image Jacobian matrix. After the derivative of Equation (6) we get:

{\dot{q}}_{i} = \frac{d Q^{- 1}}{dt} p_{i} + Q^{- 1} {\dot{p}}_{i}

By plugging Equation (6) into the above equation, we can get:

{\dot{q}}_{i} = \frac{d Q^{- 1}}{dt} p_{i} + Q^{- 1} {\dot{p}}_{i} = \frac{d Q^{- 1}}{dt} Q q_{i} + Q^{- 1} {\dot{p}}_{i}

(9)

Since $\frac{d Q^{- 1}}{dt} Q = - Q^{- 1} \frac{d Q}{dt}$ , by plugging this into Equation (9) we could obtain:

\begin{array}{l} {\dot{q}}_{i} = Q^{- 1} ({\dot{p}}_{i} - \frac{d Q}{dt} q_{i}) = Q^{- 1} ({\dot{p}}_{i} - [{\dot{p}}_{1} {\dot{p}}_{2} {\dot{p}}_{3}] [\begin{array}{l} q_{1 i} \\ q_{2 i} \\ q_{3 i} \end{array}]) \\ = Q^{- 1} ({\dot{p}}_{i} - q_{1 i} {\dot{p}}_{1} - q_{2 i} {\dot{p}}_{2} - q_{3 i} {\dot{p}}_{3}) \end{array}

(10)

According to the IBVS we know the derivative of the current image point is ṗ_i =J_iv_c. By plugging this into Equation (10), we can obtain:

J_{q i} = Q^{- 1} (J_{i} - q_{1 i} J_{1} - q_{2 i} J_{2} - q_{3 i} J_{3}), i = 1, ... n

(11)

where J_i is the standard image Jacobian matrix.

\begin{array}{l} J_{i} = [\begin{matrix} - \frac{f}{Z_{i}} & 0 & \frac{x_{i}}{Z_{i}} & \frac{x_{i} y_{i}}{f} & - f - \frac{x_{i}}{f} & y_{i} \\ 0 & - \frac{f}{Z_{i}} & \frac{y_{i}}{Z_{i}} & f + \frac{y_{i} ​^{2}}{f} & \frac{x_{i} y_{i}}{f} & - x_{i} \end{matrix}] ​ ​ i = 1, 2, ... n \\ i = 1, ... n \end{array}

(12)

J_qi can be written just as follows:

J_{q i} = [φ_{i} (Z_{1}, Z_{2}, Z_{3}, Z_{i}) ψ_{i}] [\begin{array}{l} K 0 \\ 0 F (K) \end{array}] ​ ​ i = 1, 2, ... n

Where,

φ_{i} = Q^{- 1} [\begin{array}{l} \sum_{j = 1}^{3} \frac{q_{j i}}{z_{i}} - \frac{1}{z_{i}} 0 \frac{u_{i}}{z_{i}} - \sum_{j = 1}^{3} \frac{u_{j i}}{z_{i}} \\ 0 \sum_{j =1}^{3} \frac{q_{j i}}{z_{i}} - \frac{1}{z_{i}} \frac{v_{i}}{z_{i}} - \sum_{j = 1}^{3} \frac{u_{j i}}{z_{i}} \\ ​ \\ 0                 0                   0 \end{array}]

ψ_{i} = Q^{- 1} [\begin{matrix} u_{i} v_{i} - \sum_{j = 1}^{3} u_{i} v_{i} q_{j i} & \sum_{j = 1}^{3} u_{i} ​^{2} q_{j i} - u_{i} ​^{2} \\ v_{i} ​^{2} - \sum_{j = 1}^{3} v_{i} ​^{2} q_{j i} & \sum_{j = 1}^{3} u_{i} v_{i} q_{j i} - u_{i} v_{i} \\ 0 & 0 \end{matrix} \begin{matrix} v_{i} - \sum_{j = 1}^{3} v_{i} q_{j i} \\ \sum_{j = 1}^{3} u_{i} q_{j i} - u_{i} \\ 0 \end{matrix}] ​

F (K) = [\begin{array}{l} \frac{\sin θ}{f k_{v}} \frac{- \cos θ}{f k_{u}} 0 \\ 0 \frac{1}{f k_{u}} 0 \\ 0 0 \frac{1}{k_{u} k_{v}} \end{array}]

2.3.3 Definition of the error function in the Invariant Space

The error function in the invariant space can be defined as follows:

e = {\hat{J}}_{q}^{+} (s - s^{*})

(13)

whare s = (q₁, q₂,…q_n) is n points of the observed object coordinate vector in the invariant space and current position and s* = (q₁*,q₂*,…q_n*) is the coordinate vector in the invariant space in the desired position of these points. is the image Jacobian matrix in the invariant space and J⁺_q is the pseudo-inverse of the estimated image Jacobian matrix (3n×6). If the visual servoing law is:

u = - λ e

(14)

where λ is a positive scalar factor, the robot can be driven back to the desired position only if s converges to s*, meaning that the end operator of the robot arrives in the desired position.

3. Basic principle of the depth adaptive zooming visual servoing

The invariant visual servoing approach described above provides a good methodology to control the robot with a zooming camera. In this section, we will bring the zoom control and depth adaptive estimation into the invariant visual servoing method and proposed a depth adaptive zooming visual servoing method. The purpose of the proposed method is to keep the observed object always staying in the field of view of camera during the visual servoing through change the camera focus length, and to adaptively estimate the depth of the image feature points on the object during the visual servoing.

3.1 The Basic Structure of the Depth Adaptive Zooming Visual Servoing

The whole servoing structure diagram of the proposed robot depth adaptive zooming visual servoing method can be depicted in Fig. 1. The desired input p* shows the image feature points on the observed object with respect to the camera in the desired position. The transformed points s* are the corresponding features in the invariant image space, which are independent of the internal camera parameters. Similarly, s are features in the invariant image space of the same image feature points in the current frame. The visual servoing law: $u = - λ e = - λ {\hat{J}}_{q}^{+} (s - s^{*})$ can be calculated with the error in the invariant image space. We could control the robot to the goal position from the invariant space to the robot motion space by this visual servoing law. The robot visual positioning task will be accomplished once the image features in the invariant space are consistent with themselves at the desired position ^[16]. The main aim of the zooming control (see details in subsection 3.2) is to ensure the interest object is lying in the field of view of the camera during the visual servoing. The aim of the depth estimation is to adaptively estimate the depth of the image feature points on the object during the visual servoing (see details in subsection 3.3).

Figure 1.

Structure diagram of depth adaptive zooming visual servoing.

3.2 Zooming Control

The purpose of zooming control can be described as follows:

During the robot visual positioning process, zooming out when the object moves away from the camera (even out of the field of view) and zooming in to produce an expanding image of the interest object can achieve high precision positioning [^17–19]. Note that the zooming control can't change the resolution of the image captured from the camera, but can change the local resolution of the interest object.

First, we define δ as the distance from the nearest feature point on the object to the border of the image plane. δ = min(u_i, v_i, u_max -u_i, v_max -v_i), where u_max and v_max are respectively the image dimensions in the ↑ and ↕ direction. We can control the zoom in order to keep the nearest point to the image borders at a given distance. The zooming areas in the image plane are shown in Fig. 2.

Figure 2.

Zooming areas.

If the object is in region 1, i.e., δ is in section 1, in order to improve the image resolution of the interest object, we can enlarge the focal length. Zooming means

{\frac{\partial δ}{\partial t} |}_{δ > δ_{\max}} > 0 .

If the object is in the dark grey area, i.e., δ is in region 2, there is no zooming in this area which can keep all the feature points in the view of the camera stably. That is

{\frac{\partial δ}{\partial t} |}_{δ_{\min} < δ < δ_{\max}} = 0 .

If the object is in the light grey area, i.e., δ is in region 3, it means that the object will escape from the field of view of the camera, so we should decrease zoom in this area to obtain a sufficient field of view of the camera. The corresponding mathematical expression can be written as

\frac{\partial δ}{\partial t} | δ < δ_{\min} < 0 .

According to the different regions where the object is, we can adopt the corresponding zooming control rules to improve the image resolution on the premise of guaranteeing all the features on the object stay in the field of view of the camera.

3.3 Depth Adaptive Estimation

As the description in subsection 2.3 states, the visual servoing law is u = -λe = -λĴ⁺_q (s – s*) and J_q is an image Jacobian matrix with relation to the target depth, which is a time-varying parameter. Depth estimation is a challenging problem in computer vision and robotics. Many studies have presented methods to estimate the unknown depth distance in robot motion in the 3D environment ^{[14, 15]} However, these depth estimation methods are only suitable for fixed camera intrinsic parameters. In this paper, we extend these depth estimation methods to zooming visual servoing, where the camera intrinsic parameter changes during the visual servoing. We propose a depth adaptive estimation method in the invariant space. The basic steps of this method can be described as follows:

Step 1: to transform the image features in the original image space into the ones in the invariant space by using (6) and (7).

Step 2: to choose a state function not only related to the feature error, but also to the depth parameter.

Step 3: to design a nonlinear depth adaptation scheme according to Lyapunov stability and prove its rationality. Next we will depict this depth adaptive estimation method in the invariant space in detail.

We define the function of the whole state ( $(\tilde{s}, \tilde{ε})$ ) as follows:

V (\tilde{s}, \tilde{ε}) = \frac{1}{2} {\tilde{s}}^{T} \tilde{s} + \frac{1}{2 γ_{1}} {\tilde{ε}}^{T} \tilde{ε}

(15)

where s˜ = s – s* and the definitions of s and s* are described in subsection 2.3. The definition of the error vector $\tilde{ε}$ is $\tilde{ε} = {[{\tilde{ε}}_{1}, ... {, \tilde{ε}}_{n}]}^{T}, i =1, ... n$ is $\tilde{ɛ} = {[{\tilde{ɛ}}_{1}, ..., {\tilde{ɛ}}_{n}]}^{T}, i = 1, ... n$ Where ${\tilde{ε}}_{i} = ε_{i} - {\hat{ε}}_{i}$ , $ε_{i}$ is the reciprocal of the target depth ε_i=1Z_i and ∊ = 1/Z_i is the estimated value of ε_i.

In order to ensure system stability, we must address the system satisfy Lyapounov theories.

Observing the matrix J_q, the depth has a proportional relationship with the first three columns of the matrix.

By expanding Equation (15), we can observe:

V (\tilde{s}, \tilde{ε}) = \frac{1}{2} \sum_{i = 1}^{n} {\tilde{q}}_{1i} ​^{2} + {\tilde{q}}_{2i} ​^{2} + {\tilde{q}}_{3i} ​^{2} + \frac{1}{γ_{1}} {\tilde{ε}}_{i} ​^{2}

(16)

The derivative of Equation (16) is:

\begin{array}{l} \dot{V} = \sum_{i = 1}^{n} ({\tilde{q}}_{1 i} {\dot{\tilde{q}}}_{1 i} + {\tilde{q}}_{2 i} {\dot{\tilde{q}}}_{2 i} + {\tilde{q}}_{3 i} {\dot{\tilde{q}}}_{3 i} + \frac{1}{γ_{1}} {\tilde{ε}}_{i} {\dot{\tilde{ε}}}_{i}) \\ = \sum_{i = 1}^{n} (({\tilde{q}}_{1 i} (Q^{- 1} φ_{i} K) |_{r 1} V_{T} + (Q^{- 1} ψ_{i} F (K)) |_{r 1} V_{R} \\ + {\tilde{q}}_{2 i} ((Q^{- 1} φ_{i} K) |_{r2} V_{T} + (Q^{- 1} ψ_{i} F (K)) |_{r 2} V_{R}) \\ + {\tilde{q}}_{3 i} ((Q^{- 1} φ_{i} K) |_{r3} V_{T} + (Q^{- 1} ψ_{i} F (K)) |_{r3} V_{R}) \\ + \frac{1}{γ_{1}} ({\tilde{ε}}_{i} {\dot{\hat{ε}}}_{i} - ε_{i} {\dot{\hat{ε}}}_{i} - {\tilde{ε}}_{i} {\dot{ε}}_{i})) \\ = \sum_{i = 1}^{n} ((ε_{i} ({\tilde{q}}_{1 i} ((Q^{- 1} C_{i} K)) |_{r 1} + {\tilde{q}}_{2 i} ((Q^{- 1} C_{i} K)) |_{r2} \\ + {\tilde{q}}_{3 i} ((Q^{- 1} C_{i} K)) |_{r3}) V_{T} - \frac{{\hat{ε}}_{i}}{γ_{1}}) \\ + (Q^{- 1} ψ_{i} F (K)) |_{r1} V_{R} +(Q^{- 1} ψ_{i} F (K)) |_{r2} V_{R} \\ + (Q^{- 1} ψ_{i} F (K)) |_{r3} V_{R} + \frac{1}{γ_{1}} ({\hat{ε}}_{i} {\dot{\hat{ε}}}_{i} - {\tilde{ε}}_{i} {\dot{ε}}_{i})) \end{array}

If we use the depth adjusting law:

{\dot{\hat{ε}}}_{i} = γ_{1} {(Q^{- 1} (C_{i} K) v_{T})}^{T} {\tilde{q}}_{i} - γ_{2} {(Q^{- 1} (C_{i} K) v_{T})}^{T} e_{i}

(17)

Where, γ₁ > 0, γ₂ >0,i = 1,2,…,n, e_i = d̂_i – d_i* is called as prediction error,d_i =Q⁻¹(C_iK)v_Tε_i, i=1,2,…n.

$γ_{1} > 0, γ_{2} > 0, i = 1, 2, ..., n$ is the derivative of the reciprocal of the depth parameters and $\tilde{ε} = {[{\tilde{ε}}_{1}, ... {, \tilde{ε}}_{n}]}^{T}, i =1, ... n$ is the error of the reciprocal of the depth parameters.

So Equation (17) can be simplified as follows:

\dot{V} = \sum_{i=1}^{n} {\tilde{q}}_{i} {\hat{J}}_{q i} V_{C} - \frac{1}{γ_{1}} {\tilde{ε}}_{i} {\dot{ε}}_{i} - \frac{γ_{2}}{γ_{1}} {‖ Q^{- 1} C_{i} K V_{T} ‖}^{2} {\tilde{ε}}_{i} ​^{2}

(18)

where:

C_{i} = [\begin{array}{l} \sum_{j = 1}^{3} q_{j i} - 10 u_{i} - \sum_{j = 1}^{3} u_{j i} \\ 0 \sum_{j =1}^{3} q_{j i} - 1 v_{i} - \sum_{j = 1}^{3} v_{j i} \\ ​ \\ 0 0                 0 \end{array}]

The camera velocity v_c, relative to the target, can be describe as v_c = [v_T^T v_R ^T]^T, where v_T = [v₁ v₂ v₃]_T is the translation velocity vector of the camera and v_R = [v₄ v₅ v₆]^T is the rotation velocity vector of the camera.

If we define the vector $\hat{\tilde{y}} = {\hat{J}}_{q} ^{T} \tilde{s} \in ℜ^{6}$ , Equation (18) can be reduced to:

\begin{array}{l} \dot{V} =-λ {\tilde{s}}^{T} {\hat{J}}_{q} {\hat{J}}_{q} ​^{+} \tilde{s} - \frac{1}{γ_{1}} {\tilde{ε}}^{T} \dot{ε} - \frac{γ_{2}}{γ_{1}} {‖ {\tilde{ε}}_{γ} ‖}^{2} = -λ {\hat{\tilde{y}}}^{T} ({\hat{J}}_{q} ​^{T} {\hat{J}}_{q})^{- 1} \hat{\tilde{y}} - \frac{1}{γ_{1}} {\tilde{ε}}^{T} \dot{ε} - \frac{γ_{2}}{γ_{1}} {‖ {\tilde{ε}}_{γ} ‖}^{2} \\ ​ \leq - \frac{λ}{{\hat{σ}}_{M} ​^{2}} {‖ \hat{\tilde{y}} ‖}^{2} + \frac{1}{γ_{1}} ‖ \tilde{ε} ‖ ‖ \dot{ε} ‖ - \frac{γ_{2}}{γ_{1}} {‖ {\tilde{ε}}_{γ} ‖}^{2} \end{array}

where ${\hat{σ}}_{M} = \max ({\hat{J}}_{q})$ , ${\hat{σ}}_{m} = \min ({\hat{J}}_{q})$ and ${\tilde{ε}}_{γ} = {[‖ Q^{- 1} C_{i} K V_{T} ‖ {\tilde{ε}}_{1}, ..., ‖ Q^{- 1} C_{i} K V_{T} ‖ {\tilde{ε}}_{n}]}^{T}$

We know that V ≥ 0. If V̇ ≤ 0 in a region then the system with the adaptive estimation method is locally asymptotically stable. If:

\frac{1}{γ_{1}} ‖ \tilde{ε} ‖ ξ \leq \frac{λ}{{\hat{σ}}_{M} ​^{2}} {‖ \hat{\tilde{y}} ‖}^{2} + \frac{γ_{2}}{γ_{1}} {‖ {\tilde{ε}}_{γ} ‖}^{2}

(19)

where $‖ {\tilde{ε}}_{γ} ‖ \geq δ ‖ \tilde{ε} ‖, δ = \min (C_{i} V_{T}) > 0$ . The V̇ is negative in the region:

ℜ = {\tilde{s} \in Ω, \tilde{ε} \in R^{n} : \frac{1}{γ_{1}} ‖ \tilde{ε} ‖ ξ - \frac{γ_{2}}{γ_{1}} δ^{2} ‖ \tilde{ε} ‖ \leq \frac{k {\hat{σ}}_{m}}{{\hat{σ}}_{M} ​^{2}} {‖ \tilde{s} ‖}^{2}}

meaning that the state error s˜ converges to the residual set $B (\tilde{s}, \frac{ξ {\tilde{σ}}_{M}}{2 δ {\tilde{σ}}_{m} \sqrt{{λγ}_{1} γ_{2}}})$ .

As the unknown object depth, if the visual servoing law is $u = - λ e = - λ {\hat{J}}_{q}^{+} (s - s^{*})$ , the adaptive adjustment mechanism can guarantee that the states of its system are uniformly bounded.

4. Simulation Results

In this section we build a robot depth adaptive zooming visual servoing simulation model in a MATLAB environment, based on Peter's robotics toolbox ^[20] and conduct three kinds of robot visual positioning simulation experiments to validate the performance of the proposed depth adaptive zooming visual servoing.

The first simulation experiment aims to demonstrate the rationality of the whole structure with zooming control. The second simulation involves plugging into the depth adaptive method and is in contrast with the first simulation, in order to validate the visual servoing performance of the proposed approach. The third simulation aims to demonstrate the validity of the proposed approach in a different initial position.

4.1 Related Parameter Settings

The camera is in the eye-in-hand configuration. The related parameters used in the simulation experiments are as follows:

the number of the feature points n=12

coordinates of the principle point u₀=256, v₀=256 (in pixels)

magnifications in the direction ↑ and ↑ k_u =20000, k _v =20000 (in pixels/m)

angle between u and v axes” 0=90°

initial focal length f₀=0.015 (in meters)

desired focal length f_d=0.012 (in meters)

coordinates of the 12 image points in the 3D space with different depth points=

[0.5–0.5 2.01; 0.9 0.2 2.04;-0.2–0.2 2;

0.9–0.5 2.01; 0.9–0.9 2.01; 0.5–0.9 2.01;

0.7–0.7 2.01; 0.5–0.2 2.04; 0.7 0 2.04;

0.2 0.2 2; 0.2–0.2 2; −0.2 0.2 2],

the initial camera pose

\begin{array}{l} Tcamera_0 = [0 0.0001 1 0.5; \\ - 0.0001 1 - 0.0001 - 0.3; \\ - 1 - 0.0001 0 0.6; \\ 0 0 0 1] \end{array}

Suppose the robot move 700mm in the X-minus direction, 300mm in the Y direction, 100mm in the Z-minus direction and take the rotation −0.3rad along the Z direction. Then we get the goal position:

\begin{array}{l} Tcamera_e = [0 0.0001 1 0.4; \\ - 0.2956 0.9553 - 0.0001 0; \\ - 0.9553 - 0.2956 0 0.1; \\ 0 0 0 1] \end{array}

The visual servoing law used is $u = - λ e = - λ {\hat{J}}_{q}^{+} (s - s^{*})$ where the control gain λ=0.3 and the image Jacobiań, matrix J_q can be computed by (11).

We don't consider the depth estimation in the first simulation experiment, but consider the depth estimation, with the depth adjusting gains γ₁ = 10², γ₂ =10³ in the second simulation.

4.2 Results and Discussion

4.2.1 Simulation experiment 1: robot zooming visual servoing without depth estimation

In this simulation experiment, 12 image feature points on the target object at the desired position and at the initial position are obtained respectively. Then the robot moves from the initial position to the desired position using zooming visual servoing without depth estimation. The motion trajectory of the 12 feature points in the image plane is shown in Fig. 3. At the very beginning of the servoing, the focal length is 12mm and there are four points outside of the field of view. The camera zooms out in order to assure the interest object points are all in view at the beginning of the servoing. During the late servoing, zooming in improves the image resolution and the final focus value is 17.3mm (Fig. 5 (b)). The simulation results show that the control method can make the error of characteristic vectors in invariant space of the target object zero (Fig. 5 (a)). As a consequence, the positioning error in directions X, Y and Z are Xerror=1.66mm, Yerror=0.59mm, Zerror=6.44mm, respectively. The rotation error converges to 0 (Fig. 4 (a) and Fig. 4 (b)). So the robot can reach the desired position.

Figure 3.

The image points trajectory (blue points) without depth estimation.

Figure 4.

Visual positioning error without depth estimation; (a) Translation error. (b) Rotation error around Z axes.

Figure 5.

Error in the invariant space and change of focal length without depth estimation; (a) The total error in the invariant space. (b) Focal length changes in visual servoing.

4.2.2 Simulation experiment 2: robot zooming visual servoing with depth adaptive estimation

In this section, we conducted another robot 4DOF visual positioning simulation experiment using zooming visual servoing with a depth adaptive estimation method. The experiment parameters used in this experiment are the same as the ones used in the first simulation experiment. The visual positioning results using zooming visual servoing with a depth adaptive estimation are shown in Fig. 6 – Fig. 8. The motion trajectory of the 12 feature points in the image plane is shown in Fig. 6.

Figure 6.

The image points trajectory (blue points) with depth estimation

Figure 7.

Visual positioning error with depth estimation

Figure 8.

Error in the invariant space and change of focal length with depth estimation; (a) The total error in the invariant space; (b) Focal length changes in visual servoing.

The errors of the characteristic vectors in the invariant space of the target object are zero (Fig. 8 (a)) and the focal lengths finally converge to 17.1mm (Fig.8 (b)). The errors along the X, Y and Z axes are Xerror=0.29mm, Yerror=0.35mm and Zerror=2.59mm. The rotation errors converge to 0 and are shown in Fig.7 (a) and Fig.7 (b) respectively. The simulation results show that the positioning error along the Z axis is smaller when depth estimation is considered in the zooming visual servoing. The whole depth converge process is shown in Fig. 10. Comparison results of the convergence performance of the two zooming visual servoing methods are shown in Fig. 9. It can be seen from the Fig. 9 that the zooming visual servoing with a depth adaptive estimation method has a better convergence performance.

Figure 9.

Comparison of the convergence of the two zooming visual servoing method

Figure 10.

Depth convergence curve

4.2.3 Simulation experiment 3: robot zooming visual servoing with depth adaptive estimation in the different initial position.

In order to demonstrate the validity of the proposed approach further. We conduct another robot 4DOF visual positioning simulation experiment with a different initial camera pose.

Suppose the robots moves 300mm in the X-minus direction, 200mm in the Y direction and −200mm in the Z-minus direction.

Then we get the new initial camera pose:

\begin{array}{l} Tcamera_1 = [0 0.0001 1 0.7; \\ - 0.0001 1 - 0.0001 0.5; \\ - 1 - 0.0001 0 0.3; \\ 0 0 0 1] \end{array}

Other experimental parameters used in this experiment are the same as the ones used in the first two simulation experiments.

The visual positioning results using zooming visual servoing with depth adaptive estimation in the different initial camera position are shown in Fig. 11 – Fig. 13.

Figure 11.

Visual positioning error in the different initial camera position

Figure 12.

Error in the invariant space and change of focal length in the different initial camera position; (a) The total error in the invariant space; (b)Focal length changes in visual servoing.

Figure 13.

Depth convergence curve in the different initial camera position

The translation errors along the X, Y and Z axes and the rotation error around the Z axis are shown in Fig. 11 (a) and Fig. 11 (b) respectively. The errors of characteristic vectors in the invariant space of the target object are zero (Fig. 12 (a)) and the focal length finally converge to 17.1mm in Tcamera_0 and 23.0mm in Tcamera_1 (Fig. 12 (b)). The whole depth convergence process is shown in Fig. 13. The simulation results show that our approach has good visual positioning performance in the different camera initial positions.

The simulation results show that the zoom control method can be used in the invariant visual servoing system. The purpose of the zooming control is to extend the camera's field of view when the target is moving out of range and improve the local resolution of the object of interest when the target is in range of the camera view. In addition, the proposed depth adaptive zooming visual servoing method can guarantee closed-loop stabilization to achieve the 4-DOF robot visual positioning task. It can be seen from Table 1, in contrast to simulation 1, the second simulation system with a depth adaptive estimation has higher visual servoing precision and the whole state of the robot visual servoing system is bounded, though the depth converges to a stable value instead of the true value.

Table 1.

The translation convergence errors along X, Y, Z axes in the two simulations experiments.

	Error(mm)			Run time(s) (in MATLAB)
	X	Y	Z	Run time(s) (in MATLAB)
Simulati on 1	1.66	0.59	6.44	22.84
Simulati on 2	0.29	0.35	2.59	21.82

Table 2.

The translation convergence errors along X, Y, Z axes in the two different camera initial positions.

Error(mm)	X	Y	Z
Simulation 2	0.29	0.35	2.59
Simulation 3	0.49	0.31	3.61

5. Conclusion

In this paper, a depth adaptive zooming visual servoing method is proposed to solve view visibility problems. In order to enlarge the field of view, the visual servoing system in the invariant space with zooming control is performed. This could improve the accuracy of image resolution of the target. In order to overcome the defects of the constant depth method, a nonlinear depth adaptive estimation method for robot zooming visual servoing is proposed. With robustness in the invariant space, this adaptive method can improve the accuracy of the servoing system under the zooming condition. The adaptive adjustment mechanism can guarantee that the states of its system are uniformly bounded. The simulation results of robot 4DOF visual positioning show that the proposed method has a better convergence performance for the image error.

No specific feature extraction algorithm is considered in the simulation experiment and some simulation points are defined as the image feature points. The performance of the feature point's extraction will affect the final robot visual servoing. In some cases, the whole robot visual servoing would become unstable, if there is a big mistake while extracting these points. Changing lighting conditions would affect the performance of the feature point's extraction. In addition to a robot/camera system with eye-in-hand configuration, the image captured by the camera will undergo some deformations because of the change in robot pose and camera focus length during the robot zooming visual servoing. These deformations can be locally well approximated by affine transformations of the image plane. In the future, we will apply the affine invariant feature extraction algorithm ^{[21, 22]} to extract the affine invariant feature points on the object and extend the proposed robot depth adaptive zooming visual servoing method to real nature scenes.

Footnotes

6. Acknowledgment

This work is supported by the National Natural Science Foundation of China under Grant No. No. 61203345, No. 61174101, the Shaanxi Provincial Natural Science Foundation of China under Grant No. 2009JQ8011 and Educational Commission of Shaanxi Province of China under Grant No. 2010JK737.

References

Chaumette

& Hutchinson

(2006) Visual Servo Control, Part I: Basic Approaches. IEEE Robotics & Automation Magazine 13(4):82–90.

Fayman

J. A.

Sudarsky

& Rivlin

(1998) Zoom tracking. Proceedings of the 1998 IEEE International Conference on Robotics and Automation. Leuven, Belgium:,2783–2788

Fang

Y. C.

(2008) A survey of robot visual servoing. CAAI transaction on intelligent systems 3(2):109–113

Mezouar

& Chaumette

(2000) Path planning in image space for robust visual servoing. Proceedings of the 2000 IEEE International Conference on Robotics and Automation, San Francisco: 2759–2764.

Schramm

& Morel

(2006). Ensuring visibility in calibration-free path planning for image-based visual servoing. IEEE Transaction on Robotics 22(4): 848–854

Yao

& Gupta

(2007) Path planning with general end-effector constraints. Robotics and Autonomous Systems 55(4): 316–327

Chesi

(2009) Visual Servoing Path Planning via Homogeneous Forms and LMI Optimizations. IEEE Transactions on Robotics,25(2): 281–291

Deng

Janabi-Sharifi

& William

(2005) Hybrid Motion Control and Planning Strategies for Visual Servoing. IEEE Transactions on Industrial Electronics 52(4): 1024–1040.

Kazemi

Gupta

& Mehrandezh

(2010) Path Planning for Visual Servoing: A Review and Issues. Visual Servoing via Advanced Numerical Methods, Lecture Notes in Control and Information Sciences 401: 189–207

10.

Tordoff

B. J.

& Murray

D. W.

(2007) A method of reactive zoom control from uncertainty in tracking. Computer Vision and Image Understanding: 131–144

11.

Kumar

Micheloni

& Piciarelli

(2009) Stereo Localization Using Dual PTZ Cameras. Proceeding of IEEE International Conference on Computer Analysis of Images and Patterns, Munster, Germany

12.

Malis

(2004) Visual servoing invariant to changes in camera-intrinsic Parameters. IEEE Transaction on Robotics and Automation 20(1): 72–81.

13.

Malis

& Benhimane

(2003) Vision-based control with respect to planar and non-planar objects using a zooming camera. Proceeding of IEEE International Conference on Advanced Robotics, Coimbra, Portugal.

14.

Conticelli

& Allotta

(2001) Nonlinear Controllability and Stability Analysis of Adaptive Image-Based Systems. IEEE Transactions on Robotics and Automation 17(2):208–214.

15.

Zachi

A. R. L.

Liu

& Lizarralde

(2004) Performing stable 2D adaptive visual positioning/tracking control without explicit depth measurement. Proceedings of the 2004 IEEE International Conference on Robotics and Automation 3: 2297–2302

16.

J. B.

(2010) Uncalibrated robotic hand-eye coordination. Beijing: Electronic industry press

17.

Hatano

& Hashimoto

(2003) Image-Based Visual Servo Using Zoom Mechanism. SICE Annual Conference, Fukui.

18.

Benhimane

& Malis

(2003) Vision-based control with respect to planar and non-planar objects using a zooming camera. The 11th International Conference on Advanced Robotics, Coimbra, Portugal

19.

Radchenko

Pape

& Graser

(2003) Visual servoing with adjustable zoom cameras. The 8th International Conference on Rehabilitation Robotics: 22–25

20.

Corke

P. J.

(2007) MATLAB toolboxes: Robotics and vision for students and teachers, IEEE Robotics and Automation Magazine 14(4): 16–17

21.

Morel

& Yu

(2009) ASIFT: A new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences 2(2): 438–469

22.

& Morel

(2011) ASIFT: An Algorithm for Fully Affine Invariant Comparison. Image Processing On Line: 47–52