Abstract
In this paper we present a visual-control algorithm for driving a mobile robot along the reference trajectory. The configuration of the system consists of a two-wheeled differentially driven mobile robot that is observed by an overhead camera, which can be placed at arbitrary, but reasonable, inclination with respect to the ground plane. The controller must be capable of generating appropriate tangential and angular control velocities for the trajectory-tracking problem, based on the information received about the robot position obtained in the image. To be able to track the position of the robot through a sequence of images in real-time, the robot is marked with an artificial marker that can be distinguishably recognized by the image recognition subsystem.
Using the property of differential flatness, a dynamic feedback compensator can be designed for the system, thereby extending the system into a linear form. The presented control algorithm for reference tracking combines a feedforward and a feedback loop, the structure also known as a two DOF control scheme. The feedforward part should drive the system to the vicinity of the reference trajectory and the feedback part should eliminate any errors that occur due to noise and other disturbances etc. The feedforward control can never achieve accurate reference following, but this deficiency can be eliminated with the introduction of the feedback loop. The design of the model predictive control is based on the linear error model. The model predictive control is given in analytical form, so the computational burden is kept at a reasonable level for real-time implementation. The control algorithm requires that a reference trajectory is at least twice differentiable function. A suitable approach to design such a trajectory is by exploiting some useful properties of the Bernstein-Bézier parametric curves. The simulation experiments as well as real system experiments on a robot normally used in the robot soccer small league prove the applicability of the presented control approach.
Keywords
Introduction
Visual servoing (VS) is a technique which uses an image sensor in a feedback loop for motion control of a robot. The field of VS combines robotics, machine vision and control theory. An extensive overview of the VS applications and methodology was given by Corke [1, 2], Chaumette and Hutchinson [3, 4], Kragic and Christensen [5], and Chang [6] etc. Visual servoing can be found in a variety of applications: cooperative movement of mobile soccer robots [7], navigation of autonomous mobile robots [8, 9], docking of autonomous water surface vehicles [10], helicopter and quadrocopter hovering and guidance [11–14], autonomous landing of aeroplanes [15–17], attitude control of satellites [18], grasping and movement of objects [19, 20], and keeping the relative view in a dynamic environment [21] etc.
The VS approaches are normally divided into three main groups [1]: position-based visual servoing (PBVS), image-based visual servoing (IBVS) and hybrid visual servoing. The methods differ in the definition of the control error. In the PBVS case, the control error is defined as the difference between the desired and current pose in the world (Euclidean) space. As opposed to the classical approach of the PBVS, the IBVS (implemented in this paper) defines the control error between the desired and current pose directly in the image coordinate frame (in pixels). The PBVS usually generates better motion, but if the system is not precisely calibrated it may not be able to eliminate the steady-state error. The IBVS can usually achieve an error-free motion, but the generated motion may not be optimum or may sometimes even produce unnecessary or even undesirable motion. The control algorithms that try to combine the useful properties of both approaches are called hybrid VS methods [22]. The VS algorithms may further be divided based on the configuration of the camera and the robot: eye-in-hand and eye-to-hand configuration; and the number of cameras: one camera, two cameras (stereo configuration), multiple cameras (more than two). Some visual servoing approaches take advantage of some special structural properties that can be used implicitly in the design of the control algorithm, e.g., planar surfaces [23]. In this paper we consider an eye-to-hand configuration with one camera. The mobile robot is observed by an overhead camera that is placed at arbitrary inclination with respect to the ground plane. In other words, the arbitrary positioned camera is used to provide information about the position of the mobile robot that moves on a flat surface.
A camera is considered to be an inexpensive and non-invasive sensor, and the information about the environment it provides is extremely rich compared to the other sensors, e.g., laser or ultrasonic distance sensors. This makes the camera an extremely appealing sensor for a broad range of applications. The huge amount of data the images can provide represent a challenge, particularly when it comes to extracting the relevant information for the specific task in real-time. A classical approach to the design of visual servoing consists of image acquisition, image segmentation and classification, high-level reasoning and decision-making, movement planning and, finally, execution of appropriate actions [1]. Although some approaches for visual servoing describe the task on a high (abstract) level [24], the modern approaches consider the action generation directly on the acquired image features to reduce the computational burden. The main idea is that the complex image signal can be described by a relatively small set of image features (like SIFT [25]), and these features are then used in the controller for the calculation of the appropriate actions [8, 18]. However, the classical high-level approach is useful in a process when the system is learning a new task, but when the task is learned, the system should carry it at low level to speed up the execution. In this paper, the machine vision is given a task of measuring the position of the robot in the image frame. To simplify and speed up the segmentation of the robot from the surrounding environment, the robot can be equipped with a marker (special colour or pattern) that can be indistinguishably detected by the image recognition system [7, 13].
The problem of designing a controller for trajectory-tracking (smooth movement along the predefined path) is one of the fundamental problems in robotics. Another fundamental problem in robotics is posture stabilization (movement from point-to-point), but we do not discuss this here. Over the years many different approaches have been developed to tackle the trajectory-tracking problem [26–29]. Some methods for trajectory-tracking are also able to accommodate the path on-line for obstacle avoidance [30, 31]. A good overview of the path planning methods can be found in [32].
The main emphasis of this paper is on the following topics:
The trajectory-tracking task is supposed to be given in the image space, and hence the IBVS is adopted as the control scheme. The design of the control algorithm takes into account arbitrary inclination of the camera with respect to the ground plane. The system states are estimated from the delayed measurements using a Kalman filter. The states of the robot needed in the control law are all estimated in the image frame (no explicit conversion to the world frame is made). The trajectory-tracking control law is developed in the model predictive framework. The entire control algorithm is designed in the discrete space for optimum performance on a digital computer. For the path planning, the use of parametric curves under perspective projection is studied. The overall control algorithm was experimentally tested for robustness.
This paper is structured as follows. Section 2 gives an overview of the system along with all the mathematical equations describing the mobile robot and camera. This is followed by section 3 which presents the design of the controller. The system is linearized with the introduction of a non-linear compensator, which is described in section 3.1. In section 3.2 the model predictive control for trajectory-tracking is presented. Since all the states required by the control algorithm are not directly measurable, a state observer in the form of a Kalman filter is needed, which is presented in section 3.3. In section 4 the approach for designing a trajectory based on Bernstein-Bézier splines is presented. Afterwards, section 5 presents experimental results. Finally in section 6, some conclusions are drawn and ideas for future work are presented.
System overview
If not specified differently, we use small bold letters (e.g., x) for column vectors, and big bold letters for matrices (e.g., X). If needed, we use the subscript (·) w to denote the world coordinate frame, (·) p for the picture coordinate frame, and (·) c for the camera coordinate frame. To describe the position and orientation of an object in a plane, we use generalized coordinates q T w = [x y φ] and q T p = [u v θ] with respect to the world and image (picture) frame, respectively. For denoting just a point in a plane we use p T w = [x y 1] and p T p = [u v 1] with respect to the world and image coordinate frame, respectively. Note that we distinguish between bold and regular face symbols, and that we may use the same symbol to denote points in homogeneous and non-homogeneous coordinates interchangeably.
The system consists of a two-wheeled differentially driven mobile robot and a camera that observes the robot from an inclined angle with respect to the ground plane, on which the robot can move freely. Next, we describe the kinematic model of the robot and projective transformation of the camera.
Mobile robot kinematics
The robot's architecture is shown in Figure 1. The kinematic motion equations of a two-wheeled differentially driven mobile robot are the same as those of a unicycle [32]. The kinematic model has a non-integrable constraint:

Two-wheeled differentially driven mobile robot.
which results from an assumption that the robot cannot slip in a lateral direction. The
The transformation between the world point p T w = [x y z 1] and the corresponding point in the image frame p T p = [u v 1] (in pixels) can be described by a pinhole camera model [33]:
where the λ is a scalar weight, the matrix S holds intrinsic camera parameters, while the matrix
If the world points are confined to a common plane, the relation (3) simplifies. Without loss of generality, we can assume that the plane spans the axis vectors x and y (z = 0):
where we have taken advantage of the notation by redefining the p
T
w
= [x y 1]. The matrix
which means that the coordinates of a point in the world plane are just scaled and translated to the image: u = sux + tu and v = svx + tv; where the tuples of values (su, tu) and (sv, tv) represent the scaling and translation factor in the horizontal and vertical direction, respectively. Such a canonical configuration can greatly simplify the design of the in plane object tracking, and it is the configuration used in mobile soccer small league [7]. Denoting HT = [h1 h2 h3], from the equation (4) follows:
Note that the denominator in the equations (6) is equal to the factor λ = h T 3 pw in the equation (4).
The whole system given with the equations (2) and (6) is multi-variable and non-linear. Furthermore, the system (2) is non-holonomic which makes the visual servoing non-trivial since Brockett's theorem (1983) shows that no linear time-invariant controller can control it [2].
The design of the control law is divided into several subsystems. First, it can be shown that the system under consideration, given in the equations (2) and (6), is a differentially flat system with respect to the position of the robot in the image reference frame [27]. Therefore, the system can be linearized by a non-linear dynamic compensator. Based on the obtained linear model, a model predictive control for trajectory-tracking is derived. Since some states needed in the controller are not directly measurable, they must be estimated, and this is achieved with the use of a Kalman filter. The overall control scheme is shown in Figure 2.

Control scheme. (Operator D denotes the differentiation.)
How the property of differential flatness can be used in the design of trajectory-tracking and posture stabilization controllers for mobile robots has already been shown in [34]. The control method was later extended to a case where the camera is observing the mobile robot from an arbitrary inclination [26, 27]. Here, we give a brief summary of the approach for dynamic feedback linearization of the system.
A general guideline to obtaining a dynamic feedback compensator is to successively differentiate the system outputs until the system inputs appear in a non-singular way [34]. At some stage, an introduction of integrators on some of the inputs may be necessary to avoid subsequent differentiation of the original inputs.
Let us find the first derivative of the equation (6) with respect to time:
where F ∊ ℝ2 × ℝ2, since the matrix
All the inputs have not yet appeared in the equation (8), so we need to continue with the differentiation. However, another differentiation of the equation (8) would differentiate the system input v, so we need to introduce a new state ξ = v before continuing. The second derivative is then:
In the equation (9) the other system input (the angular velocity
It can be shown that in a case where the tangential velocity differs from zero v = ξ ≠ 0, and the matrix
The elements of the matrix
all the terms on the right-hand side of the equation (11) can also be expressed in terms of the values in the image frame:
Since the number of differentiations of all the outputs (2 + 2) equals to the order of the original system plus the number of added integrators during the process of differentiation (3 + 1), the overall extended system can be written in the following form:
for i ∊ {1, 2}, where
With respect to the new inputs u and flat outputs pp, the extended system is not only linear but also without input-output cross-coupling. The system is represented with two uni-variable subsystems, each subsystem consists of two integrators connected in series — the scheme known as chain form.
The feedback part of the controller is designed for each of the subsystems, indexed i = {1, 2} in the equation (13), separately. Since both subsystems in the equation (13) have the same form, we omit the writing of the subscript index i.
We are to develop the controller for the extended system as a sum of the feedforward and feedback actions u = uF + (–uB). We assume the reference signal has the same dynamics as the subsystem in the equation (13):
where the state-error vector has been introduced: ∊ = xr – x.
In the development of the error model predictive controller, the discrete equivalent of the error model (15) is needed. By expanding the step-invariant transformation equations into a Taylor series [35, p. 52], the following discrete error model with the sample time Ts is obtained:
where all the matrices are:
The predictive control can be formulated as an optimization problem where we search for the control signal that minimizes some penalty function over finite prediction horizon h > 0 [28, 31]:
where we have defined the states' error
At the current time step k an estimate of the error based on the model (17) at the future time moment k + h can be obtained provided known current states and inputs until the future time k + h − 1:
where
where the predicted states are gathered in the augmented vector
and the unknown control inputs in
The matrices
Stacking the reference error states over the prediction horizon h into a vector
the penalty function (20) can be rewritten:
where
The operator ⊗ denotes the direct sum. We have the ability to choose the reference error vector
Another option is to choose
With a search for the minimum of the penalty function
According to the receding horizon control strategy, at the time instant k only the first m inputs are applied to the system: u
B
(k) = [
We assume that the image recognition system can provide us with the information about the robot position given in the image frame, however, all the states required for the control are not directly measurable from the image. The unmeasurable states must be estimated, and because we have written the system in a linear form, a Kalman filter can be used. In visual servoing systems, delays are always present, so here we present a design of a Kalman filter that can take delays into account [36].
We assume the outputs of the system (18) are disturbed by a normally distributed white noise with zero mean and covariance matrix
According to the error model (18), the predictions from the delayed step to the current step can be made:
where we have gathered the last d control signals into augmented vector
where
When the Kalman filter is used, it is important that we have accurate estimation of the covariance matrices, otherwise the Kalman filter may return state estimations that are either overconfident or too pessimistic [37].
In multitasking and/or distributed systems it may not be easy to establish sampling time with the constant period which can lead to large errors when we use the equation (36). If the model (18) is known for time-dependent sampling times, the problem can be solved by measuring the time between the samples [13, 36].
In the development of the control law in section 3.1 we have assumed the reference signal comes from the space of twice differentiable functions. One way of determining the reference trajectory is in the form of a parametric curve (dependent on time) [27, 31, 32]. A convenient way of constructing a reference trajectory is with the use of the Bernstein-Bézier (BB) parametric curves [31]. The BB curves are completely determined with a set of control points, and the number of the control points determine the order of the BB curve. High-order BB curves can be numerically unstable, but it usually suffices to use just low-order curves, since an approximation of an arbitrary curve can be achieved with gluing together more low-order BB curves to obtain a BB spline.
A general D-dimensional BB curve r T = [r1 r2 … rD] of order b ∊ 𝒩 in parametric form is defined as follows:
where the parameter λ represents the normalized time which takes the real values from the interval [0, 1] and it is related to the relative time t with the linear equation t = λΛ, where Λ is the time it takes to reach from the beginning (λ = 0) to the end (λ = 1) of the curve. The BB curve (41) is of class C∞ (smooth curve), and it is completely defined by a set of control points {p T i = [p1, i p2,i … pD,i]}i=0,1,…,b that form a Bézier polygon. The BB curve is always bounded by the convex envelope of the Bézier polygon. The derivative of the BB curve with respect to relative time t of order d ≥ 0 is as follows:
The BB curve (41) always begins (λ = 0) at the first control point p0 and ends (λ = 1) at the last control point pb; the so-called end point interpolation property. All the other control points in general do not lie on the BB curve. Next, we show how the other control points influence the behaviour of the curve at the end points. At the end points, the following limits can be derived for d ≥ 0, b ≥ d:
The derivative of order d at the end points is dependent only on the d first or last control points, respectively.
Let us confine ourselves to a two-dimensional space
The tangential velocity along the curve is calculated as follows:
and from the differentiation of the (45), the angular velocity along the curve is obtained:
We consider only the relations at the starting point of the BB curve (41) (λ = 0), since the relations at the end point can be obtained in the same way. Taking into account the equations (42) and (43), the tangential velocity (46) at the starting point of the BB curve can be expressed in terms of the first two control points:
We assume the tangential velocity never reaches zero v(t) ≠ 0. The orientation (45) at the starting point is then given as:
and the angular velocity (47) at the starting point as:
The relation (50) is the equation of a line that is parallel to the line that passes through the control points p0 and p1. When the point p2 lies on the line passing through the points p0 and p1, the angular velocity ω0 becomes zero. It can be shown that the lines perpendicular to the line that is passing through the points p0 and p1 represent the lines of constant tangential acceleration
When the following condition is met p2 = 2p1 – p0, the angular velocity ω0 and tangential acceleration a0 are both zero.
The equations (48) to (51) can be expressed more conveniently in terms of the wanted properties at the starting point of the curve:
The minimum order of the BB curve required to design a curve with the desired values of the properties (48) to (51) at the beginning and end of the curve, independently, is five. In the case of a higher-order curve, the additional central control points can be used to change the shape of the curve without changing the desired properties at the curve boundaries. For example, these additional control points can be positioned in a way that achieves an obstacle-free path [31].
To summarize, the first control point defines the curve starting position, the first two control points define the orientation φ0 and tangential velocity v0, and the first three control points define the angular velocity ω0 and tangential acceleration a0. Hence, the first three control points (p0, p1, p2) can be positioned by specifying the starting position, orientation, tangential and angular velocity, and tangential acceleration. In the same way the last three control points (p b , pb–1, pb–2) define the curve at the end point.
To define an arbitrary curve one could increase the order of the BB curve, but this may introduce numerical instability. A better approach for defining an arbitrary path is to compose it from a set of connected low-order curves. This also allows us to define additional properties at the joint points as shown in section 4.1. The connections should be carried out in way that achieves continuity of the curve and its derivatives up to some order. We consider three cases for joining two consecutive BB curves, demanding the continuity of the curve, continuity of the first order derivative and also continuity of the second order derivative. We describe how all these demands are related to the so-called tangential and angular velocities of the curve at the junctions. We denote two consecutive curves with r j (λ) and rj+1(λ), respectively. Here we allow the relative time to be dependent on the curve part t j = Λ j λ which gives us an additional degree of freedom. We also assume that the relative time t is reset to zero at the beginning of each BB curve.
To achieve continuity of a spline the following condition has to be met:
which yields the condition for selecting the first control point in the next curve part:
This means that the curve is continuous, but the orientation of the tangent coming into the junction may or may not be the same as the one leaving the junction,
To achieve continuity of the BB spline up to the first order in addition to the condition (55) the following condition has to be met:
which yields an additional condition for selecting the second control point in the next curve part:
Assume that the tangential velocity at the end point of the previous curve is not zero, v b,j ≠ 0. Then, in addition to the continuous position, the orientation and tangential velocity are also continuous at the junction point, φ b,j = φ0,j+1 and v b,j = v0,j+1; but the angular velocity and tangential acceleration may still not be continuous at the junction point.
To achieve continuity of a spline up to the second order in addition to the conditions (59) and (57) the following condition has to be met:
which yields an additional condition for selecting the third control point in the next curve part:
In this case, in addition to the continuity of the position, orientation and tangential velocity, the angular velocity and tangential acceleration are also continuous at the junction point, ωb,j = ω0,j+1 and ab,j = a0,j+1.
Since the perspective transformation of homography (4) is a non-linear transformation, the BB curve (41) cannot be transformed through homography from the world to image frame, or vice versa, in terms of the control points. However, the curve and also the curve's derivatives can be transformed from one frame to another, and the homography transformation does not break apart the continuity of the curve to the specific order, since the continuity and smoothness are properties of the curve that are invariant under homography transformation.
Since the Bernstein basis polynomials Bi,b(λ) form a partition of unity:
It is clear that the rational expression (61) cannot be written in a polynomial form like (41), so the transformed BB curve (rational BB curve) cannot be written in terms of the control points in the transformed frame.
Two distinctive approaches can be used in the design of the reference trajectory. The reference trajectory can be designed in terms of the control points in the world plane, and then transformed into the image frame according to the equation (61). This approach requires knowledge of the homography matrix
We first tested the presented control algorithm in a simulation environment. The camera was positioned to some arbitrary location in space, in a way that the ground plane was in the camera's field of view. We defined the reference trajectory with the fifth order BB spline, and demanded from the trajectory to have continuous derivatives up to the second order — which means that the tangential and angular velocity are both continuous functions. We selected the time length of all the BB curves in the spline to be Λ = 5 s.
To the outputs of the system a normally distributed white noise with zero mean and a variance σ n = 2 px2 was added. The sampling time was selected to be Ts = 0.1 s. The system was supposed to have a delay of two samples. The initial tangential velocity in the dynamic feedback compensator was set to the tangential velocity at the beginning of the reference trajectory in the world space. The error model predictive controller was initialized with the following values:
The Kalman filter was initialized with the following values of the covariance matrices:
The robot was displaced from the starting location of the reference trajectory (non-zero initial condition).
With the selected parameters, we made several experiments. For the control performance assessment, we defined several performance evaluation functions:
where e u (k) = r u (k) – u(k) and e v (k) = r v (k) – v(k), and c can be either v or ω. The criterion functions SSE and SAE evaluate the trajectory-tracking error, while the cost functions SSdU c and SAdU c evaluate the control effort.
Since the control algorithm demands knowledge of the homography (the mapping between the points in the image and world plane), we first evaluated the robustness of the proposed algorithm to an imprecise estimation of the homography matrix. The bold quadrilateral in Figure 3(b) shows the boundaries of the plane that are visible in the image, and the other ten thin quadrilaterals represent the estimated visible area in the image — each quadrilateral represents one homography. The performance of the tracking algorithm for all the inaccurately estimated homography matrices is depicted in the other three figures in Figure 3, and in table Table 1 some statistics about the control performance for all the cases is gathered.

Trajectory-tracking (a) with respect to the imprecise estimation of the homography; (b) boundaries of the plane that are visible in the image for different estimations of the homography (quadrilateral with strong edges corresponds to the true homography); (c) extended inputs and (d) control inputs. In all the figures all ten signals are overlaid.
Evaluation of the trajectory-tracking performance with respect to the imprecise estimations of the homography.
The model predictive control algorithm can be tuned in terms of the prediction horizon. To evaluate the influence of the prediction horizon h on the control quality, we considered cases where the prediction horizon takes values in the range from one to ten. The trajectory-tracking performance is evaluated in Table 2, and also shown in Figure 4 for the length of the prediction horizon one and five. The measurements were repeated twenty times. In this case we assumed that the homography matrix is known precisely.
Evaluation of trajectory-tracking performance with respect to different lengths of the prediction horizon h.

Trajectory-tracking (a), (c) for the length of the prediction horizon h = 1 (top row) and h = 5 (bottom row); (b), (d) control inputs. In all the figures all the signals from 20 repeats of the experiment are overlaid.
In all the experiments considered the output of the system was corrupted by white Gaussian noise of variance σ2 n = 2 px2. Furthermore, we also evaluated the performance due to different levels of noise, and the results are gathered in Table 3. We again assumed that the homography matrix is precisely known. Figure 5 shows reference-tracking in a case where the level of the noise is σ2 n = 3 px2, and this figure can be compared to Figure 4(c), Figure 4(d) where the value of noise is σ2 n = 2 px2, but other simulation conditions are the same.
Evaluation of trajectory-tracking performance with respect to different levels of noise.

Trajectory-tracking (a) with respect to different levels of noise; (b) control inputs. In all the figures all the signals from 20 repeats of the experiment are overlaid.
To test the performance of the tracking controller on the real system, we implemented a simple image-tracking algorithm. To simplify the image-based robot tracking, we equipped a robot with a colour marker that can be recognized easily by the image recognition system [7]. The position of the robot is supposed to be at the centre of the largest detected patch. The recognized robot in the image is shown in Figure 6(a). In Figure 6(b) the results of the trajectory-tracking on the real system are presented. To improve readability, only every fifth sample in Figure 6(b) is marked with a symbol, and all the samples are connected with a line.

Trajectory-tracking on a real robot: (a) view from the camera with the detected robot and (b) trajectory-tracking results in the image frame.
We presented the design of a visual controller for trajectory-tracking of a mobile robot that is observed by an overhead camera that can be placed at arbitrary inclination with respect to the ground plane. An image-based visual servoing principle was used in the design of the control algorithm. The presented algorithm includes a Kalman filter for state estimation and a model predictive control for reference-tracking. Since the proposed control algorithm requires a special trajectory that is at least twice differentiable, we gave an extensive description for the use of Bernstein-Bézier curves to tackle the trajectory design problem. The presented control algorithm was designed in the discrete-time space.
The control algorithm was derived assuming the homography
Another important parameter of the control algorithm is the length of the model predictive control horizon h. The evaluation of reference tracking performance with respect to the length of the prediction horizon (Figure 4 and Table 2) shows that good tracking performance is not achieved when the control horizon is only of unit length. When we increase the length of the control horizon the tracking performance improves, but too long a control horizon may smooth the trajectory sharp turns more than is desirable. The longer control horizon also increases the control high-frequency gain (Figure 4(d)). In our case, the control horizon of length five seems to give good trade off between tracking performance and the control gain (Table 2).
The measured position of the robot in the image, obtained by the image-tracking algorithm, is inherently corrupted with noise. The experimental results in Figure 5 and Table 3 show the influence of the noisy measurements on the trajectory-tracking performance, and confirm that the controller is robust to noise.
The simulation experiments as well as experiments made on the real system (Figure 6) confirm that the presented control approach is suitable for solving the trajectory-tracking task. The results prove the applicability of the presented control design, even in the case of non-ideal conditions.
