Abstract
This paper presents an analysis of planar bearing localization and mapping for visual servoing with known camera velocities. In particular, we investigate what is the subset of camera locations and environmental features that can be retrieved from dynamic observations obtained by a planar bearing sensor (nearly e.g., a pinhole camera). Results assume that the camera's linear and angular velocities are available, which is equivalent to consider a unicycle vehicle carrying an onboard camera. Results hold if other system inputs are considered, e.g., an omnidirectional vehicle. The theoretical results may guide the design of nonlinear observers to estimate the variables of interest in real time to be applied to visual servoing schemes. An example of such an observer is discussed and simulated.
Keywords
1. Introduction
Vision systems are versatile, powerful, and cheap, providing a minimal sensing framework for dealing with fundamental robotic problems such as localization, environment mapping and robot motion. A quite accurate measurement that can be collected from a vision system is the horizontal bearing. This paper aims at an analytical description of the information, i.e., robot locations (localization problem) and environment landmark positions (mapping problem), that can be inferred from observed landmarks with planar bearings.
It is well known that the observability of localization and landmark positions, a problem known as Simultaneous Localization and Mapping (SLAM), is granted when using stereo cameras [1]. With known configuration of the stereo pair, observability is preserved even in the static case [2]. This fact is mainly due to the stereo camera capability of providing more than just scene appearance by capturing three-dimensional images, undoubtedly more informative than images grabbed from monocular cameras. However, the larger amount of information is obtained at the cost of an increased complexity of the system, for which fine calibration of the stereo pair as well as a complex image processing algorithm are unavoidable.
In order to bound the system complexity, we are aiming at determining the minimal amount of information needed by a vision system in order to solve the localization and mapping problem. In particular, we analyze images coming from a monocular camera fixed on a robot chassis in order to retrieve planar bearing measurements and then retrieve the system observability (the knowledge of the system inputs is given for granted). In the case of vision problems, the observability problem is the first and main question to investigate in order to verify in which conditions visual servoing is a possibility. In this respect, some results have been presented in literature, in which the observability is treated by design in a monocular SLAM for servoing approach as in [3, 4]. This paper follows a different direction by analyzing in details the observability and mapping problems as a function of the knowledge of the position of the observed features.
While the observability question can be investigated using system-theoretic tools [5], a specific approach for vision problems has been presented only recently. The first observability analysis of the monocular SLAM problem using planar bearing measurements has been discussed in [6, 7], where landmark positions are considered known. Among the others, a characterization of the observability analysis have been presented in [8] for bearing only measurements with unknown landmark motions, in [9] for multi-robot localization and in [10] for on-line parameter identification and odometry self calibration. In [11] only one landmark is used for localization, assuming that vehicle orientation w.r.t. a fixed reference frame is available.
The knowledge of the input signals is not necessary for localization if structure from motion (SFM) [12] techniques are adopted. In such a case, the camera trajectory in space is reconstructed from a series of images. Building a map using SFM is time-consuming and hence it is usually carried out off-line, while localization with SFM is faster if the map is previously built. An alternative interesting method, called Visual Odometry, has been proposed by Nistér [13, 14], where motion estimation is performed through selected landmarks tracking. This way, stereo or simple cameras motion can be computed in real time using only visual data.
In practice, the main difference between visual odometry and SFM is that the latter was originally conceived an off-line algorithm. However, apart from the implementation differences, from a theoretical point of view monocular visual odometry and monocular Visual SLAM [15] can both be seen as a particular solution of SFM. One drawback of SFM, and visual odometry as well, is the strong assumption on the environment and on the camera motion: both assume the rigidity in the scene and the constant velocity of the camera along its trajectory. Preliminary results that overcome these assumptions have been presented in [16], where an unknown input observability analysis is proposed for measurements taken from 3 known landmarks, without any other information.
In this paper a detailed and complete analysis of the localization and mapping observability problem assuming planar bearings is presented following the same methodology of [16] and assuming general configurations of the observed landmarks with both known and unknown positions. Configurations that are not observable are decomposed in Kalman Form, in order to have a clear picture of the observable and unobservable spaces. For the best of the authors knowledge, this is the first attempt of planar bearing SLAM analysis that takes into account all the different aspects of the problem. Theoretical results are verified via simulation adapting the nonlinear observer presented in [16]. A remarkable difference with respect to [16] is the relaxed assumption on the knowledge of the camera velocities. Indeed, results apply whenever at least the camera's linear and angular velocities are available, which is equivalent to consider an unicycle vehicle carrying an onboard camera. However, all results still hold if additional system inputs are available, e.g., an omnidirectional vehicle. The presented results are applicable to a range of problems, in particular, to visual servoing.
2. Problem definition
2.1. System Dynamics
Consider a vehicle, whose configuration is denoted by
The observability problem under analysis is considered in different configurations regarding the number of known and unknown landmarks being observed. The system state variable of the problem at hand comprises the vehicle configuration and the unknown position of the N targets
We consider vehicles equipped with a sensor head measuring the angles in the horizontal plane between the line joining the landmark with the head position and the forward direction of the vehicle (see Fig. 1). Of course, a vision system equipped with a simple point feature detection and tracking algorithm falls into this category. The measurement process is modeled by equations of the form

Fixed frame
where
2.2. System Observability
Let us consider a generic continuous time-invariant control affine system
Let
the system observability codistribution
In the rest of the paper we will refer to
2.3. Local Decomposition
If a control affine system is not observable in the sense of rank condition [17], there exists a coordinate mapping
where the observable state, i.e., the one that satisfies the rank condition, is given by
3. Planar bearing SLAM observability
In this section the planar bearing SLAM observability problem assuming the knowledge of the control inputs is discussed. The results here reported extend those in [6] by detailing all possible cases from 3 + N markers to 3 + N targets, thus including the unobservable cases and the related Kalman Form decomposition.
3.1. Codistribution form
A generic form for the observability codistribution of the systems under investigation is
In all cases, the rank of the observability codistributions reaches its maximum within the first level of Lie differentiation.
3.2. Observability Analysis
Each feature configuration is now analyzed separately. Before going into details, we recall that the state space of a vehicle moving on a plane has dimension 3, while each landmark has 2 variables w.r.t. the plane of motion.
Case A: 3 or more markers: The observability codistribution rank is equal to 3 for
Case B: 2 markers: After 1 level of Lie differentiation, the observability codistribution rank reaches its maximum of 3, apart from configuration singularities, and the system is completely locally weakly observable. The problem is not statically invertible, instead state reconstruction is only possible under vehicle motion.
Case C: 1 and a half markers and half target: For this case, the output function is given by the measurements from two landmarks: one landmark position is completely known (marker); the other landmark position is partially known (half marker), i.e., only one of the 2 plane coordinates is assumed to be known. Without loss of generality, we will assume that the coordinate
Case D: 1 marker: After 1 level of Lie differentiation, the observability codistribution rank reaches its maximum of 2, apart from configuration singularities. Hence, ξ is not fully observable and the unobservable space dimension is 1. From geometric analysis, the unobservable space is given by a circumference centered in the marker.
Kalman form decomposition
Consider a reference frame

Reference frame <P> with axes parallel to <W> and vehicle configuration represented using polar coordinates
where ρ represents the cartesian distance from the vehicle to the point
We are now in a position to decouple observable and unobservable subsystems. Indeed, under such coordinate transformation, the system output becomes
Case E: Half marker and half target: We are now interested in a robot whose output measurements consist of two landmarks: one landmark has a position that is partially known. Without loss of generality, the coordinate
Kalman form decomposition
With reference to the reference frame
Using the new set of coordinates, after 1 level of Lie differentiation, the observability codistribution for ζ is
Case F: 1 target: After 1 level of Lie differentiation, the observability codistribution rank reaches its maximum of 2, apart from configuration singularities.ξ is not completely observable and the unobservable space dimension has 3.
Kalman form decomposition
With reference to Fig. 2, consider again the frame
Using the new set of coordinates, after 1 level of Lie differentiation, the observability codistribution for ζ is
Case G: 2 targets: After 1 level of Lie differentiation, the observability codistribution rank reaches its maximum of 4, apart from singularities. Hence, ξ is not fully observable, with an unobservable subspace dimension of 2.
Kalman form decomposition
Consider a reference frame

Reference frame <P> of 2 targets problem
Consider the coordinate transformation
Φ is a not a global diffeomorphism since it is not defined if the robot is on the feature position
The null space of Ω is
3.3. Extension of results
Results presented in this section are extended to any number of targets. Let
Proposition 1: Consider a system
Proof: Given a generic observability codistribution
Observability analysis summary: M – Number of markers; N – N Number of targets; K – Minimum level of lie-bracketing required to cover observable space; n – System dimension;
Corollary 1: If
Table I presents an overview of the results obtained in this section for any number of targets.
4. Results
Theoretical results were evaluated by simulations, implementing the nonlinear observer described in [16] to reconstruct the observable space of the cases analyzed in section 3. Simulation results for arbitrary configurations are summarized in Fig. 4. Notice that the nonlinear observer converges in all cases, hence it always succeeds in reconstructing the observable space.

Observed state errors
In particular, when only one landmark is being observed (Case D,E and F), the observable subsystem is
When 2 landmarks
4.1. Visual Servoing
In this section we validate the use of the nonlinear observer described in [16] in a Position Based Visual Servoing scheme (as seen in Fig. 5) for the case of measuring 3 markers (as seen in section 3.2. 1). The controller used is the Visual-Servoing with Omnidirectional Sight as presented in [18].

PBVS Visual Servoing Scheme
The desired configuration of the robot is considered to be coincident to the origin of the world framehWi. The initial vehicle configuration is
Results can be seen in Fig. 6. The simulation clearly shows that the pose regulation is successfully achieved.

Visual Servoing Results: Observer error
5. Conclusions and future work
In this paper we have presented a complete observability analysis of the planar bearing only localization and mapping problem for all configurations of landmarks with known (markers) and unknown position (targets). Theoretical results are supported by simulations.
Future work will concentrate mainly on the singularity analysis and on observability without input knowledge.
Footnotes
6. Acknowledgements
The research leading to these results has received funding from the European Union Seventh Framework Programme [FP7/2007-2013] under grant agreement n257462 HYCON2 Network of excellence.
