Sage Journals: Discover world-class research

Abstract

The development of vision-based navigation systems for mobile robotics applications in outdoor scenarios is a very challenging problem due to frequent changes in contrast and illumination, image blur, pixel noise, lack of image texture, low image overlap and other effects that lead to ambiguity in the interpretation of motion from image data. To mitigate the problems arising from multiple possible interpretations of the data in outdoor stereo egomotion, we present a fully probabilistic method denoted as probabilistic stereo egomotion transform. Our method is capable of computing 6-degree of freedom motion parameters solely based on probabilistic correspondences without the need to track or commit key point matches between two consecutive frames. The use of probabilistic correspondence methods allows to maintain several match hypothesis for each point, which is an advantage when ambiguous matches occur (which is the rule in image feature correspondence problems), because no commitment is made before analysing all image information. Experimental validation is performed in simulated and real outdoor scenarios in the presence of image noise and image blur. Comparison with other current state-of-the-art visual motion estimation method is also provided. Our method is capable of significant reduction of estimation errors mainly in harsh conditions of noise and blur.

Keywords

Stereo vision egomotion visual odometry

Introduction

In this article, we focus on the inference of robot self-motion (egomotion) based on visual observations of the environment. Although egomotion can be estimated without visual information using sensors such as inertial measurement units (IMUs) or global positioning systems (GPSs), the use of visual information plays an important role specially in MU/GPS denied environments, for example, crowded urban areas or other environments where there are challenging imaging conditions such as aerial and underwater scenarios. In Figure 1, we present some examples of mobile robotic platforms equipped with vision sensors, spanning applications in land, sea, air and underwater (courtesy of INESC TEC).

Figure 1.

INESC-TEC mobile robotics platforms on land, sea and air application scenarios. All robotic platforms are equipped with one or more visual sensors to perform visual navigation or other complementary tasks.

Egomotion estimation from outdoors imagery is extremely challenging due to multiple factors that generate blur, ambiguities and low signal-to-noise ratio in images. In land robots, camera vibration produces significant motion blur. In sea and underwater robots, repetitive image patterns and low texture generate serious matching ambiguities. In all cases, low lighting conditions, shadows and other illumination artefacts lead to unfavourable signal-to-noise ratios. It is thus essential to develop robust algorithms capable of mitigating some of the aforementioned effects.

This article is an extension of the work by Silva et al.¹ where we have introduced the probabilistic stereo egomotion transform (PSET), a fully probabilistic algorithm for the computation of image motion from stereo vision systems that provides better estimates than alternative approaches. This article provides a deeper explanation, analysis and performance evaluation of PSET. In particular, it focuses on PSET advantages in images with severe amounts of noise and blur that often characterize outdoors operating conditions.

The article outline is as follows: in the following section, the related work is presented. We then make a brief introduction to the probabilistic egomotion estimation problem and an outline of the rationale of the method. Afterwards, we present in detail the steps of the probabilistic stereo egomotion approach, and the obtained results in both synthetic and real image data sets with emphasis in the results obtained under extreme image conditions (presence of image noise and blur). In the final section, we present the conclusions and future work.

Related work

In robotics applications, egomotion estimation is directly linked to visual odometry (VO) applications as described by Scaramuzza.² The use of VO methods for estimating robot motion has been a subject of research by the robotics community in recent years. One way of performing VO is by determining instantaneous camera displacement on consecutive frames and integrating over time the estimated linear and angular velocities. The need to develop such applications urged from the increase use of mobile robots on modern world tasks in different application scenarios. Robots need to extend their perception capabilities to be able to navigate in complex scenarios where typical inertial navigation system information cannot be used, for example, urban areas or underwater GPS-denied environments.

Visual motion perception is achieved by measuring image point displacement on consecutive frames. In monocular egomotion estimation, there is translation scale ambiguity, that is, in the absence of other sources of information, only the linear velocity direction can be measured in a reliable manner. Whenever a calibrated stereo setup is used, the full angular and translational velocity components can be extracted, which is denoted by stereo VO.

Most of the work on stereo VO methods started by Maimone et al.³ and Maimone et al.⁴ on the famous Mars Rover Project. The proposed method was able to determine all 6-degree of freedom (DOF) of the rover (x, y, z, roll, pitch and yaw) by tracking 2D image key points between stereo image pairs and obtain their 3D coordinates by triangulation. Concerning the way image motion information is obtained, the method employs a key point detector using^5,6 corner detector combined with a grid scheme to sample key points over the image. After 3D point position is triangulated using stereo correspondence, a fixed number of points is used within an RANSAC⁷ framework to obtain an initial motion estimation using least squares. Subsequently, a maximum likelihood estimation (batch estimation) procedure uses the rotation matrix and translation vector obtained by least squares as well as the ‘inlier’ points to produce a more accurate motion estimation.

The stereo VO method implemented in the Mars Rover Project was inspired by Olson et al.⁸ At the time, VO methods appear as replacements for wheel odometry dead reckoning methods to overcome long distance limitations. To avoid large drift in robot position over time, Olson method combined a primitive form of stereo egomotion estimation procedure also used by Maimone et al.³ with absolute orientation (AO) sensor information.

The taxonomy adopted by the robotics and computer vision community classifies stereo VO methods into two categories based on either feature detection scheme or pose estimation procedure. The most utilized methods for pose estimation are 3D AO methods and perspective-n-point (PnP) methods.

The AO method consists of 3D points triangulation for every stereo pair and then motion estimation is solved using point alignment algorithms, for example, procrustes method,⁹ the AO using unit quaternions method by Horn,¹⁰ iterative-closest-point method¹¹ or the one utilized by Milella and Siegwart¹² for estimating motion of an all-terrain rover.

In the study by Alismail et al.,¹³ a benchmark study is performed to evaluate both AO and PnP techniques for robot pose estimation using stereo VO methods. The authors concluded that PnP methods perform better than AO methods due to stereo triangulation uncertainty, especially in the presence of small stereo rig baselines.

The influential work of Nister et al.¹⁴ was one of the first PnP method implementations. It utilized the perspective-three-point method (P3P) developed by Haralick et al.¹⁵ combined with an outlier rejection scheme (RANSAC). Despite the fact of having instantaneous 3D information from a stereo camera setup, the authors use a P3P method instead of a more easily implementable AO method. The authors concluded that P3P pose estimation deals better with depth estimation ambiguity, which corroborates the conclusions drawn by Alismail et al.¹³

In a similar line of work, and in order to avoid having a great dependency of feature matching and tracking algorithms, Kai and Dellaert¹⁶ tested both three-point and one-point stereo VO implementations using a quadrifocal setting within an RANSAC framework. Later on, Ni et al.¹⁷ decouple the rotation and translation estimation into two different estimation problems. The method starts with the computation of a stereo putative matching, followed by a classification of features based on their disparity. Afterwards, distant points are used to compute the rotation using a two-point RANSAC method. The underlying idea is to reduce the problem of the rotation estimation to the monocular case. The closer points with a disparity above a given threshold are used together with the estimated rotation to compute the one-point RANSAC translation.

Recent efforts on stereo VO are being driven by novel intelligent vehicles and by automotive industry applications. One example is the work developed by Kitt et al.¹⁸ The proposed method is available as an open-source VO library named LIBVISO. The stereo egomotion estimation approach is based on image triples and online estimation of the trifocal tensor.¹⁹ It uses rectified stereo image sequences and outputs a 6D vector with linear and angular velocity estimation using an iterative extended Kalman filter. Comport et al.²⁰ also developed a stereo VO method based on the quadrifocal tensor.¹⁹

Other recent developments on VO have been achieved by the extensive research conducted at the Autonomous System Laboratory of ETH Zurich University.^21–25 The work developed by Scaramuzza and Fraundorfer²⁶ and Scaramuzza et al.²¹ takes advantages of motion constraints (planar motion) to reduce model complexity and allow a much faster estimation. Also, since the camera is installed on a non-holonomic wheeled vehicle, motion complexity can be further reduced to a single-point correspondence. More recently, the work of Kneip et al.²⁷ introduced a novel parameterization for the P3P PnP. The method differs from standard algebraic solutions for the P3P estimation problem¹⁵ by computing the aligning transformation directly in a single stage without the intermediate derivation of the points in the camera frame. This pose estimation method combined with key point detectors^28–30 and with IMU information was used to estimate monocular VO²² and stereo VO by Voigt et al.²³ In the study by He et al.,³¹ a visual-inertial egomotion estimation method is used to estimate an arbitrary body motion in indoor environment. Vision is used to estimate the camera motion from a sequence of feature correspondence using bundle adjustment while the inertial estimation outputs the orientation using adaptive-gain orientation filter.

Most of the previously mentioned state-of-the-art algorithms use deterministic methods to find matches between images and then compute the motion. Our approach, on the contrary, takes full advantage of not defining the correspondence at an early stage but keep multiple correspondence hypothesis that together will contribute to a more accurate egomotion estimation, especially when image conditions contain many ambiguous and unreliable correspondences due to non-ideal imaging conditions.

Probabilistic monocular egomotion estimation

The seminal work of Domke and Aloimonos³² has introduced the notion of probabilistic correspondence in the context of the single camera egomotion estimation problem. The authors introduced the term probabilistic (which is actually a belief) to code the distance between Gabor filters using an exponential transformation. In this setting, it is possible to compute the angular velocity of the vehicle and the direction of the linear velocity (5-DOF) overall, but it is not possible to determine the amplitude (scale) of the linear velocity.

In this section, we briefly describe Domke and Aloimonos’³² approach and introduce the notation required for the remaining sections.

Probabilistic correspondence

Given two images taken at different times, I_k and I_k+1, the probabilistic correspondence between a point $s \in R^{2}$ in image I_k and point $q \in R^{2}$ in image I_k+1 is defined as a belief image

ρ_{s} (q) = match (s, q | I_{k}, I_{k + 1})

The belief image $ρ_{s} (q)$ contains in each pixel q a value between 0 and 1 expressing similarity of appearance between local neighbourhoods around s in I_k and q in I_k+1. In the study by Domke and Aloimonos,³² the match function was implemented by the correlation of a Gabor filter bank response in the two points. In our work, we use the zero-mean normalized cross-correlation function (ZNCC)

ZNCC (s, q) = \frac{\sum_{δ \in W} [I_{k} (s + δ) - {\bar{I}}_{k}] [I_{k + 1} (q + δ) - {\bar{I}}_{k + 1}]}{\sqrt{\sum_{δ \in W} {[I_{k} (s + δ) - {\bar{I}}_{k}]}^{2}} \sqrt{\sum_{δ \in W} {[I_{k + 1} (q + δ) - {\bar{I}}_{k + 1}]}^{2}}}

where $W \subset R^{2}$ denotes a 2D window centred at the origin whose size defines the neighbourhood of analysis around points s and q, and ${\bar{I}}_{k} and {\bar{I}}_{k + 1}$ are the mean values of those patches. In practice, we use a fast recursive implementation of the ZNCC developed by Huang et al.³³ The probabilistic correspondence is then computed as

ρ_{s} (q) = \frac{ZNCC (s, q) + 1}{2}

So that, it maps to the range 0–1.

Probabilistic motion

Motion hypotheses are defined as a set of incremental rotation matrices R and translation directions $\hat{t} = \frac{t}{∥ t ∥}$ . The likelihood of a particular motion hypothesis $(R, \hat{t})$ is evaluated by analysing the probabilistic correspondences $ρ_{s} (q)$ along epipolar lines.³² A correspondence q for point s must satisfy the epipolar constraint denoted by

{\tilde{s}}^{T} E \tilde{q} = 0

where $\tilde{s} and \tilde{q}$ are homogeneous representations of s and q, respectively, and E is the essential matrix,¹⁹ a 3 × 3 matrix of rank 2- and 5-DOFs that encodes rigid camera motion

E = R {[\hat{t}]}_{\times}

where ${[\hat{t}]}_{\times}$ is the skew symmetric matrix

{[\hat{t}]}_{\times} = [\begin{matrix} 0 & - {\hat{t}}_{z} & {\hat{t}}_{y} \\ {\hat{t}}_{z} & 0 & - {\hat{t}}_{x} \\ - {\hat{t}}_{y} & {\hat{t}}_{x} & 0 \end{matrix}]

In order to obtain an estimate of the essential matrix (E) from the probabilistic correspondences, Domke and Aloimonos³² propose a maximum likelihood search on a probability distribution over the 5D space of essential matrices. Initially, likelihood values are measured on a grid where each dimension is divided into 10 bins, thus leading to 10⁵ hypotheses E_i.

For each point s in image I_k, the likelihood of a motion hypothesis (E_i) is proportional to the belief of the best probabilistic correspondence along the epipolar constraint in I_k+1, generated by the essential matrix

ρ (E_{i} | s) \propto max_{{(\tilde{q})}^{T} E_{i} \tilde{s} = 0} ρ_{s} (q)

If one assumes statistical independence between the measurements obtained at each point s, the overall likelihood of a motion hypothesis is proportional to the product of the likelihoods for all points

ρ (E_{i}) \propto \prod_{s} ρ (E_{i} | s)

In Figure 2, an illustration of these steps is presented.

Figure 2.

Left: a point s in image I_k generates epipolar lines e_i in image I_k+1 corresponding to motion hypotheses represented by the epipolar matrices E_i, see equation (4). Centre: at each point s, motion hypothesis E_i are evaluated by computing the highest probabilistic correspondence at points q_i along epipolar line e_i, see equation (7). Right: the overall motion likelihood is computed by collecting the information from all considered points, see equation (8).

Finally, having computed all the motion hypotheses, an optimization method³⁴ is used to refine the motion estimate around the highest scoring samples E_i. The Nelder–Mead simplex method is a local search method for problems whose derivatives are not known. The method was already applied by Domke and Aloimonos³² to search for the local maxima of likelihood around the top-ranked motion hypotheses

E_{i}^{*} = arg max_{E_{i} + δ E} ρ (E_{i} + δ E)

where δE is perturbations to the initial solution E_i computed by the Nelder–Mead optimization procedure.

Then, the output of the algorithm is the solution with the highest likelihood as defined

E^{*} = {argmax}_{E_{i}^{*}} ρ (E_{i}^{*})

Probabilistic stereo egomotion estimation

Now we extend the notion of probabilistic correspondence and probabilistic egomotion estimation to the stereo case. This allow us to compute the whole 3D motion information in a probabilistic way. Let us consider images $I_{k}^{L}, I_{k + 1}^{L}, I_{k}^{R} and I_{k + 1}^{R}$ , where superscripts L and R denote, respectively, the left and right images of the stereo pair. Probabilistic matches of a point s in $I_{k}^{L}$ are now computed not only for points q in $I_{k + 1}^{L}$ but also for points r in $I_{k}^{R}$ and p in $I_{k + 1}^{R}$ (see Figures 3 and 4)

ρ_{s} (r) = \frac{Z N C C (s, r) + 1}{2}, ρ_{s} (p) = \frac{Z N C C (s, p) + 1}{2}

Figure 3.

ZNCC matching used to compute the PSET transform. ZNCC: zero-mean normalized cross-correlation function; PSET: probabilistic stereo egomotion transform.

Figure 4.

Example of probabilistic correspondence images ( $ρ_{s} (r), ρ_{s} (q), and ρ_{s} (p)$ ) obtained by ZNCC matching of a given point s from $I_{k}^{L}$ in images, $I_{k}^{R}, I_{k + 1}^{L}, and I_{k + 1}^{R}$ , respectively. The $ρ_{s} (r)$ correspondence can be limited to a band, since the epipolar geometry is known by stereo calibration. ZNCC: zero-mean normalized cross-correlation function.

For the sake of computational efficiency, analysis can be limited to subregions of the images given prior knowledge about the geometry of the stereo system or bounds of the motion given by other sensors like IMU’s. In particular, for each point s, coordinates r can be limited to a band around the epipolar lines according to the fixed stereo setup epipolar geometry, as illustrated in Figure 3.

The geometry of stereo egomotion

In this section, we describe the geometry of the stereo egomotion problem, that is, will analyse how world points project in the four images acquired from the stereo setup in two consecutive instants of time according to its motion. This analysis is required to derive the expressions to compute the translational motion amplitude.

Let us consider the 4 × 4 rototranslations $T_{L}^{R} and M_{k}^{k + 1}$ that describe, respectively, the rigid transformation between the left and right cameras of the stereo setup and the transformation describing the motion of the left camera from time k to k + 1 as described by

T_{L}^{R} = [\begin{matrix} R_{calib} & t_{calib} \\ 0 & 1 \end{matrix}] M_{k}^{k + 1} = [\begin{matrix} R & t \\ 0 & 1 \end{matrix}]

We factorize the translational motion t in its direction $\hat{t}$ and amplitude α

t = α \hat{t}

The rotational motion R and translation direction $\hat{t}$ can be computed by Silva et al.³⁵ The computation of α is thus the main objective at this stage.

Let us consider an arbitrary 3D point $X = {(X_{x}, X_{y}, X_{z})}^{T}$ expressed in the left camera reference frame at time k. Considering normalized intrinsic parameters (unit focal distance f = 1, zero central point $c_{x} = c_{y} = 0$ , no skew), the homogeneous coordinates of the projection of X in the four images are given by

{\begin{array}{l} \tilde{s} = X \\ \tilde{r} = R_{calib} X + t_{calib} \\ \tilde{q} = R X + α \hat{t} \\ \tilde{p} = R_{calib} R X + α R_{calib} \hat{t} + t_{calib} \end{array}

To illustrate the solution, let us consider the particular case of parallel stereo. This will allow us to obtain the form of the solution with simple equations but does not compromise generality because the procedure to obtain the solution in the non-parallel case is analogous. In parallel stereo, the cameras are displaced laterally with no rotation. The rotation component is 3 × 3 identity $(R_{calib} = I_{3 \times 3})$ and the translation vector is an offset (baseline b) along the x coordinate, $t_{calib} = (b {,0,0)}^{T}$ . Solving the first two equations, in coordinates, we obtain solutions for $s = (s_{x}, s_{y})^{T} and r = (r_{x}, r_{y})^{T}$

s = {(\frac{X_{x}}{X_{z}}, \frac{X_{y}}{X_{z}})}^{T} r = {(\frac{X_{x} + b}{X_{z}}, \frac{X_{y}}{X_{z}})}^{T}

Introducing the disparity d as $d = r_{x} - s_{x}$ , we have $d = \frac{b}{X_{z}}$ and we can reconstruct the 3D coordinates of point X as function of image coordinates r and s and baseline value b

X = {(\frac{s_{x} b}{d} \frac{s_{y} b}{d} \frac{b}{d})}^{T} = \tilde{s} \frac{b}{d}

Replacing equation (16) in the last two equations of equation (14), we obtain

{\begin{array}{l} \tilde{q} = R \tilde{s} \frac{b}{d} + α \hat{t} \\ \tilde{p} = R \tilde{s} \frac{b}{d} + α \hat{t} + t_{calib} \end{array}

Let us write R in its constituent rows $R = {[r_{1}, r_{2}, r_{3}]}^{T} and \hat{t}$ in coordinates $\hat{t} = ({\hat{t}}_{x}, {\hat{t}}_{y}, {\hat{t}}_{z})$ . Computing the coordinates of $p = (p_{x}, p_{y}) and q = (q_{x}, q_{y})$ , we get

q_{x} = \frac{r_{1}^{T} \tilde{s} b + α {\hat{t}}_{x} d}{r_{3}^{T} \tilde{s} b + α {\hat{t}}_{z} d}

q_{y} = \frac{r_{2}^{T} \tilde{s} b + α {\hat{t}}_{x} d}{r_{3}^{T} \tilde{s} b + α {\hat{t}}_{z} d}

p_{x} = \frac{(r_{1}^{T} \tilde{s} + d) b + α {\hat{t}}_{x} d}{r_{3}^{T} \tilde{s} b + α {\hat{t}}_{z} d}

p_{y} = \frac{r_{2}^{T} \tilde{s} b + α {\hat{t}}_{x} d}{r_{3}^{T} \tilde{s} b + α {\hat{t}}_{z} d}

Solving for α each of the previous equations, we get four possible solutions

α^{(1)} = \frac{(r_{1}^{T} - q_{x} r_{3}^{T}) \tilde{s}}{q_{x} {\hat{t}}_{z} - {\hat{t}}_{x}} \frac{b}{d}

α^{(2)} = \frac{(r_{2}^{T} - q_{y} r_{3}^{T}) \tilde{s}}{q_{y} {\hat{t}}_{z} - {\hat{t}}_{y}} \frac{b}{d}

α^{(3)} = \frac{(r_{1}^{T} - p_{x} r_{3}^{T}) \tilde{s}}{p_{x} {\hat{t}}_{z} - {\hat{t}}_{x}} \frac{b}{d}

α^{(4)} = \frac{(r_{2}^{T} - p_{y} r_{3}^{T}) \tilde{s}}{p_{y} {\hat{t}}_{z} - {\hat{t}}_{y}} \frac{b}{d}

Solutions exist whenever disparity d is not null, that is, the corresponding 3D point is not at infinity. The other potential singular case is

{\begin{matrix} q_{x} {\hat{t}}_{z} - {\hat{t}}_{x} & = 0 \\ q_{y} {\hat{t}}_{z} - {\hat{t}}_{y} & = 0 \\ p_{x} {\hat{t}}_{z} - {\hat{t}}_{x} & = 0 \\ p_{y} {\hat{t}}_{z} - {\hat{t}}_{y} & = 0 \end{matrix}

This corresponds to the case when q and p are simultaneously aligned with the translation direction. However, for finite fields of view, this only happens when q = p which again corresponds to zero disparity. If, for a certain combination of points r, s, p, q, all denominators are low (due to very low disparity or close to degenerate motion), that combination is not used for the estimation. In our implementation, we empirically set a predetermined minimal value for the disparity d, $d \geq d_{min}$ . If all disparities are very small, then all observed points are very far and it is impossible to determine the linear velocity scale factor. Therefore, because our method uses all available image information, if at least one point has enough disparity, we will have a solution for α. In practice, to prevent numerical errors, we choose the solution with the largest denominator.

One special case to take into account is when the translational component of motion is zero. When this happens, the value of $\hat{t}$ computed by the monocular egomotion estimation process is arbitrary but non null, so does not bring any singularity to the problem. The computation of α can be made with the same expressions as before and should result in values very close to zero.

Probabilistic scale estimation

In the previous section, we demonstrated how to estimate the translation scale factor α from the observation of a single static point s, if point correspondences r, q and p are known and disparity is non null. In practice, two major problems arise: (i) it is hard to determine what are the static points in the environment given that the cameras are also moving and (ii) it is very hard to obtain reliable matches due to the noise and ambiguities present in natural images. Therefore, using a single point to perform, this estimation is doomed to failure. We must therefore use multiple points and apply robust methodologies to discard outliers.

Previously in Silva et al.,³⁵ this was achieved by computing the rigid transformation between point clouds obtained from stereo reconstruction at times k and k + 1. Point correspondences were deterministically assigned by searching for the best matches along epipolar lines in space (from camera L to camera R) and time (from time k to time k + 1).

Instead in this article, we apply the notion of probabilistic correspondence to the stereo case. Instead of committing matches in space and time, we create a probabilistic observation model for possible matches

P_{match} (s, r, p, q) = ρ_{s} (r) ρ_{s} (q) ρ_{s} (p)

where we assume statistical independence in the measurements obtained in the pairwise probabilistic correspondence functions $ρ_{s} (\cdot)$ . An example is shown in Figure 5 for the $ρ_{s} (r)$ case.

Figure 5.

Probabilistic correspondence $ρ_{s} (r)$ for a point s along a section of the epipolar line E_sr. On the top figure, we show the probabilistic correspondence values. In red, we have all points of the distribution (non-normalized). In green, we show the global maximum. In blue, we show all other local maxima of $ρ_{s} (r)$ (blue). On the bottom figure, the sample point s in $I_{k}^{L}$ and the local maxima peaks in $I_{k}^{R}$ are displayed. Sampling is performed in a neighbourhood of the points on the epipolar line.

From the pairwise probabilistic correspondence, we obtain all possible combination of corresponding matches. Then, because each four-tuple $(s, r, p, q)$ will correspond to a given hypothesis value of α, we create an accumulator of α hypotheses weighted by $P_{match} (s, r, p, q)$ . Searching for global maxima in the accumulator will provide a robust (most agreed) value for α.

PSET accumulator

For computing the accumulator, we assume E has been computed by the previously described methods and the system calibration is known.

First, a large set of points $s_{j}, j = 1 \dots J$ is selected in image $I_{k}^{L}$ . Selection can be random, uniform or based on key points, for example, Harris corner⁶ or scale-invariant features.²⁸

Point-wise computations. For each point s_j, the epipolar lines $E_{calib} = {\tilde{s}}_{j}^{T} S$ (being S given by stereo calibration) and $E_{s q} = {\tilde{s}}_{j}^{T} E$ are sampled at L_j points $r_{j}^{l_{j}}, l_{j} = 1 \dots L_{j}$ and M_j points $q_{j}^{m_{j}}, m_{j} = 1 \dots M_{j}$ , in images $I_{k}^{R} and I_{k + 1}^{L}$ , respectively. Again sample point selection can be uniform along the epipolar lines or based on match quality. In our implementation, we compute local maxima over the epipolar lines. For each triple $(s_{j}, r_{j}^{l_{j}}, q_{j}^{m_{j}})$ , the geometric solution of p becomes uniquely determined and is denoted as $p_{j}^{l_{j} m_{j}}$ .

After all probabilistic correspondences have been computed for a point s_j, we create a 2D table $H_{j} (l_{j}, m_{j})$ to store disparity, likelihood and α values. Each entry $(l_{j}, m_{j})$ of table H_j corresponds to a tuple $(s_{j}, r_{j}^{l_{j}}, q_{j}^{m_{j}}, p_{j}^{l_{j} m_{j}})$ from which it is computed the disparity value $d_{j}^{l_{j}}$ , the scale value $α_{j}^{l_{j} m_{j}}$ determined by equations (24) and (25), and the match likelihood (27) $λ_{j}^{l_{j} m_{j}}$

λ_{j}^{l_{j} m_{j}} = ρ_{s_{j}} (r_{j}^{l_{j}}) ρ_{s_{j}} (q_{j}^{m_{j}}) ρ_{s_{j}} (p_{j}^{l_{j} m_{j}})

Finally, for a particular s_j, we compute the global maximum of $λ_{l_{j} m_{j}}^{j}$ , which will indicate the best match hypothesis

(l_{j}^{*}, m_{j}^{*}) = {argmax}_{l_{j}, m_{j}} λ_{l_{j} m_{j}}^{j}

From this best match, we retrieve from H_j, the solution α voted by this point

α_{j} = α_{j}^{l_{j}^{*} m_{j}^{*}}

and associated likelihood

λ_{j} = λ_{j}^{l_{j}^{*} m_{j}^{*}}

Thus, each point s_j votes for a certain motion scale factor α_j, according to the confidence λ_j collected from the probabilistic correspondences in the other images. As a side product, we also get the best disparity hypothesis at that point

d_{j} = d_{j}^{l_{j}^{*}}

Image-wise computations. In the previous section, we described how each point s_j votes for a translation amplitude α_j with weight λ_j. We collect all these values in sets $A = {α_{j}} and Λ = {λ_{j}}, j = 1 \dots J$ and use a kernel smoothing method for estimating the highest density of α votes,³⁶ as described by

{\hat{f}}_{h} (α) = \frac{1}{j h} \sum_{i = 1}^{j} K (\frac{λ_{i} (α_{i} - α)}{h})

being K a Gaussian kernel function, and h the interval bandwidth.

Dealing with calibration errors

A common source of errors in a stereo setup is the uncertainty in the calibration parameters. Both intrinsic and extrinsic parameter errors will deviate the epipolar lines from their nominal values and influence the computed correspondence probability values. To minimize these effects, we modify the correspondence probability function when evaluating sample points such that a neighbourhood of the point is analysed, instead of using only the exact coordinate of the sample point

{ρ'}_{s} (q) = max_{q' \in N (q)} [ρ_{s} (q') exp \frac{{(q - q')}^{2}}{2 σ^{2}}]

where N(q) denotes a neighbourhood of the sample point q which, in our experiments, is defined as a 7 × 7 window.

Another method used to diminish the uncertainty of the correspondence probability function when performing ZNCC is to use subpixel refinement methods, for example, parabola fitting and Gaussian fitting as presented by Debella-Gilo and Kaab.³⁷

Algorithm 1. PSET.
Input: Two stereo image pairs $(I_{k}^{L}, I_{k}^{R}) and (I_{k + 1}^{L}, I_{k + 1}^{R}), E_{r i g}$ (stereo calibration) Output: (Velocities) V, W Step 1. Use a feature based method to select a set of initial points or use the all image. Step 2. Compute the probabilistic correspondences between images $I_{k}^{L} and I_{k + 1}^{L}, ρ_{s} (q)$ . Equations (1) to (3). Step 3. Compute probabilistic egomotion, E. Equations (7) to (10). Step 4. Compute probabilistic correspondences for the stereo case, $I_{k}^{L} and I_{k}^{R}, I_{k + 1}^{R}, ρ_{s} (r), ρ_{s} (p)$ equation (11). Step 5. Obtain the probabilistic observation model P_match using $ρ_{s} (r) ρ_{s} (q) ρ_{s} (p)$ to relate all possible four-tuple (s, r, q, p) matches equation (27). Step 6. Create an accumulator array H for each point s_j, and perform pointwise computations for obtaining the translation scale α and the associated likelihood λ for each point, equations (28) to (31). Step 7. Compute the imagewise computations and obtain the final translation scale factor α_max using Weighted Kernel density estimation (33) Step 8. Estimate Linear and Angular Velocities, V and W. (35) to (37) Step 9. Constant Velocity Kalman Filtering. Equations (38) and (39)

Algorithm 1. PSET.

Input: Two stereo image pairs $(I_{k}^{L}, I_{k}^{R}) and (I_{k + 1}^{L}, I_{k + 1}^{R}), E_{r i g}$ (stereo calibration)

Output: (Velocities) V, W

Step 1. Use a feature based method to select a set of initial points or use the all image.

Step 2. Compute the probabilistic correspondences between images $I_{k}^{L} and I_{k + 1}^{L}, ρ_{s} (q)$ . Equations (1) to (3).

Step 3. Compute probabilistic egomotion, E. Equations (7) to (10).

Step 4. Compute probabilistic correspondences for the stereo case, $I_{k}^{L} and I_{k}^{R}, I_{k + 1}^{R}, ρ_{s} (r), ρ_{s} (p)$ equation (11).

Step 5. Obtain the probabilistic observation model P_match using $ρ_{s} (r) ρ_{s} (q) ρ_{s} (p)$ to relate all possible four-tuple (s, r, q, p) matches equation (27).

Step 6. Create an accumulator array H for each point s_j, and perform pointwise computations for obtaining the translation scale α and the associated likelihood λ for each point, equations (28) to (31).

Step 7. Compute the imagewise computations and obtain the final translation scale factor α_max using Weighted Kernel density estimation (33)

Step 8. Estimate Linear and Angular Velocities, V and W. (35) to (37)

Step 9. Constant Velocity Kalman Filtering. Equations (38) and (39)

Velocities estimation

The linear and angular velocities are then estimated, using the same procedure applied by Silva et al.³⁵ After having obtained the rotation (R), translation direction ( $\hat{t}$ ) and translation scale factor (α), the linear and angular velocities are computed by

V = \frac{α \tilde{t}}{Δ T}

where ΔT is the sampling interval

Δ T = T_{k + 1} - T_{k}

Likewise, the angular velocity is computed by

W = \frac{r}{Δ T}

where r = θu, the angle-axis representation of the incremental rotation R, as defined by Craig.³⁸

Kalman filter

In order to achieve a more smooth estimation, we filter the linear and angular velocities estimates using a Kalman filter with a constant velocity model. The state transition model with zero-mean stochastic acceleration is given by

X_{k} = F X_{k - 1} + ξ_{k}

where the state transition matrix is the identity matrix, $F = I_{6 x 6}$ , and the stochastic acceleration vector ξ_k is distributed according to a multivariate zero-mean Gaussian distribution with covariance matrix Q, $ξ_{k} \sim N$ (0,Q).

The observation model considers state observations with additive noise

Y_{k} = H X_{k} + η_{k}

where the observation matrix H is identity, $H = I_{6 x 6}$ , and the η_k measurement noise is zero-mean Gaussian with covariance R.

We set the covariance matrices Q and R empirically, according to our experiences, to

Q = diag (q_{1}, \dots, q_{6})

R = diag (r_{1}, \dots, r_{6})

where $q_{i} {= 10}^{- 3}, i = 1, \dots,6, r_{3} {= 10}^{- 3} and r_{i} {= 10}^{- 4}, i \neq 3$ .

The r₃ differs from the other measurement noises values, because it corresponds to the translation on the z-axis which is inherently noisier due to the uncertainty of the t_z estimates in the stereo triangulation step. A brief summary of the aforementioned PSET method is described in algorithm 1.

Results

In order to evaluate the accuracy of PSET, we performed evaluation tests with synthetic and real image data. For comparison purposes, we used LIBVISO¹⁸ as a state-the-art deterministic egomotion estimation method. The choice was based on the fact that it is an open source 6D VO library with a filtering step equivalent to ours (constant velocity model).

Synthetic images results

As a first test for evaluating the egomotion estimation accuracy of the PSET method, we utilized a sequence of synthetic stereo images. The sequence was created using a VRML-based simulator and implemented a quite difficult scene (see Figure 6) in which the images contain a great deal of repetitive structure that cause ambiguity in image point correspondence. The sequence is composed by four linear tracks (see Figure 7), as we are more interested in evaluating the performance of the method in the estimation of the translation scale factor.

Figure 6.

Synthetic images stereo pairs for translation scale motion estimation

Figure 7.

Generated and estimated trajectories in the synthetic image experiment.

We assume a stereo camera pair calibrated setup with a 10-cm baseline, 576 × 380 image resolution, with ZNCC window N_w = 7. For computational reasons, we used 1000 uniform selected points s_j for the dense probabilistic egomotion estimation and a subgroup of 100 points $r_{j}^{l_{j}} and q_{j}^{l_{j}}$ ( $J = 1000, L_{j} = 100, M_{j} = 100, \forall j$ ). The experiments conducted to compute the PSET were performed using an Intel I5 Dual Core 3.2 GHz which took about 20 s. The code was written in MATLAB as a proof of concept without any kind of optimization. The processing time is spent nearly 70% on the 5D estimation part and on the probabilistic correspondence step, the voting scheme takes the remaining 30%.

In Figure 7, one can observe the generated and the estimated trajectories obtained using PSET and LIBVISO.

From Table 1, we can observe that PSET obtains a more accurate egomotion estimation, having less root mean square (RMS) error than LIBVISO in all velocity components. This turns out to be more evident in the computation of the velocity norm over the global motion trajectory, where PSET results are almost 50% more accurate than the ones displayed by LIBVISO.

Table 1.

Comparison of the standard mean squared error between PSET and LIBVISO.

	V_x m/frame	V_y m/frame	V_z m/frame	\|\|V\|\| m/frame
LIBVISO	0.000690	0.000456	0.0011	0.0022
PSET	0.000336	0.000420	0.000487	0.0012

PSET: probabilistic stereo egomotion transform.

In this experiment, we focused on the evaluation of translational motion estimation, since the angular velocity case was already demonstrated by Silva et al.³⁵

In Figure 8, we can observe a box plot of the instantaneous linear velocity error distribution during the sequence. It is clear better performance of PSET both in terms of the mean, median and variance of the error. Figure 9 shows the same information discriminated by coordinate axis where the same tendency is observed, especially for the X and Z components.

Figure 8.

Error distribution ||V|| obtained by PSET and LIBVISO. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. PSET: probabilistic stereo egomotion transform.

Figure 9.

Error distribution of estimated linear velocities obtained by PSET and LIBVISO in all three axes ( $V_{x}, V_{y}, V_{z}$ ). PSET: probabilistic stereo egomotion transform.

Real image sequences

The evaluation of PSET in real images was performed using KITTI data set³⁹ composed of stereo image sequences. The KITTI data set uses a car-vehicle robot in different road scenarios (urban street, countryside and highways) providing stereo image sequences in colour or greyscale format at 10 fps, 1.4-MP image resolution (1334 × 391) with IMU/GPS (OXTS RT 3003) information to act as external validation. In the study by Silva et al.,³⁵ PSET and LIBVISO were already compared using that data set. Results show that PSET outperforms LIBVISO in both linear and angular velocity estimation. In this work, we perform novel experiments with added Gaussian noise and image blur. Not only we want to evaluate the egomotion estimation accuracy in normal conditions but also with unfavourable image characteristics, typical of outdoor scenarios. The main argument we want to validate is that probabilistic methods, although requiring additional computations, can be more effective in robotic scenarios where image conditions are far from ideal and deterministic egomotion estimation methods tend to fail.

Experiment with added Gaussian noise. In principle, probabilistic egomotion estimation methods are less sensible to image noise than their deterministic counterparts. In outdoor mobile robotic scenarios, images are frequently corrupted due to factors such as sensor noise, bad scene illumination, highlights, specular reflections and other optical artefacts. In order to validate probabilistic methods as more robust to image noise than deterministic methods, a set of experimental trials was performed. The experimental procedure consisted on adding white Gaussian noise to all the images (346 stereo pairs) in the KITTI stereo image data set sequence (drive 2011-09-26-0091),³⁹ since it is a scene that contains high contrast images and shadows and comparing egomotion estimation accuracy of both PSET and LIBVISO methods.

In Figure 10, we show an example of an image of the KITTI data set and the corresponding corrupted images with different values of Gaussian noise. Table 2 shows results for PSET and LIBVISO with three different noise powers: 0.001, 0.002 and 0.005 variance in grey level units in the 0–255 range.

Figure 10.

Original image from KITTI data set drive 2011-09-26-0091, and corrupted versions with white Gaussian noise of variance 0.001, 0.002 and 0.005 in an image grey level range between 0 and 255.

Table 2.

RMS for PSET and LIBVISO under different values of image Gaussian noise.

Gaussian noise	0×		1× (0.001)		2× (0.002)		5× (0.005)
Egomotion	\|\|V\|\|	\|\|W\|\|	\|\|V\|\|	\|\|W\|\|	\|\|V\|\|	\|\|W\|\|	\|\|V\|\|	\|\|W\|\|
PSET	0.4170	0.9400	0.4436	0.9400	0.4556	0.9400	0.4899	0.9405
LIBVISO	0.4444	0.9605	0.5210	1.0068	0.5535	1.0600	0.6332	1.2712
Improvement	≈6%	≈2%	≈15%	≈7%	≈18%	≈12%	≈23%	≈26%

RMS: root mean square; PSET: probabilistic stereo egomotion transform.

We measure accuracy as the RMS error between the estimates and the IMU/GPS information. Quantitative results are shown in Table 2 and Figures 11 and 12. The obtained results show higher accuracy of the PSET method under all values of added Gaussian noise compared to LIBVISO. Furthermore, as the noise power grows, the PSET method shows bigger improvements. For the largest noise power tested, PSET reduces the error in 23% for the linear velocities and 26% for the angular velocities.

Figure 11.

Error distribution of the magnitude of linear velocity computed by PSET and LIBVISO, images corrupted with Gaussian noise with variance 0.001, 0.002, 0.005, denoted, respectively, 1×, 2×, 5×. PSET: probabilistic stereo egomotion transform.

Figure 12.

Error distribution of the magnitude of angular velocities computed by PSET and LIBVISO images corrupted with Gaussian noise with variance 0.001, 0.002, 0.005, denoted, respectively, 1×, 2×, 5×. PSET: probabilistic stereo egomotion transform.

In Figures 11 and 12, we show the error distribution of the linear and angular velocity magnitude computed by PSET and LIBVISO, for all tested error powers. The accuracy in the egomotion estimation obtained by PSET is higher, since it displays lower median error when compared to LIBVISO for all cases.

Experiment with added blur. In outdoor robotics scenarios, the presence of blur is somewhat frequent. The use of visual egomotion estimation in those scenarios was limited due to the fact that deterministic egomotion methods tend to fail in the presence of image blur. One of the reasons that justifies the use of probabilistic egomotion estimation methods is precisely the higher robustness exhibited by this type of approach when compared to deterministic methods in the presence of image blur. To validate such claim, we conducted another experiment using both PSET and LIBVISO in the same KITTI data set sequence (2011-09-26-0091), but this time using different values of blur, as illustrated in Figure 13.

Figure 13.

Original image from KITTI data set drive 2011-09-26-0091, and corrupted versions with blur 1, 3, 5 pixels standard deviation denoted as 1×, 2×, 5×.

In Table 3, we show the RMS error using PSET and LIBVISO with different types of blur (1.0) when compared to IMU/GPS information. The corrupted images were created by adding a low-pass Gaussian filter of the image size with 1×, 3×, 5× standard deviation.

Table 3.

RMS error for PSET and LIBVISO under different values of blur.

Image blur	0×		1× (1.0)		3× (3.0)		5× (5.0)
Egomotion	\|\|V\|\|	\|\|W\|\|	\|\|V\|\|	\|\|W\|\|	\|\|V\|\|	\|\|W\|\|	\|\|V\|\|	\|\|W\|\|
PSET	0.4170	0.9400	0.4176	0.9400	0.4490	0.9400	0.4820	1.2965
LIBVISO	0.4444	0.9605	0.4475	1.0092	0.5535	1.1163	0.9400	1.9194
Improvement	≈6%	≈2%	≈7%	≈7%	≈19%	≈15%	≈48%	≈32%

RMS: root mean square; PSET: probabilistic stereo egomotion transform.

Again results show that PSET is more accurate than LIBVISO in the presence of higher quantities of blur. The error difference between PSET and LIBVISO increases from 7% for low values of blur (1×) to 48% and 32% in the linear and angular velocity estimation for high values of blur (5×).

In Figures 14 and 15, we can see the error distributions of linear and angular velocities both for PSET and LIBVISO. Again, PSET exhibits lower median error when compared to LIBVISO for all values of image blur.

Figure 14.

Error distribution of the magnitude of the linear velocities PSET and LIBVISO, images corrupted with a Gaussian blur filter with 1×, 2×, 5× pixels standard deviation. PSET: probabilistic stereo egomotion transform.

Figure 15.

Error distribution of the magnitude of the angular velocities PSET and LIBVISO, images corrupted with a Gaussian blur filter with 1×, 2×, 5× pixels standard deviation. PSET: probabilistic stereo egomotion transform.

The difference in the obtained accuracy from PSET and LIBVISO is bigger for the synthetic image scenario than in real image data sets. This fact maybe due to two causes. First the ground-truth information used in both experiments is different. In the synthetic image scenario, we generate the VRML simulation and therefore the world points have a precise position that provides a reliable egomotion trajectory verification. On the contrary for the real image data set, an IMU/GPS information is used. The IMU/GPS information is subject to bias and noise, and therefore, it can only be considered as weak ground truth information. Secondly, the KITTI sequence does not contain such image repetitive structure, and therefore, point correspondence ambiguity is lower.

Conclusions and future work

The probabilistic approach for stereo visual egomotion estimation described in this work has proven to be an accurate method of computing stereo egomotion. The proposed approach is very robust because no explicit matching or feature tracking is necessary to compute the vehicle motion. To the best of our knowledge, this is the first implementation of a fully dense probabilistic method to compute stereo egomotion. The results demonstrate that PSET is more accurate than other state-of-the-art 3D egomotion estimation methods, significantly improving the overall accuracy in linear and angular velocity estimation. We have shown improvements up to 50% in a highly repetitive texture synthetic image scenario with ground truth information and above 20% in real images with large amounts of blur and noise with respect to IMU/GPS reference. One of the main advantage of probabilistic egomotion estimation methods is their higher robustness in difficult imaging scenarios, for example, in the presence of image noise or blur. In the experiments, conducted PSET achieved a better performance than LIBVISO and the improvement (error difference) between both methods increased in the presence of higher values of image noise and blur. Despite the clear advantages over other state-of-the-art methods, its effectiveness and usefulness in mobile robotics scenarios requires further improvements on the computational implementations in order to have real-time functionality. Given the highly parallel nature of the algorithm, composed of many independent operations, in future work, we plan to develop a PSET GPU implementation to achieve real-time performance. Another objective is to pursue further validation of the PSET algorithm in other heterogeneous mobile robotics scenarios, especially in aerial and underwater robotics, where the lack of image texture combined with high matching ambiguity provides an ideal scenario for further accessing the robustness of the proposed methodology.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is financed by the Coral project NORTE-01-0145-FEDER-000036, and by National Funds through the FCT -Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013 and project UID/EEA/50009/2013.

References

Silva

Bernardino

Silva

. Probabilistic stereo egomotion transform. In: IEEE international conference on robotics and automation, Hong Kong, May 31–7 June 2014.

Scaramuzza

. Performance evaluation of 1-point-RANSAC visual odometry. J Field Robot 2011; 28(5): 792–811.

Maimone

Matthies

Cheng

. Visual odometry on the Mars Exploration Rovers. In: IEEE international conference on systems, man and cybernetics, Hawaii, November 2005.

Maimone

Matthies

Cheng

. Two years of visual odometry on the mars exploration rovers: field reports. J Field Robot 2007; 24(3): 169–186.

Förstner

Gülch

. A Fast Operator for Detection and Precise Location of Distinct Points, Corners and Centres of Circular Features, ISPRS Intercommission Workshop, 1987.

Harris

Stephens

. A combined corner and edge detection. In: Proceedings of the fourth Alvey vision conference, 1988, pp. 147–151.

Fischler

Bolles

. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communicat ACM 1981; 24(6): 381–395.

Olson

Matthies

Schoppers

. Rover navigation using stereo ego-motion. Robot Autonom Syst 2003; 43; 215–229.

Goodall

. Procrustes methods in the statistical analysis of shape. J Roy Stat Soc Ser B (Methodological) 1991; 53(2): 285–339.

10.

Horn

BKP

. Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 1987; 4(4): 629–642.

11.

Rusinkiewicz

Levoy

. Efficient variants of the ICP algorithm. In: Proceedings of the Third international conference on 3D digital imaging and modeling (3DIM), 2001, pp. 145–152.

12.

Milella

Siegwart

. Stereo-based ego-motion estimation using pixel tracking and iterative closest point. In Fourth IEEE International Conference on Computer Vision Systems (ICVS’06), New York, 4–7 January 2006.

13.

Alismail

Browning

Dias

. Evaluating pose estimation methods for stereo visual odometry on robots. In: Proceedings of the 11th international conference on intelligent autonomous systems (IAS-11), Ottawa, Canada, January 2011.

14.

Nister

Naroditsky

Bergen

. Visual odometry for ground vehicle applications. J Field Robot 2006; 23(1): 3–20.

15.

Haralick

Lee

Ottenberg

. Review and analysis of solutions of the three point perspective pose estimation problem. Int J Comput Vision 1994; 13: 331.

16.

Kai

Dellaert

. Stereo tracking and three-point/one-point algorithms - a robust approach, visual odometry. In: International conference on image processing (ICIP), Atlanta, USA, October 2006.

17.

Dellaert

Kaess

. Flow separation for fast and robust stereo odometry. In: IEEE international conference on robotics and automation, Kobe, Japan, 12–17 May 2009.

18.

Kitt

Geiger

Lategahn

. Visual odometry based on stereo image sequences with ransac-based outlier rejection scheme. In: IEEE intelligent vehicles symposium (IV), San Diego, USA, June 2010.

19.

Hartley

Zisserman

Multiple view geometry in computer vision. Cambridge: Cambridge University Press, 2004. ISBN: 0521540518.

20.

Comport

Malis

Rives

. Real-time quadrifocal visual odometry. Int J Robot Res SAGE Publicat 2010; 29: 245–266.

21.

Scaramuzza

Fraundorfer

Siegwart

. Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC. IEEE Int Conf Robot Autom, Kobe, Japan, 12–17 May 2009.

22.

Kneip

Chli

Siegwart

. Robust real-time visual odometry with a single camera and an IMU. In: Proceedings of the British machine vision conference (BMVC), September 2011, pp. 16.1–16.11. BMVA Press.

23.

Voigt

Nikolic

Huerzeler

. Robust embedded egomotion estimation. In: Proceeding of the IEEE RSJ international conference on intelligent robots and systems (IROS), San Francisco, 25–30 September 2011.

24.

Rehder

Gupta

Nuske

. Global pose estimation with limited GPS and long range visual odometry. IEEE Conf Robot Autom, St Paul, USA, 14–18 May 2012.

25.

Kazik

Kneip

Nikolic

. Real-time 6D stereo visual odometry with non-overlapping fields of view. In: IEEE International conference on computer vision and pattern recognition, Providence RI, USA, 16–21 June 2012.

26.

Scaramuzza

Fraundorfer

. Visual odometry tutorial. IEEE Robot Autom Magaz 2011; 18(4): 80–92.

27.

Kneip

Scaramuzza

Siegwart

. A novel parameterization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: Proceedings CVPR ’11 Proceedings of the 2011 IEEE conference on computer vision and pattern recognition (CVPR), 2011, pp. 2969–2976.

28.

Lowe

. Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2004; 60: 91.

29.

Bay

Ess

Tuytelaars

. Speeded-up robust features (SURF). Comput Vision Image Understand Elsevier Sci Inc 2008; 110: 346–359.

30.

Calonder

Lepetit

Strecha

. Brief: binary robust independent elementary features. In: European conference on computer vision, Crete, Greece, 5–11 September 2010.

31.

Guan

. “Wearable ego-motion tracking for blind navigation in indoor environments,” IEEE Trans Autom Sci Eng 2015; 12(4): 1181–1190.

32.

Domke

Aloimonos

. A Probabilistic notion of correspondence and the epipolar constraint. In: Proceeding 3DPVT ‘06 Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT‘06), 2006, pp. 41–48.

33.

Huang

Zhu

Pan

. A high-efficiency digital image correlation method based on a fast recursive scheme. Measur Sci Technol 2011; 21(3): 35101–35112.

34.

Nelder

Mead

. Simplex method for function minimization. Comput J 1965; 7(4): 308–313.

35.

Silva

Bernardino

Silva

. Probabilistic egomotion for stereo visual odometry. Int J Intell Robot Syst 2015; 77(2): 265–280.

36.

Wand

Jones

. Kernel smoothing. In: Chapman Hall CRC Monographs on Statistics Applied Probability, 1994.

37.

Debella-Gilo

Kaab

. Sub-pixel precision image matching for measuring surface displacements on mass movements using normalized cross-correlation. Remote Sens Environ 2011; 115: 130–142.

38.

Craig

. Introduction to robotics: mechanics and control. Harlow: Addison-Wesley Longman Publishing Co, Inc, 1989.

39.

Geiger

Lenz

Stiller

. Vision meets robotics: The KITTI dataset. Int J Robot Res (IJRR) 2013; 32(11): 1–6.

A voting method for stereo egomotion estimation

Abstract

Keywords

Introduction

Related work

Probabilistic monocular egomotion estimation

Probabilistic correspondence

Probabilistic motion

Probabilistic stereo egomotion estimation

The geometry of stereo egomotion

Probabilistic scale estimation

PSET accumulator

Dealing with calibration errors

Velocities estimation

Kalman filter

Results

Synthetic images results

Real image sequences

Conclusions and future work

Footnotes

Declaration of conflicting interests

Funding

References