Positioning and navigation of mobile robot with asynchronous fusion of binocular vision system and inertial navigation system

Abstract

Binocular stereovision–based positioning and inertial measurement–based positioning have their respective limitations. Asynchronous fusion of a binocular vision system and an inertial navigation system (INS) is therefore introduced to global positioning system–denied environments with a fuzzy map. It aims to provide a sequential and progressive update with regard to mobile robot positioning and navigation. The system consists of two off-the-shelf cameras and a low-cost inertial measurement unit (IMU). The localization procedure fuses the inertial data from the IMU and the absolute position data from the binocular vision system based on corners. The main contribution of this article is a novel fusion method adaptive to different data rates at which the two modules operate. Utilization of an asynchronous Kalman filter is proposed to fuse the results from the two modules, which can achieve intermittent correction for INS localization. Experiments were carried out in an indoor laboratory environment where dynamic tests validated the reliability and effectiveness of the proposed asynchronous fusion algorithm.

Keywords

Asynchronous fusion mobile robot binocular vision system inertial navigation system asynchronous Kalman filter

Introduction

Research on robot positioning is of great importance. It is impossible for a robot to establish a feedback regulator for motion control or navigation without positioning and orientation information. Traditional positioning methods commonly include the use of a global positioning system (GPS),¹ an odometer,² and/or an inertial measurement unit (IMU).³ These key technologies can be found in autonomous navigation,⁴ motion planning,⁵ and simultaneous localization and mapping (SLAM).⁶

As one of the popular localization methods, GPS is widely used for land vehicle positioning, although the signal propagated by a satellite may experience discontinuity and blockage caused by high-rise buildings or other obstacles. An odometer is a device for measuring the distance and speed of a vehicle. However, the scale coefficient errors of these measurements have a great and negative influence on the precision of dead reckoning. Similarly, an IMU can provide angular velocity and specific force during inertial measurement navigation,⁷ and it then calculates the position and attitude information through integration. Nevertheless, an IMU is susceptible to environment noise that can lead to the drift of positioning results over time. Therefore, these traditional positioning methods cannot meet the demand of real-world applications due to systematic errors and lack of absolute precision and drift.

Recently, researchers have incorporated other sensors to complement the use of IMUs, aiming at reducing the systematic errors and improving the positioning accuracy. For example, to obtain a high-accuracy state estimate of a vehicle, a navigation system with high integrity and reliability by GPS/IMU fusion was presented.⁸ In the works of Zhang and Meng,⁹ a simple self-contained pedestrian tracking method using a foot-mounted inertial and a magnetic sensor module was proposed to avoid acceleration noise and integration drift. To improve the poor navigation accuracy, Zihajehzadeh et al.¹⁰ investigated the integration of a barometric pressure sensor of microelectromechanical systems (MEMSs) with an MEMS-IMU for vertical position/velocity tracking in sports. However, the combination of these methods can only be used outdoors, and the requirement of sensors is relatively high. Fusion of vision sensors with inertial sensors thus becomes an alternative that can be applied to outdoor scenarios as well as indoor environments, owing to the fact that orientation information can be extracted from visual data of all kinds of complex environments, and will be less affected by the movement restrictions of robots. Moreover, positioning accuracy achieved by this fusion method is better than that of traditional relative positioning technologies, such as wheel-type mileage meters, inertial navigators, radars, and sonar, meaning that it is an effective alternative to the traditional positioning methods.¹¹ The common form of vision-aided inertial navigation systems (INSs) combines a monocular or binocular camera system with a low-cost IMU, which has been widely used in both robotics and autonomous vehicle navigation.^12

–18 Largely benefited by this method, positioning error caused by the uncertainty of visual feature extraction in the space can be reduced, the drift phenomenon can be diminished, and the accumulated error in the positioning of inertial measurement can be overcome.

In these integrated systems, the motion estimation procedure fuses the inertial data from the IMU with features (landmarks) detected in consecutive frames captured by the camera. Some methods are only used for indoor position estimation,¹³ while some methods have realized SLAM with real-time monocular visual-inertial measurements.¹⁶ Some methods are based on algorithms where INS and camera are tightly coupled to avoid loss of information possibly resulted from visual preprocessing.¹⁷ Vision-based acquisition and estimation of subject orientation can be divided into monocular visual navigation positioning and binocular visual navigation positioning according to the number of cameras utilized. The former computes motion parameters from images acquired by a single camera, which is often uncomplicated and portable. However, it can only support 2D feature analysis and is unable to reconstruct the 3D information of the scene.¹⁹ Depth information estimation commonly deviates due to the inaccurate estimation of camera motion parameters. Therefore, compared to the binocular vision, monocular vision is unable to provide accurate information and is only applicable to situations where depth precision is not stringently demanded.²⁰ In this article, position estimation of the robot is subject to accurate depth information of feature points. Therefore, we have adopted a binocular vision system to accurately reconstruct the 3D features.

Vision-aided inertial navigation methods mainly focus on using forward-looking cameras to capture salient features, which assist with navigation in man-made buildings by exploiting various features. For instance, in the work of Xian et al.,²¹ features both near and far from the camera are simultaneously taken into consideration in the visual-inertial method. In the work of Yu and Anastasios,²² an algorithm for vision-aided inertial navigation is presented that employs both point and line features, which are prevalent in man-made environments. In the work of Panahandeh and Jansson,²³ to estimate the motion parameters of the system, inertial information from the IMU is integrated with the position of ground plane feature points. However, these approaches suffer in feature poor environment or scenes that contain complex backgrounds. In this article, sparse 3D features are obtained by a binocular vision system; therefore, the proposed method that combines the binocular vision system and the INS can be applied to environments where corner features are prevalent. It should be pointed out that its real-time performance is limited by the heavy computation due to what is commonly known as the “correspondence problem.”²⁴ Due to the trade-off between accuracy and real-time performance, this article places the focus on fast navigation when position errors are controlled within a limited range.

This article investigates the mobile robot navigation control based on visual understanding and inertial measurement of mobile robot positions in a real and complex indoor environment. Visual understanding refers to the process where a mobile robot imitates human beings to learn and perceive its surrounding environment using binocular vision sensors. In the first stage of visual understanding, scene images are captured and features extracted before positioning and navigation take place. To emulate human thinking and behaviors, the concept of fuzzy map is introduced such that a simple scene graph can be constructed in the navigation process. The fused sensory system consists of only two webcams and a low-cost IMU. The proposed approach takes advantage of the fast response of an inertial sensor and the accurate measurements of visual sensors. Consequently, the mobile robot is able to achieve rapid autonomous navigation to a particular location with the aid of a fuzzy environment map. In the experiments, the mobile robot first employs an internal fuzzy map to reach its destination, and meanwhile, the inertial measurements help it to realize indoor robot step gauge navigation. Continuously fusing the position data obtained by the binocular vision system with the location data from the IMU allows for the estimation of the current absolute position of the mobile robot and correction of inertial measurements. Data fusion methods include Kalman filters (KFs),²⁰ extended KFs,¹² sigma-point KFs,¹⁷ and unscented KFs.²³ In this article, the inertial and visual modules operate at different data rates, and therefore, an asynchronous KF (AKF) is used to fuse the results from the two modules. The input of the fusion algorithm is therefore position information gained by the binocular vision system and linear acceleration vectors measured by the inertial sensor.

The main contributions of this work are:

a method is put forward to achieve absolute localization for mobile robot by employing the indoor corners as natural landmarks perceived by a binocular vision system;

a fusion method, which relies on an AKF such that information from inertial navigation and that from binocular vision system becomes complementary; and

design of a stereo camera IMU (SC-IMU) that can be fixed on a mobile robot to allow for rapid positioning and navigation in indoor environments with further assistance from a fuzzy map.

This article is organized as follows. “Binocular vision system localization” section introduces the Harris-scale invariant feature transform (SIFT) algorithm and absolute localization of mobile robot by a binocular vision system. “Asynchronous fusion of binocular vision system and INS” section delineates the navigation updates of INS and data fusion of binocular vision and INS using an AKF. “Experiments and discussions” section specifies the SC-IMU hardware structure and verifies the practicability and accuracy of the proposed fusion algorithm. Finally, “Conclusion” section concludes this study.

Binocular vision system localization

In this article, we use a binocular vision system, which employs two webcams to recognize and extract the feature points of corners based on Harris-SIFT. It then measures the distances from cameras to feature points according to the binocular ranging technology. It finally adopts the principle of triangulation to achieve absolute localization. The process of binocular vision system localization in the indoor environments based on corners is shown in Figure 1. There are two offline phases, in which the internal and external parameters of camera are gained and feature point database built up.

Figure 1.

The flow diagram of binocular vision system localization.

Ranging model of binocular vision system

Camera model

The binocular vision system is composed of two webcams, and the geometric relationship of cameras with respect to the target object is shown in Figure 2, where $X_{w}, Y_{w}, Z_{w}$ is the world coordinate system, $X_{c}, Y_{c}, Z_{c}$ is the camera coordinate system, and uv is the pixel plane coordinate system. The point $O_{l} (O_{r})$ represents the optical center of two cameras (projective center), f is the focal length of the camera, and Z_c is the optical axis of camera, which is perpendicular to the image plane $C_{l} (C_{r})$ . The projection point of A(X, Y, Z) in the world coordinate system onto the image plane are a_r and a_l, making the connecting line between them with the optical center, and the intersection point is the object point A.

Figure 2.

The model of binocular camera.

The relationship between the pixel coordinates and the world coordinates can be deduced by the imaging model

Z_{c} (\begin{matrix} u \\ v \\ 1 \end{matrix}) = (\begin{matrix} f_{x} & 0 & u_{0} & 0 \\ 0 & f_{y} & v_{0} & 0 \\ 0 & 0 & 1 & 0 \end{matrix}) (\begin{matrix} R & t \\ O^{T} & 1 \end{matrix}) (\begin{matrix} X_{w} \\ Y_{w} \\ Z_{w} \\ 1 \end{matrix}) = M_{1} M_{2} (\begin{matrix} X_{w} \\ Y_{w} \\ Z_{w} \\ 1 \end{matrix})

where f_x and f_y are the component of camera focal length on the X and Y axis and (u₀, v₀) is the pixel coordinate of the imaging plane geometric center. These four parameters f_x, f_y, u₀, and v₀ determine the internal parameter matrix M₁ of the camera. The extrinsic parameter matrix M₂ is composed of the rotation matrix R and the translation vector t. Camera calibration is a process for determining the internal and external parameters. In this article, the MATLAB calibration toolbox²⁵ of Jean-Yves Bouguet is utilized for offline camera calibration.

Theory of binocular ranging

According to Figure 2, the distance z between target point A and the camera is calculated using the properties of similar triangles

\frac{T - (x_{l} - x_{r})}{z - f} = \frac{T}{z} \Rightarrow z = \frac{f T}{x_{l} - x_{r}}

There is a linear correlation between the distance z with $x_{l} - x_{r}$ , f, and T. $x_{l} - x_{r}$ is parallax of point A, which is the difference in position at the left and right optical paths’ imaging point. It is clear that the depth information of object A can be obtained in the case where focal length and baseline size are known. Furthermore, the parameters f and T have been determined in the calibration.

Harris-SIFT algorithm

Scale invariant feature transform was summed up by Lowe,^26,27 this feature scale of image changes, image scaling, rotation, or affine transformation is invariable; however, the complex operation and slow processing speed affect the real-time performance of a binocular vision system. In the feature detection operators, the Harris corner detection algorithm is simple, stable, and relatively robust to illumination, rotation, and noise. Therefore, this article combines the advantages of SIFT feature description and Harris corner detection, that is, extracting corners by the Harris operator, and then using the SIFT operator to build feature descriptor. Finally, the similarity of the Euclidean distance metric pairs of the extracted feature points and the nearest neighbor search strategy is applied to feature matching. More specifically, these steps are as follows:

(1) Detecting corners by the Harris operator

The Harris corner detection algorithm is based on signal point feature,²⁸ whose expression is given by

R (x, y) = det (M) - k {[tr (M)]}^{2}

where det(M) represents the determinant of matrix M, tr(M) represents the trace of matrix, and k is the empirical value, which is 0.04. In our experiments, this value was found to produce the most reliable corner features in indoor environments where the scene images were structured and noncomplex. M is the correlation matrix.

(2) Confirming feature vector of corners

The direction parameters of each feature point are specified by using the gradient direction distribution characteristics of key points in the neighborhood pixels. At each corner (x, y), the value and direction of gradient module are defined as follows

{\begin{array}{l} \begin{array}{l} m (x, y) = [{(L (x + 1, y) - L (x - 1, y))}^{2} + \\ {(L (x, y + 1) - L (x, y - 1))}^{2}]^{\frac{1}{2}} \end{array} \\ θ (x, y) = ta n^{- 1} (\frac{L (x, y + 1) - L (x, y - 1)}{L (x + 1, y) - L (x - 1, y)}) \end{array}

where the scale space L is defined by convolving the original image with a set of variable-scaled Gaussians.²⁹

(3) Calculating SIFT feature descriptor

In the computational process, in order to enhance the stability of feature matching, the relative directions of sampling points and feature points are included in histogram which contains eight directions after Gauss weighting to get a 128 (4 × 4 × 8) dimensional descriptor.²⁶

Offline database establishment

In this article, the keypoints extracted by the Harris corner detection algorithm are represented by SIFT descriptors, which are stored as feature vectors. All these descriptors extracted in an indoor environment are collected as natural landmarks ${H_{k} | k = 1, \dots, N}$ , where N is the number of corners. The top nine corner responses are added to the natural landmark collection, while the descriptors and their corresponding world coordinates are stored in the visual landmark database, as shown in Figure 3. The size of each template is 145 × 145. The small circles in each subfigure represent the detected corners.

Figure 3.

The natural landmark collection H_k.

The system acquires images periodically by the binocular vision system as the robot navigates. When a new descriptor of a corner is produced, it will be matched against those in the landmark database. If the match is successful, it is assumed that the corner is recognized. In order to apply trilateration in the world coordinate, at least three corner descriptors are required. Therefore, when the number of matched feature points is no less than 3, the match is deemed a true match, and otherwise, a false match. The system extracts the world coordinates of the corner features for a true match. It then calculates the distances between the camera and the feature points based on binocular ranging. Finally, the world coordinate of robot can be acquired according to the trilateration principle.

Asynchronous fusion of binocular vision system and INS

The schematic diagram of asynchronous fusion of binocular vision and INS is shown in Figure 4. In this article, a strap-down INS (SINS) is adopted, which is essentially an IMU (accelerometer, gyroscope, and magnetometer) rigidly mounted on the carrier. The output of the IMU is linear acceleration, angular velocity, and magnetic field. The velocity and position of the robot for each step are calculated by integrating, and the attitude update is completed by means of gradient descent. Then, according to the binocular vision algorithm utilized in this article, the absolute position of mobile robot is obtained to assist with correction for the inertial measurement. The accurate and efficient navigation process can then be achieved.

Figure 4.

Schematic diagram of the asynchronous fusion of binocular vision and an INS. INS: inertial navigation system.

Navigation update

According to the measured values from the IMU, through coordinate transformation and navigation calculation, the position, speed, attitude, and other navigation information for robot can be obtained. The navigation process consists of three parts proposed by the strap-down algorithm.³⁰

Velocity update

Based on the basic equation of an INS n, the scalar form of velocity update can be directly expressed as

[\begin{matrix} V_{x, m}^{n} \\ V_{y, m}^{n} \end{matrix}] = [\begin{matrix} V_{x, m - 1}^{n} \\ V_{y, m - 1}^{n} \end{matrix}] + [\begin{matrix} Δ T & 0 \\ 0 & Δ T \end{matrix}] [\begin{matrix} a_{x, m}^{n} \\ a_{y, m}^{n} \end{matrix}]

where Vⁿ is the velocity, aⁿ is the acceleration in the current moment measured by an accelerometer, and ΔT is the sampling time of the INS.

Position update

Position $S_{I}^{n}$ in the navigation coordinate system n is calculated by integrating the estimated velocities. The position can be updated according to

S_{I_{m}}^{n} = S_{I_{m - 1}}^{n} + \int_{t_{m - 1}}^{t_{m}} V_{m}^{n} dt

where $S_{I_{m}}^{n}$ and $V_{m}^{n}$ are, respectively, the position and the speed at time instant m, $S_{I_{m - 1}}^{n}$ is the position at time instant m − 1, and the time interval $[t_{m}, t_{m - 1}]$ is the step length, and $Δ T = t_{m} - t_{m - 1}$ .

Attitude update

In the binocular vision system and INS, the measurement of them is in the body coordinate system due to the IMU being directly fixed to the robot. The direction cosine matrix (DCM) $C_{b}^{n}$ represents the attitude of the carrier to realize the conversion from the body coordinates to the navigation coordinates. In this article, the attitude equation of SINS is solved by the method of quaternion, then the element values of DCM is calculated using the relation between the quaternion and the direction cosine.

A quaternion q is a R⁴ vector that can be used to represent the orientation of body coordinate relative to its navigation coordinate in R³. The quaternion consists of q₀, q₁, q₂, and q₃, whose expression is

q = q_{0} + q_{1} \vec{i} + q_{2} \vec{j} + q_{3} \vec{k}

The DCM can be expressed uniquely in terms of the quaternion

C_{b}^{n} = [\begin{matrix} q_{0}^{2} + q_{1}^{2} - q_{2}^{2} - q_{3}^{2} & 2 (q_{1} q_{2} - q_{0} q_{3}) & 2 (q_{1} q_{3} + q_{0} q_{2}) \\ 2 (q_{1} q_{2} + q_{0} q_{3}) & q_{0}^{2} - q_{1}^{2} + q_{2}^{2} - q_{3}^{2} & 2 (q_{2} q_{3} - q_{0} q_{1}) \\ 2 (q_{1} q_{3} - q_{0} q_{2}) & 2 (q_{2} q_{3} + q_{0} q_{1}) & q_{0}^{2} - q_{1}^{2} - q_{2}^{2} + q_{3}^{2} \end{matrix}]

The relationship between the quaternion derivative ${\dot{q}}_{ω}$ and the angular velocity ω^b measured by the gyroscope is described by the following equation³¹

{\dot{q}}_{ω} = \frac{1}{2} q \otimes ω^{b}

where ⊗ represents the multiplication of the quaternion. It is seen that the attitude quaternion of gyroscope q_ω can be obtained by integrating the quaternion derivative ${\dot{q}}_{ω}$ with a fixed sampling time.

In this article, the algorithm for attitude update is the gradient descent method based on quaternion. It uses the differential equation of quaternion to solve the current attitude, and then compensate the drift of gyroscope by taking advantage of the stability of accelerometer. Since the accelerometer cannot be used to sense rotations around the vertical axis, the measurement from the magnetometer is adopted to correct the orientation relative to the vertical direction. Figure 5 describes the principle of the algorithm that updates the attitudes. The attitude quaternion q_ω of gyroscope has been calculated by equation (9). The computational process of attitude quaternion q_∇ for the accelerometer and magnetometer is introduced subsequently.

Figure 5.

A block diagram that describes the principle of attitude update.

The objective error function is defined by³²

f (a^{b}, b^{n}, m^{b}) = [\begin{matrix} 2 (q_{1} q_{3} - q_{0} q_{2}) - a_{x} \\ 2 (q_{0} q_{1} + q_{2} q_{3}) - a_{y} \\ 2 (0.5 - q_{1}^{2} - q_{2}^{2}) - a_{z} \\ 2 b_{x} (0.5 - q_{2}^{2} - q_{3}^{2}) + 2 b_{z} (q_{1} q_{3} - q_{0} q_{2}) - m_{x} \\ 2 b_{x} (q_{1} q_{2} - q_{0} q_{3}) + 2 b_{z} (q_{0} q_{1} + q_{1} q_{3}) - m_{y} \\ 2 b_{x} (q_{0} q_{2} + q_{1} q_{3}) + 2 b_{z} (0.5 - q_{1}^{2} - q_{2}^{2}) - m_{z} \end{matrix}]

The magnetic field of earth can be considered to have components in one horizontal axis and vertical axis, which can be represented by $b^{n} = [\begin{matrix} b_{x} & 0 & b_{z} \end{matrix}]$ in the navigation coordinate system. $a^{b} = [\begin{matrix} a_{x} & a_{y} & a_{z} \end{matrix}]$ and $m^{b} = [\begin{matrix} m_{x} & m_{y} & m_{z} \end{matrix}]$ are the normalized accelerometer output and magnetometer output in the body coordinate system, respectively.

The gradient descent algorithm is adopted to calculate the attitude quaternion q_∇ for the accelerometer and the magnetometer. According to the principle of gradient descent, the quaternion q_∇ is given by

q_{\nabla, k} = q_{\nabla, k - 1} - λ \frac{\nabla f (a^{b}, b^{n}, m^{b})}{‖ \nabla f (a^{b}, b^{n}, m^{b}) ‖}

where λ is the step size and $\nabla f (a^{b}, b^{n}, m^{b})$ is the gradient of equation (10)

\nabla f (a^{b}, b^{n}, m^{b}) = J^{T} (q) f (a^{b}, b^{n}, m^{b})

J(q) is the Jacobian matrix of equation (10)

\begin{array}{l} J (q) = \frac{\partial f (a^{b}, b^{n}, m^{b})}{\partial q} \\ = 2 [\begin{matrix} - q_{2} & q_{3} & - q_{0} & q_{1} \\ q_{1} & q_{0} & q_{3} & q_{2} \\ q_{1} & - q_{1} & - q_{2} & q_{3} \\ - b_{z} q_{2} & b_{z} q_{3} & - 2 b_{x} q_{2} - b_{z} q_{0} & - 2 b_{x} q_{3} + b_{z} q_{1} \\ - b_{x} q_{3} + b_{z} q_{1} & b_{x} q_{2} + b_{z} q_{0} & b_{x} q_{1} + b_{z} q_{3} & - b_{x} q_{0} + b_{z} q_{2} \\ b_{x} q_{2} & b_{x} q_{3} - 2 b_{z} q_{1} & b_{x} q_{0} - 2 b_{z} q_{2} & b_{x} q_{1} \end{matrix}] \end{array}

Then, the attitude produced by the gyroscope, accelerometer, and magnetometer can be fused according to the following equation

q_{e, k} = q_{e, k - 1} + ({\dot{q}}_{ω, k} - λ \frac{\nabla f (a^{b}, b^{n}, m^{b})}{‖ \nabla f (a^{b}, b^{n}, m^{b}) ‖}) Δ T

In the final step of the attitude update procedure, it is necessary to normalize the resultant quaternion q_e,k, and therefore, the calculated integrator may not be a unit quaternion. The attitude, which included pitch θ, roll γ, and yaw ψ, can be written as the following identity³³

[\begin{matrix} θ \\ γ \\ ψ \end{matrix}] = [\begin{matrix} arctan 2 (\frac{q_{2} q_{3} + q_{0} q_{1}}{q_{0}^{2} - q_{1}^{2} - q_{2}^{2} + q_{3}^{2}}) \\ arcsin (- 2 (q_{2} q_{3} - q_{0} q_{1})) \\ arctan 2 (\frac{q_{1} q_{2} - q_{0} q_{3}}{q_{0}^{2} + q_{1}^{2} - q_{2}^{2} - q_{3}^{2}}) \end{matrix}]

Data fusion based on the AKF algorithm

INS is a kind of autonomous navigation systems. Various navigation parameters could be computed in real time according to the specific force and angular rate measured in the IMU as long as the initial condition of navigation system is provided. When the position of robot is calculated by integration, the drift of position information can be caused by the error of acceleration values, which can cause the navigation to fail. Therefore, this article adds binocular vision to the system in order that the robot can carry out phase correction, which overcomes the disadvantages of a low-cost IMU.

In order to improve the navigation accuracy of the integrated binocular vision system and INS, an AKF for data fusion is designed. It fuses the current robot position acquired by the IMU and absolute localization from the binocular vision system, and therefore can achieve intermittent correction for the localization of INS. In the integrated navigation system, an AKF is used to adaptively adjust the motion state of the robot. An AKF is uncomplicated and hence improves the system efficiency.

For a KF to be applicable, the state vector of the target at time instant k is defined as $x_{k} = {[\begin{matrix} x_{(k)} & {\dot{x}}_{(k)} & y_{(k)} & {\dot{y}}_{(k)} \end{matrix}]}^{T}$ , in which $(x_{(k)}, y_{(k)})$ and $({\dot{x}}_{(k)}, {\dot{y}}_{(k)})$ are the position and velocity of the mobile robot severally. The state space equation can be obtained by the uniform acceleration motion equation

[\begin{matrix} x_{(k)} \\ {\dot{x}}_{(k)} \\ y_{(k)} \\ {\dot{y}}_{(k)} \end{matrix}] = [\begin{matrix} 1 & Δ T & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & Δ T \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{(k - 1)} \\ {\dot{x}}_{(k - 1)} \\ y_{(k - 1)} \\ {\dot{y}}_{(k - 1)} \end{matrix}] + [\begin{matrix} 0.5 Δ T^{2} & 0 \\ Δ T & 0 \\ 0 & 0.5 Δ T^{2} \\ 0 & Δ T \end{matrix}] [\begin{matrix} a_{x} \\ a_{y} \end{matrix}] + w_{k - 1}

Namely the state space model of this system is

x_{k} = A x_{k - 1} + B u_{k - 1} + w_{k - 1}

where A is the time-invariant state transition matrix, u_{k − 1}u_{k − 1} is the acceleration of mobile robot in the current moment measured by the accelerometer, and w_{k − 1} is a white noise sequence with its covariance matrix Q_k − 1 representing the process noise.

A linear model for this measurement can be given by

z_{k} = H x_{k} + v_{k}

where z_k is the observation vector containing the position in the time instant k, which is measured by binocular vision system, $H = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{matrix}]$ is the corresponding measurement matrix and v_k is a zero-mean white noise sequence with its covariance matrix R_k representing the measurement noise.

A block diagram that illustrates the principle of the proposed KF is summarized in Figure 6. It can be seen that the KF includes the time update (projection) and the state update (rectification) aspects. The projection equation reflects the change in the estimation value ${\hat{x}}_{k - 1}$ and the estimation error variance P_{k − 1} from k − 1 to k according to the change of the system, and the rectification equation reflects the calculation process of ${\hat{x}}_{k}$ when there are measurements z_k.

Figure 6.

Block diagram of the proposed KF. KF: Kalman filter.

In the experiment, the measurement period is large and the dynamic of the system is changing rapidly. Therefore, this article treats the computation period of these two kinds of equations differently. Assuming that the correction period is T, which is divided into N short period

T = N \cdot Δ T

where ΔT is the step of solving the prediction equation. A sampling schematic diagram of the AKF based on vision-aided inertial navigation is shown in Figure 7. The prediction calculation of the INS is at $t_{i} (i = 1, 2, \dots, N)$ , and the position updating of inertial navigation by the vision system is at $j T (j = 1, 2, \dots, k)$ . The values of N and j are obtained by experiments.

Figure 7.

Sampling schematic diagram of AKF. AKF: asynchronous Kalman filter.

Experiments and discussions

In order to evaluate the performance of the proposed algorithm, extensive tests have been performed practically. In this section, we introduce the SC-IMU hardware design, preparation for the experiments, as well as analysis of the positioning experimental results.

SC-IMU hardware design

In the binocular vision system and the INS, the two cameras are rigidly attached to the front of the mobile robot. The IMU controller is placed at the center of robot such that the navigation coordinate system is in accordance with the body coordinate system. It consists of the ST-Microelectronics STM32F103 (ARM Cortex-M3 32-bit CPU) and an IMU module (including a tri-axis magnetometer HMC5883L, and the MPU6050 embedded with a tri-axis MEMS gyroscope and a tri-axis MEMS accelerometer). The size of IMU module is 16 × 21 mm², which is shown in Figure 8(d). The binocular camera system and the IMU controller are directly connected to a laptop via a Universal Serial Bus (USB) port.

Figure 8.

The cameras used in the binocular vision system and the IMU controller for positioning and navigation. (a) Mobile robot in the experiment. (b) The cameras used in the binocular vision system. (c) The IMU controller. (d) The IMU module. IMU: inertial measurement unit.

The binocular system comprises of two webcams whose model is LifeCam HD-3000 with a resolution of 720 × 640 pixels. They are 55 mm apart and 740 mm off the ground. The MEMS gyroscope can function in four different ranges with a maximum of ±2000° s⁻¹ output, which is sufficient to sense the motion of high-speed rotation. The MEMS accelerometer can also functions at four different ranges with a maximum of ±2g output. In the experiment, the specifications of IMU module are shown in Table 1.

Table 1.

IMU module measured specifications.

Sampling frequency		125 Hz
Acc	Bias stability	(0.108 0.015 0.088) m ⋅ s⁻²
Acc	Standard deviation	(0.0027 0.0019 0.0021) m ⋅ s⁻²
Gyro	Bias stability	(0.057 0.226 −0.113) rad ⋅ s⁻¹
Gyro	Standard deviation	(0.031 0.062 0.035) rad ⋅ s⁻¹
Mag	Bias stability	(319 −18 −396) mGauss
Mag	Standard deviation	(2.09 1.77 3.55) mGauss

IMU: inertial measurement unit; Acc: accelerometer; Gyro: gyroscope; Mag: magnetometer.

Preparation of experiment

In this article, a feasible scheme of rapid positioning and navigation based on inertial measurements and binocular vision system is proposed by introducing fuzzy map into the mobile robot platform. This has been implemented and tested using Microsoft Visual Studio 2010. As shown in Figure 9, two fuzzy maps of indoor environment with a predefined path are designed, which contains a number of corners (solid black spot) whose world coordinates are known. The dashed line represents the path, following which the robot navigates to the destination in the absence of obstacles. In the process of navigating to the destination, the mobile robot follows a circulatory navigation control mode of inertial measurement, visual correction, and inertial measurement, namely:

Determining the initial position of the mobile robot in the fuzzy map using IMU to position at the beginning of the experiment.

Detecting and recognizing corners based on Harris-SIFT in a fuzzy map by employing the binocular vision system to calculate the position of robot according to the proposed estimation algorithm. The inertial measurement of position is fused using an AKF to modify the position of the mobile robot.

Positioning and navigating of the mobile robot using the inertial measurement sequentially, that is, executing steps (2) and (3) circularly until it reaches the destination.

Figure 9.

The fuzzy map of indoor environment with predefined path.

Before the mobile robot localization and navigation experiment, two groups of experiments had been carried out to test the performance of the binocular vision system and the IMU as follows.

Experiment 1: The mobile robot follows a straight line of 4.2 m in the interior corridor and calculates its attitude according to the gradient descent method. The error of location results is defined as

ε_{e} = \sqrt{{({\hat{x}}_{t} - x_{t})}^{2} + {({\hat{y}}_{t} - y_{t})}^{2}}

where ${\hat{x}}_{t}$ and ${\hat{y}}_{t}$ are the estimated location values, while x_t and y_t are the measured ground truth. The results of the short-distance experiment are shown in Figure 10. As depicted in Figure 10(b), the position error gradually increases over time, accumulated to 2.02 m.

Figure 10.

The short-distance experiment results. (a) Location results of IMU. (b) Positioning error. IMU: inertial measurement unit.

Experiment 2: In order to test the accuracy of positioning by the binocular vision system proposed in this article, we selected several certain points in the experimental environment and measured the 3D coordinates of points in millimeters relative to the camera by hand, as shown in Table 2. It can be seen that the positioning accuracy of binocular vision can meet the experimental requirements and can be used to correct the position of the INS.

Table 2.

Binocular measurement result and position error (mm).

	Test	Measure	Error
Point 1	(61, 90, 832)	(74, 97, 846)	(13, 7, 12)
Point 2	(−44, −19, 919)	(−22, −5, 924)	(22, 14, 15)
Point 3	(−23, −16, 1105)	(−38, −28, 1092)	(15, 12, 13)
Point 4	(−43, −26, 1249)	(−35, −10, 1237)	(8, 16, 12)

Real experiment and analysis

A two-wheel mobile robot equipped with the SC-IMU hardware module designed in this article was used as the experiment platform to collect data on line. Two dynamic indoor experiments were conducted on the sixth floor and the first floor of the laboratory, respectively, to validate the reliability and effectiveness of the proposed algorithm. This experiment presents a fusion of vision and inertial navigation in a GPS-denied condition. The vision information was fused at 1 Hz (i.e. vision information was employed for correction every second) on average and the output frequency of IMU is a fixed 125 Hz. We performed many tests where the mobile robot navigated in indoor environments along a predefined path. To illustrate the results of this experiment intuitively, following trajectories are given:

truth-value,

INS, and

fusion of binocular vision and INS.

Results of experiment 1 are shown in Figure 11, which corresponds to the predefined path in Figure 9(a). The route length of experiments is 32.5 m. The test process of this indoor experiment is shown in Figure 12. The estimated trajectory obtained by data fusion using the AKF algorithm is compared with the trajectory obtained by INS alone, the pure vision, and the measured ground truth. It can be seen that the red trajectory of the integrated system is the closest to the true path, and the positioning errors of the blue trajectory from INS alone are larger than the red trajectory from the integrated system. Figure 13 shows the velocity estimates that compare the integrated system with pure INS. The results prove that the proposed algorithm can improve the positioning accuracy of a mobile robot and can limit its velocity within a certain range.

Figure 11.

The results of dynamic indoor experiments, which corresponds to the predefined path in Figure 9(a). (a) Comparison of the estimated trajectories between the proposed fusion algorithm, INS, and truth-value. (b) Comparison of positioning errors between the proposed fusion algorithm and INS. INS: inertial navigation system.

Figure 12.

The test process of experiment in the indoor environment.

Figure 13.

Comparison of the estimated velocities, which corresponds to the predefined path in Figure 9(a). (a) x-direction velocity of the proposed fusion algorithm and INS. (b) y-direction velocity of the proposed fusion algorithm and INS. INS: inertial navigation system.

To demonstrate the utility of the proposed fusion method, trajectories estimated by standard KF and AKF are summarized in Figure 14. It is observed from Figure 14(b) that the proposed AKF has a higher positioning accuracy than standard KF.

Figure 14.

Comparison experiments between standard KF and the proposed AKF algorithm. (a) Trajectories estimated by standard KF and the proposed AKF algorithm. (b) Comparison of positioning errors.

Experimental results for another predefined path are shown in Figure 15, which corresponds to Figure 9(b). The route length of this experiment is 66.5 m. As expected, the positioning result of the mobile robot by the fusion of binocular vision system and INS shows a negligible deviation, as shown in Figure 15(b). The comparative evaluation shows that the proposed method can regulate the output from a pure INS-based method. While INS will very likely lead to the “drifting” problem, the proposed method can recover from large positioning errors and decrease them by data association. Figure 16 shows the velocity estimates which compare the integrated system with pure INS. This again proves that the integrated system can serve to improve the accuracy of localization.

Figure 15.

The results of dynamic indoor experiments, which corresponds to the predefined path in Figure 9(b). (a) Comparison of the estimated trajectories between the proposed fusion algorithm, INS, and truth-value. (b) Comparison of the positioning errors between the proposed fusion algorithm and INS. INS: inertial navigation system.

Figure 16.

Comparison of estimated velocities, which corresponds to the predefined path in Figure 9(b). (a) x-direction velocity of the proposed fusion algorithm and INS. (b) y-direction velocity of the proposed fusion algorithm and INS. INS: inertial navigation system.

To illustrate the utility of the proposed AKF, another comparison test between standard KF and the proposed method is shown in Figure 17. It can be seen that the positioning error of red from AKF is smaller, and this again proves the efficiency of the proposed fusion method.

Figure 17.

Comparison experiments between standard KF and the proposed AKF algorithm. (a) Trajectories estimated by standard KF and the proposed AKF algorithm. (b) Comparison of positioning errors. KF: asynchronous Kalman filter; AKF: asynchronous KF.

To demonstrate the rapidity of the proposed navigation method, another trial has been conducted in comparison with the employment of a stereo visual odometry (VO), as shown in Figure 18. It can be seen that the positioning error of the fusion of binocular vision and INS is larger than that of VO; however, the navigation time of the proposed method is much shorter. The results prove that the proposed algorithm can realize fast navigation.

Figure 18.

The results of comparison experiments. (a) Comparison of the estimated trajectories between the proposed fusion algorithm, VO, and truth-value. (b) Positioning error of the proposed fusion algorithm. (c) Positioning error of VO. VO: visual odometry.

Conclusion

This article presents a real-time system that integrates binocular vision and INS for positioning the control of a mobile robot, which can operate reliably in indoor environments. We propose an algorithm for the estimation of absolute robot positions based on a dedicated and integrated sensory system. The SC-IMU hardware is specifically designed for the system and enables it to acquire data via a combination of an accelerometer, a gyroscope, and a magnetometer. An AKF is proposed to fuse the information from inertial navigation and the data from the binocular vision system, which can mitigate errors by suppressing the drift of inertial measurements. Three dynamic indoor experiments were conducted to validate the reliability and effectiveness of the proposed algorithm. Compared with pure INS and VO, the results indicate that the proposed fusion method can significantly improve the positioning accuracy and can achieve fast navigation. It should be noted that although the proposed system can achieve high accuracy in most indoor environments, limiting factors such as no-stationary obstacles cannot be resolved which therefore remain to be investigated in our further study.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work is supported by four Projects from National Natural Science Foundation of China (60705035, 61075087, 61573263, 61273188), Scientific Research Plan Key Project of Hubei Provincial Department of Education (D20131105), Project supported by the Zhejiang Open Foundation of the Most Important Subjects, and supported by Science and Technology support plan of Hubei Province (2015BAA018), also supported by Zhejiang Provincial Natural Science Foundation (LY16F030007).

References

Chen

Song

. A novel fusion methodology to bridge GPS outages for land vehicle positioning. Meas Sci Technol 2015; 26(7): 96–109. DOI: 10.1088/0957-0233/26/7/075001.

Kim

Bazin

JC,

Choi

. Ground vehicle navigation in harsh urban conditions by integrating inertial navigation system, global positioning system, odometer and vision data. IET Radar Sonar Nav 2011; 5(8): 814–823. DOI: 10.1049/iet-rsn.2011.0100.

Bruckner

Krüger

Blume

. Reliable orientation estimation for mobile motion capturing in medical rehabilitation sessions based on inertial measurement units. Microelectron J 2014; 45(12): 1603–1611. DOI: 10.1016/j.mejo.2014.05.018.

Cao

Xiang

ZY,

Liu

. Perception in disparity: an efficient navigation framework for autonomous vehicle with stereo cameras. IEEE Trans Intell Transp Syst 2015; 16(5): 2935–2948. DOI: 10.1109/TITS.2015.2430896.

Jung

BJ,

Kim

. Kinematic analysis and motion planning for a planar multiarticulated omnidirectional mobile robot. IEEE/ASME Trans Mechatronics 2015; 20(6): 2983–2995. DOI: 10.1109/TMECH.2015.2416153.

Chi

Wang

Cheng

. Robot simultaneous localization and mapping using a calibrated kinect sensor. Sensors Mater 2014; 26(5): 353–364.

Lin

Tsai

. Design and implementation of a nine-axis inertial measurement unit. IEEE/ASME Trans Mechatronics 2012; 17(4): 657–668. DOI: 10.1109/TMECH.2012.2011378.

Saadeddin

Abdel-Hafez

MF,

Jarrah

. Estimating vehicle state by GPS/IMU fusion with vehicle dynamics. J Intell Robot Syst 2014; 74(1–2): 147–172. DOI: 10.1007/s10846-013-9960-1.

Zhang

Meng

. Use of an inertial/magnetic sensor module for pedestrian tracking during normal walking. IEEE Trans Instrum Meas 2015; 64(3): 776–783. DOI: 10.1109/TIM.2014.2349211.

10.

Zihajehzadeh

Lee

. Integration of MEMS inertial and pressure sensors for vertical trajectory determination. IEEE Trans Instrum Meas 2015; 64(3): 804–814. DOI: 10.1109/TIM.2014.2359813.

11.

Pieters

Jonker

. Direct motion planning for vision-based control. IEEE Trans Autom Sci Eng 2014; 11(4): 1282–1288. DOI: 10.1109/TASE.2014.2345954.

12.

Vasko

Peter

MGS

. Inertial navigation aided by vision-based simultaneous localization and mapping. IEEE Sensors J 2011; 11(8): 1646–1656. DOI: 10.1109/JSEN.2010.2103555.

13.

Zhang

Tan

Ziming

. Monocular camera and IMU integration for indoor position estimation. In: IEEE, editor. 2014 IEEE international conference on engineering in medicine and biology society, August 26–30 2014, pp. 1198–1201. Chicago, IL, USA: IEEE. DOI: 10.1109/EMBC.2014.6943811.

14.

Panahandeh

Huichinson

Händel

. Planar-based visual inertial navigation: observability analysis and motion estimation. J Intell Robot Syst 2016; 82(2): 277–299. DOI: 10.1007/s10846-015-0257-4.

15.

Pelczynski

Ostrowski

Rzeszotarski

. Motion vector estimation of a stereovision camera with inertial sensors. Metrol Meas Syst 2012; 19(1): 141–150. DOI: 10.2478/v10178-012-0013-z.

16.

Keivan

Sibley

. Asynchronous adaptive conditioning for visual-inertial SLAM. Int J Robot Res 2015; 34(13): 1573–1589. DOI: 10.1177/0278364915602544.

17.

Asadi

Bottasso

. Tightly-coupled stereo vision-aided inertial navigation using feature-based motion sensors. Adv Robot 2014; 28(11): 717–729. DOI: 10.1080/01691864.2013.870496.

18.

Mourikis

. High-precision, consistent EKF-based visual-inertial odometry. Int J Robot Res 2013; 32(6): 690–711. DOI: 10.1177/0278364913481251.

19.

Wang

Liu

YH,

. A sample and parallel algorithm for real-time robot localization by fusing monocular vision and odometry/AHRS sensors. IEEE/ASME Trans Mechatronics 2014; 19(4): 1447–1457. DOI: 10.1109/TMECH.2014.2298247.

20.

Duminda

IBR

Sudeep

Manjriker

. Vision-IMU integration using a slow-frame-rate monocular vision system in an actual roadway setting. IEEE Trans Intell Transp Syst 2010; 11(2): 256–266. DOI: 10.1109/TITS.2009.2038276.

21.

Xian

XP,

Lian

. Fusing stereo camera and low-cost inertial measurement unit for autonomous navigation in a tightly-coupled approach. J Nav 2015; 68(3): 434–452. DOI: 10.1017/S0373463314000848.

22.

Anastasios

. Vision-aided inertial navigation with line features and a rolling-shutter camera. In: IEEE, editor. 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), September 28–Oct 2 2015, pp. 892–899. Hamburg, Germany: IEEE. DOI: 10.1109/IROS.2015.7353477.

23.

Panahandeh

Jansson

. Vision-aided inertial navigation based on ground plane feature detection. IEEE/ASME Trans Mechatronics 2014; 19(4): 1206–215. DOI: 10.1109/TMECH.2013.2276404.

24.

Wang

Fan

Zeng

. Real time control system based on windows system. Chin J Comput Mod 2006; 12: 91–94.

25.

Bouguet

. Camera calibration toolbox for MATLAB. http://www.vision.caltech.edu/bouguetj/calib_doc/ (accessed 21 March 2016).

26.

Lowe

. Object recognition from local scale-invariant features. In: IEEE, editor. IEEE international conference on computer vision, 20–27 September 1999, pp. 1150–1157. Kerkyra, Greece: IEEE. DOI: 10.1109/ICCV.1999.790410.

27.

Lowe

. Distinctive image features from scale-invariant keypoints. Int J Comput Vision 2004; 60(2): 91–110.

28.

Bellavia

Tegolo

Valenti

. Improving Harris corner selection strategy. IET Comput Vision 2011; 5(2): 87–96. DOI: 10.1049/iet-cvi.2009.0127.

29.

Tony

. Scale-space theory: a basic tool for analysing structures at different scales. J Appl Statist 1994; 21(2): 224–270. DOI: 10.1080/757582976.

30.

Pan

. Velocity/position integration formula part II: Application to strapdown inertial navigation computation. IEEE Trans Aerosp Electron Syst 2013; 49(2): 1024–1034. DOI: 10.1109/TAES.2013.6494396.

31.

Joel

Dimitrios

Sean

. Consistency analysis and improvement of vision-aided inertial navigation. IEEE Trans Robot 2014; 30(1): 158–176. DOI: 10.1109/TRO.2013.2277549.

32.

Madgwick

SOH

Harrison

AJL,

Vaidyanathan

. Estimation of IMU and MARG orientation using a gradient descent algorithm. In: IEEE, editor. IEEE international conference on rehabilitation robotics, June 29–July 1 2011, p. 1–7. Zurich, Switzerland: IEEE. DOI: 10.1109/ICORR.2011.5975346.

33.

Diebel

. Representing attitude: Euler angles, unit quaternions, and rotation vectors. Matrix 2006; 58: 15–16.