Sensor to sensor calibration of the integrated INS/vision navigation system

Abstract

Due to the essential limitation of optical sensors, its integration with traditional Inertial Navigation System (INS) tends to be the focus in indoor navigation applications. In a low-cost INS/vision integrated navigation system, the relative position and gestures between slave systems are important coefficients. In order to solve the initial bias estimation problems involved in INS/vision integrated system, this article proposes a novel alignment approach based on the time-domain constraints. At first, on the basis of traditional initial alignment model, the time-domain constrained model is deduced, in which the time-related states and measurements are all modeled. In order to verify the advantage of state evaluability, both the traditional alignment model and the corresponding time-domain constrained model are analyzed via the nonlinear observability analysis method. At the end of this article, two groups of numerical simulations are implemented, and the corresponding results validate the effectiveness of the proposed time-domain constrained model.

Keywords

Multisensor fusion aided inertial navigation system integrated navigation initial calibration

Introduction

Visual navigation has received a lot of research interest in recent years for a number of reasons. Cameras are cheap, light, and have low power requirements when compared to other sensors for localization such as laser scanners.^1,2 This makes them attractive sensors for low-cost situations or for applications where size and weight need to be tightly controlled, such as for micro aerial vehicles in a global navigation satellite system (GNSS)-denied environment.³ In addition, cameras are also passive sensors, unlike sonar, radar, and laser scanners, which makes them useful in surveillance applications as they are difficult to detect and do not interfere with the environment they are observing. In addition, a large amount of information can also be obtained from a sequence of images which allows the motion of the platform to be tightly constrained and for highly detailed maps to be constructed. This is especially useful when loop closure occurs as it can be easily detected by image matching.

Traditional visual navigation usually uses a stereo visual system, which can directly provide the 3-dimensional (3-D) information of circumstance, and the position of cameras can be easily estimated by utilizing the visual difference come from multiple vision sensors.⁴ Whereas the accuracy of stereo visual navigation is limited by the length of baseline, this problem is crucial especially in applications that the baseline is seriously limited, such as remote sensing, micro-unmanned aerial vehicles (UAVs), and so on. Therefore, the monocular visual navigation tends to be more general and commonly used.

According to the previous contributions, monocular visual navigation has been carried out either by local optimization of key frames⁵ or by filtering.^6
–8 Since cameras are projective sensors providing bearing-only observations, observations from a single image cannot provide an estimate of the range to features, and if a single camera is the only sensor used, the true scale of the position and mapping is not observable, no matter which class of navigation algorithm is chose. This is one of the main drawbacks of using cameras for navigation.^9,10 Comparatively, an Inertial Navigation System (INS) is capable of tracking the position, velocity, and attitude of a vehicle. This dead-reckoning process, however, cannot be used over extended periods of time because the errors in the computed estimates continuously increase. The high-dynamic motion measurements of the inertial measurement unit (IMU) are used to support the vision algorithms by providing accurate predictions where features can be expected in the upcoming frame.¹¹ The combination of vision and inertial sensors is very suitable for a wide range of robotics applications.^12,13

In the aforementioned traditional integrated navigation approaches, due to the large number and rate of observations from cameras and inertial unit, there exists large amount of computationally intensive processing that needs to be performed on the images to extract and match feature points.⁷ Yang and Shen¹⁴ introduce a probabilistic, optimization-based initialization method, where the sensors work under a tightly coupled structure. This article proposes a novel time-domain-related model with less computational load. On the basis of this model, we also provide an effective solution about this nonlinear optimization problem.

Problem formulation

In this section, we will give a more formal formulation of the problem we are solving. The coordinate frames that are used are introduced as follows:

Camera frame (C): This coordinate frame is attached to the moving camera. Its origin is located in the optical center of the camera, with the z-axis pointing along the optical axis.

Body frame (B): This is the coordinate frame of the strapdown IMU, and it is rigidly connected to the C frame. All the inertial measurements are resolved in this coordinate frame.

Image frame (I): This is the two-dimensional coordinate frame of the camera images. It is located on the image plane, which is perpendicular to the optical axis.

World frame (W): This is the only static coordinate that is involved in this article, and it is seen as the reference frame. The pose of all the aforementioned frames is estimated with respect to W frame. The 3-D feature positions are, without loss of generality, assumed to be constant and known in this frame. It is fixed to the environment and can be aligned in any direction. However, preferably it should be vertically aligned.

$R_{a}^{b}$ = Direction cosine matrix between a and b frame.

$q_{a}^{b}$ = Rotation quaternion from a to b frame.

$r_{a}^{b}$ = Relative displacement vector between a and b frame.

$b_{g}$ , $b_{a}$ = Instrumental bias of gyros and accelerators in b frame.

$V_{a}^{b}$ = Relative velocity vector between a and b frame, m/s.

$ω_{a}^{b}$ = Relative angular velocity between a and b frame, rad/s.

${{}^{c}r}_{a}^{b}, {{}^{c}V}_{a}^{b}, {{}^{c}ω}_{a}^{b}$ = Description of $r_{a}^{b}, V_{a}^{b}, ω_{a}^{b}$ in c frame.

$C_{k}$ = Camera optical-centered coordinate system in kth step.

$I_{k}$ = Image coordinate system in kth step.

$Π_{C} (k)$ = Optical-centered plane, xoy plane of $C_{k}$ .

$Π_{I} (k)$ = Image plane, xoy plane of $I_{k}$ .

The traditional alignment algorithm usually contains two steps, which are named as filter initialization and estimation. The basic model of the integrated system is introduced as follows.

In this article, the state of the system is selected as

\begin{array}{l} x (t) = [\begin{matrix} {({{}^{W}r}_{W}^{B} (t))}^{T} & {(q_{B}^{W} (t))}^{T} & {({}^{W}v a_{B}^{W} (t))}^{T} & {({}^{W}g)}^{T} & {(b_{g} (t))}^{T} \end{matrix} \\ {\begin{matrix} {(b_{a} (t))}^{T} & {({}^{C}r_{C}^{B})}^{T} & {(q_{C}^{B})}^{T} & {(ω_{B}^{} (t))}^{T} & {({}^{W}a a_{B}^{W} (t))}^{T}] \end{matrix}}^{T} \end{array}

The dynamic process of state can be depicted as

{}^{W}{\dot{r}}_{W}^{B} = {}^{w}v_{B}^{W}, {\dot{q}}_{B}^{W} = \frac{1}{2} Ω ({(ω_{B}^{})}^{T}) q_{B}^{W}, {}^{W}{\dot{v}}_{B}^{W} = {}^{W}a_{B}^{W}

{}^{W}{\dot{g}} = 0_{3 \times 1}, {\dot{b}}_{g} = n_{g ω}, {\dot{b}}_{a} = n_{a ω}, {}^{C}{\dot{r}}_{C}^{B} = 0_{3 \times 1}, {\dot{q}}_{C}^{B} = 0_{4 \times 1}

In equation (3), the bias of gyros and accelerators is assumed as first-order Markov processes, and then the derivative of them can be written as the white noise, as presented by $n_{g ω}$ and $n_{a ω}$ . The motion of the whole system is supposed to satisfy the invariant acceleration and invariant angular velocity model, and then the angular velocity and derivative of acceleration are supposed to be white noise, $n_{ω}$ and $n_{a}$ . Then we have

ω_{B} = n_{ω}, {}^{W}{\dot{a}}_{B}^{W} = n_{a}

In equation (2), $Ω (ω)$ is the quaternion matrix, and it can be written as¹⁵

Ω (ω) = [\begin{matrix} 0 & - ω^{T} \\ ω & - [ω \times] \end{matrix}]

Equations (2) to (4) are the traditional state equations of the integrated system. In INS system, the angular velocity and the acceleration can be measured directly in the body coordinate

_{m} ω_{B}^{} = ω_{B}^{} + ω_{cmd} + b_{g}

_{m} a_{B}^{W} = {(R (q_{I}^{w}))}^{T} (^{W} a_{B}^{W} - {}^{W}g) + b_{a}

Herein, $R (q)$ is the rotation matrix corresponding to the quaternion and $ω_{cmd}^{}$ is the command angular velocity of the INS platform. As previous researchers did, we also select the feature points in sampled images as the measurements of the filter. Based on the pinhole camera model, the measurement equations can be written as

y_{i} = [\begin{matrix} u_{i} \\ v_{i} \\ 1 \end{matrix}] = K \cdot {[\begin{matrix} \frac{x_{i}}{z_{i}} & \frac{y_{i}}{z_{i}} & 1 \end{matrix}]}^{T}

and

{[\begin{matrix} x_{i} & y_{i} & z_{i} \end{matrix}]}^{T} = {}^{C}r_{C}^{p_{i}} = R_{B}^{C} \cdot R_{W}^{B} ({}^{W}r_{W}^{p_{i}} - {}^{W}r_{W}^{B}) + {}^{C}r_{C}^{B} + η_{i}

where K is the intrinsic matrix, $η_{i}$ is the measuring noise of feature points, which is supposed as white noise sequence with variance $R_{i}$ . If many features are measured, the corresponding measurement is $y = {[\begin{matrix} y_{1}^{T} & ... & y_{n}^{T} \end{matrix}]}^{T}$ , and the measuring noise matrix is $R = diag (R_{1}, ..., R_{n})$ . Now the whole measurements can be written as

h_{0} = {[\begin{matrix} _{m} ω_{B}^{} & _{m} a_{B}^{W} & y_{i} \end{matrix}]}^{T}

In the aforementioned integrated model, there exist some error elements, such as inertial device error, image feature errors, and so on. In order to obtain more accurate initial alignment result, this article considers a novel time-domain constrained optimized model in the following section.

Time-domain constraints description

In Figure 1, the relationship between the coordinate frames is illustrated. The camera and the IMU are rigidly connected, that is, $R_{B}^{C}$ and $r_{B}^{C}$ are constant.

Figure 1.

The sensor unit, shown at two time instants, t(k-1) and t(k), consists of an IMU (B frame) and a camera (C frame). These frames are rigidly connected. The position of the sensor unit with respect to the world frame (W) changes over time as the unit is moved.

According to the two time-step motion, the relationship between the pose coefficients can be modeled as follows.

At first, in two adjacent time steps, the relation of rotation matrix is

R_{C_{k - 1}}^{C_{k}} = R_{B_{k}}^{C_{k}} \cdot R_{B_{k - 1}}^{B_{k}} \cdot R_{C_{k - 1}}^{B_{k - 1}}

Note that every rotation matrix R is corresponding to a gesture vector, which is depicted as $ϕ (R_{C_{k - 1}}^{C_{k}})$ , for example. The conjunction of INS and camera is supposed as rigid; therefore, the INS and the camera have the same angular velocity, and we have

ϕ (R_{C_{k - 1}}^{C_{k}}) = ω_{B}^{} \cdot Δ t

Suppose the camera-gesture-related measurement is $h_{1}$ , then the corresponding measuring equation is

h_{1} = ω_{B}^{} \cdot Δ t

In addition, the transition vector of this integrated system has the following constraints

\begin{matrix} ^{W} r_{C_{k - 1}}^{C_{k}} =^{W} r_{C_{k - 1}}^{B_{k - 1}} +^{W} r_{B_{k - 1}}^{B_{k}} +^{W} r_{B_{k}}^{C_{k}} \\ = R_{C_{k - 1}}^{W}^{C_{k - 1}} r_{C_{k - 1}}^{B_{k - 1}} - R_{C_{k}}^{W}^{C_{k}} r_{C_{k}}^{B_{k}} + (^{W} r_{W}^{B_{k}} -^{W} r_{W}^{B_{k−1}}) \end{matrix}

According to the rigid conjunction, suppose we have

^{C_{k - 1}} r_{C_{k - 1}}^{B_{k - 1}} =^{C_{k}} r_{C_{k}}^{B_{k}} =^{C} r_{C}^{B}

The second group of measurements is the camera transition–related constraints, which can be written as

h_{2} = R_{C_{k - 1}}^{W} (I_{3 \times 3} - R_{C_{k}}^{C_{k - 1}})^{C_{k}} r_{C}^{B} +^{W} v_{B}^{W} \cdot Δ t

Equations (13) and (16) form the measurements in traditional INS/vision integrated system. In fact, we can also utilize more time-domain constraints in longer image sequence to improve the ego-motion estimation accuracy

h_{1} {\begin{matrix} R_{C_{k - 1}}^{C_{k}} = R_{B_{k}}^{C_{k}} \cdot R_{B_{k - 1}}^{B_{k}} \cdot R_{C_{k - 1}}^{B_{k - 1}} \\ R_{C_{k - 2}}^{C_{k}} = R_{B_{k}}^{C_{k}} \cdot R_{B_{k - 2}}^{B_{k}} \cdot R_{C_{k - 2}}^{B_{k - 2}} \\ ... \\ R_{C_{k - n}}^{C_{k}} = R_{B_{k}}^{C_{k}} \cdot R_{B_{k - n}}^{B_{k}} \cdot R_{C_{k - n}}^{B_{k - n}} \end{matrix}

h_{2} {\begin{matrix} ^{W} r_{C_{k - 1}}^{C_{k}} = R_{C_{k - 1}}^{W} (I_{3 \times 3} - R_{C_{k}}^{C_{k - 1}})^{C_{k}} r_{C}^{B} + \int_{k - 1}^{k}^{W} v_{B}^{W} \cdot d t \\ ^{W} r_{C_{k - 2}}^{C_{k}} = R_{C_{k - 2}}^{W} (I_{3 \times 3} - R_{C_{k}}^{C_{k -2}})^{C_{k}} r_{C}^{B} + \int_{k - 2}^{k}^{W} v_{B}^{W} \cdot d t \\ ... \\ ^{W} r_{C_{k - n}}^{C_{k}} = R_{C_{k - n}}^{W} (I_{3 \times 3} - R_{C_{k}}^{C_{k -n}})^{C_{k}} r_{C}^{B} + \int_{k - n}^{k}^{W} v_{B}^{W} \cdot d t \end{matrix}

Equations (17) and (18) constitute the time-domain constraint-related measurements.

Observability analysis

On the basis of the time-domain constrained model, the model estimation problem is analyzed in this section. In the state estimation process, system observability is an important issue in state estimation problem. Observability analysis provides a direct understanding of the fundamental limits of the obtainable solutions, regardless of process and measurement noises. The standard pose estimation formulation is a strong nonlinear process.¹⁶ For this analysis, the nonlinear system will be approximated at each time step as a linear system, always along with time-varying coefficients. Researchers usually use the Hermann approach to analyze the observability of nonlinear system.¹⁷

Unlike the traditional integrated model, the time-domain constrained model has more time-related measurements, as depicted in the previous section. The involved measurements can be presented as $h = {[\begin{matrix} h_{0} & h_{1} & h_{2} & h_{3} \end{matrix}]}^{T}$ . Herein, $h_{0}$ contains three elements: $h_{01} =_{m} ω_{B}$ , $h_{02} =_{m} a_{B}^{W}$ , and $h_{03} = y_{i}$ . $h_{3}$ contains two components: $h_{31} = {(q_{B}^{W})}^{T} (q_{B}^{W})$ and $h_{32} = {(q_{C}^{B})}^{T} (q_{C}^{B})$ . All the Lie derivatives of the above elements can be written as

L^{0} h_{01} = ω_{B}^{} + b_{g}

L^{0} h_{02} = R {(q_{B}^{w})}^{T} (^{W} a_{B}^{W} - {}^{W}g) + b_{a}

L^{0} h_{03} = y_{i}

L^{0} h_{1} = ω_{B}^{} \cdot Δ t

L^{0} h_{2} = R_{C_{k - 1}}^{W} (I_{3 \times 3} - R_{C_{k}}^{C_{k - 1}})^{C} r_{C}^{B} +^{W} V_{B}^{W} \cdot Δ t

L^{0} h_{3} = {[\begin{matrix} {(q_{B}^{W})}^{T} (q_{B}^{W}) & {(q_{C}^{B})}^{T} (q_{C}^{B}) \end{matrix}]}^{T}

Their gradients are as follows

\nabla L^{0} h_{01} = [0_{3 \times 3} 0_{3 \times 4} 0_{3 \times 3} 0_{3 \times 3} I_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 4} I_{3 \times 3} 0_{3 \times 3}]

\begin{array}{l} \nabla L^{0} h_{02} = [0_{3 \times 3} {(J (R (q_{B}^{w})))}^{T} \cdot (^{W} a_{B}^{W} - {}^{W}g) 0_{3 \times 3} \\ - R {(q_{B}^{w})}^{T} 0_{3 \times 3} I_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} R {(q_{B}^{w})}^{T}] \end{array}

\nabla L^{0} h_{03} = [\frac{\partial y_{i}}{\partial^{W} r_{W}^{B}} \frac{\partial y_{i}}{\partial q_{B}^{W}} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} \frac{\partial y_{i}}{\partial^{W} r_{C}^{B}} \frac{\partial y_{i}}{\partial q_{C}^{B}} 0_{3 \times 3} 0_{3 \times 3}]

\nabla L^{0} h_{1} = [0_{3 \times 3} 0_{3 \times 4} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} Δ t \cdot I_{3 \times 3} 0_{3 \times 3}]

\begin{array}{l} \nabla L^{0} h_{2} = [0_{3 \times 3} \frac{\partial R_{C_{k - 1}}^{W}}{\partial q_{B}^{W}} (I_{3 \times 3} - R_{C_{k}}^{C_{k - 1}})^{C} r_{C}^{B} Δ t \cdot I_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} 0_{3 \times 3} \\ R_{C_{k - 1}}^{W} (I_{3 \times 3} - R_{C_{k}}^{C_{k - 1}}) \frac{\partial R_{C_{k - 1}}^{W}}{\partial q_{C}^{B}} (I_{3 \times 3} - R_{C_{k}}^{C_{k - 1}})^{C} r_{C}^{B} 0_{3 \times 3} 0_{3 \times 3}] \end{array}

\begin{array}{l} \nabla L^{0} h_{3} = [0_{1 \times 3} 2 \cdot {(q_{B}^{W})}^{T} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 4} 0_{1 \times 3} 0_{1 \times 3} \\ 0_{1 \times 3} 0_{1 \times 4} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 3} 0_{1 \times 3} 2 \cdot {(q_{C}^{B})}^{T} 0_{1 \times 3} 0_{1 \times 3}] \end{array}

Herein, $\frac{\partial y_{i}}{\partial q_{B}^{W}} = R_{B}^{C} \cdot J (R {(q_{B}^{W})}^{T}) (^{W} r_{W}^{p_{i}} -^{W} r_{W}^{B})$ , $\frac{\partial y_{i}}{\partial q_{C}^{B}} = J (R {(q_{B}^{C})}^{T}) \cdot R_{W}^{B} \cdot ({}^{W}r^{p_{i}} W - {}^{W}r^{B} W)$ , $\frac{\partial y_{i}}{\partial^{W} r_{W}^{B}} = - R_{W}^{C}$ , and $\frac{\partial y_{i}}{\partial^{W} r_{C}^{B}} = I_{3 \times 3}$ . $J (R (q))$ is the Jacobin matrix. The nonzero elements of the above terms constitute the observability judgment matrix O

O = {[\begin{array}{l} \begin{matrix} \nabla L^{0} h_{01} & \nabla L^{0} h_{02} & \nabla L^{0} h_{03} & \nabla L^{0} h_{1} & \nabla L^{0} h_{2} & \nabla L^{0} h_{3} & \nabla L_{f}^{1} h_{02}, \end{matrix} \\ \begin{matrix} \nabla L_{f}^{1} h_{03} & \nabla L_{f}^{1} h_{2} & \nabla L_{f}^{1} h_{31} & \nabla L_{f}^{2} h_{02} & \nabla L_{f}^{2} h_{03} & \nabla L_{f}^{2} h_{2} & \nabla L_{f}^{2} h_{31} & \nabla L_{f}^{3} h_{03} \end{matrix} \end{array}]}^{T}

Under the same conditions, the observability judgment matrix $O_{1}$ of the traditional nonconstraint integrated model is

O_{1} = {[\begin{matrix} \nabla L^{0} h_{01} & \nabla L^{0} h_{02} & \nabla L^{0} h_{03} & \nabla L_{f}^{1} h_{02} & \nabla L_{f}^{1} h_{03} & \nabla L_{f}^{2} h_{02} & \nabla L_{f}^{2} h_{03} & \nabla L_{f}^{3} h_{03} \end{matrix}]}^{T}

From the above judgment matrix, we can find that in the initial alignment process, the motion dynamic of the integrated platform will affect the observability results. Thus, we consider this problem in different dynamics. At first, we suppose the rotation velocity is ${[\begin{matrix} 0 & 0 & 1 \end{matrix}]}^{T}$ rad/s, the transition velocity is zero, the initial position and the gesture quaternion are ${[\begin{matrix} 1 & 1 & 1 \end{matrix}]}^{T}$ and ${[\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}]}^{T}$ , respectively, and the acceleration is supposed to be zero for simplicity. Then the observability can be obtained from the rank calculation of the judgment matrix. In this situation, the matrix rank of the traditional integrated model is 26, which suggests that the model is not completely observable. While in the time-domain constrained model, the matrix rank is 32, thus the judgment matrix is full rank and all the states are observable in the proposed model.

In addition, we also consider the more rigid situation, where the rotation velocity is zero during the initial alignment process. Under this condition, the two involved models are both not full rank. That’s to say, in order to estimate the initial alignment coefficients of the INS/vision integrated system successfully, the rotation is necessary in different approaches. Kelly proved that if the rotation and transition are both included in the system motion, the initial alignment model is observable completely, as described in the study by Kelly and Sukhatme.¹⁸ Compared with Kelly’s work, this article can simplify the initial alignment process by introducing the time-domain-related measurements. Even in the zero velocity situations, if the rotation is provided, the complete observability is guaranteed. This approach takes more advantages in applications that the large range maneuver is inappropriate.

Experiments validation

Simulations

In order to verify the effectiveness of the proposed time-domain constrained approach, we implement two groups of simulations in different dynamics. At first, the simulations of the integrated navigation system are implemented, where the ground truth of sensor to sensor bias, instrument coefficients are all known. The basic indicator of INS is described in Table 1.

Table 1.

Accuracy parameters of inertial navigation system.

Gyros			Accelerators
Constant drift (°/h)	Random walk (°/h)	Related time (s)	Stochastic bias (μg)	Random drift (μg)	Related time (s)
10	5	3000	100	50	300

The intrinsic coefficients of the camera are supposed as

K = [\begin{matrix} 1635 & 0 & 639 \\ 0 & 1635 & 512 \\ 0 & 0 & 1 \end{matrix}]

In the first group of experiments, the system is supposed as static, and the true value of relative position is set as $^{W} {\hat{r}}_{C}^{B} = {[\begin{matrix} 2 & 2 & 2 \end{matrix}]}^{T}$ m and $G_{C}^{B} = {[\begin{matrix} 5 & 5 & 5 \end{matrix}]}^{T}$ °. The initial guess of system state is $^{W} {\hat{r}}_{C}^{B} (0) = {[\begin{matrix} 1 & 1 & 1 \end{matrix}]}^{T}$ m and ${\hat{G}}_{C}^{B} = {[\begin{matrix} 0 & 0 & 0 \end{matrix}]}^{T}$ . In this situation, the estimation results of relative position and gesture are depicted in Figure 2.

Figure 2.

Initial alignment results in static circumstance: (a) estimation results of relative position; (b) estimation results of relative gesture.

From Figure 2, if the system is static during the whole alignment process, the initial bias of state guess is unable to be corrected well no matter which algorithm is selected. That’s to say, in the static situation, it is hard to implement initial alignment successfully, and it is well identical with the observability analysis results in the former section. In the following second group of experiments, the rotation velocity is supposed to be 10°/s, and the equipment indicator is the same as that in the first simulation, as depicted in Table 1. Under this condition, the corresponding estimation results are depicted in Figure 3.

Figure 3.

Initial alignment results in dynamic circumstance: (a) estimation results of relative position; (b) estimation results of relative gesture.

From the aforementioned simulations, we can find that provided the rotation motion is guaranteed, the initial alignment is able to be implemented without other sensors. The proposed time-domain constrained model is able to obtain higher accuracy in the initial alignment process.

Experiments

In this section, we implement a group of experiment on the basis of the data sampled from real equipment. The integrated system contains a micro IMU and industrial camera. In the experimental system, the random walk of gyro is 60°/h, the stochastic bias of accelerator is 1 mg, the relative position between camera and IMU is $ϕ_{c}^{b} = [\begin{matrix} 0 & 0 & 0 \end{matrix}]$ °, and $r_{c}^{b} = [\begin{matrix} - 14.5 & - 6.5 & - \end{matrix}]$ mm. It is necessary to remind here that the reference truth of sensor to sensor biases, which contains both orientation bias and position references, is measured from equipment positions in install process. The internal structure drawing of micro IMU is also referred in the measuring process. Because of the lack of the optical center in optical axis, the position bias of these two sensors is unable to measure accurately. The corresponding experimental results are depicted in Figure 4.

Figure 4.

Initial alignment estimation results of relative position between INS and camera: (a) estimation results of relative position; (b) estimation results of relative gesture.

This article also provides a demo about the experimental process. Please see the attachment file for more details. From the experiments results, we can find that the time-domain constraints are able to improve the estimation accuracy of the initial alignment estimation.

Conclusion

According to the positions constraints existed in the INS and visual system during different time steps, this article proposes a novel constrained model on the basis of the traditional parameterization. In addition, the time-domain optimization is also designed. From this article, we can arrive at three conclusions as follows:

The proposed approach needs no more additional sensors and measuring equipment other than a commonly used chessboard.

The result of observability analysis on two integrated model suggests that the system observability is related to the moving dynamics. Higher dynamic motion is able to improve the state estimation accuracy through the higher observability.

Compared with the traditional initial alignment model, the time-domain constrained parameterization is able to improve the system observability. In the proposed alignment process, the system states are completely observable even only the rotation motion is guaranteed, which will improve the availability of INS/vision integrated system especially in the applications that the system is hard to implement large-scale maneuver.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China under Grants 61403398 and 61673017.

References

Waldmann

Silva

Chagas

. Observability analysis of inertial navigation errors from optical flow subspace constraint. Inf Sci 2016; 327: 300–326.

Yang

Sun

Wang

. Simultaneous estimation of ego-motion and vehicle distance by using a monocular camera. Sci Chin Inf Sci 2014; 57(5): 1–10.

Liao

Yang

Liu

. Time domain optimisation for monocular visual navigation: moving horizon approach. Imaging Sci J 2015; 63(6): 332–338.

Chun

Won

Heo

. Performance analysis of an INS/SLAM integrated system with respect to the geometrical arrangement of multiple vision sensors. Int J Control Autom Syst 2012; 10(2): 288–297.

Mouragnon

Lhuillier

Dhome

. Generic and real-time structure from motion using local bundle adjustment. Image Vis Comput 2009; 27(8): 1178–1193.

Davison

. Real-time simultaneous localisation and mapping with a single camera. In: Proceedings of the ninth IEEE international conference on computer vision, Nice, France, 13–16 October 2003, pp. 1403–1410.

Eade

Drummond

. Monocular SLAM as a graph of coalesced observations. In: IEEE 11th international conference on computer vision, Rio deJaneiro, Brazil, 14–21 October 2007, pp. 1–8. United states of America: Institute of Electrical and Electronics Engineers Inc.

Mingyang

Anastasios

. Mourikis, 3-D motion estimation and online temporal calibration for camera-IMU systems. IEEE Int Conf Robot Autom 2013; 135(43): 5709–5716.

Bailey

. Mobile robot localisation and mapping in extensive outdoor environments. PhD Thesis, The University of Sydney, Sydney, Australia, 2002.

10.

Bosse

Newman

Leonard

. An atlas framework for scalable mapping. In: IEEE international conference on robotics and automation, 2003, pp. 1899–1906.

11.

Johnson

Willson

Goguen

. Field testing of the mars exploration rovers descent image motion estimation system. In: Proceeding of IEEE international conference on robotics and automation, Barcelona, Spain, 18–22 April 2005, pp. 4463–4469. United states of America: Institute of Electrical and Electronics Engineers Inc.

12.

Lobo

Soderstrom

. Relative pose calibration between visual and inertial sensors. Int J Robot Res J 2007; 26(6): 561–575.

13.

Mirzaei

Roumeliotis

. A Kalman filter-based algorithm for IMU-camera calibration: Observability analysis and performance evaluation. IEEE Trans Robot 2008; 24(5): 1143–1156.

14.

Yang

Shen

. Monocular visual-inertial state estimation with online initialization and camera-IMU extrinsic calibration. IEEE Trans Autom Sci Eng 2016; 99: 1–13.

15.

Titterton

Weston

JL.

Strapdown inertial navigation technology. 2nd ed, The Institution of Electrical Engineers, 2004, pp. 39–44. England: Institution of Engineering and Technology.

16.

Yang

Wang

. Performance enhancement of large-ship transfer alignment: a moving horizon approach. J Navig 2013; 66(1): 17–33.

17.

Hermann

Krener

. Nonlinear controllability and observability. IEEE Trans Autom Control 1977; 22(5): 728–740.

18.

Kelly

Sukhatme

. Visual-inertial sensor fusion: localization, mapping and sensor-to-sensor self-calibration. Int J Robot Res 2011; 30(1): 30–56.