Abstract
Due to the essential limitation of optical sensors, its integration with traditional Inertial Navigation System (INS) tends to be the focus in indoor navigation applications. In a low-cost INS/vision integrated navigation system, the relative position and gestures between slave systems are important coefficients. In order to solve the initial bias estimation problems involved in INS/vision integrated system, this article proposes a novel alignment approach based on the time-domain constraints. At first, on the basis of traditional initial alignment model, the time-domain constrained model is deduced, in which the time-related states and measurements are all modeled. In order to verify the advantage of state evaluability, both the traditional alignment model and the corresponding time-domain constrained model are analyzed via the nonlinear observability analysis method. At the end of this article, two groups of numerical simulations are implemented, and the corresponding results validate the effectiveness of the proposed time-domain constrained model.
Introduction
Visual navigation has received a lot of research interest in recent years for a number of reasons. Cameras are cheap, light, and have low power requirements when compared to other sensors for localization such as laser scanners. 1,2 This makes them attractive sensors for low-cost situations or for applications where size and weight need to be tightly controlled, such as for micro aerial vehicles in a global navigation satellite system (GNSS)-denied environment. 3 In addition, cameras are also passive sensors, unlike sonar, radar, and laser scanners, which makes them useful in surveillance applications as they are difficult to detect and do not interfere with the environment they are observing. In addition, a large amount of information can also be obtained from a sequence of images which allows the motion of the platform to be tightly constrained and for highly detailed maps to be constructed. This is especially useful when loop closure occurs as it can be easily detected by image matching.
Traditional visual navigation usually uses a stereo visual system, which can directly provide the 3-dimensional (3-D) information of circumstance, and the position of cameras can be easily estimated by utilizing the visual difference come from multiple vision sensors. 4 Whereas the accuracy of stereo visual navigation is limited by the length of baseline, this problem is crucial especially in applications that the baseline is seriously limited, such as remote sensing, micro-unmanned aerial vehicles (UAVs), and so on. Therefore, the monocular visual navigation tends to be more general and commonly used.
According to the previous contributions, monocular visual navigation has been carried out either by local optimization of key frames 5 or by filtering. 6 –8 Since cameras are projective sensors providing bearing-only observations, observations from a single image cannot provide an estimate of the range to features, and if a single camera is the only sensor used, the true scale of the position and mapping is not observable, no matter which class of navigation algorithm is chose. This is one of the main drawbacks of using cameras for navigation. 9,10 Comparatively, an Inertial Navigation System (INS) is capable of tracking the position, velocity, and attitude of a vehicle. This dead-reckoning process, however, cannot be used over extended periods of time because the errors in the computed estimates continuously increase. The high-dynamic motion measurements of the inertial measurement unit (IMU) are used to support the vision algorithms by providing accurate predictions where features can be expected in the upcoming frame. 11 The combination of vision and inertial sensors is very suitable for a wide range of robotics applications. 12,13
In the aforementioned traditional integrated navigation approaches, due to the large number and rate of observations from cameras and inertial unit, there exists large amount of computationally intensive processing that needs to be performed on the images to extract and match feature points. 7 Yang and Shen 14 introduce a probabilistic, optimization-based initialization method, where the sensors work under a tightly coupled structure. This article proposes a novel time-domain-related model with less computational load. On the basis of this model, we also provide an effective solution about this nonlinear optimization problem.
Problem formulation
In this section, we will give a more formal formulation of the problem we are solving. The coordinate frames that are used are introduced as follows: Camera frame (C): This coordinate frame is attached to the moving camera. Its origin is located in the optical center of the camera, with the z-axis pointing along the optical axis. Body frame (B): This is the coordinate frame of the strapdown IMU, and it is rigidly connected to the C frame. All the inertial measurements are resolved in this coordinate frame. Image frame (I): This is the two-dimensional coordinate frame of the camera images. It is located on the image plane, which is perpendicular to the optical axis. World frame (W): This is the only static coordinate that is involved in this article, and it is seen as the reference frame. The pose of all the aforementioned frames is estimated with respect to W frame. The 3-D feature positions are, without loss of generality, assumed to be constant and known in this frame. It is fixed to the environment and can be aligned in any direction. However, preferably it should be vertically aligned.
The traditional alignment algorithm usually contains two steps, which are named as filter initialization and estimation. The basic model of the integrated system is introduced as follows.
In this article, the state of the system is selected as
The dynamic process of state can be depicted as
In equation (3), the bias of gyros and accelerators is assumed as first-order Markov processes, and then the derivative of them can be written as the white noise, as presented by
In equation (2),
Equations (2) to (4) are the traditional state equations of the integrated system. In INS system, the angular velocity and the acceleration can be measured directly in the body coordinate
Herein,
and
where
In the aforementioned integrated model, there exist some error elements, such as inertial device error, image feature errors, and so on. In order to obtain more accurate initial alignment result, this article considers a novel time-domain constrained optimized model in the following section.
Time-domain constraints description
In Figure 1, the relationship between the coordinate frames is illustrated. The camera and the IMU are rigidly connected, that is,

The sensor unit, shown at two time instants, t(k-1) and t(k), consists of an IMU (B frame) and a camera (C frame). These frames are rigidly connected. The position of the sensor unit with respect to the world frame (W) changes over time as the unit is moved.
According to the two time-step motion, the relationship between the pose coefficients can be modeled as follows.
At first, in two adjacent time steps, the relation of rotation matrix is
Note that every rotation matrix R is corresponding to a gesture vector, which is depicted as
Suppose the camera-gesture-related measurement is
In addition, the transition vector of this integrated system has the following constraints
According to the rigid conjunction, suppose we have
The second group of measurements is the camera transition–related constraints, which can be written as
Equations (13) and (16) form the measurements in traditional INS/vision integrated system. In fact, we can also utilize more time-domain constraints in longer image sequence to improve the ego-motion estimation accuracy
Equations (17) and (18) constitute the time-domain constraint-related measurements.
Observability analysis
On the basis of the time-domain constrained model, the model estimation problem is analyzed in this section. In the state estimation process, system observability is an important issue in state estimation problem. Observability analysis provides a direct understanding of the fundamental limits of the obtainable solutions, regardless of process and measurement noises. The standard pose estimation formulation is a strong nonlinear process. 16 For this analysis, the nonlinear system will be approximated at each time step as a linear system, always along with time-varying coefficients. Researchers usually use the Hermann approach to analyze the observability of nonlinear system. 17
Unlike the traditional integrated model, the time-domain constrained model has more time-related measurements, as depicted in the previous section. The involved measurements can be presented as
Their gradients are as follows
Herein,
Under the same conditions, the observability judgment matrix
From the above judgment matrix, we can find that in the initial alignment process, the motion dynamic of the integrated platform will affect the observability results. Thus, we consider this problem in different dynamics. At first, we suppose the rotation velocity is
In addition, we also consider the more rigid situation, where the rotation velocity is zero during the initial alignment process. Under this condition, the two involved models are both not full rank. That’s to say, in order to estimate the initial alignment coefficients of the INS/vision integrated system successfully, the rotation is necessary in different approaches. Kelly proved that if the rotation and transition are both included in the system motion, the initial alignment model is observable completely, as described in the study by Kelly and Sukhatme. 18 Compared with Kelly’s work, this article can simplify the initial alignment process by introducing the time-domain-related measurements. Even in the zero velocity situations, if the rotation is provided, the complete observability is guaranteed. This approach takes more advantages in applications that the large range maneuver is inappropriate.
Experiments validation
Simulations
In order to verify the effectiveness of the proposed time-domain constrained approach, we implement two groups of simulations in different dynamics. At first, the simulations of the integrated navigation system are implemented, where the ground truth of sensor to sensor bias, instrument coefficients are all known. The basic indicator of INS is described in Table 1.
Accuracy parameters of inertial navigation system.
The intrinsic coefficients of the camera are supposed as
In the first group of experiments, the system is supposed as static, and the true value of relative position is set as

Initial alignment results in static circumstance: (a) estimation results of relative position; (b) estimation results of relative gesture.
From Figure 2, if the system is static during the whole alignment process, the initial bias of state guess is unable to be corrected well no matter which algorithm is selected. That’s to say, in the static situation, it is hard to implement initial alignment successfully, and it is well identical with the observability analysis results in the former section. In the following second group of experiments, the rotation velocity is supposed to be 10°/s, and the equipment indicator is the same as that in the first simulation, as depicted in Table 1. Under this condition, the corresponding estimation results are depicted in Figure 3.

Initial alignment results in dynamic circumstance: (a) estimation results of relative position; (b) estimation results of relative gesture.
From the aforementioned simulations, we can find that provided the rotation motion is guaranteed, the initial alignment is able to be implemented without other sensors. The proposed time-domain constrained model is able to obtain higher accuracy in the initial alignment process.
Experiments
In this section, we implement a group of experiment on the basis of the data sampled from real equipment. The integrated system contains a micro IMU and industrial camera. In the experimental system, the random walk of gyro is 60°/h, the stochastic bias of accelerator is 1 mg, the relative position between camera and IMU is

Initial alignment estimation results of relative position between INS and camera: (a) estimation results of relative position; (b) estimation results of relative gesture.
This article also provides a demo about the experimental process. Please see the attachment file for more details. From the experiments results, we can find that the time-domain constraints are able to improve the estimation accuracy of the initial alignment estimation.
Conclusion
According to the positions constraints existed in the INS and visual system during different time steps, this article proposes a novel constrained model on the basis of the traditional parameterization. In addition, the time-domain optimization is also designed. From this article, we can arrive at three conclusions as follows: The proposed approach needs no more additional sensors and measuring equipment other than a commonly used chessboard. The result of observability analysis on two integrated model suggests that the system observability is related to the moving dynamics. Higher dynamic motion is able to improve the state estimation accuracy through the higher observability. Compared with the traditional initial alignment model, the time-domain constrained parameterization is able to improve the system observability. In the proposed alignment process, the system states are completely observable even only the rotation motion is guaranteed, which will improve the availability of INS/vision integrated system especially in the applications that the system is hard to implement large-scale maneuver.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China under Grants 61403398 and 61673017.
