A Flying Robot Localization Method Based on Multi-Sensor Fusion

Abstract

This paper proposes a novel localization method for a power-tower-inspection flying robot based on fusion of vision, IMU and GPS. First, the research background is introduced in relation to a visual localization algorithm derived from 3D-model-based tracking and a coordinate transformation model for related coordinate frames. Then, a multi-sensor fusion-based localization method is presented, in which two collaborative Kalman filters are designed to fuse IMU/GPS and visual information. Finally, experimental results are presented to show the robustness and precision of the proposed method.

Keywords

Flying Robot Power-tower Inspection Multi-sensor Fusion-based Localization

1. Introduction

Traditional artificial power-tower inspection suffers from both inefficiency and safety problems in harsh geographical environments. Despite the fact that the use of helicopters may now reduce inspection risks, pilot safety still cannot be guaranteed. Thus, autonomous flying robot inspection, with the advantage of both greater accuracy and safety, has become a major trend in power-tower inspection [1, 2].

A critical issue in autonomous-flying-robot inspection is pose estimation. Commercial GPS-based localization only provides accuracy at between 2 and 10 m, which not only constitutes a great threat to safety, but also means it is difficult to maintain interacted spatial relationship between tower and flying robot. Visual localization turns out to be an effective alternative. However, camera shaking, edge ambiguity and background complexity may degrade robustness and flexibility of visual pose estimation.

In order to promote the effectiveness of localization, this paper proposes a new method that can improve both precision and robustness. Section 2 gives a quick review of recent related research. Section 3 presents a short description of a 3D-model-based tracking system and a related coordinate transformation model involving IMU, GPS and corresponding frames. Section 4 explains a new localization approach based on multi-sensor fusion. Experimental results are demonstrated in Section 5, followed by our conclusions in the last section.

2. Related Research

Vision-based localization derives from multiple geometry. Hartley [3] presented classic ways of recovering camera pose using a set of correspondent features extracted from several image sequences; this is known as structure from motion. PTAM (Parallel Tracking and Mapping) is an excellent implementation of the algorithm by Klein [4]. Nützi [5] applied this method on a real flying robot, using an IMU sensor to acquire absolute scale estimation. Shen [6] built a system based on a fisheye camera and another ordinary camera, along with IMU information to yield state estimates. The author proposed a vision-based estimation approach that combines the advantages of monocular vision and stereo vision in a complex environment. However, simplicity and repeatability of background texture makes it hard to extract and match features. In 1990, Harris [7] proposed a 3D wireframe model-based tracking algorithm, which is theoretically suitable for power-tower inspection as it best fits the rigid structure of a tower. Lepetit [8] surveyed all model-based tracking methods and offered some advice for future work. Tater, Addrew [9] innovatively combined robust control law and model-based tracking developing a more generalized toolbox for real-time vision servo application. However, edge ambiguity and motion blur, which often occur during tower inspection due to instability of the flying robot, may directly lead to a tracking failure.

For these two problems, researchers have developed many solutions. For example, the visual algorithm by Choi [10], which involves a particle filter, and the multiple-edge hypothesis verification by Teulière [11]. These authors share the idea of constructing a hypothesis-searching space for all the possible edge combinations to calculate the best pose to handle edge mismatches. Kempter [12] created a system that integrates PTAM and a model-based tracking system, and which automatically switches between two systems according to tracking situation and condition. However, the most suitable solution for power-tower inspection is multi-sensor fusion. With the help of Kalman filters, one can fuse the sensor data from GPS and IMU to correct visual pose and gain optimal pose estimation. Klein [13] used a gyroscope to acquire angular velocity to provide an initial pose for the next frame tracking. Klein also proposed a parametric-blur edge detector with angular rate and camera exposure time as parameters. However, only camera rotations were considered and translation was still estimated from Kalman filters. You [14] chose a flat and distinguished landmark for tracking, which does not work for power tower inspection.

For power-tower inspection, tracking failure and wrong pose estimation may happen because of variation of viewpoint and camera shaking. Translation and rotation may happen at the same time. Hence, more sensors should be deployed, rather than only using a simple visual sensor. However, GPS/IMU-based localization cannot reach high precision. Therefore, we propose a new localization approach that fuses IMU/GPS and vision information, and which has the following advantages: (1) System error can be reduced and error accumulation due to edge ambiguity can be avoided by providing an initial pose guess for 3D-model-based tracking in the current frame. (2) The random error can be reduced by using a framework of two collaborative Kalman filters. This framework has the ability to automatically switch between IMU/GPS-aided localization and IMU/GPS/vision combined localization in accordance with the tracking condition. It is also capable of maintaining localization function and visual tracking recovery, whilst and after the visual tracking is unavailable, respectively.

3. Backgrounds

3.1 3D Model-based Visual Localization

An existing model-based tracking [9] method is brought into our method to perform pose estimation for the flying robot. On the one hand, a model of the power tower can be obtained off-line, for example a CAD model or LIDAR scan. The rigid structure of the power tower also fits model-based tracking. On the other hand, model-based visual localization is more accurate than pure GPS localization. The measurement unit is metres, which can be easily employed for inspection path planning.

The algorithm works as follows: Consider a point belonging to a 3D object with coordinate P_w in object frame and coordinate P_i in the projected image frame. Knowing the camera's intrinsic parameter ξ, pose r can be calculated as in Eq. (1).

obj. min (p r o j (r, ξ, P_{w}) - P_{i})

(1)

Through tracking the tower in the image, a relative pose between the camera and the tower E = H_{Tower→Camera}, can be acquired, which satisfies Eq. (2).

{\begin{cases} P_{C a m e r a} = H_{T o w e r \to C a m e r a} P_{T o w e r} \\ H_{T o w e r \to C a m e r a} = [\begin{matrix} R_{3 \times 3} & t_{3 \times 1} \\ 0_{1 \times 3} & 1_{1 \times 1} \end{matrix}] \end{cases}

(2)

P_Tower stands for a point coordinate in the tower frame; P_Camera indicates the same point in the camera frame. H_{Tower→Camera} extrinsic matrix, describing transformations between the above two frames. R is a rotation matrix, and t is a translation matrix. Here, we can get the coordinate of the camera centre in the tower frame in formula (3).

P_{T o w e r}^{C a m e r a C e n t e r} = - R^{T} t

(3)

This coordinate will be treated as one component of system state variables in subsequent computations.

3.2 Coordinate transformation

The approach proposed in this paper requires fusing of the data from IMU, GPS and visual pose estimation. Thus, a coordinate transformation is necessary. According to the chain rule of transformation, the overall transformation is divided into five successive frames: (1) world frame; (2) power-tower frame; (3) inertial frame; (4) robot body frame; (5) camera frame.

The transformation matrix between power-tower frame and world frame, which can be pre-calibrated manually as H_{Tower→World} when tracking starts, stays constant as in Eq.(4):

P_{W o r l d} = H_{T o w e r \to W o r l d} P_{T o w e r}

(4)

P_World and P_Tower present the same point in the world frame and power-tower frame, respectively.

During the inspection, the flying robot is always changing its pose relative to the tower. The transformations between camera frame and power-tower frame are therefore dynamic, and can be described as in Eq. (5):

P_{C a m e r a} = H_{T o w e r \to C a m e r a} P_{T o w e r}

(5)

where

\begin{array}{l} H_{T o w e r \to C a m e r a} \\ = [\begin{matrix} R_{B o d y \to C a m e r a} & t_{B o d y \to C a m e r a} \\ 0 & 1 \end{matrix}] [\begin{matrix} R_{I n e r t i a l \to B o d y} & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} R_{T o w e r \to I n e r t i a l} & t_{T o w e r \to I n e r t i a l} \\ 0 & 1 \end{matrix}] \end{array}

(6)

In formula (6), t_{Tower→Inertial} and R_{Inertial→Body} have to be calculated in real time, while the other variables can be pre-calibrated. $t_{T o w e r \to I n e r t i a l}$ is equal to the coordinate of the origin of the inertial frame in terms of the power-tower frame, for which t_{Tower→Inertial} can be obtained with data from GPS using Eq. (4), assuming that the origin of inertial frame coincide with GPS-positioned point. R_{Inertial→Body} can be calculated using roll, pitch and yaw from IMU through Eq. (7).

R_{I n e r t i a l \to B o d y} = R (α, β, γ) = R_{x} (- γ) R_{y} (- β) R_{z} (- α)

(7)

where γ is roll, β is pitch, α is yaw. R_x(−γ),R_y(−β),R_z(−α) are basic rotation matrices about the x, y and z axes, respectively.

H_{Tower→Camera} is obtained as in Eq. (2). Using Eq. (6), R_{Inertial→Body} can be easily obtained. Euler angles from visual localization algorithm are finally estimated as in Eq. (8):

{\begin{cases} ψ = - \tan^{- 1} (- R_{23} / R_{33}) \\ θ = - \tan^{- 1} (R_{13} / \sqrt{R_{23}^{2} + R_{33}^{2}}) \\ φ = - \tan^{- 1} (- R_{12} / R_{11}) \end{cases} where R = [\begin{matrix} R_{11} & R_{12} & R_{13} \\ R_{21} & R_{22} & R_{23} \\ R_{31} & R_{32} & R_{33} \end{matrix}]

(8)

where ψ is roll, θ is pitch, and ϕ is yaw. These three variables partially constitute the flying robot's system state. There may exist different solutions to Eq. (8) when it comes to the implementation of arctangent. In general, raw output of roll pitch and yaw stay with $(- π, π)$ . Specifically during the inspection, however, range of pitch and roll is (–π/2,π/2). Accounting for specific ranges of Euler angles, by using built-in atan and atan2 function in programming languages like C++, roll, pitch and yaw can be recovered as in Eq. (9). In the third formula of Eq. (9), the atan2 results should be mapped to (0,2π) by adding 2π and taking a modulo operation of 2π, before timing negative one.

{\begin{cases} ψ = - atan (- R_{23} / R_{33}) \\ θ = - atan (R_{13} / \sqrt{R_{23}^{2} + R_{33}^{2}}) \\ φ = - atan2 (- R_{12}, R_{11}) \end{cases}

(9)

4. Multi-Sensor Fusion-based Localization

Purely model-based tracking for pose estimation is executed for successive frames, which also means that the pose estimated in the current frame depends on the one in the last frame. However, due to the complexity of the tower model, when inspecting the tower from a different view, separated edges project closely together in the image. When the tracking algorithm searches the edges, multiple matching points will be found, as shown in Figure 1. If that happens only for few frames, the error stays within an acceptable range. If no pose correction is carried out, however, error will be accumulated, resulting in tracking failure.

Figure 1.

Edge ambiguity

In order to solve the problem, fusion of GPS and IMU based on Kalman filters is used to correct the pose for visual tracking.

The Kalman filter method is designed as follows. Say the system state variable X_k, as shown in formula (10):

X_{k} = {(x_{k}, y_{k}, z_{k}, v_{x, k}, v_{y, k}, v_{z, k}, ψ_{x, k}, ψ_{y, k}, ψ_{z, k}, ω_{x, k}, ω_{y, k}, ω_{z, k})}^{T}

(10)

where (x_k,y_k,z_k)^T indicates the coordinates of a flying robot in relation to the power-tower frame, (v_x,k,v_y,k, v_z,k)^T is flying speed, and (ψ_x,k,ψ_y,k,_z,k,ω_x,k,ω_y,k,ω_z,k)^T is the Euler angles and angular rate of the flying robot.

For a high sampling frequency, it is reasonable to consider that during interval time ΔT, the flying robot is moving at a constant speed. Predictions and updates are given in Eq. (11):

{\begin{cases} X_{k + 1}^{-} = F_{k} X_{k}^{+} \\ P_{k + 1}^{-} = F_{k} P_{k}^{+} F_{k}^{T} + Q_{k} \\ y_{k} = z_{k} - H_{k} X_{k + 1}^{-} \\ S_{k} = H_{k} P_{k}^{-} H_{k}^{T} + R_{k} \\ K_{k} = P_{k}^{-} H_{k}^{T} S_{k}^{- 1} \\ X_{k + 1}^{+} = X_{k + 1}^{-} + K_{k} y_{k} \\ P_{k + 1}^{+} = (I - K_{k} H_{k}) P_{k + 1}^{-} \end{cases}

(11)

where F_k is state transformation model, P_k is state covariance matrix, Q_k is process noise covariance matrix, R_k is measurement covariance matrix, H_k is mapping matrix of measurement, and z_k = (x_k, y_k, z_k, ψ_x,k, ψ_y,k, ψ_z,k)^T is observation matrix. In the following section, the pose of the flying robot is noted as x_k = (x_k,y_k,z_k,ψ_x,k,ψ_y,k,ψ_z,k).^T

The whole procedure of the proposed method is depicted in Figure 2. The details are as follows:

(1)

Recursively estimate a posteriori pose $x_{t - 1}^{+}$ from last frame. Priori Pose $x_{t}^{-}$ is predicted in current frame.

(2)

Gather data from IMU and GPS as observations, marked as z_t.

(3)

Update pose with observations in (2) and get a posteriori pose $x_{t}^{+}$ .

(4)

Set pose in (3) as an initial pose for visual tracking. Kalman filter for IMU/GPS ends here.

(5)

Track the model one more time to get a pose for the current frame. If it is untrackable, the pose from (4) is directly used as the new pose.

(6)

The pose from (5) is regarded as the observation for the vision Kalman filter. After one time update, final pose for current frame is estimated.

Two collaborative Kalman filters are used: (1) Kalman filters can reduce the side effect of random noise; (2) pose correction from IMU/GPS helps to avoid error accumulation for visual tracking; (3) visual-tracking-based pose estimation has higher precision than pure IMU/GPS.

Model-based tracking cannot recover itself once it fails. Thus, when facing with a motion blur, which makes it hard to identify enough feature points, the whole system has to be switched into IMU/GPS-aided localization. As long as the error stays within a specific range, visual tracking can automatically recover when the images become clear. It is worth noting that IMU/GPS-aided localization will increase error to some degree. But when visual tracking is available, error will be narrowed again.

Figure 2.

Fusion-based method

Figure 3 depicts one frame of the tracking procedure. In the IMU/GPS Kalman filter, we get a re-projected edge using predicted pose. Projection with observations means projected edge using purely IMU/GPS data. Through one filter update, a filter-updated pose is calculated. Then, one more tracking is performed to get the visual tracking pose, indicated as a projection with the tracking update. The final edge is located closer to the real-image edge, indicating a higher precision.

Figure 3.

Snapshot for stable visual tracking

5. Experiments and analysis

In order to prove the accuracy and robustness of the proposed method, a comparison between visual localization and fusion-based localization is made.

The whole experiment is carried out as in Figure 2. The experimental device is shown in Figure 4. A flying robot equipped with IMU and a camera with a 640*360 resolution at 30 FPS is used. The tracking target is a mini power-tower model. The operating system is Ubuntu Tinux 12.04. The programs run on a computer with Intel Core i5 2.5GHz and 4G RAM. An IGPS (Indoor Global Positioning System) is employed, simulating a real GPS.

Figure 4.

Indoor GPS, quadcopter and mini tower model

As mentioned before, purely visual tracking may fail due to motion blur or edge ambiguity. In order to prove the effectiveness of the proposed method, the experiment focuses on these two specific occasions.

5.1 Switching Condition

It is key to measure tracking quality so as to determine whether or not switch to IMU/GPS-aided tracking. For 3D-model-based visual-tracking pose estimation, corresponding points sampled along the edges for two successive frames are used to calculate the pose. Therefore, enough feature points must be used for pose estimation. Because of motion blur or camera shaking this number may fall rapidly, which disables visual tracking. Here, points available for pose estimation are defined as Trackable Points (TP), and those not available are defined as UnTrackable Points (UTP). In order to ensure visual-tracking accuracy, a threshold needs to be configured.

T h r e s h o l d < T P / (T P + U T P)

(12)

When Eq. (12) is satisfied, tracking is considered to be good; otherwise, the system will switch to IMU/GPS-aided localization. In the experiment, the threshold is set to 0.7, which performs well under this circumstance.

5.2 Edge Ambiguity

Purely model-based tracking suffers from error accumulation out of edge ambiguity, as shown in Figure 5, in which the white frame shows the visual localization result, showing that the error has accumulated hugely, deviating from the real tower target. The red frame stands for fusion-based localization, in which visual pose is corrected by IMU/GPS to avoid error accumulation, demonstrating higher accuracy and robustness.

Figure 5.

Error accumulation comparison

As described above, for evaluation of tracking quality, measurement of TP is involved. The amount of TP is somewhat proportional to tracking quality and localization accuracy. Figure 6 depicts the process from stable tracking to error accumulation due to edge ambiguity. TP of fusion-based localization is apparently higher than that of pure visual localization.

Figure 6.

TP comparison

5.3 Camera Shaking

Flying robots may inevitably suffer from sudden camera shaking. The experimental result for camera shaking is shown in Figure 7: the white frame stands for pure visual localization and the red frame for fusion-based localization. As shown, vertical shaking causes edge blurring, and pose difference between two frames is too big, making it unable to find point matches. During shaking TP is shown in Figure 8. TP for fusion-based localization stays high, whereas for pure visual localization, it drops enormously. The proposed method therefore has better performance.

Figure 9 indicates the tracking recovery result after tracking failure due to severe motion blur. The red frame shows the recovery consequence as aided by IMU/GPS; pure visual localization, marked with the white frame, is clearly unable to recover the right pose.

Figure 7.

Camera-shaking comparison

Figure 8.

TP number comprason

Figure 9.

Tracking recovery

6. Conclusion and future work

In this paper, we propose a new multi-fusion-based localization method for a power-tower-inspection flying robot, in which two collaborative Kalman filters are designed to fuse IMU/GPS and visual information for higher accuracy and robustness. The experimental results show the advantage of being immune to camera shaking and edge ambiguity, meeting the requirements for power-tower inspection.

It needs to be stressed that when the error of IMU/GPS stays within a particular range, the proposed method has relatively high accuracy and stability. However, if the system error for IMU/GPS gets too large, not following a Gaussian distribution, or fluctuates randomly with great magnitude, the initial pose provided by IMU/GPS will not be acceptable, which means visual tracking cannot depend on IMU/GPS anymore. Also, switching to IMU/GPS-aided localization will lose localization precision.

Our future work will have two aims. One is to use an accelerometer in the IMU to partially replace GPS for greater reliability. The other is to fuse the particle filters. With the help of sensors like the IMU, the search space for particle filters will be able to be constrained for better performance.

Footnotes

7.

This work is supported by the Programme for New-Century Excellent Talents in University (Grant No. NCET-11-0634), the National Natural Science Foundation of China (Grant No. 61105083) and the Fundamental Research Funds for the Central Universities (Programme No. 12ZX16).

References

Liu

C.A.

Wang

Liu

C.Y.

(2009) Mission planning of the flying robot for power line inspection. Progress in Natural Science, 19: 1357–1363.

Hua

Min

Liu

Chang-An

Liu

Chun-Yang

. (2012) Planning efficient and robust behaviors for model-based power tower inspection. IEEE International Conference on Applied Robotics for the Power Industry 2012, Sep 11-13, Zurich, Switzerland. pp. 163–166.

Hartley

Zisserman

(2004) Multiple View Geometry in Computer Vision, Second Edition. Cambridge University Press, March 2004, Cambridge, UK. pp. 262–278.

Klein

and Murray

(2007) Parallel Tracking and Mapping for Small AR Workspaces. 6th IEEE and ACM International Symposium on Mixed and Augmented Reality 2007, Nov 13–16, Nara, Japan. pp. 225–234.

Nützi

Weiss

Scaramuzza

Siegwart

(2012) Fusion of IMU and Vision for Absolute Scale Estimation in Monocular SLAM. Journal of Intelligent & Robotic Systems, 61: 287–299.

Shen

Mulgaonkar

Michael

Kumar

(2013) Vision-Based State Estimation for Autonomous Rotorcraft MAVs in Complex Environments. International Conference on Robotics and Automation 2013, May 6–10, Karlsruhe, Germany. pp. 1758–1764.

Harris

Stennett

(1990) RAPiD – A Video Rate Object Tracker. Proceedings of British Machine Vision Conference 1990, Oxford, England. pp. 73–78.

Lepetit

Fua

(2005) Monocular Model-Based 3D Tracking of Rigid Objects: A Survey. Foundations and Trends in Computer Graphics and Vision, 1: 1–89.

Comport

Andrew I.

Marchand

Chaumette

(2004) Robust model-based tracking for robot vision. IEEE International Conference on Intelligent Robots and Systems 2004, Sep 28 -Oct 2, Sendai, Japan. pp. 692–697.

10.

Choi

, and Christensen

H.I.

(2011) Robust 3D Visual Tracking Using Particle Filtering on the SE(3) Group. International Conference on Robotics and Automation 2011, Sep 25–30, San Francisco, CA, USA. pp. 4384–4390.

11.

Teulière

Eck

Marchand

Guénard

(2010) 3D model-based tracking for UAV position control. International Conference on Intelligent Robots and Systems 2010, Oct 18–22, Taipei, Taiwan. pp. 1084–1089.

12.

Kempter

Wendel

Bischof

(2012) Online Model-Based Multi-Scale Pose Estimation. Proceedings of 17th Computer Vision Winter Workshop 2012, Jan 1–3, Mala Nedelja, Slovenija. pp. 1–8.

13.

Klein

G.S.W.

Drummond

T.W.

(2004) Tightly integrated sensor fusion for robust visual tracking. Journal of Image and Vision Computing, 22: 769–776.

14.

You

Neumann

(2001) Fusion of vision and gyro tracking for robust augmented reality registration. Proceedings of the IEEE Conference on Virtual Reality 2001, Mar 17–17, Yokohama, Japan. pp. 71–78.