A hierarchical vision-based localization of rotor unmanned aerial vehicles for autonomous landing

Abstract

The vision-based localization of rotor unmanned aerial vehicles for autonomous landing is challenging because of the limited detection range. In this article, to extend the vision detection and measurement range, a hierarchical vision-based localization method is proposed for unmanned aerial vehicle autonomous landing. In such a hierarchical framework, the landing is defined into three phases: “Approaching,”“Adjustment,” and “Touchdown,” in which visual artificial features at different scales can be detected from the designed object pattern for unmanned aerial vehicle pose recovery. The corresponding feature detection and pose estimation algorithms are also presented. In the end, typical simulation and field experiments have been carried out to illustrate the proposed method. The results show that our hierarchical vision-based localization has the ability to a consecutive unmanned aerial vehicle localization in a wider working range from far to near, which is significant for autonomous landing.

Keywords

Hierarchical vision localization unmanned aerial vehicle landing

Introduction

Unmanned aerial vehicles (UAVs) are popular among civil and military situations that are hazardous to human operators. Automated localization is therefore highly desirable while the vehicles are landing on stationary or moving platforms. Specifically, this refers to UAVs that have the ability to localize themselves using information from onboard sensors, such as Global Positioning System (GPS), Inertial Measurement Unit (IMU), and vision. The use of vision sensors for localization has many advantages. Currently, the GPS, IMU, or their combination is the most common method used to determine a UAV position. However, these require the transmission of information between the air vehicle and the landing platform. Furthermore, vision sensors are mostly passive and do not rely on an external signal. It is worth noting that vision sensors can have millimeter-level accuracy and can determine not only the distance but also the relative orientation between two objects. This article describes a complete vision-based process and the key enabling technologies for a successful landing.

In recent years, there has been a wealth of work and different vision-based methods available for this type of application. These include both feature-based methods and direct methods. Some of the approaches require prior knowledge of the targets, and some extract information from the surroundings in real time. However, keeping detection range as well as precision is difficult for each single vision-based method because of a limited camera device. It means that UAVs could hardly keep detecting features at single scale from the reference object when the UAV altitude changes continuously from high to low. Moreover, UAVs at different altitude have a fixed field of view. For example, the UAV is too far away from the landing to detect some features or patterns. Hence, to solve this problem, a strategy using hierarchical vision-based localization is advised for UAV landing, which is able to extract reliable visual cues in different landing phases.

This article describes a hierarchical localization demonstration in which a rotor UAV determines its own position and attitude relative to a designed landing object using only the onboard camera. A planar textured image pattern is employed as the landing object, which is composed by a set of signs with the scale information. The reference object enables easy detection and is able to provide sufficient information for UAV pose recovery. The main contributions of this article can be summarized as follows: in order to ensure precision and range of the vision-based localization, a hierarchical vision localization framework that employs various artificial markers with different scales as visual feature for a landing. The landing process has been defined into three phases, “Approaching,”“Adjustment,” and “Touchdown.” For different landing phases, the corresponding feature can be extracted from the designed landing object, and the UAV pose recovery algorithms using the extracted feature have been presented. The hierarchical vision localization framework is proved to be very beneficial for an open landing. The hierarchical framework has been tested and evaluated by simulation and field experiment. The results show that the proposed method is able to estimate the UAV’s position and orientation in a wide vision range. As a result, the hierarchical framework is pragmatic for autonomous landing.

The remainder of the article is organized as follows: In section “Previous work,” some related works are introduced. The designed landing object and the corresponding feature detection and pose recovery algorithms are introduced in section “Object detection and pose recovery.” Section “A hierarchical vision-based localization framework” describes a hierarchical vision-based localization framework to extend the vision range. Section “Experiments and analysis” presents the experiments and results to illustrate the proposed hierarchical vision-based framework. Finally, the conclusions are presented in section “Conclusion and future work.”

Previous work

Currently, vision-based localization is one of the most adopted ways to actively study UAV autonomous landing. In general, an object is designed as the landing reference to enable clear detection by UAV onboard vision and can provide structure information for the UAV pose recovery with respect to the landing spot.¹

In related works, one assumes that the pattern and size of a landing object are given. And the relative localization can be acquired by its projection in the view of onboard vision. In particular, depending on the inertia moments of the object in the image, it is sufficient to distinguish the landing object from background.² The state of the UAV is calculated by matching the acquired real-time image with a dataset of stored labeled images that have been calibrated offline in advance. Image blurring would reduce the extraction accuracy of cooperative markers. As a result, a special pattern consisting of several concentric white rings on a black background was designed as the observed object so that at least one ring can be captured clearly.³ Each of the white rings is recognized with a unique ratio of its inner to outer border radius. A pose estimation algorithm based on the feature lines of the cooperative object was reported,⁴ where infrared (IR) light was employed to reduce the influence of weather and feature lines, and the vanishing lines were thought to be insensitive to image blurring. An initial 5 degree-of-freedom (DOF) pose of the camera coordinate frame for the UAV with respect to a landing pad could be obtained using the quadratic equation of the projected ellipse.⁵ The IMU data were integrated to eliminate the remaining geometric ambiguity. The final DOF of the camera pose, its yaw angle, was calculated by fitting an ellipse to the projected contour of the letter “H.” The homography between the image frame and the object reference plane was also used to calculate the UAV initial pose.⁶ With the four correspondences between the world plane and the image plane, the minimal solution for the homography could be estimated. Reported in Martinez et al.,⁷ a similar work decomposed the homography between the current and previous frames to accumulate for ego-motion estimation. The relative pose between the current and the previous frames could be tracked by observing a structured-unknown object.^8–10 Based on speeded-up robust features (SURF) feature descriptors and fast approximate nearest neighbor search (FLANN) matcher, a template matching method¹¹ was presented to determine the relative position of the landing target. Also, artificial neural networks (ANN) have been employed to estimate the state of the landing UAV in Moriarty et al.¹² Similar to our work, Araar et al.¹³ have designed an adequate pad to extend the detection range. The proposed pad is composed of patterns of different sizes, permitting their detection from high as well as very low altitudes. For the application purpose, the main advantage of the approach in Benini et al.¹⁴ is the use of a GPU for detecting the marker in order to achieve onboard high-frequency pose estimation and marker detection. The use of the GPU allows fast image analysis even in cluttered environments that otherwise would be difficult to achieve with CPU-based algorithms. In Vetrella et al.,¹⁵ a guidance approach using the improved intrinsic tau guidance theory was presented to create spatio-temporal four-dimensional (4D) trajectories for a desired time-to-contact with a landing platform tracked by a visual sensor.

In another case, there is no reference object for UAV landing and the localization could be achieved by sensing the surrounding nature scene. In this method, a batch of non-artificial feature was presented. Typical optic-flow-based methods^16–20 are used to track or stabilize the UAV pose. This strategy involves adjusting the speed of approach to maintain the magnitude of the optical flow generated from the ground. A biological guidance system, reported in Forster et al.,²¹ detected and analyzed visual cues from the natural environment, such as the horizon profile and sky compass. The image coordinates extrapolation (ICE) algorithm²² has been used to calculate the pixel-wise difference between the current view (panoramic image) and a snapshot taken at a reference location to estimate the UAV three-dimensional (3D) position and velocity real time. An autopilot based on an optic-flow-based vision system is reported in Kendoul et al.²³ and Denuelle et al.,²⁴ where the optic flow is calculated and used for autonomous localization and scene mapping. The relevant controller using the vision information is also discussed in detail in these works. The image-based visual servoing (IBVS) has been used to track the platform in two-dimensional (2D) image space and generate a velocity reference command used as the input to an adaptive sliding mode controller, which was reported in Lee et al.²⁵ In Ho and Chu et al.,²⁶ a fully autonomous system using the visual measurements was presented for a quadrotor UAV to perform automatic landing task. In Yang et al.,⁵ the landing site was defined by a template image and identified using the matching of detected oriented FAST and rotated BRIEF (ORB) features. The relative pose was estimated from a parallel tracking and mapping (PTAM) thread, by matching the landing site features to the PTAM map points. A monocular vision-based autonomous landing system²⁷ was implemented for emergencies and unstructured environments. 3D features and a mid-pass filter have been combined to remove noise and construct an elevation map in this system.

In addition, some other ways have been applied, such as stereo vision^28,29 and vision and IMU combination.^30,31 It is assumed that the IMU has the ability to provide a good scale estimation for the mono-vision system. Wenzel et al.³² used a Wii remote IR camera as main sensor, which allows robust tracking of a pattern of IR lights in conditions without direct sunlight. In our preliminary work, one multi-view navigation system³³ has also been reported, that employed hybrid vision measurements to estimate UAV pose.

Based on the work described above, it is therefore thought that feature detection and recognition would be a key issue with regard to localization precision. Moreover, for a complete landing, the vision should have the ability to work well from far and close with the landing spot. To extend vision detection and localization range, a hierarchical method is studied in this work.

Object detection and pose recovery

The UAV localization with respect to an object in an unknown environment is a complex but solvable problem that can be achieved with either single camera or stereo cameras. In this work, a set of friendly artificial markers have been set to construct a landing object, as shown in Figure 1. The corresponding feature detection and pose recovery methods are explained in the following section.

Figure 1.

Landing object and feature description inside.

Feature description and detection method

One desires to calculate the UAV position and orientation reliably using the designed artificial markers of the landing object. Inspired from QR code popularly applied in the field of current information recognition, three artificial markers (black and white rectangles), denoted as Top, Right and Bottom, can be detected and recognized robustly. First, each marker is segmented from the background using contour extraction and statistics. We converted original gray image to binary format by thresholding operation. The “findContours” function in OpenCV has the ability to retrieve contours from the binary image using the algorithm.³⁴ Contours can be explained simply as a curve joining all the continuous points along the boundary, having same color or intensity. They are used for fiducial marker detection and recognition. Thus, there would be two contours to be detected for each boundary between white and black. It means that a fixed number (six) of contours would be obtained for each complete fiducial marker, as shown in Figure 1. These markers can be recognized robustly by counting the detected contours. The contour c1 which has five child-contours (c2-c6) is thought to be the boundary of one fiducial marker. Using such a way, other two fiducial markers can also be recognized.

Then, Top marker can be recognized by calculating the straight-line distances between any two markers. The distances between the markers are $l 1$ , $l 2$ , and $l 3$ , respectively. The line $l 3$ between Right and Bottom markers is the longest one. Therefore, the Top marker can be determined by judging whether it is on the line $l 3$ or not. Finally, Right and Bottom markers are also determined by analyzing the slope k of $l 3$ and the distance $l 4$ from Top marker to the line $l 3$ . For example, when $k < 0$ and $l 4 < 0$ , the direction of the object is thought to be in the northern and Right marker is on the right of Bottom marker. It should be noted that the top of the image is assumed to be north.

As a result, redundant feature corners can be extracted from these fiducial markers. Since the actual sizes of these markers are assumed to be known, the corners can be used as corresponding points between the reference object and its image plane for the following pose extraction part. So far, all feature points from the three fiducial markers can be determined using the above operations to build a constant 3D coordinate on the landing object, which enables a unique yaw for UAV landing.

Pose recovery method

The UAV pose (position and orientation) can be extracted by homography decomposition. Here, the homography is a non-singular 3 × 3 matrix $H$ that defines the projection between the planar landing object and its image plane and can be calculated using the acquired corresponding points. Assuming that the 3D coordinate system is built on the plane, the Z-axis of all the extracted points are zeros. As a result, the 3D coordinates of all points on the landing object are defined to be ${[\begin{matrix} X_{i} & Y_{i} & 0 \end{matrix}]}^{T}$ . Figure 2 shows the coordinate frames in homography calculation. And the corresponding image points are ${[\begin{matrix} u_{i} & v_{i} \end{matrix}]}^{T}$ ; the homography relation can be described as follows

[\begin{matrix} u_{i} \\ v_{i} \\ 1 \end{matrix}] = H [\begin{matrix} X_{i} \\ Y_{i} \\ 1 \end{matrix}], with H = K [\begin{matrix} r_{1} & r_{2} & t \end{matrix}]

(1)

Figure 2.

Object-to-image plane homography based on the camera projection model.

Using the extracted corresponding points, one rough solution about the matrix $H$ can be obtained by singular value decomposition (SVD)³⁵ or Gaussian eliminate (GE).³⁶ Then, using the random sample consensus (RANSAC) method, the matrix $H$ can be optimized to remove the errors from the mismatched points. The goal is achieved after iteratively selecting a random subset of the original data points by testing it to obtain the model and evaluating the model consensus, which is the total number of original data points that best fit the model.

The matrix $H$ can be decomposed to require the UAV camera pose with respect to the landing object, since the homography contains intrinsic and extrinsic camera parameters. As shown in equation (1), assuming the camera parameter matrix $K$ is known, the 3 × 3 rotation matrix $R$ and the 3 × 1 translation vector $t$ are involved in the remaining part and can be calculated based on the camera projection model

{\begin{matrix} r_{1} = λ K^{- 1} h_{1} \\ r_{2} = λ K^{- 1} h_{2} \\ t = λ K^{- 1} h_{3} \end{matrix}, with λ = \frac{1}{K^{- 1} h_{1}} = \frac{1}{K^{- 1} h_{2}}

(2)

where $h_{i}$ is the ith column of $H$ and $r_{i}$ is the ith column of $R$ . Since all the columns of the rotation matrix are orthonormal to each other, $r_{3}$ can be determined from $r_{1} \times r_{2}$ . However, the data noise causes the resulting matrix to not satisfy the orthonormality condition, and SVD is used to form a new optimal rotation matrix that is fully orthonormal.

With this, $- R^{- 1} t$ and $R^{- 1}$ represents the position and the orientation of the onboard camera in the 3D coordinate system of the landing object. As a result, the UAV’s pose can also be determined since the camera is fixed on the body.

A hierarchical vision-based localization framework

Hierarchical localization

Except for the vision detection or localization algorithms, the employed camera parameters that involve image resolution and focal length can influence the result of the UAV pose calculation as the detecting range changes. To solve the problem, a hierarchical vision-based localization framework is presented. In our work, a complete UAV landing is considered to be divided into three phases: “Approaching,”“Adjustment,” and “Touchdown.” Different visual features and the corresponding algorithms are selected for three UAV landing phases, as shown in Figure 3. These phases are switched by considering current UAV height with respect to the landing platform, object projection ratio, and so on.

Figure 3.

Hierarchical vision-based localization framework.

At the beginning of a landing, the UAV is thought to be so far away from the reference object that the limited onboard vision can hardly capture fiducial markers clearly. At this point, the external contour or boundary of the landing object can be detected by vision. When the UAV is far away from a landing object in a real environment, many other objects except for the reference object can be seen by the onboard camera, such as a building. As shown in Figure 1, compared to the background, the landing object has a distinctive color feature. Thus, before contour detection, a color-based segment setup is acquired to remove the background. And the remaining noise such as irregular blobs can be filtered by shape analysis as the landing object is designed to be a four-vertex polygon. The landing object is thought to be the longest closed contour with four control vertices. It means that the object contour can be approximated using one four-vertex polygon. As 2D image feature, these vertices of the contour are extracted and used for the UAV pose calculation. As a result, this phase is called “Approaching,” where the image contour of the landing object is used to provide the relative motion information for the UAV.

In the landing phase “Adjustment,” the flying vehicle is thought to be sufficiently close to the landing object when the artificial markers can be detected clearly. In our field landing experiment, the landing object with an actual squared size of 0.85 m × 0.85 m and a 1080-p camera are employed. As a result, it is found that the artificial markers begin to become detectable gradually in the view of the onboard vision when the UAV altitude is below 7 m. Using the corresponding feature points extracted from the markers, the relative position and orientation of the UAV can be calculated through the homography-based method introduced in the previous section. The obtained real-time pose can be used to adjust the UAV to an appropriate state for precise landing. In particular, this phase is helpful when the UAV is required to touch down in a certain orientation, such as charging or ranking.

When the UAV is near the landing, the complete object could be hardly captured due to the limited view of the onboard vision, as shown in Figure 3(c). This phase is called “Touchdown,” in which either object-based or marker-based method in the first two phases is out of work. Therefore, as soon as the markers cannot be detected any more and the current height is near the ground, it is considered that the UAV comes in the “Touchdown” phase. In this “Touchdown” phase, an optical flow-based pose tracker is designed to determine the current pose by tracking the optic-flow points between the current and the previous image frames. A Harris corner detector is used to extract feature points from each image. Using the iterative Lucas–Kanade method with pyramids, these points between previous and current frames can be matched. As we know, when the UAV is near the landing object, all acquired images are occupied by the planar object. That means almost all feature points are from the plane. Hence, the matching points can be used to calculate the homography between previous and current frames. And the relative pose change can also be obtained by homography decomposition. To calculate the current pose with respect to the landing object, the homography between the current frame $i + 1$ and the previous frame $i$ is defined as $H_{i}^{i + 1}$ . Using an accumulation way as equation (3), the homography between the current camera (UAV) frame and the landing object frame can be acquired. Such a pose-tracking method is feasible as certain corners from the landing object can be provided for point tracking and the process is short so that accumulative error can be negligible

H_{w}^{i + 1} = H_{w}^{0} H_{0}^{1} H_{1}^{2} \dots H_{i}^{i + 1}

(3)

In summary, different feature detection and pose extraction algorithms are presented for three corresponding landing phases, which constitute a hierarchical vision-based localization framework. The framework is practical and can guarantee a consecutive pose result for UAV landing.

False pose estimation detection

Failure detection of the visual parts is desired to ensure ongoing functionality of the whole framework. We explain the detection of vision failures in the following sections.

It should be noted that the rotation between the body (inertial) frame and the visual frame $q_{i}^{v}$ is constant during an estimation step. At each step $k$ , this rotation can be measured as

q_{i}^{v} (k) = {\hat{q}}_{v}^{w - 1} (k) \otimes q_{i}^{w} (k)

(4)

where $q$ refers to the quaternion of a rotation and ⊗ denotes the multiplication of quaternions. Since the drift is very slow compared to the vision frequency, we can smooth a sequence of measurements of $q_{i}^{v} (k)$ . A median filter for each vision node is suggested to model non-zero mean outlier jumps. The estimation of the rotation between inertial and visual frame ${\hat{q}}_{i}^{v} (k)$ using a window of size N is then

{\hat{q}}_{i}^{v} (k) = med (q_{i}^{v} (k)), i = k - N \to k

(5)

Due to the fact that the drift is slow, abrupt jumps can be identified in the measured orientation $q_{i}^{v} (k)$ with respect to the smoothed estimation ${\hat{q}}_{i}^{v} (k)$ as failures of the visual pose estimation. As soon as a measurement $q_{i}^{v} (k)$ lies outside the 3σ (variance) error bounds of the past $N$ estimation ${\hat{q}}_{i}^{v} (k)$ , it is considered as false vision pose solution.

Experiments and analysis

Simulation in Gazebo

This section shows that some UAV localization tests are done using the hierarchical vision-based system. In the simulation, a virtual camera with a resolution of 800 × 800 pixels and a framerate of 30 Hz is installed to look downward relative to the UAV. It is assumed that the camera is calibrated, and the intrinsic parameters are known. The designed landing object is placed on the top of a ground vehicle to form a landing platform. The relative height of the platform is approximately 0.5 m. All the calculations are programed using receiver-only synchronization (ROS) nodes, and all the experiments are executed by Gazebo,³⁷ which can export the ground truth of the UAV state. Figure 4(a) shows the simulated UAV landing scene. The experiments begin when the UAV flies over a landing object. The center and size of the three squared fiducial markers are assumed to be known in advance.

Figure 4.

UAV vision sensor system and landing object: (a) simulation in Gazebo and (b) field experiment.

The objective of the first flight experiment is to verify the designed landing object and the corresponding algorithm for the UAV pose recovery and adjustment. In the experiment, the UAV is required to perform a series of movements, such as forward, backward, left, right, up, and down, and several 360° spins. These movements are typical, involving all possibilities of a given flight. The localization result acquired from the vision and the ground truth of the UAV state are shown with time in Figure 5. Since the landing object is set on a ground platform with a height of 0.5 m approximately, it is reasonable that there is a constant bias between Z_vision and Z_ground. Z_vision is the estimated height with respect to the object, and Z_ground is the real height with respect to ground, which can be seen in Figure 5(a). The corresponding error is also analyzed, and the calculated position has a small error with a root mean square error (RMSE) of 0.0239 m. In the same way, the RMSE of the calculated orientation is 0.0818 rad, shown in Figure 5(b). The results show a good performance for the proposed vision-based pose recovery method with our designed landing object.

Figure 5.

Vision-based pose recovery using the designed landing object: (a) 3D position with respect to the landing object (m) and (b) the orientation involving the roll, pitch, and yaw angles (rad).

Also, the employed feature detection and extraction algorithms have been tested and the results are shown in Figure 6. In the beginning “Approaching” phase of a landing, the detected object is so small in the field-of-view of the UAV which is only reliable information to guide the UAV. Accordingly, the four vertices extracted from the contour could be used to calculate the UAV pose, as shown in Figure 6(a). In the “Adjustment” phase, and the artificial markers of the landing object can be captured and detected completely, which is shown in Figure 6(b). Finally, as the relative distance between the UAV and the landing object reduces, either the contour or the marker is almost out of the onboard vision. At this moment, the pose tracker shown in Figure 6(c) starts generating a set of optical flows and ensures the final pose in the “Touchdown” phase.

Figure 6.

Real-time captured images at three landing phases and the corresponding feature detections: (a) approaching phase, (b) adjustment phase, and (c) touchdown phase.

Field experiment

Another field landing experiment has also been carried out to illustrate the performance of the hierarchical vision-based framework presented in the previous section. In the experiment, the employed UAV is a six-rotor aircraft that can fly autonomous according to a predefined GPS trajectory, as shown in Figure 4(b). The aircraft uses a proportional–integral–derivative (PID)-based nested controller and is controlled by onboard GPS and IMU sensors. Used for a reference, an XSENS product (MTi-G-700)³⁸ is a fully integrated solution that includes a GPS receiver and an inertial navigation system (INS). The MTi-G-710-GPS/INS is thus capable of not only outputting GPS-enhanced 3D orientation, it can also output attitude and heading reference system (AHRS)-augmented 3D position and velocity, so that velocity and position accuracy significantly improve with respect to the accuracy of the GPS receiver alone. Furthermore, it provides 3D sensor data, such as acceleration and magnetic field. For the reference system, the employed XSENS module has a 450 deg/s full range gyroscope and a 200 m/s² full range acceleration. As a result, the dynamical errors in roll/pitch and yaw are 0.3° and 1.0°, while the horizontal and vertical positioning accuracy are below 1 and 2 m, typically. Data generated from the strapdown integration algorithm (orientation and velocity increments Δq and Δv) are available, as all other processed data, at 100–400 Hz with a low latency (<2 ms). As the onboard vision, a motion camera GOPRO4 with a resolution of 1080 p and a refresh rate of 50 Hz is installed to look downward relative to the UAV. The camera has been calibrated in advance. The designed landing object with a squared size 0.85 m × 0.85 m is placed on the ground and the relative height is approximately 0 m. In the following field experiments, the vision measurements have been compared with the integrated GPS/INS system.

The aircraft starts landing on the landing object from a height of 20 m, and the results of the total process has been recorded. The pose solutions from the three landing phases are displayed using different colors in Figure 7. For the flight test, the rotation $q_{i}^{v}$ between the camera frame and the inertial frame has been calculated as time, and the red encircled areas imply a failure of the vision algorithm in Figure 8. It can be seen that analyzing the parameter for failure detection is effective to eliminate the abrupt jumps. As shown in Figure 9, the corresponding errors can be calculated by comparing the vision measurements with onboard inertial measurements. It is found that the position and orientation errors change variously as the altitude detection and the vision-based algorithms using different feature have corresponding localization precision in the given detection range. That is what we think is reasonable in this article, because the vision algorithm with single feature has a limited measurement range. For example, the valid vision measurement range of the “Adjustment” phase is 2–6 m, and the “Touchdown” phase is below 3 m. Only depending on any single feature-based method is difficult to localize the UAV for an open landing. As the main purpose, this article presented a hierarchical vision-based localization to extend the vision range. The hierarchical algorithm tried to select visual feature at different scales when the landing UAV is at different altitude. For localization during landing, both vision precision and range are considered as two important referential factors. In this work, the proposed vision localization method can not only provide a 6-DOF pose solution with a satisfying accuracy, but also output over a range of 0–20 m. The article is only the preliminary work. In the future, a filter framework would be considered to fuse the vision measurements of the three landing phases and achieve a final estimation with a stable precision for UAV autonomous landing.

Figure 7.

Proposed hierarchical vision-based localization: (a)–(c) 3D position (X, Y, Z) with respect to the landing object (m); (d)–(f) orientation angles involving roll, pitch, and yaw (rad).

Figure 8.

Calculated rotation $q_{i}^{v}$ between the camera frame and the inertial frame during the “Adjustment” phase.

Figure 9.

Localization error analysis of different landing phases (Approaching, Adjustment, and Touchdown): (a) 3D position error m; (b) orientation angle error (deg).

Conclusion and future work

In this article, to extend the detectable and measurable range, a hierarchical vision-based localization method has been presented for UAV autonomous landing. A landing object was designed to provide various visual reference features at different scales for onboard vision from high as well as low altitude. Based on the designed object, the landing was defined into three phases: “Approaching,”“Adjustment,” and “Touchdown.” And for the three phases, the corresponding feature extraction and pose recovery algorithms have been introduced. Simulation and field experiments have been performed and analyzed to illustrate the performance of the proposed hierarchical vision localization. Although this article is a preliminary work, such a hierarchical method is thought to be significant as one of the visual guidance for UAV landing. Seamless and smooth pose estimation and visual servoing using the acquired vision measurements will be focused in the future.

Footnotes

Acknowledgements

Q. Li would like to acknowledge the support of Virginia Microelectronics Consortium (VMEC) research grant.

Handling Editor: Jinsong Wu

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work reported in this paper is the product of several research stages at George Mason University and Wuhan University of Technology has been sponsored in part by the Natural Science Foundations of China (51579204 and 51679180) and Double First-rate Project of WUT (472-20163042).

References

Kendoul

A survey of advances in guidance, navigation and control of unmanned rotorcraft systems. J Field Robot 2012; 29(2): 315–378.

Saripalli

Montgomery

Sukhatme

GS.

Visually guided landing of an unmanned aerial vehicle. IEEE T Robot Auto 2003; 19(3): 371–380.

Lange

Sunderhauf

Protzel

A vision based onboard approach for landing and position control of an autonomous multirotor UAV in GPS-denied environments. In: Proceedings of the international conference on advanced robotics (ICAR), Munich, 22–26 June 2009, pp.22–26. New York: IEEE.

Zeng

Tian

et al . Use of land’s cooperative object to estimate UAV’s pose for autonomous landing. Chin J Aeronaut 2013; 26(6): 1498–1505.

Yang

Scherer

Schauwecker

et al . Autonomous landing of MAVs on an arbitrarily textured landing site using onboard monocular vision. J Intell Robot Syst 2014; 74(1): 27–43.

Mondragon

Campoy

Martinez

et al . 3D pose estimation based on planar object tracking for UAVs control. In: Proceedings of the 2010 IEEE international conference on robotics and automation, Anchorage, AK, 3–7 May 2010, pp.35–41. New York: IEEE.

Martinez

Mondragon

Olivares-Mendez

et al . On-board and ground visual pose estimation techniques for UAV control. J Intell Robot Syst 2011; 61(1): 301–320.

Brockers

Bouffard

et al . Autonomous landing and ingress of micro-air-vehicles in urban environments based on monocular vision. In: Proceedings of micro- and nanotechnology sensors, systems, and applications III, Orlando, FL, 25–29 April 2011, paper no.803111. Bellingham, WA: SPIE.

Sanchez-Lopez

Pestana

Saripalli

et al . An approach toward visual autonomous ship board landing of a VTOL UAV. J Intell Robot Syst 2014; 74(1–2): 113–127.

10.

Lin

Garratt

Lambert

AJ.

Monocular vision-based real-time target recognition and tracking for autonomously landing an UAV in a cluttered shipboard environment. Auton Robot 2016; 41(4): 881–901.

11.

Sudevan

Shukla

Karki

Vision based autonomous landing of an unmanned aerial vehicle on a stationary target. In: Proceedings of the 17th international conference on control, automation and systems (ICCAS), Jeju, South Korea, 18–21 October 2017, pp.362–367. Bellingham, WA: SPIE.

12.

Moriarty

Sheehy

Doody

Neural networks to aid the autonomous landing of a UAV on a ship. In: Proceedings of the 28th Irish signals and systems conference (ISSC), Killarney, 20–21 June 2017, pp.1–4. New York: IEEE.

13.

Araar

Aouf

Vitanov

Vision based autonomous landing of multirotor UAV on moving platform. J Intell Robot Syst 2017; 85(2): 369–384.

14.

Benini

Rutherford

Valavanis

KP.

Experimental evaluation of a real-time GPU-based pose estimation system for autonomous landing of rotary wings UAVs. Control Theory Technol 2018; 16(2): 145–159.

15.

Vetrella

Popović

et al . Improved tau-guidance and vision-aided navigation for robust autonomous landing of UAVs. In: Siciliano

Khatib

(eds) Springer proceedings in advanced robotics. New York: Springer, 2018, pp.115–128.

16.

Coskun

Doherty

et al . Experimental comparison of open source vision based state estimation algorithms. In: Proceedings of the international symposium on experimental robotics, Tokyo, Japan, 3–6 October 2016, pp.775–786. New York: Springer.

17.

Srinivasan

MV.

Honeybees as a model for the study of visually guided flight navigation and biologically inspired robotics. Physiol Rev 2011; 91(2): 413–460.

18.

Chahl

Srinivasan

Zhang

SW.

Landing strategies in honey bees and applications to uninhabited airborne vehicles. Int J Robot Res 2004; 23(2): 101–110.

19.

Strydom

Thurrowgood

Srinivasan

MV.

Visual odometry: autonomous UAV navigation using optic flow and stereo. In: Proceedings of Australasian conference on robotics and automation, Melbourne, VIC, Australia, 2–4 December 2014, pp.1–10. Australia: Australian Robotics and Automation Association.

20.

Shen

Mulgaonkar

Michael

et al . Vision-based state estimation for autonomous rotorcraft MAVs in complex environments. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), Karlsruhe, 6–10 May 2013, pp.1758–1764. New York: IEEE.

21.

Forster

Pizzoli

Scaramuzza

SVO: fast semi-direct monocular visual odometry. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), Hong Kong, China, 31 May–7 June 2014, pp.15–22. New York: IEEE.

22.

Thurrowgood

Moore

RJD

Soccol

A biologically inspired, vision-based guidance system for automatic landing of a fixed-wing aircraft. J Field Robot 2014; 31(4): 699–727.

23.

Kendoul

Fantoni

Nonami

Optic flow-based vision system for autonomous 3D localization and control of small aerial vehicles. Robot Auton Syst 2009; 57(6–7): 591–602.

24.

Denuelle

Thurrowgood

Strydom

Biologically-inspired visual stabilization of a rotorcraft UAV in unknown outdoor environments. In: Proceedings of the international conference on unmanned aircraft systems (ICUAS), Denver, CO, 9–12 June 2015, pp.1084–1093. New York: IEEE.

25.

Lee

Ryan

Kim

HJ.

Autonomous landing of a VTOL UAV on a moving platform using image-based visual servoing. In: Proceedings of the IEEE international conference on robotics and automation (ICRA), Saint Paul, MN, 14–18 May 2012. New York: IEEE.

26.

Chu

QP.

Automatic landing system of a quadrotor UAV using visual servoing. In: Proceedings of the Eurognc 2013, 2nd CEAS specialist conference on guidance, navigation & control, Delft, 10–12 April 2013. Reston, VA: AIAA.

27.

Yang

Zhang

Monocular vision SLAM-based UAV autonomous landing in emergencies and unknown environments. Electronics 2018; 7(5): 73.

28.

Kong

Zhang

Localization framework for real-time UAV autonomous landing: an on-ground deployed visual approach. Sensor 2017; 17(6): E1437.

29.

Shen

Stereo vision guiding for the autonomous landing of fixed-wing UAVs: a saliency-inspired approach. Int J Adv Robotic Syst 2016; 13(43): 2016.

30.

Herisse

Hamel

Mahony

et al . Landing a VTOL unmanned aerial vehicle on a moving platform using optical flow. IEEE T Robot 2012; 28(1): 77–89.

31.

Garratt

Lambert

Monocular snapshot-based sensing and control of hover, takeoff, and landing for a low-cost quadrotor: monocular snapshot-based sensing and control. J Field Robot 2015; 32: 984–1003.

32.

Wenzel

Masselli

Zell

Automatic take off, tracking and landing of a miniature UAV on a moving carrier vehicle. J Intell Robot Syst 2011; 61(1–4): 221–238.

33.

Yuan

Xiao

Xiu

et al . A new combined vision technique for micro aerial vehicle pose estimation. Robotics 2017; 6(2): 6.

34.

Suzuki

Abe

Topological structural analysis of digitized binary images by border following. Comput Vision Graph 1985; 30(1): 32–46.

35.

Zhang

A flexible new technique for camera calibration. IEEE T Pattern Anal 2000; 22(11): 1330–1334.

36.

Bazargani

Bilaniuk

Laganière

A fast and robust homography scheme for real-time planar target detection. J Real Time Image Process 2015; 2015: 1–20.

37.

GAZEBO Software, http://gazebosim.org

38.

MTi-G-700 Inertial Module, https://www.xsens.com/products/mti-g-710/