Stereo Vision Guiding for the Autonomous Landing of Fixed-Wing UAVs: A Saliency-Inspired Approach

Abstract

It is an important criterion for unmanned aerial vehicles (UAVs) to land on the runway safely. This paper concentrates on stereo vision localization of a fixed-wing UAV's autonomous landing within global navigation satellite system (GNSS) denied environments. A ground stereo vision guidance system imitating the human visual system (HVS) is presented for the autonomous landing of fixed-wing UAVs. A saliency-inspired algorithm is presented and developed to detect flying UAV targets in captured sequential images. Furthermore, an extended Kalman filter (EKF) based state estimation is employed to reduce localization errors caused by measurement errors of object detection and pan-tilt unit (PTU) attitudes. Finally, stereo-vision-dataset-based experiments are conducted to verify the effectiveness of the proposed visual detection method and error correction algorithm. The compared results between the visual guidance approach and differential GPS-based approach indicate that the stereo vision system and detection method can achieve the better guiding effect.

Keywords

UAV Autonomous Landing Visual Guidance Saliency Detection Algorithm

1. Introduction

With successful application in many areas, unmanned aerial vehicles (UAVs) have been a popular research topic in the field of robotic systems. Landing safely is an important process in the flight control of UAVs. Most traditional landing control methods are based around radio waves and the global navigation satellite system (GNSS). However, the UAV landing process is mainly remote controlled by experienced operators in environments where there is electromagnetic interference. In this eye-in-loop process, operators estimate UAV flight states using the human visual system (HVS). Since recovery mode depends on the experience of the professional operators, dangerous or fatal mistakes are inevitable when the operator's attention is required for one hour or more. It is therefore desirable to develop UAV autonomous landing technology by using vision-like technologies. The development of an UAV autonomous landing system involves many areas of knowledge. Although there has been significant research [1,2] on designing analogous systems and algorithms in the area of UAV autonomous landing, guiding UAV landing remains a challenging task.

The design of visual detection, tracking and vision measurement algorithms are key factors for constructing an autonomous guiding system using computer vision. Stereo visual localization is inspired by the human vision system (HVS) and supported by stereo vision measurement theory, which has been involved in many applications [3]. During flight, aircrafts often move through a cluttered background (e.g., cloudy sky, forest, mountain, the ground). Furthermore, rapid movement of UAV leads to significant background variation, which increases the difficulty of target detection and tracking. Even if the flying vehicle is automatically detected, measurement error still affects localization and guiding accuracy. This inaccuracy is particularly noteworthy in long-distance scenarios.

To address the issues noted above, this paper proposes and develops a saliency detection method combined with a filtering correction. This work is in accordance with the developed ground-based stereo vision guidance system [16]. The combination of computer vision and autonomous landing control can avoid the influence of electromagnetic interference, and increase the robustness of the landing system. The successful realization of the proposed visual guidance system primarily depends on the following two contributions.

A long-baseline architecture that separates binocular cameras is proposed for the ground visual guidance system. Specifically, two cameras are independently installed on the pan-tilt units on the ground to imitate the HVS shown in Fig.1. Under such circumstances, the ground stereo system captures sequential images of the flying vehicle. Algorithms of target detection and localization are developed to obtain spatial coordinates for the landing of the aerial vehicle. The ground system sends the coordinates to the on-board autopilot via the specified data link in order to guide the landing of the UAV.

A saliency visual detection algorithm inspired by the human eye is employed for detecting the flying vehicle from the captured images. Furthermore, an extended Kalman filter (EKF) is implemented to estimate the spatial coordinates of the UAV and to reduce localization errors.

Figure 1.

Architecture for the ground-based fixed-wing UAV guiding system

The remainder of this paper is organized as follows. In section 2, works related to UAV landing technology and the saliency detection method are reviewed. Section 3 formulates the UAV autonomous landing problem and describes the architecture of the visual guidance system. A saliency-based target detection algorithm is proposed in section 4. Parameter initialization and tuning with typical scenarios is also considered. Section 5 develops the flying vehicle localization algorithm by fusing the detected image coordinates and the PTU angles. Moreover, a trajectory estimation and correction algorithm combined with EKF is also proposed for improving localization accuracy. Eventually, outdoor flight experiments are conducted to validate the effectiveness of the proposed algorithms in Section 6. Finally, concluding remarks and future works are presented in section 7.

2. Related Works

Developing autonomous landing technologies is an important trend for the runway-mode of take-off-and-landing UAV systems. Types of autonomous guiding systems that have been developed for UAV navigation include ground vision-based, onboard vision-based and ground radar-based systems. The success of flying aircraft navigation is mostly achieved by using on-board sensors [4–5] such as the global positioning system (GPS), inertial measurement unit (IMU), on-board cameras and magnetometers. In recent years, methods [6 –12] using airborne vision-based navigation methods for UAV accuracy landing have also been proposed. The tactical automated landing system (TALS), based on millimetre wavelength ground radar, was developed by the Sierra Nevada Corporation for autonomous landing [13]. The TALS is an all-weather ground station. However, there are also some disadvantages to the aforementioned autonomous landing systems. (1) Compared to airborne guidance technology, ground systems possess stronger computational resources and save costs. The payload of the UAV is limited, which functions as a strong restriction to sensor selection, considering their weights. Moreover, external artificial auxiliary markers included in several studies [6 –12] are mainly used for guiding the UAV to land on the runway. (2) A GPS-based system is limited to satellite signals. However, in some circumstances, a GPS- denied environment can be quite dangerous for the UAV closed-loop control system in terms of landing manoeuvres. (3) A radar-based detector primarily employs microwaves and is extremely sensitive. Furthermore, as an active range sensor, it cannot avoid detection. They are also much more expensive compared to a visual system. Therefore, taking into account the above considerations, we primarily focused on developing a ground stereo vision guidance system [15 –17]. By using a ground binocular camera system to guide the UAV's landing, many researchers have expanded work in the area of developing vision systems [14 –18].

Another purpose of this research is to develop detection and tracking algorithms for the vision-based UAV guidance system. A range of methods have been presented [19] for the detection and tracking of a target using visual information, e.g., feature-based tracking, mean shift tracking and tracking by detection. However, target detection in the cluttered environment has some shortcomings. For example, local point feature-based detection can receive many local features in a cluttered environment. To solve this problem, Chan-Vese algorithms were applied to flying object detection on the ground-captured sequential images in our previous works [16, 17]. In this paper, we mainly focus on inspiration gained from the HVS. In particular, HVS responds sensitively to high textural contrasts, which are generally derived from discriminative directional patterns. Visual attention facilitates our ability to rapidly locate the most important information in a scene. The visual selective attention mechanism, also known as the visual saliency mechanism, enables humans to focus on a few striking visual objects in a rapid fashion when faced with a complex scene and to carry out the priority processing on these objects. In the case of human eyes, the visual saliency mechanism [20] realizes highly efficient information acquisition and the processing of images. Such image regions are said to be salient because it is much easier for the salient regions to get the attention of HVS. The saliency detection method primarily originates from the biological implications of human vision [20 –22], because it is easy for humans to judge interesting objects within their region of view using binocular vision. Motivated by the above, in this paper, we introduce a detection framework combined with a ground vision guidance scheme and a saliency detection algorithm.

3. Framework

The typical process of UAV landing on the ground can be divided into three stages: approaching, descending and taxiing. The ground stereo vision system usually works within the descending and taxiing stages, once the UAV's engine is turned off. The guiding system, consisting of two independent modules, is located symmetrically on both sides of the runway near the UAV landing point, as shown in Fig.1. Each module is equipped with one camera on an independent pan-tilt unit. The two modules are independently connected to the computer. By using two cameras, binocular vision has a function similar to human eyes and can obtain 3D information about the targets.

The binocular vision measurement system mainly consists of image capture, aircraft detection and tracking and aircraft position calculation modules. The workflow of the position calculation system and vision system are shown in Fig.2. The airborne navigation system guides the aircraft into the field of view of the stereo cameras and the aircraft turns off its engine. The visual system with its two cameras has a wide view of the sky. Captured image sequences from the two cameras are sent to the data processing computer at the same time. The detection algorithm extracts the UAV's image position from the captured sequential images. At the same time, the localization algorithm uses the calibration data, target image coordinates and the feedback angles of pan-tilt units to calculate the UAV spatial coordinates and restores them. Combining the current UAV's spatial position with previous position information, a more precise trajectory prediction result can be achieved by the filter algorithm. Finally, the landing position and control information are sent to the guidance system of the UAV. The PTU vision system updates its rotating angles and drives the pan-tilt motors to track the UAV and to keep the image of the flying aircraft at the centre of the vision field.

Figure 2.

Workflow of the visual guidance system

4. Visual Detection Algorithm

Target detection is an important technology for guiding UAV autonomous landing. Many research methods exist for object detection, including the background-difference method, frame difference method, feature extraction based method, optical flow based method, etc. The main challenge for detection in UAV landing problems lies in finding the UAV within a varied background composed of sky and land. Furthermore, the binocular solution will use the UAV's image coordinates as the input. Inspired by human vision characteristics, in this paper, we propose a saliency method for target detection.

4.1. Saliency detection method

A good saliency algorithm should highlight the entire salient object. Several methods have been highlighted in object saliency research, e.g., methods pertaining to the frequency domain and time domain methods. The proposed saliency detection method in this paper belongs to the frequency domain.

The problem of finding all objects in a scene and separating them from the background is known as figure-ground separation. The natural images that HVS is interested in often contain special content. These contents are part of the image and sparsely distributed. Therefore the picture can be regarded as consisting of a background and the sparse distribution of salient targets.

A grey-scale image will, for example, exhibit the following structure: $X = F + B$ , where F represents the foreground or figure signal and is assumed to be sparsely supported in the standard spatial basis and B represents the background and is assumed to be sparsely supported in the basis of the discrete cosine transform (DCT). On the other hand, it has been shown that the inverse discrete cosine transform (IDCT) of the image signature concentrates image energy at the locations of a spatially sparse foreground, relative to a spectrally sparse background [21].

This method primarily involves seven key steps, as shown in Fig.3. The full image processing results that applied this algorithm are also shown in Fig.3.

Figure 3.

Process of saliency detection algorithm: resize, DCT, SIGN, IDCT, mean, Gaussian convolution and normalization

In this paper, we use the nearest neighbour interpolation algorithm to resize the initial input image $I (x, y)$ with size $a \times b \times d$ , where $d = 3$ is the dimension of the image. By defining the size scale factor γ, the resized image is $m \times n \times d$ , where $m = a \times γ$ , $n = b \times γ$ . An initial input image from two cameras is resized in three channels. The size of the image has an effect on processing time and the final detection result. In order to verify this conclusion, we chose the same input image with a different scale factor γ to conduct the same process using the entire saliency algorithm. Then, the processed images are reshaped to the original size. The salient region is different when using a different γ, as shown in Fig.4. It was found that the size of the image will influence the salient region.

Figure 4.

Saliency detection results and the algorithm running time for a different γ

The DCT is a type of orthogonal transformation that transforms data from the spatial domain to the frequency domain. The primary information of the signal is concentrated on a small number of low frequency coefficients. The definition of DCT can be described by:

\begin{array}{l} I {(x, y)}_{c}^{D C T} = I (p, q) \\ = α_{p} α_{q} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} I {(i, j)}_{c} cos \frac{(2 i + 1) p π}{2 m} cos \frac{(2 j + 1) q π}{2 n}, \begin{matrix} 0 \leq p \leq m - 1 \\ 0 \leq q \leq n - 1 \end{matrix} \end{array}

(1)

where $α_{p} = {\begin{matrix} \frac{1}{\sqrt{m}}, p = 0 \\ \sqrt{\frac{2}{m}}, 1 \leq p \leq m - 1 \end{matrix}$ , $α_{q} = {\begin{matrix} \frac{1}{\sqrt{n}}, q = 0 \\ \sqrt{\frac{2}{n}}, 1 \leq q \leq n - 1 \end{matrix}$ and $c = R, G, B$ .

The sign operator, defined by Eq.(2), is applied to the transformed image $I {(x, y)}_{c}^{D C T}$ .

I {(x, y)}_{c}^{S} = {\begin{matrix} \begin{array}{l} - 1 \\ 0 \\ 1 \end{array} & \begin{array}{l} , \\ , \\ , \end{array} & \begin{array}{l} I {(x, y)}_{c}^{D C T} > 0 \\ I {(x, y)}_{c}^{D C T} = 0 \\ I {(x, y)}_{c}^{D C T} < 0 \end{array} \end{matrix}

(2)

The IDCT of the image is defined by the following equation:

I {(x, y)}_{c}^{I D C T} = \sum_{p = 0}^{m - 1} \sum_{q = 0}^{n - 1} α_{p} α_{q} I {(p, q)}_{c}^{S} cos \frac{(2 i + 1) p π}{2 m} cos \frac{(2 j + 1) q π}{2 n}

(3)

A mean operator is applied to the three image channels using the following definition:

I {(x, y)}^{M E A N} = \frac{1}{3} (I {(x, y)}_{R}^{I D C T} + I {(x, y)}_{G}^{I D C T} + I {(x, y)}_{B}^{I D C T})

(4)

Gaussian convolution is adopted to get rid of noise in the image after mean operation.

I {(x, y)}^{G} = I {(x, y)}^{M E A N} * G (x, y, σ)

(5)

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{- \frac{(x^{2} + y^{2})}{2 σ^{2}}}

(6)

in which σ is the parameter of Gaussian variance and the convolution operator is defined by:

h (x, y) = f * g = \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} f (η, μ) g (x - η, y - μ) d η d μ

In this process, different σ will also have an effect on the final results, as shown in Fig.5.

Figure 5.

Saliency detection results for different σ

Finally, a normalization operator, defined in Eq.(7), is applied on the filtered images. Following normalization, the pixel value of images is re-scaled to the range $[0, 1]$ .

I {(x, y)}^{N} = ρ (I {(x, y)}^{G} - I {(x, y)}_{0}^{G})

(7)

where $I {(x, y)}_{0}^{G} = min (I {(x, y)}^{G})$ and $ρ = \frac{1}{max (I {(x, y)}^{G}) - min (I {(x, y)}^{G})}$ .

4.2. Index for parameter selection

When people observe a scene, the focus of the eye is constantly adjusted. However, the same scene is seen differently at an altered distance. Parameters of scale factor γ and σ are related to the detection results for imitating the distance factor. In this paper, the evaluation for acquired information can be realized by calculating entropy. Entropy can be used to discriminate different characteristics and texture structures in different regions of images. In this way, entropy is used to calculate the anisotropy of the image. We will choose parameters based on the variation trend of the entropy value. Particularly, the calculation of entropy can be realized by Renyi entropy [23], defined as follows:

Re = \frac{1}{1 - q} l n (\sum_{i = 1}^{m} \sum_{j = 1}^{n} I {(i, j)}^{q})

(8)

where q is an adjustable parameter. When q reaches 1, Renyi entropy is converted to Shannon entropy. By choosing different value of q, the entropy corresponding to different regions becomes more obvious. Considering the complexity of the computation and full use of parameter q, the value of q is set to 3 in this paper.

4.3. UAV image coordinates solution

The result obtained by the saliency detection algorithm comprises a significant image region. However, it is necessary to know the precise location of the UAV during the guiding process. Obtaining the precise image coordinate position of the UAV is key to the process of UAV guidance. This paper primarily employs the threshold segmentation method and the saliency image to deal with this problem. The threshold algorithm, used for determining the position of the UAV from the saliency region, is defined by:

S {(x, y)}^{δ} = {\begin{matrix} \begin{array}{l} 0 \\ I {(x, y)}^{N} \end{array} & \begin{array}{l} , \\ , \end{array} & \begin{array}{l} I {(x, y)}^{N} < δ \\ I {(x, y)}^{N} \geq δ \end{array} \end{matrix}

(9)

where δ is the threshold. The threshold is primarily selected by our multiple experimental experience on the image dataset.

In order to obtain a precise image coordinate of the UAV, this paper uses the following average method to determine the UAV image coordinates $(u, v)$ for the $S {(x, y)}^{δ}$ .

{\begin{matrix} u = \frac{1}{a} \sum_{i = 1}^{m} x_{i} \\ v = \frac{1}{a} \sum_{j = 1}^{n} y_{j} \end{matrix}

(10)

in which a is the number of nonzero pixels in $S {(x, y)}^{δ}$ and $(x_{i}, y_{j})$ is the image coordinate.

4.4. UAV detection and image coordinates' solution algorithm

Algorithm detection and image coordinates solution

Inputs 1) Continuous digital image I(x, y) 2) Algorithm parameters: size scale factor γ, Gaussian parameter σ, threshold parameter δ Output Image coordinates (u, v) of UAV Procedure 1) Image size transformation: using size scale factor γ, resize the image in RGB channels. 2) Image discrete cosine transform: apply the DCT on the resized image based on Eq.(1) in RGB channels. 3) Sign operation: redefine the image value based on Eq.(2). 4) Image inverse discrete cosine transform: apply the IDCT to the reassigned image based on Eq.(3) in RGB channels. 5) Mean operation: determine the mean value of the RGB channels based on Eq.(4) after IDCT. 6) Gaussian convolution transform: use the Gaussian convolution with parameter σ to get a filtered image. 7) Normalization: image is normalized into [0, 1] based on Eq.(7). 8) Threshold processing: use parameter 8 to select the salient region based on Eq.(9). 9) Coordinates calculation: determine the image coordinates (u, v) of UAV based on Eq.(10).

5. Spatial Position Calculation Algorithm

In order to guide the UAV's landing, the spatial location must be calculated from the image coordinates. In this paper, the calculation algorithm is based on the binocular vision measurement principle. On the other hand, UAV detection must ensure that the UAV remains in the FOV via real-time control of PTUs. Updated rotating angles are also given in this section.

5.1. Binocular vision calculation algorithm

As shown in Fig.6, the world coordinates system is located on the origin of the coordinate system $O_{L} - X_{L} Y_{L} Z_{L}$ of the left camera without rotations. The world coordinate chooses the landing runway as the Y_W axis, while the Z_W axis is in perpendicular direction to the ground, as shown in Fig.6. The left camera optical centre is set by O_L and the right camera optical centre is O_R. Additionally, the left camera optical centre O_L is set as coordinate dots and the connection direction between O_L and O_R is defined as the X_W axis. The baseline length between O_L and O_R is d. The right image tracking coordinate system $O_{R} - X_{R} Y_{R} Z_{R}$ is configured on the X_W axis.

Figure 6.

Simplified coordinates based on the landing process shown in Figure 1

Based on the pinhole model, the relationship between the UAV's coordinates in the world frame $L = {[\begin{matrix} x_{w} & y_{w} & z_{w} \end{matrix}]}^{'}$ and the image coordinates $P o s = {[\begin{matrix} u & v \end{matrix}]}^{'}$ is defined by:

\begin{matrix} [\begin{matrix} P o s \\ 1 \end{matrix}] = λ [\begin{matrix} M_{i n} & 0_{3 \times 1} \end{matrix}] M_{e x} [\begin{matrix} L \\ 1 \end{matrix}] \\ = λ [\begin{matrix} M_{i n} & 0_{3 \times 1} \end{matrix}] [\begin{matrix} R_{3 \times 3} & p_{3 \times 1} \\ 0 & 1 \end{matrix}] [\begin{matrix} L \\ 1 \end{matrix}] \end{matrix}

(11)

where λ is normalization coefficient, $M_{i n}$ is the internal parameters of the camera, $R_{3 \times 3}$ is the attitude rotation matrix of the camera and $p_{3 \times 1}$ is the transformation matrix from the camera frame to the world frame. Variables in the Eq. (11) are defined by:

M_{i n} = [\begin{matrix} k_{x} & 0 & u_{0} \\ 0 & k_{y} & v_{0} \\ 0 & 0 & 1 \end{matrix}], R = f (θ, φ, ϕ) and p = [\begin{matrix} p_{x} \\ p_{y} \\ p_{z} \end{matrix}] .

Using the camera projective transform model (11), we can derive:

[\begin{matrix} u_{l} \\ v_{l} \\ 1 \end{matrix}] = M [\begin{matrix} L \\ 1 \end{matrix}]

(12)

and

[\begin{matrix} u_{r} \\ v_{r} \\ 1 \end{matrix}] = λ M_{i n} [\begin{matrix} x_{C r} \\ y_{C r} \\ z_{C r} \end{matrix}]

(13)

where ${[\begin{matrix} x_{C r} & y_{C r} & z_{C r} \end{matrix}]}^{'}$ are the UAV coordinates in the right camera coordinate system. The transformation relationship between coordinate systems of the left and right cameras can be derived by $M_{l r} = [R_{l r} | T_{l r}]$ , and the following holds:

[\begin{matrix} x_{C r} \\ y_{C r} \\ z_{C r} \end{matrix}] = M_{l r} [\begin{matrix} L \\ 1 \end{matrix}] = [\begin{matrix} r_{1} & r_{2} & r_{3} & t_{x} \\ r_{4} & r_{5} & r_{6} & t_{y} \\ r_{7} & r_{8} & r_{9} & t_{z} \end{matrix}] [\begin{matrix} L \\ 1 \end{matrix}]

(14)

Hence, for any given point in $O_{W} - X_{W} Y_{W} Z_{W}$ , the principle of binocular solution defines the relationship between the image coordinates $(u_{l}, v_{l})$ and $(u_{r}, v_{r})$ in the left and right cameras and the world coordinates $(x_{w}, y_{w}, z_{w})$ as:

{\begin{cases} x_{w} = \frac{z u_{l}}{f_{l}} \\ y_{w} = \frac{z v_{l}}{f_{l}} \\ z_{w} = \frac{f_{l} (f_{r} t_{x} - u_{r} t_{z})}{u_{r} (r_{7} u_{l} + r_{8} v_{l} + f_{l} r_{9}) - f_{r} (r_{1} u_{l} + r_{2} v_{l} + f_{l} r_{3})} \end{cases}

(15)

where f_l and f_r are the two cameras' focal lengths.

5.2. Rotation angles of the PTUs calculation algorithm

In this paper, we assume that the roll angles of the PTUs are 0°. Therefore, the pan angle θ and tilt angle φ of PTUs update parameters automatically to keep the flying aircraft in the centre of the vision field. The distance of the baseline between two PTUs is d, as shown in Fig.6. Based on the UAV spatial position's projection relationship, N is the projection of UAV on the $X_{W} O_{W} Y_{W}$ plane. We choose point A on the X_W axis. And the point A makes sure that $N A$ is perpendicular to $O_{L} A$ and the plane $P - N - A$ is perpendicular to the plane $X_{W} O_{W} Y_{W}$ . Thus, $N A$ is the horizontal distance from the UAV to the camera. According to this measurement, the relationship between the PTUs' angles $[\begin{matrix} (θ_{l}, φ_{l}) & (θ_{r}, φ_{r}) \end{matrix}]$ and the world coordinates $(x_{w}, y_{w}, z_{w})$ can be defined as:

{\begin{cases} x_{w} = \frac{d tan θ_{l}}{tan θ_{l} + tan θ_{r}} \\ y_{w} = \frac{d}{tan θ_{l} + tan θ_{r}} \\ z_{w} = \frac{d tan φ_{l}}{cos θ_{l} (tan θ_{l} + tan θ_{r})} \end{cases}

(16)

Eq.(16) can be used to determine the updated PTUs' angles:

{\begin{cases} θ_{l} = arctan (\frac{x_{w}}{y_{w}}) \\ φ_{l} = arctan (\frac{z_{w}}{{(x_{w}^{2} + y_{w}^{2})}^{\frac{1}{2}}}) \\ θ_{r} = arctan (\frac{d - x_{w}}{y_{w}}) \\ φ_{r} = arctan (\frac{z_{w}}{{({(d - x_{w})}^{2} + y_{w}^{2})}^{\frac{1}{2}}}) \end{cases}

(17)

5.3. EKF-based UAV spatial position fusion algorithm

5.3.1. EKF Prediction model

In order to improve the precision of our method, the data fusion algorithm, based on the extended Kalman filter, is discussed in this section. In the EKF design for this application, an estimation state vector is taken as follows:

X = [\begin{matrix} L & V & Θ & Ω \end{matrix}]

(18)

where L and V are vectors describing the location of the UAV and the velocity in the world frame; Θ and Ω are vectors describing the angle and angular velocity of the PTUs.

Variables in Eq.(18) are defined by $L = {[\begin{matrix} x & y & z \end{matrix}]}^{'}$ , $V = {[\begin{matrix} v_{x} & v_{y} & v_{z} \end{matrix}]}^{'}$ , $Θ = {[\begin{matrix} φ_{l} & θ_{l} & φ_{r} & θ_{r} \end{matrix}]}^{'}$ and $Ω = {[\begin{matrix} {\overset{\cdot}{φ}}_{l} & {\overset{\cdot}{θ}}_{l} & {\overset{\cdot}{φ}}_{r} & {\overset{\cdot}{θ}}_{r} \end{matrix}]}^{'}$ .

More specially, EKF-based visual tracking methods estimate the UAV's state with given observations up to frame k. The relationship between the UAV's position and velocity is defined by

L_{k} = Δ t V_{k - 1} + L_{k - 1}

(19)

The relationship between the angular position and angular velocity of the PTUs is defined by

Θ_{k} = Δ t Ω_{k - 1} + Θ_{k - 1}

(20)

Then, the state transition equation is defined by

X_{k} = F_{k - 1} X_{k - 1} + w_{k - 1}

(21)

where ${w}$ is zero-mean Gaussian noise and $F_{k - 1}$ is defined by $F_{k - 1} = [\begin{matrix} I_{3 \times 3} & Δ t_{3 \times 3} & 0_{3 \times 4} & 0_{3 \times 4} \\ 0_{3 \times 3} & I_{3 \times 3} & 0_{3 \times 4} & 0_{3 \times 4} \\ 0_{4 \times 3} & 0_{4 \times 3} & I_{4 \times 4} & Δ t_{4 \times 4} \\ 0_{4 \times 3} & 0_{4 \times 3} & 0_{4 \times 4} & I_{4 \times 4} \end{matrix}]$ .

The estimated state covariance matrix at step k is defined by:

P_{k}^{-} = F_{k - 1} P_{k - 1}^{+} F_{k - 1}^{T} + G_{k - 1} Q_{k - 1} G_{k - 1}^{T}

(22)

where $Q_{k - 1}$ is the zero-mean Gaussian noise covariance of the dynamic model and G_k is the Jacobian matrix of $Q_{k - 1}$ with respect to state x at step k. The state covariance matrix consists of four parts: target to target, target to camera, camera to camera and camera to target, which is $P_{k} = [\begin{matrix} P_{T | T} & P_{T | C} \\ P_{C | T} & P_{C | C} \end{matrix}]$ .

5.3.2. Measurement Model

In the binocular vision system, we can acquire the image data and the angles of the PTU. Thus, the measurement matrix is defined by

Y = {[\begin{matrix} P o s_{l} & P o s_{r} & C_{P T U} \end{matrix}]}^{'}

(23)

where $P o s$ is a vector describing the location of the UAV in the image frame and $C_{P T U}$ is the attitude angles of the PTUs. Here, we assume that the optical centre positions of the two cameras installed on the turntable are kept stable during turning of the PTU. Thus, variables in Eq.(23) are defined by $P o s_{l} = {[\begin{matrix} u_{l} & v_{l} \end{matrix}]}^{'}$ , $P o s_{r} = {[\begin{matrix} u_{r} & v_{r} \end{matrix}]}^{'}$ and $C_{P T U} = [\begin{matrix} φ_{l} & θ_{l} & φ_{r} & θ_{r} \end{matrix}]$ .

Based on Eq.(11), the measurement model is defined by:

Y_{k} = f_{k} (X_{k}, v_{k}) = [\begin{matrix} f (λ_{r}, M_{i n_r}, M_{e x_r}, L_{k}) \\ f (λ_{l}, M_{i n_l}, M_{e x_l}, L_{k}) \\ I_{4 \times 4} C_{k - 1} \end{matrix}] + v_{k}

(24)

where ${v}$ is zero-mean Gaussian noise.

5.3.3. Correction model

X_k is updated by measurement Y_k and covariance matrices. According to measurement model, the Jacobian matrix H_k of f_k can be represented as:

H_{k} = \frac{\partial f_{k}}{\partial X_{k}}

(25)

Then, we get the Kalman gain K_k:

K_{k} = P_{k}^{-} H_{k}^{T} {(H_{k} P_{k}^{-} H_{k}^{T} + R_{k})}^{- 1}

(26)

where R_k is the Gaussian noise covariance matrix of the sensors and $P_{k}^{-} = F_{k - 1} P_{k - 1}^{+} F_{k - 1}^{T} + Q_{k - 1}$ . In the final step, we update the covariance matrix with regard to Kalman gain:

X_{k}^{+} = X_{k}^{-} + K_{k} [Y_{k} - f (X_{k}^{-})]

(27)

P_{k}^{+} = (I - K_{k} H_{k}) P_{k}^{-}

(28)

6. Experiments and Discussion

In order to verify the UAV guiding effect based on a binocular vision guidance system and the superiority of the saliency detection method, this paper adopts the fixed-wing UAV as flight test object and conducts offline and online experiments.

6.1. Experimental set-up

We constructed a set of ground-based guidance systems using a fixed-wing aircraft. The fixed-wing UAV applied in this paper is shown in Fig.7 and the parameters are shown in Table 1. The platform is installed using an IFLY-F1 controller. The control signal and position signal is transmitted to the UAV flight control system via radio stations on the ground.

Figure 7.

Experimental equipment in the present work: (a) fixed wing UAV; (b) DGPS module inside UAV; (c) antenna used for sending signal; (d) two PTUs systems with camera

Table 1.

The parameters of the fixed-wing UAV

Parameters	Value	Unit
Wingspan	2.9	m
Maximum Weight	15	Kg
Payload	5	Kg
Maximum Speed	40	m/s
Autonomy	1	h

In addition, the UAV is equipped with a differential global positioning system (DGPS) module in order to ensure its flight safety. The DGPS signal is also easy to use for data comparison analysis. As the product manual of DGPS lists, the localization error of DGPS is less than 2cm 95% of the time. The frequency for recording DGPS signal data is 10 Hz, which is similar to the video recording frequency. When the UAV flight height is 60-100m and the distance from the PTUs is 400-500m, i.e., the approaching and descending stage, respectively, the system begins to prepare for landing guidance. The vision system uses a visible light camera. Two PTU systems select a stabilized biaxial gyroscope with high rotary speed (50 degree/sec) that supports the RS232 serial port in sending the command signal. Additionally, the real-time feedback status, including pan angle θ and tilt angle φ, can be achieved by the PTUs.

6.2. Analysis of target detection results

6.2.1. Parameters selection experiments

UAV visible light images captured by the camera are shown in Fig.8. A significant region for the UAV can be obtained at a speed of roughly 30m/s using the saliency method. A single image's initial size is 640*480 pixels. First, we need to determine the corresponding parameters in the saliency detection algorithm based on historical data. Using a set of flight tests, a sample dataset is collected. As shown in the Fig.8, we selected a typical source for sample data to analyse. The typical source includes the images that UAV appears in the different position of the air and the background is different.

Figure 8.

Algorithm running time and Renyi entropy Re for different γ

In Fig.8, three typical images in different positions display the parameter selection process in order to conveniently describe the data processing results. The saliency detection algorithm in section 4.4 is applied to the image datasets using different scale factors.

In the parameter selection experiment, the range of scale factor γ is $(0, 3]$ and step size is $Δ γ = 0.1$ . Based on the algorithm's processing of the dataset, it was found that the values of Renyi entropy vary significantly when $γ \in (0, 1]$ . However, when $γ > 3$ , the value of Renyi entropy changes little and almost converges to a constant value. Thus, the processing results are not list here when $γ > 3$ . On the other hand, we recorded the running time of the entire programme for each scale factor, as well as the Renyi entropy of each saliency map. The processing results show that detection time is increased, as is scale factor. Particularly, processing time $t > 10$ s when $γ > 3$ . It is not allowed to cost so much time on images processing in the UAV landing process.

Therefore, there are $N u m = 30$ scale factors in the experiments. In addition, variation in terms of entropy is calculated by $Δ {Re}_{k} = {Re}_{k} - {Re}_{k + 1}, k = 1, \dots N u m - 1$ .

Variations in entropy $Δ Re$ show a decreasing trend. By integrating the changes in the time and entropy factors, we analysed the result of 50 historical images to obtain the best scaling factor value, i.e., 0.1. In this experiment, the parameter σ is 0.045.

Similarly, the corresponding saliency map is obtained according to Gauss convolution using different σ parameters. In the experiment, we also calculate entropy and entropy changes pertaining to the saliency map. For the all images in the dataset, the optimal size scale factor is set to 0.1. Parameter σ is in the range of $(0, 0.06]$ . The range is chosen based on the value of Renyi entropy. In case $σ > 0.06$ , Renyi entropy almost converges to a constant value, as shown in Fig.9(a,c,e). On the other hand, it was found that the final processing result with $σ = 0.06$ , as shown in Fig.9(6), was worse than results using $σ < 0.06$ . Thus, results yielded by parameter $σ > 0.06$ were not considered in this paper. Similarly, Fig.9 shows the saliency map for three images with different backgrounds and a chart indicating entropy changes. We analysed the results of 50 images to achieve a final parameter $σ = 0.004$ , based on variations in entropy.

Figure 9.

Saliency map and Renyi entropy Re for different σ: saliency map 1-6 for different images show when σ=0.01, 0.02, 0.03, 0.04, 0.05, 0.06

6.2.2. UAV detection experiment results

We applied the saliency detection algorithm to the flight experiments using parameters $γ = 0.1$ and $σ = 0.004$ . According to the method of threshold segmentation algorithm, we acquired the UAV salient region in the third row of Fig.10. Furthermore, the gravity centre coordinate of the image is obtained using the image coordinates calculation method for the UAV. It can be seen that the gravity centre of the UAV is almost at the UAV's heading position, as achieved by the method of visual calculation. In addition, Fig.10 shows three conditions of the UAV in different areas. Here, it is easy to detect the UAV in the air. When the UAV flight background changes, the forest area and runway area in the background can be seen in Fig.10 (b) and (c).

Figure 10.

Target detection using the saliency method and threshold algorithm: (a) the processing result for target detection of the UAV in the air; (b) the processing result for target detection of the UAV in the air with forest in the background; (c) the processing result for the UAV on the ground; (d) the wrong detection result for the UAV on the ground

We also found a wrong detection result, shown in Fig.10(d). A chair with same red color to the UAV is detected using the saliency algorithm. However, this algorithm was verified as suitable for detecting the salient region in an image. Thus, incorrect detection results may give rise to UAV spatial coordinate calculation errors. A filter (proposed in section 5.2) is needed to reduce errors based on recorded historical data.

6.3. Flight experiments

During the UAV flight test process, the ground control station transfers the physical coordinates of the UAV via antenna to guide the landing of the UAV at a speed of 30m/s. In order to verify the effectiveness of the visual detection algorithm, the ground control station records DGPS data, saliency automatic detection results and EKF- based estimation results during the landing process. As Fig.11 shows the comparison curves of the DGPS data, EKF estimation results and the UAV spatial coordinates attained by the visual method. The UAV begins to land about 250m away from the visual system.

Figure 11.

3D flight path results including the visual guidance result, EKF-based fusion result and DGPS data

In order to analyse landing precision, we define the absolute errors by $e_{x, y, z_| E - D |}$ and $e_{x, y, z_| V - D |}$ between the EKF-based calculation results, visual calculation results and DGPS data. As shown in Fig.12, the comparison results changes along with the UAV landing distance from $y_{w} = 250$ m to $y_{w} = 0$ m. It can be seen from the curves that the visual method and EKF-based result yield mostly equivalent performances to the DGPS. Based on the comparison results, we find that the error in three directions is very small, especially from $y_{w} = 50$ m to $y_{w} = 0$ m. On the other hand, it can be concluded that the EKF-based method deliver better calculation results, especially in the Z_W direction.

Figure 12.

3D flight path errors between the visual guidance results, EKF-based fusion result and DGPS data

6.4. Comparisons with other works

Compared to existing research, we find that the autonomous landing results are similar to work presented in [6]. In Gui's work [6], an airborne vision-based navigation method is used for the accuracy of UAV landing. In addition, many additional infrared lamps are applied to increase landing precision. Our study attempts to imitate the HVS in order to avoid using additional auxiliary materials.

On the other hand, in our previous work [17], different visual algorithms were applied to this guiding system. To verify its effectiveness, we mainly compared the saliency-based calculation results with the methods proposed in [17]. A Chan-Vese (CV) model-based approach is proposed and developed for ground stereo vision detection and a region-of-interest (ROI) set-up is presented to improve real-time capabilities. We therefore primarily compared different detection-based calculation precision with detection algorithm running time. Fig.13 shows the running time for different detection algorithms running on a PC with 2.80 GHZ CPU and 6.00 GB RAM. The time for CV is longer than when using the saliency detection method. The iteration process increases the running time of the CV algorithm. By decreasing the size of image, the saliency method shows better detection efficiency compared to CV, snake and gradient vector flow (GVF) snake algorithms. The running time for the location calculation algorithm with EKF is 10 ms.

Figure 13.

Running time for different detection algorithms

Table 2 shows the guidance calculation comparison results between the saliency and CV methods. It can be seen that the spatial coordinate's root-mean-square error (RMSE), based on saliency, is larger than the CV-based method. In [16], the detection region of the UAV is the region-of-interest (ROI), which is smaller than that detected by the saliency method. Thus, the calculated image coordinates for the UAV are more accurate than those identified in this paper. Detection errors decrease the accuracy of calculating the spatial coordinate. However, the EKF-based method used in the current paper improves the UAV spatial coordinates calculation results by 2.3%, 7.28% and 18.21%.

Table 2.

The root-mean-square error (RMSE) and precision improvement with EKF at each axis for different algorithms

Algorithm	X	Y	Z
CV_Visual (m)	0.3166	2.9019	0.3873
CV_EKF (m)	0.3104	2.792	0.1272
Improvement	1.96%	3.79%	67.16%
Saliency_Visual Result (m)	0.7695	10.5084	1.763
Saliency_EKF Result (m)	0.7515	9.7436	1.4418
Improvement	2.3%	7.28%	18.21%

7. Conclusion

Realizing aircraft self-recovery using bionic vision technology is an important indication for UAVs with high level performance. Addressing the UAV autonomous landing problem within a GPS denied environment, this paper constructed an UAV autonomous guiding system based on the HVS. Focusing on visual detection, tracking and the vision measurement problem, this paper proposed the saliency method and the UAV spatial position calculation algorithm by using the visual measurement principle in the UAV autonomous guiding system. In addition, according to the interference problem pertaining to UAV detection, this paper presented a method for forecasting and position correction based on EKF in order to reduce measurement errors. Finally, this paper carried out experiments using a fixed-wing UAV to test landing using this visual system. Experimental results and comparison results show that the proposed algorithm framework is simple and that the target detection effect is effective, which can satisfy the requirement of accuracy.

However, the shortcoming of this study is that salient regions selection depends on threshold parameters, which needs pre-training using historical data. Future work will aim to constantly improve the intelligent algorithm in order to raise the precision of the algorithm's position calculation within this framework.

Footnotes

8. Acknowledgements

The authors wish to thank Zhang Daibing, Niu Yifeng and Kong Weiwei for their assistance in developing the guidance system. We also gratefully acknowledge the support of the National Natural Science Foundation of China (61403410).

References

Kong

Zhou

Zhang

. “Vision-based autonomous landing system for unmanned aerial vehicle: A survey,” in International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MH), pp. 1–8, 2014.

Sereewattana

Ruchanurucks

Rakprayoon

. “Automatic landing for fixed-wing UAV using stereo vision with a single camera and an orientation sensor: A concept,” in IEEE International Conference on Advanced Intelligent Mechatronics (AIM), pp.29–34, 2015.

Liu

Xiong

. “Robust and Accurate Multiple- camera Pose Estimation Toward Robotic Applications,” in International Journal of Advanced Robotic Systems, vol. 11, pp. 153–169, 2014.

Cesetti

Frontoni

Mancini

. “A vision-based guidance system for UAV navigation and safe landing using natural landmarks,” in 2nd International Symposium on UAVs, pp. 233–257, 2010.

Yang

Scherer

S. A.

Zell

. “An onboard monocular vision system for autonomous takeoff, hovering and landing of a micro aerial vehicle,” in Journal of Intelligent and Robotic Systems, vol. 69, no. 1–4, pp. 499–515, 2013.

Gui

Guo

Zhang

. “Airborne vision-based navigation method for UAV accuracy landing using infrared lamps,” in Journal of Intelligent and Robotic Systems, vol. 72, no. 2, pp. 197–218, 2013.

Azinheira

J. R.

Rives

. “Image-based visual servoing for vanishing features and ground lines tracking: Application to a UAV automatic landing,” in International Journal of Optomechatronics, vol. 2, no. 3, pp. 275–295, 2008.

Shang

Shi

. “Vision-based runway recognition for UAV autonomous landing,” in International Journal of Computer Science and Network Security, vol. 7, no. 3, pp. 112–117, 2007.

Saripalli

Montgomery

J. F.

Sukhatme

G. S.

. “Visually guided landing of an unmanned aerial vehicle,” in IEEE Transactions on Robotics and Automation, vol. 19, no. 3, pp. 371–380, 2003.

10.

Lee

Ryan

Kim

H. J.

. “Autonomous landing of a VTOL UAV on a moving platform using image-based visual servoing,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 971–976, 2012.

11.

Courbon

Mezouar

Guenard

. “Vision-based navigation of unmanned aerial vehicles,” in Control Engineering Practice, vol. 18, no. 7, pp. 789–799, 2010.

12.

Zhang

. “Research on computer vision-based for UAV autonomous landing on a ship,” in Pattern Recognition Letters, vol. 30, no. 6, pp. 600–605, 2009.

13.

Sierra Nevada Corporation, “Tactical automatic landing system.” [Online]. Available: http://www.sncorp.com/

14.

Miller

Shah

Harper

. “Landing a UAV on a runway using image registration,” in IEEE International Conference on Robotics and Automation, pp. 182–187, 2008.

15.

Kong

Zhang

Wang

. “Autonomous landing of an UAV with a ground-based actuated infrared stereo vision system,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2963–2970, 2013.

16.

Tang

Shen

Zhang

Zhou

. “Chan-Vese model based binocular visual object extraction for UAV autonomous take-off and landing,” in International Conference on Information Science and Technology, pp. 67–73, 2015.

17.

Tang

Shen

. “Stereo Vision Navigation for Autonomous Take-off and Landing of UAV: A Chan-Vese Model Approach,” in press by International Journal of Advanced Robotic Systems, 2015.

18.

Huh

Shim

D. H.

. “A vision-based automatic landing method for fixed-wing UAVs,” in Journal of Intelligent and Robotic Systems, vol.57, no.1–4, pp. 217–231, 2010.

19.

Yilmaz

Javed

Shah

. “Object tracking: A survey,” in Acm computing surveys (CSUR), vol. 38, no. 4, pp.13, 2006.

20.

Itti

Koch

Niebur

. “A model of saliency-based visual attention for rapid scene analysis,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 1254–1259, 1998.

21.

Hou

Harel

Koch

. “Image signature: Highlighting sparse salient regions,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 194–201, 2012.

22.

Hou

Zhang

. “Saliency detection: A spectral residual approach,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007.

23.

Gabarda

Cristobal

. “Blind image quality assessment through anisotropy,” in Journal of the Optical Society of America A, vol. 24, no. 12, pp. B42–B51, 2007.