Abstract
Detection and tracking strategies based on monocular vision are proposed for autonomous aerial refuelling tasks. The drogue attached to the fuel tanker aircraft has two important features. The grey values of the drogue's inner part are different from the external umbrella ribs, as shown in the image. The shape of the drogue's inner dark part is nearly circular. According to crucial prior knowledge, the rough and fine positioning algorithms are designed to detect the drogue. Particle filter based on the drogue's shape is proposed to track the drogue. A strategy to switch between detection and tracking is proposed to improve the robustness of the algorithms. The inner dark part of the drogue is segmented precisely in the detecting and tracking process and the segmented circular part can be used to measure its spatial position. The experimental results show that the proposed method has good performance in real-time and satisfied robustness and positioning accuracy.
1. Introduction
Aerial refuelling, also referred to as in-flight refuelling (IFR) or air-to-air refuelling (AAR), is an operation whereby fuel is transferred from one aircraft (the tanker) to another aircraft (the receiver) during flight. IFR is an important method for extending the flying distance and speed of the aircraft and is widely used in military aircraft. In unmanned aerial vehicles (UAV), autonomous air-to-air refuelling is needed to ensure flight endurance. There are two kinds of hardware configurations used for aerial refuelling: the first configuration, called the boom-and-receptacle refuelling system, includes a rigid boom extending from the tanker aircraft, with a probe and nozzle at its distal end. The boom also includes airfoils controlled by a boom operator stationed on the refuelling aircraft. The airfoils allow the boom operator to actively manoeuvre the boom with respect to the receiver aircraft, which flies in a fixed refuelling position below and aft of the tanker aircraft [1–5]. The second configuration, called the probe-and-drogue refuelling system, includes a refuelling hose which has a drogue deposited at its end trailed behind the tanker aircraft and a probe installed on the receiver aircraft. The probe must be placed or docked into the drogue in order to refuel successfully [6–9]. Autonomous aerial refuelling relies on three key technologies: target detection, tracking and measurement, in order to allow the receiver aircraft to determine control strategies to enable a robust and safe approach and coupling. The attempt described in this paper is to provide detection and tracking strategies for the probe-drogue autonomous aerial refuelling based on monocular vision.
In this paper, the drogue's detection and tracking strategies based on monocular vision are proposed for autonomous aerial refuelling tasks. Two important features of the drogue are used to design the detection and tracking strategies. The first feature is that the grey values of the drogue's inner part are almost the same and are different from the external umbrella ribs. The second feature is that the shape of the drogue's inner dark part is nearly circular. The drogue's detection algorithm includes two parts: the drogue's rough location algorithm and the drogue's fine positioning algorithm. The rough location algorithm is used to define the potential regions in which the drogue may be located, while the drogue's fine positioning algorithm is used to find the drogue in the potential regions accurately if the drogue is in the image. Particle filter is widely used in target tracking because of its robustness [10–13]. A new particle filter algorithm based on the drogue's shape was proposed to track the drogue. In the new particle algorithm, unique principles of state transition are defined to ensure tracking robustness even when the drogue's position or size changes significantly in two adjacent frames. A switch strategy between detection and tracking was proposed to improve the algorithm's robustness, which provides the link between detection and tracking. This is critical when the tracking has failed or the drogue is not in the image.
The paper is organized as follows. Section 2 gives an introduction to previous works. Section 3 describes the drogue's detection strategy. Section 4 describes the drogue's tracking strategy. Section 5 describes the switch strategy between detection and tracking. Section 6 presents the experimental results and Section 7 concludes the main points of the research.
2. Previous Works
Machine vision methods used for autonomous aerial refuelling tasks are becoming increasingly popular [6,7,14–17]. Advantages of using machine vision methods for autonomous aerial refuelling tasks include the potential for installation without modification being required to the target aircraft and increased measurement precision.
The researchers have developed a variety of different machine vision method for the probe-drogue autonomous aerial refuelling system (shown in Figure 1). John Valasek et al. [6] developed a vision-based navigation sensor and system for autonomous aerial refuelling tasks. For application to the endgame docking problem of automated aerial refuelling of aircraft, a VisNav sensor (a position-sensing diode) was mounted on a receiver aircraft and a set of LED beacons were mounted on a drogue being trailed from a tanker aircraft. When light energy from an individual beacon on the drogue was focused on the surface of the position-sensing diode, it generated an electrical current, which was measured with four pickoff leads, one on each side. The six-degrees-of-freedom position and attitude of the sensor aircraft with respect to the drogue can be computed by the four position-sensing signals. The main disadvantage of the method proposed in [6] is that some modifications to the tanker equipment must be made to provide electrical power for beacons, since there is no such power in the hose to which the drogue is attached. Fravolini et al. [18] proposed a docking control scheme for autonomous aerial refuelling of UAVs using a probe-drogue refuelling system. The docking control scheme was based on a fuzzy sensor fusion strategy featuring GPS and machine vision data. The GPS was used to measure the relative position between the tanker and the receiver and the machine vision was used to measure the relative camera-drogue distance. Some markers were placed in the drogue to measure its position and orientation. However, GPS receivers may be affected by interference from electronic devices and GPS signals may be blocked by the tanker. Lorenzo Pollini et al. [19] placed light emitting diodes (LEDs) on the drogue and used a CCD webcam with an infra-red filter to identify the LEDs. Hager and Mjolsness's (LHM) algorithm [20] was used to determine iteratively the translation vector as well as the transformation matrix between the 3D reference systems on the object and the camera, respectively. As in [6], the main disadvantage in [19] is that some modifications to the tanker equipment are required. Carol Martinez et al. [7] proposed a vision-based strategy for autonomous aerial refuelling tasks. The proposed strategy consisted of four stages: detection, initialization, tracking and 3D position estimation. The detection stage was composed of two algorithms: one based on edge-image template matching using the normalized cross correlation (NCC) method, and the second based on image threshold segmentation. The detection method is time-consuming because the drogue images with different variations, such as scale, illumination and position, must be contained in the template images. It is impossible to contain all the conditions of the drogue in the template images, so in order to decrease the failure of detection, an experience threshold was used to segment the image to detect the inner part of the drogue when the edge-image template matching method failed. However, the experience threshold is hard to define because the illumination of the scene may change significantly. The tracking algorithm was a Hierarchical Multi-Parametric and Multi-Resolution implementation of the Inverse Compositional Image Alignment technique HMPMR-ICIA [21].

The probe-and-drogue refuelling system
3. Detection Strategy
3.1. Rough Location of the Drogue
The aim of rough location is to define the potential regions in which the drogue may be located. According to prior knowledge, the rough location stage is composed of two algorithms: this first based on image segmentation using a series of thresholds, and the second based on contour features of the image regions segmented by image segmentation using a series of thresholds.
It is impossible to define an accurate experience threshold used to segment the image to detect the inner part of the drogue because the illumination of the scene may change significantly. So a series of thresholds are used to segment the same image, as follows:
where
where ∆
Then all the contours of the output images are extracted, and the set of the contours is expressed as
where
In order to improve the speed of detection, the Multi-Resolution (MR) hierarchical structure [22] is used. The MR structure is created by repeatedly downsampling the images by a factor of two in order to create the different levels of the pyramid. The number of levels
3.2. Fine Positioning of the Drogue
3.2.1. Location of Edge Points of the Drogue's Inner Dark Part
To every contour in the set of
where (
where
In order to obtain the location of edge points of the drogue's inner dark part, some half-lines are assumed to start at the point (
where
Most of directions of the half-lines are neither horizontal nor vertical, so coordinate transformations are used to calculate the coordinates of small rectangles in half-lines whose directions are not horizontal or vertical. Figure 3 shows three coordinate systems; the coordinate system (

Location of edge points of the drogue's inner dark part

Three coordinate systems
The coordinates of the point
and the relationship between (
From (14) and (15), the relationship between (
The parameter
In order to improve the computing speed, the integral image is used to calculate the ratio of the small rectangle's two parts. The

The rectangle
where
3.2.2. Getting Rid of the Bad Edge Points Using Vector Angles
The bad edge points of the drogue's inner dark part may be detected because of image noise or the partial occlusion of the drogue. Assume that the detected edge points of the drogue's inner dark part are the red points {1, 2,…, 16} shown in Figure 5. We define the vectors as {

Vector angles of edge points
where
3.2.3. Location of the Drogue Using RANSAC
The styles of the bad edge points are diverse, so some bad edge points cannot be eliminated by vector angles. For example, the bad point 9 in Figure 5 cannot be eliminated by the vector angle because the vector angle
A threshold
where

Location of the drogue using RANSAC
4. Tracking Strategy
The tracking algorithm is based on particle filters which are sequential Monte Carlo methods [25] based on point mass. Particle filters are suitable for any non-linear system that could be represented by a state model. The tracking object is the inner dark part of the drogue in the tracking algorithm, so the centre of the inner dark part of the drogue can be defined as the state
4.1. Selection of Particles and State Transition
The disturbance of the drogue is uncertain when in the air, so it is hard to establish the accurate motion model of the drogue during air-to-air refuelling. The state

Principles of state transition.
where
where
4.2. Particle weights and posterior probability
The particle weights of state transition particles in Section 4.1 can be calculated through the following steps:
Obtain the set
Eliminate the bad edge points of every set
Calculate weights of state transition particles using the RANSAC method with the same method in Section 3.2.3, and the weight of
A threshold
Calculate the state weight at time
5. Switch Strategy between Detection and Tracking
The detection stage must be enabled automatically to detect the drogue either at the start of the run or when the drogue has gone out of the field of view of the camera, or alternatively because the tracking algorithm has failed to track the drogue. Therefore, performance assessment criteria should be defined to switch between detection algorithm and tracking algorithm. The detection and tracking algorithm is initiated with a lost status
where

The proposed visual detecting and tracking system for air-to-air refuelling
6. Experimental Results
In this section, experiments were conducted on the real drogue of air-to-air refuelling at different air scenes. The performance of our method was compared to the performance of the algorithm proposed by Carol Martinez et al. [7]. Three experiments were carried out to detect and track different drogues at different air scenes. Speed of processing and percentage of correct location are compared between Carol Martinez's method and ours. The speed indicators were the average time
6.1. At Air Scene 1
125 frames of images with 1440×900 pixel size were used in the experiment at air scene 1. The experimental data were obtained from the website http://www.youtube.com/watch?v=nWmFpLVl8MQ. Eleven edge templates as shown in Figure 9 were used to find the drogue in the lowest resolution image in the image pyramid, the number of pyramid levels was

Edge templates at air scene 1 in the method of Carol Martinez et al
Nine result frames in our method are shown in Figure 10. The magnified target is displayed in the top-right corner of the frame in Figure 10. The green circle segmented in detecting and tracking process is the inner dark part of the drogue, the white point is the green circle's centre and the red points are the state transition particles in Figure 10. The comparison of processing time of each frame between our method and the method of Carol Martinez et al. is shown in Figure 11. The processing time of each frame in our method is clearly less than the processing time in the method of Carol Martinez et al. The processing time in our method is not affected by the scale of the drogue in the image. In Figure 11, the cyan triangle is the detection time (about 16ms) in ROI in the 116th frame and the nine magenta pentagrams stand for the detection time (less than 172ms) in the whole images in our method. The green squares represent the time in which no data is output (or the algorithm thinks there is no target in the frame) in Carol Martinez's method. As shown in Figure 11, no data is output from the 32th frame to the 39th frame, though the targets in these frames are clear in Carol Martinez's method, while our method gives the drogue's positions in the entire frames. In the method of Carol Martinez et al., the processing time is affected by the size of the reference image. The detecting algorithm finds the first reference image with pixel size 213×206 in the 0th frame as shown in Figure 11 and the average tracking time corresponding to the first reference image is 1099.5

Nine result frames at air scene 1 in our method

The processing time at the air scene 1
Comparison of processing time
6.2. At Air Scene 2
One hundred and ninety frames of images with pixel size 1440×900 were used in the experiment at air scene 2. The experimental data were obtained from the website http://www.youtube.com/watch?v=cG6rMZF6mIw. The appearance of the drogue at air scene 2 was different from the appearance of the drogue at air scene 1, but both had the two important features. Eleven edge templates were used to find the drogue as shown in Figure 12 and the threshold used to segment the image was 40 in Carol Martinez's method. The parameters in our method are the same as the parameters in Section 6.1. Nine result frames are shown in Figure 13 and the processing time of each frame is shown in Figure 14 in our method. The processing time of each frame in our method is clearly less than the processing time in the method of Carol Martinez et al. as shown in Figure 14.

Edge templates at air scene 2 in the method of Carol Martinez et al

Nine result frames at air scene 2 in our method

The processing time at air scene 2
In Figure 14, the seven cyan triangles are the detection time in ROI and the 11 magenta pentagrams stand for the detection time in the whole frames in our method. The detection time represented as the black pentagrams in Figure 14 in Carol Martinez's method is obviously greater than the detection time in our method. The green squares are the time in which no data is output in Carol Martinez's method. As shown in Figure 14, no data is output from the 121th frame to the 129th frame and a wrong target is detected and tracked from the 130th frame to the 136th frame. This is probably because the drogue is partially occluded. In our method, only in the 138th frame is no danger output. The speed indicators in our method are better than the method of Carol Martinez et al., as shown in Table 1. The percentage of correct location in the method of Carol Martinez et al. is less than 20%, while in our method it is near 80% when the location error threshold is five pixels, as shown in Figure 18.
6.3. At Air Scene 3
123 frames of images with 1440×900 pixel size were used in the experiment at air scene 3. The experimental data were obtained from the website http://www.jokeroo.com/videos/cool/aerial-refueling.html.
The appearance of the drogue at air scene 3 was different from the appearance of the drogues at air scene 1 and air scene 2, but all of them had two important features. Eleven edge templates were used to find the drogue as shown in Figure 15 and the threshold used to segment the image was 50 in Carol Martinez's method. The parameters in our method are same to the parameters in Section 6.1. Nine result frames are shown in Figure 16 and the processing time of each frame in our method is shown in Figure 17. The processing time of each frame in our method is clearly less than the processing time in the method of Carol Martinez et al. as shown in Figure 17 since the HMPMR-ICIA [21] algorithm adopted by Carol Martinez is a time-consuming iterative optimization method during tracking. The speed indicators in our method are better than the method of Carol Martinez et al., as shown in Table 1. The percentage of correct location in the method of Carol Martinez et al. is less than 70%, while in our method it is more than 80% when location error threshold is five pixels, as shown in Figure 18.

Edge templates at air scene 3 in the method of Carol Martinez et al

Nine result frames at air scene 3 in our method

The processing time of each frame at air scene 3

Percentages of correct location at different location error thresholds
7. Conclusions
Detecting and tracking strategies for aerial refuelling tasks based on monocular vision were proposed. According to the drogue's prior knowledge, multi-threshold segmentation and shape-distinguishing arithmetic were used to detect the drogue. Multi-threshold segmentation decreased the rate of missed detection and the figure=distinguishing arithmetic pinpointed the drogue's position precisely. A new particle filter algorithm based on the drogue's shape was proposed to track the drogue. In the new particle algorithm, unique principles of state transition are defined to ensure tracking robustness even when the drogue's position or size change significantly in two adjacent frames. A strategy of switching between detection and tracking was proposed and the switching strategy enhanced robustness of our method. In the method, the inner dark part of the drogue was segmented precisely in the detecting and tracking process and the segmented circular part can be used to measure the spatial position of the drogue. The speed of the proposed method is fast, is less affected by the size of the drogue in the image and it is highly accurate. In the future, we will try to use the segmented circular part to measure the position of the drogue in Cartesian space based on monocular vision by the method of combing the 3D model of the drogue and the imaging principle.
Footnotes
8. Acknowledgments
This work is partly supported by National Natural Science Foundation of China under Grant 61227804 and 61105036.
