Sage Journals: Discover world-class research

Abstract

In this article, a tree search algorithm is proposed to find the near optimal conflict avoidance solutions for unmanned aerial vehicles. In the dynamic environment, the unmodeled elements, such as wind, would make UAVs deviate from nominal traces. It brings about difficulties for conflict detection and resolution. The back propagation neural networks are utilized to approximate the unmodeled dynamics of the environment. To satisfy the online planning requirement, the search length of the tree search algorithm would be limited. Therefore, the algorithm may not be able to reach the goal states in search process. The midterm reward function for assessing each node is devised, with consideration given to two factors, namely, the safe separation requirement and the mission of each unmanned aerial vehicle. The simulation examples and the comparisons with previous approaches are provided to illustrate the smooth and convincing behaviours of the proposed algorithm.

Keywords

Conflict resolution tree search neural networks

Introduction

The applications of unmanned aerial vehicles (UAVs) in military and civilian fields have achieved great success in recent years. UAVs have advantages over manned planes, such as low manufacturing and operational costs, flexibility in accommodating different payloads, risk reduction of human lives (no pilot or crew), and so on. The advantages and prosperous experiences have spurred great number of communities and organizations to research on UAVs. However, most UAVs are only allowed to fly in segregated airspace nowadays, which has largely restricted the application of UAVs because the segregated airspaces are often far away from populated area. Therefore, the most fundamental problem in UAVs application promotion is airspace integration.¹ The UAVs air traffic management (ATM) is a necessity when UAVs are permitted to fly into nonsegregated airspace. In order to meet with the “do not harm” criterion, the first and foremost duty of ATM is to guarantee the airspace safety. Naturally, the airspace conflict detection and resolution (CDR) problem is a primary concern.²

The CDR on manned plane considers less on optimizing the trajectories of airplanes. Unnecessary fuel is wasted, and atmospheric pollution increases because of this inefficient mechanism. Studies have already been processing on improving the efficiency of manned CDR.^3
–5 However, there are differences between UAVs and manned planes: there is no pilot in the cabins; the sizes of civil UAVs are much smaller. Furthermore, most UAVs cannot load as many devices as manned planes, such as Traffic Collision Avoidance System II.

The objective of conflict resolution is to guide conflict-related UAVs return to predefined paths efficiently. Kinds of conflict resolution methods are proposed in the literature, such as trajectories planning methods, reactive methods. The safe separation distances between UAVs are anticipated to be smaller than that of manned planes because of their characteristics, such as platform sizes and maneuverability. As a result, the dynamic environment, such as wind, may have more obvious effluences on the states of UAVs. The long-term trajectories planning methods are time-consuming and inefficient to deal with uncertainties of environment. The reactive methods are lack of the ability to consider the objectives of CDR comprehensively, which are to keep the safe separation between UAVs, and to minimize the deviation from nominal paths. In this article, we study on the tree search-based CDR algorithm. We consider the circumstance that the ATM system keeps on communicating with UAVs. It guides UAVs to take cooperative conflict-free maneuvers when they are predicted to lose separation. There are three kinds of strategies to resolve airspace conflicts, namely heading control, speed control, and attitude control. UAVs usually fly on track laterally and to maintain a particular airspeed or Mach number when they are en route. This article researches on conflict resolution by heading control.

The tree search algorithm is widely used in AI field. It is a technique for solving practical problems in combinatorial optimization. In the conflict resolution problem, a decision is very difficult to assess before having a complete solution, therefore requiring the simulation of the process to probe into the quality of the decision. However, the uncertainty dynamics would disturb the simulation. As it is difficult to construct accurate mathematical models for the uncertainty of environment and system, various kinds of techniques are proposed to approximate the plants, such as the fuzzy logic systems (FLS) and neural networks (NNs)^6,7 Fang Wang et al. propose to approximate the continuous uncertainty function by FLS.⁶ Yan-Jun Liu et al.’s study on the optimization control problem for discrete-time system.⁸ Meanwhile, NNs are also widely applied to approximate the unmodeled dynamics. Shitie Zhao et al. propose to approximate the unknown nonlinear function by affine-type NN.⁹ Radial basis functions NNs are applied to approximate the continuous function^7,10 Since wind varies in dynamic environment. In order to meet with the online approximation requirement, the back propagation NNs (BPNNs) are utilized to approximate the wind effect.¹¹ The weights of the NNs are updated according to the change of environment.

To meet with the online planning requirement and adapt to the change of the environment, we propose to search conflict-free maneuvers in a limited time interval (look-ahead interval) P_N . Such a method requires that the optimization task is solved periodically to ensure that the moving look-ahead interval is always placed in the future. The decentralized conflict resolution method is efficient in searching the near optimal de-conflict policy. The tree search algorithm may not be able to expand to the goal states in each online search process as P_N is limit. The midterm reward function is designed to evaluate the intermediate nodes of the tree. Several features are considered in midterm reward function, namely safe separation constraint and flight efficiency (priority, maneuver, and costs).¹²

The remainder of this article is organized as follows. Section “Related works” scrutinizes related works; the tree search algorithm is presented in section “Tree search conflict resolution algorithm”. In section “Conflict detection and resolution analysis”, we study on the conflict resolution problem and devise the midterm reward function. The multi-UAV conflict resolution problem is studied and the tree search midterm reward function is proposed in section “Multi-UAV conflict resolution”. The performance of our tree search algorithm is demonstrated by comparing it with existing algorithms in section “Simulation experiments.” Conclusions on our works are presented in “Conclusions” section.

Related works

The literature that deals with the CDR problem is rapidly growing in volume.¹³ Various methods have been proposed. Trajectories optimization methods are centralized methods. They aim to find a series that guarantee conflict-free maneuvers between UAVs with minimum overall consumption, such as genetic algorithm-based method^14,15 and centralized mixed integer quadratic programming (MIQP).^16,17 However, these methods are time-consuming, even the linearized methods.^18,19 Distributed algorithms are therefore proposed. The fast reactive local motion planning method is applied in many scenarios. It is efficient in keeping aerial vehicles apart. It is typically applied to respond to unexpected events or errors in the environment model.²⁰ However, it neglects minimizing the fuel consumption and time delay in searching conflict-free solutions. Some researchers consider taking the returning to the nominal paths into reactive methods, such as the navigation function-based method,²¹ collision cone-based algorithm,²² potential field theory-based algorithm,²³ and velocity obstacles collision avoidance algorithms.^24,25 There are some other distributed algorithms that are proposed to deal with airspace conflict resolution problem. Santosh Devasia et al. bring forth the concept of conflict-resolution procedures. This is a very brilliant idea in ATM.⁵ David Šišlák et al. propose a distributed agent-based cooperative collision avoidance algorithm.³ They suppose that each aircraft knows part of the future intentions of neighboring planes during CDR process. Planes would plan conflict-free trajectories cooperatively if conflict is detected. In summary, the planning algorithms and distributed algorithms have been researched for many years and perform well in many CDR problems.

In order to improve the airspace efficiency, the safe separation distance between UAVs should be smaller. In addition to that, the dynamic factors, such as wind, may have effect on UAVs states easily because UAVs are much lighter. The CDR problem would be intractable when UAVs are flying in congested and dynamic environment. The development of model predictive control (MPC) has spurred its application on multi-agent navigation problem. Martin Saska et al. propose to coordinate and control heterogeneous vehicles in virtue of MPC.²⁶ The safe separation is guaranteed in trajectory following process. The MPC pattern is of advantage in dealing with the navigation problems in dynamic environment.²⁷ Some of the MPC problems are solved by standard mathematical programming methods while some MPC problems are solved by tree search method.^28,29 As the unmodeled dynamics are difficult to describe in an analytical form, we propose to approximate the unmodeled dynamics by NNs.³⁰ In this article, the cooperative conflict resolution maneuvers are obtained by heuristic tree search method.

Tree search conflict resolution algorithm

UAV model

The tree search algorithm is based on the model of UAVs. The point mass aircraft model described in the study by Menon et al.³¹ is commonly accepted to represent dynamical effects in civil aviation. Due to the complexity, there is a problem that searching safe separation maneuvers of several interfering aircrafts with such a model would require too much computational effort. When UAVs are flying in a constant attitude, a good approximation of aircraft point mass model dynamics is used in the study by Bicchi and Pallotino.³² The state of A_i at time t is represented by vector

S_{i} (t) = (x_{i} (t), y_{i} (t), z_{i} (t), θ_{i} (t), ϕ (t), v_{i} (t), w_{i} (t), μ_{i} (t))

where $θ_{i} (t)$ is horizontal heading angle and $ϕ_{i} (t)$ is inclination angle, and $v_{i} (t)$ is the horizontal velocity. The dynamics of the system is as, $i \in N, t \in [0,T]$

\begin{array}{l} {\dot{s}}_{i} (t) = {[\begin{matrix} v_{i} (t) cos θ_{i} (t) cos ϕ_{i} (t) \\ v_{i} (t) sin θ_{i} (t) cos ϕ_{i} (t) \\ v_{i} (t) sin ϕ_{i} (t) \end{matrix}]}^{T} \\ {\dot{θ}}_{i} (t) = w_{i} (t) \\ {\dot{ϕ}}_{i} (t) = μ_{i} (t) \end{array}

where function $t \mapsto w_{i} (t)$ , $t \mapsto v_{i} (t)$ , and $t \mapsto μ_{i} (t)$ are the control variable of the dynamic system. In this article, we assume that $v_{i} (t)$ keeps constant during the maneuver process and A _i flies at constant altitude when it is processing task ( $ϕ_{i} (t) \approx 0$ ). Therefore, conflicts are resolved by heading control. $w_{i} (t)$ has the dimension of an angular speed and is called yaw rate, which with the bound

| w_{i} (t) | \leq \frac{v_{i}}{R_{min, i}}

where $R_{min, i}$ is the minimum turn radius of A _i. The minimum turn radius is

R_{min, i} = \frac{v_{i}^{2} {cos}^{2} γ_{max}}{n g sin | ϕ_{max} |}

where ϕ_max is the max roll angular and γ_max is the max pitch angular, n and g are constants. $w_{i} (t)$ is fixed in $[0, Δ T]$ . The fly paths of A _i in time period $[0, Δ T]$ is approximated as a segment of circular arc when it takes heading maneuver, which is shown in Figure 1.

Figure 1.

UAV motion modification. UAV: unmanned aerial vehicle.

We study on the CDR problem in two-dimensional (2-D) planar space. The safe region is a circle with safe radius r_i in 2-D space, and the state of A _i in 2-D space is $s_{i} = (x_{i}, y_{i}, v_{i}, θ_{i}, w_{i})$ .

Approximate the effect of wind by NNs

The UAV model in “UAV model” section is a simplified linear model. There are bounded high-order nonlinear effects in UAVs system. In addition to that, there are external disturbances from the environment, such as wind. This would inevitably lead to unmodeled dynamics because of mismatching between the navigation design model and the actual plant. We define the unmodeled dynamics parts $h (x [n], u [n])$ . There are two kinds of methods to approximate the unmodeled dynamic, namely parametric form and nonparametric form.³³ As the structure of unmodeled dynamics in the airspace is nontransparent, the parametric model is inappropriate. In this research, a nonparametric BPNN is adopted to approximate the unmodeled dynamics on UAVs model.

The BPNN is trained by leaning the effect of unmodeled dynamics on UAVs. In this article, the effect of the unmodeled dynamics on UAVs is expressed by error between the predicted state and real state

e_{k +1}^{i} = s_{k + 1}^{i} - {\hat{s}}_{k + 1}^{i}

In the system, the main factor in the unmodeled dynamics is the wind effect. According to the dynamics of UAVs, the effect of wind on each UAV is relevant to the variables $[v_{i}, φ_{i}, θ_{i}, ϕ_{i}, p_{i}, q_{i}, r_{i}]$ of A _i, which are the speed, the gesture, and the change rate of the gesture of A _i. In the NN, the training input data is³⁴

X_{i} = [v_{i}, φ_{i}, θ_{i}, ϕ_{i}, p_{i}, q_{i}, r_{i}, u_{i}]

where u_i is the heading maneuver. The state prediction error $e_{k + 1}^{i}$ is the training output of the designed BPNN. The unmodeled dynamics is approximated as below

{\tilde{e}}_{k +1}^{i} = f_{b} (X_{i})

The detail of BPNN can refer to the study by Haykin.³⁵ The real state of UAV A _i is estimated by equation

{\tilde{s}}_{k + 1}^{i} = {\hat{s}}_{k + 1}^{i} + {\tilde{e}}_{k + 1}^{i}

Model-based tree search in conflict resolution

Primary concept

The tree search conflict resolution algorithm computes a motion series, which is a continuous function from time variable into state space, $[0, t_{max}] - > S_{i}$ . The tree search conflict resolution algorithm respects several constraints, namely, safe separation constraint and dynamic constraints of UAVs.

Cooperative tree search motion planning algorithm plans tracks for multiple vehicles. For N vehicles, the composite configuration space $S = S_{1} \times … \times S_{N}$ . The conflict resolution task for N UAVs is formalized as follows: compute a continuous trajectory $[0, t_{max}] - > S = S_{1} \times … \times S_{N}$ which:

starting at the initial states of vehicles,

avoiding loss separation of N cooperative UAVs,

minimizing additional fuel consumptions and reducing deviations from nominal paths,

respecting the kinematic and dynamic constraints of UAVs.

Several problems should be discussed when applying the tree search method in UAV CDR.

A. Action

In the tree search algorithm, actions are candidate safe separation maneuvers. In this article, the conflict resolution is solved by headings control. The control variable of A _i is angular velocity w_i. The action in multi-UAV conflict resolution is $u = (w_{1}, w_{2}, …, w_{N})$ , where N is the number of UAVs that involving with local conflict. To reduce the computation load, the action space is discrete. UAVs have several preset options for heading maneuver, e.g. (−10°, 0°, 10°).

B. budget of expansion

The tree search algorithm should be defined with budget of expansion because of the following reasons. Firstly, the computation time and memory space is limited, it is inappropriate to expand the tree to the goal state. Secondly, the NN cannot give a perfect precise estimation on the effect of unmodeled dynamics, the prediction errors would accumulate along the expanded nodes.

C. Middle-term rewards

A reward function maps perceived states (or state–action pairs) of the environment to a single dimensionless number. A reward value indicates the intrinsic desirability of the state. The sole objective of the tree search algorithm is to maximize the total reward that it receives in the long run.

As the budget of expansion is limit, the tree search algorithm may not be able to expand to the goal states in the search process. Therefore, this article proposes to criticize nodes by midterm rewards. The algorithm expands nodes based on the midterm rewards.

The reward function $v (.; ϑ) : T_{n o d e} \to [0, 1]$ is defined to transfer the state of each leaf node to a scalar value in the range $[0, 1]$ , where $ϑ \in R^{d}$ is a parameter vector. The parameters encode important aspects, such as returning to the nominal paths and safe separation. We take, as described in the study by Omer et al. and Jung et al.,^19,36 the parameterized node scoring function to be a weighted sum of features extracted from the information encoded in the path along current nodes and the nodes in question. We consider two features. The first feature corresponds to the relative relationship between UAVs. The second feature reflects the distance to the defined goal states in the nominal paths. The heuristic midterm reward function is discussed in the following part.

Tree search algorithm

The state of UAV A _i is estimated by equation (7). As it is discussed above, the middle-term reward of each node is obtained by the reward function. For each terminal node $n_{l, h} \in T_{l e a f}$ , where h denotes the layer of this node from root node, l denotes the sequence of this node in layer h. We define the value of node $n_{l, h}$ as $V (n_{l, h})$ . $V (n_{l, h})$ is the summation of two parts. The first part is the discount sum of midterm rewards that are obtained along the path from root node to $n_{l, h}$ . The second part is an upper bound on the accumulated rewards not yet observed³⁶

\forall n_{l, h} \in T_{n o d e :} : V (n_{l, h}) : = \sum_{t = 1}^{h} v (n_{i p, t}) γ^{t - 1} + \sum_{t = h + 1}^{\infty} \bar{v} γ^{t - 1} = R_{l, h} + \frac{\bar{v}}{1 - γ}

where $\bar{v}$ is the upper bound of the reward of each unexpanded node. We define the reward value of each node in the range[0,1] therefore, $\bar{v} = 1$ . After a node $n_{l^{'}, k}$ is expanded, the values of nodes in the tree would be updated by backing up method. The backing up method is the same as in the max-backups. In reference from the value iteration policy, for each nonterminal expanded node $n_{l^{'}, k} \in T_{l e a f}$ , we define the value of $n_{l, h}$ as

V (n_{l, h}) = \sum_{t = 1}^{h} v (n_{i p, t}) γ^{t - 1} + \sum_{t = h + 1}^{k} max v (n_{i p, t}) γ^{t - 1} + \frac{γ^{k} \bar{v}}{1 - γ}

Equation (9) denotes that $V (n_{l, h})$ is the summation of three parts. The first part is the discount midterm rewards summation from root node to $n_{l, h}$ . The second part is the maximum discount midterm reward summation of the expanded nodes from $n_{l, h}$ to $n_{l^{'}, k}$ . The third part is the upper bound on the accumulated rewards not yet observed.

Then we consider how to expand the tree such that we could arrive at a near-optimal decision within allowed computational budget. We apply the best first tree search method. The best first search method develops trees non-uniformly. It expands nodes to a deeper depth that look ‘promising’. In order to do this, the algorithm chooses to expand node with the highest value $V (n_{l, h})$ . The algorithm records the estimated values of all the expanded nodes. After updating is completed in k step, the algorithm would choose the node that with the maximum value from unexpanded nodes and begin the k + 1th search step. The searching process would stop when the budget of expansion is reached. The maneuver series that lead to the highest value would be chosen as the conflict resolution maneuvers.

Fiorini et al. show theoretically that non-uniform trees developed by the score will never perform worse than uniform trees for the same budget of expansion.³⁷ The performance of non-uniform tree search method is problem specific. In conflict resolution problem, the midterm reward value denotes safe separation relationship between UAVs and the effect of maneuvers on returning to goal states. The efficiency of the tree search algorithm is largely determined by the structure of the midterm reward function.

CDR analysis

UAV CDR problem analysis

The safe region of UAV A _i is defined as $D_{i} (r)$

D_{i} (r) = {p | | | p_{x} - x_{i}, p_{y} - y_{i} | | < r_{i}, ‖ p_{y} - z_{i} ‖ < s_{i}^{z}}

where $(x_{i}, y_{i}, z_{i})$ is the position of A _i, r_i is the horizontal safe radius, and $s_{i}^{z}$ is the minimum altitude separation distance. $r_{i}$ and $s_{i}^{z}$ are determined by the dynamic characteristic of A _i, such as the platform size, maneuverability, and interoperability level.

In the future, UAVs would be connected with ATM by communication devices, such as the Automatic Dependent Surveillance-Broadcast system. The local ATM system is able to obtain the real-time states of UAVs. We define the alert distance as ${\hat{r}}_{a}$ . The ATM system would keep on strict surveillance on UAVs if the distance between them is less than ${\hat{r}}_{a}$ . The tree search algorithm would be applied to find conflict resolution maneuvers when UAVs are predicted to loss separation. In this article, we study on resolving airspace conflict by horizontal heading maneuver. Supposing that the angular speed of A _i is $w_{i}$ in $[t_{k}, t_{k + 1}]$ and A _i flies in the 2-D planar space, the motion differential is

\begin{array}{l} Δ x_{i} = cos (θ_{i} - w_{i} τ) v_{i} d τ \\ Δ y_{i} = sin (θ_{i} - w_{i} τ) v_{i} d τ \end{array}

Supposing that the conflict resolution process persists on M periods, the transition points can be obtained by the following equation, $k \in [0, M - 1]$ .

x_{i} (k + 1) = {\begin{matrix} x_{i} (k) - \frac{v_{i}}{w_{i}} (sin (θ_{i} (k) - w_{i} (k + 1) • Δ T) - sin (θ_{i} (k))) if w_{i} (k + 1) \neq 0 \\ x_{i} (k) + v_{i} cos (θ_{i} (k)) • Δ T i f w_{i} (k + 1) = 0 \end{matrix}

y_{i} (k + 1) = {\begin{matrix} y_{i} (k) + \frac{v_{i}}{w_{i}} (cos (θ_{i} (k) - w_{i} (k + 1) • Δ T) - cos (θ_{i} (k))) if w_{i} (k + 1) \neq 0 \\ y_{i} (k) + v_{i} sin (θ_{i} (k)) • Δ T i f w_{i} (k + 1) = 0 \end{matrix}

θ_{i} (k + 1) = θ_{i} (k) + w_{i} (k + 1), k \in {0, …, N}, i \in {1, …, n}

CDR midterm reward

Horizontal maneuver discussion

The relationship between A _i and A _j is studied in local frame. We define $P_{i j}$ as

P_{i j} = (x_{i} (t) - x_{j} (t), y_{i} (t) - y_{j} (t))

There is potential conflict hazard between UAVs if the following condition is satisfied

{‖ P_{i j}^{x} (t), P_{i j}^{y} (t) ‖}_{2} < {\tilde{r}}_{i j}, t \in (0, τ Δ T) (\forall i, j \in {1, …, N} : i \neq j)

where ${\tilde{r}}_{i j} = max {r_{i}, r_{j}}$ , τΔT is the look-ahead time interval. The management system would devise conflict resolution maneuvers for UAVs if they are predicted to loss separation. In the local frame, the velocity of A _j relative to A _i is

{\vec{v}}_{j i}^{} = {\vec{v}}_{j}^{} - {\vec{v}}_{i}^{} = ({\dot{x}}_{j} - {\dot{x}}_{i}, {\dot{y}}_{j} - {\dot{y}}_{i})

The differential unit position of A _j relative to A _i is

Δ \tilde{p} = ((cos (θ_{j} - w_{j} τ) v_{j} - cos (θ_{i} - w_{i} τ) v_{i}) d τ, (sin (θ_{j} - w_{j} τ) v_{j} - sin (θ_{i} - w_{i} τ) v_{i}) d τ)

The displacement of A _j relative to A _j in $[t_{k}, t_{k + 1}]$ is

\tilde{x} = {\begin{matrix} \begin{matrix} \begin{array}{l} - \frac{v_{i}}{w_{i}} (sin (θ_{i} (k) - w_{i} (k + 1) • Δ T) - sin (θ_{i} (k))) \\ + \frac{v_{j}}{w_{j}} (sin (θ_{j} (k) - w_{j} (k + 1) • Δ T) - sin (θ_{j} (k))) if w_{i} (k + 1) \neq 0 w_{j} (k + 1) \neq 0 \end{array} \\ - \frac{v_{i}}{w_{i}} (sin (θ_{i} (k) - w_{i} (k + 1) • Δ T) - sin (θ_{i} (k))) - v_{i} cos (θ_{i} (k)) • Δ T if w_{i} (k + 1) \neq 0 w_{j} (k + 1) = 0 \end{matrix} \\ \begin{matrix} v_{i} cos (θ_{i} (k)) • Δ T + \frac{v_{j}}{w_{j}} (sin (θ_{j} (k) - w_{j} (k + 1) • Δ T) - sin (θ_{j} (k))) if w_{i} (k + 1) = 0 w_{j} (k + 1) \neq 0 \\ v_{i} cos (θ_{i} (k)) • Δ T - v_{j} cos (θ_{j} (k)) • Δ T if w_{i} (k + 1) = 0 w_{j} (k + 1) = 0 \end{matrix} \end{matrix}

\tilde{y} = {\begin{matrix} \begin{matrix} \begin{array}{l} \frac{v_{i}}{w_{i}} (cos (θ_{i} (k) - w_{i} (k + 1) • Δ T) - cos (θ_{i} (k))) \\ - \frac{v_{j}}{w_{j}} (cos (θ_{j} (k) - w_{j} (k + 1) • Δ T) - cos (θ_{j} (k))) if w_{i} (k + 1) \neq 0 w_{j} (k + 1) \neq 0 \end{array} \\ \frac{v_{i}}{w_{i}} (cos (θ_{i} (k) - w_{i} (k + 1) • Δ T) - cos (θ_{i} (k))) - v_{i} sin (θ_{i} (k)) • Δ T if w_{i} (k + 1) \neq 0 w_{j} (k + 1) = 0 \end{matrix} \\ \begin{matrix} v_{i} sin (θ_{i} (k)) • Δ T - \frac{v_{j}}{w_{j}} (cos (θ_{j} (k) - w_{j} (k + 1) • Δ T) - cos (θ_{j} (k))) if w_{i} (k + 1) = 0 w_{j} (k + 1) \neq 0 \\ v_{i} sin (θ_{i} (k)) • Δ T - v_{j} sin (θ_{j} (k)) • Δ T if w_{i} (k + 1) = 0 w_{j} (k + 1) = 0 \end{matrix} \end{matrix}

Conflict resolution midterm reward

One of the most important issues in tree search algorithm is to design a reasonable midterm reward function. In the tree search algorithm, airspace conflicts are expected to be resolved by sequential (multistep) maneuvers, which would reduce the deviations of UAVs from nominal paths.

To avoid the potential conflict between A _i and A _j, the cooperative conflict avoidance maneuver policy $(w_{i}, w_{j})$ should satisfy the following conditions:

UAVs should keep separation from each other during each maneuvers interval

t \in (t_{k}, t_{k} + Δ T), k \in 0, M - 1, i . j \in N

d_{t} (A_{i} A_{j}) = \sqrt{{(\tilde{x} (t) - x_{i} (t))}^{2} + {(\tilde{y} (t) - y_{i} (t))}^{2}} > {\tilde{r}}_{i j}

UAVs should keep the safe separation after heading maneuver.

To minimize the fuel cost, the distance between two UAVs may be close to ${\tilde{r}}_{i j}$ . As the flying trajectories are not straight lines during the maneuver interval, two UAVs may loss separation during this period, which is shown in Figure 2.

Figure 2.

The separation distance might be respected at t _k and t _k+1 while not being respected inside interval [t _k, t _k+1].

Constraint (a) means that two UAVs are required to keep the safe separation during the maneuver time period. It is assumed that the distance between UAVs at t_k is larger than the relative safe distance. To guarantee the safe separation of two UAVs in $(t_{k}, t_{k + 1})$ , condition (18) should be satisfied. In order to reduce the computation cost on the safe separation checking, the problem is changed to ascertain whether there is a solution for $G (t)$ in $(t_{k}, t_{k + 1})$

G (t) = {(\tilde{x} (t) - x_{i} (t))}^{2} + {(\tilde{y} (t) - y_{i} (t))}^{2} - {\tilde{r}}^{2}

$(w_{i}, w_{j})$ could guarantee the safe separation between A _i and A _j in $(t_{k}, t_{k + 1})$ if there is no solution for $G (t)$ in $(t_{k}, t_{k + 1})$ . The Newton downhill method is applied to check whether there is $t \in (t_{k}, t_{k + 1})$ that satisfy $G (t) = 0$ . The algorithm is described in Figure 3.

Figure 3.

Newton downhill-based loss separation checking algorithm.

where ε is a sufficient small positive constant, $L_{s}^{i j}$ is a Boolean variable, which is used to denote whether two UAVs would lose separation. As $Δ T$ is limited, the function $G (t)$ is convex in $(t_{k}, t_{k + 1})$ . The Newton downhill is efficient to check whether there is any solution for $G (t)$ in time period $(t_{k}, t_{k + 1})$ .

Constraint (b) combines the objective of safe separation and returning to nominal paths. The reward of each maneuver pair is approximated by the states of UAVs. For example, A _i and A _j take the heading maneuver $(w_{i} (k + 1) Δ T / 2, w_{j} (k + 1) Δ T / 2)$ in time period $(t_{k}, t_{k + 1})$ respectively, which is shown in Figure 4. A _j would get to $E (x^{'}, y^{'})$ at $t_{k + 1}$ . The new heading of A _j in the local frame is:

θ = \arctan ({\tilde{v}}_{j y} (k + 1) / {\tilde{v}}_{j x} (k + 1))

Figure 4.

Action prediction and value evaluation.

Let $d_{min}^{i j}$ denote the minimum distance between A _i and A _j, and t_r denote the estimated time that two UAVs would lose separation, as shown in Figure 4.

t_{r} = {‖ E (x', y') - P_{i} ‖}_{2} / {‖ {\tilde{v}}_{j} ‖}_{2}

From the perspective of safety, larger $d_{min}^{i j}$ and $t_{r}$ can guarantee safer situation. However, larger deviations from the nominal paths would lead to more fuel consumption and may have more serious influences on other air traffics from the perspective of efficiency. As the ultimate objective of conflict resolution is to return to the nominal paths and to minimize the influence to the air traffic (impacts on other aerial vehicles), we propose to find a balance between safety and efficiency. Therefore, we should design the midterm reward function $v_{i j} (x (k), u (k))$ .

We propose that $v_{i j} (x (k), u (k))$ is composed of the punishment of conflict dangers $v_{c o n f l i c t}^{i j} (x (k), u (k))$ and the reward of returning to temporary goals $v_{Δ g}^{h} (x (k), u (k))$ . We first study on expressing the safe separation requirement in $v_{i j} (x (k), u (k))$ . We abstract several variables from the states of two UAVs. We define θ′ to be the degree of space conflict between two UAVs, as shown in Figure 4. t_t is the time required to get rid of the loss separation dangers,³⁶ $t_{t} = θ' / (2 \times w_{max})$ . We define a maneuvering room variable t_c

t_{c} = \frac{t_{r}}{t_{t}}

The value of t_c would be high if $d_{min}^{i j}$ is large and θ′ is low. The value of t_c would decrease if two UAVs are approaching each other and θ′ is becoming larger. We define the conflict avoidance punishment function of t_c. Since there is a positive correlation between the safe separation punishment and t_c, we propose to combine the exponential function and inverse proportional function in the function, which guarantees the punishment to be high enough to avoid loss separation when the maneuvering room is limited and the punishment would not decrease rapidly when the distance between UAVs becomes slightly large,^38,39 the function is defined as

v_{c o n f l i c t}^{i j} (x (k), u (k)) = {\begin{matrix} 0 d_{min} > = {\tilde{r}}_{i j} \\ e^{- t_{c} / n_{c}} + ς_{r} / (t_{c} / t_{σ} + 1) e l s e \end{matrix}

where n_c, ς_r, and t_σ are constant coefficients.

We then discuss on evaluating the effect of heading maneuver $w_{i} (k + 1)$ and $w_{j} (k + 1)$ on returning to the goal states of A _i and A _j, respectively. Supposing that the original position of A _i at time point t_k is $P_{i}^{k}$ , the next position is predicted as ${\tilde{P}}_{i, w (k)}^{k + 1}$ when its angular rate is $w (k)$ in time period $(t_{k}, t_{k + 1})$ . $T_{i}^{g}$ is the rough estimated time to reach the goal position $P_{i}^{g o a k}$ from $P_{i}^{k}$

T_{i, t_{k}}^{g} = \frac{‖ P_{i}^{k} - P_{i}^{g o a k} ‖}{‖ v_{i} ‖}

Δ t_{g}^{i} is defined as

Δ t_{g}^{i} = T_{i, t_{k + 1}}^{g} - T_{i, t_{k}}^{g}

Function $v_{g o a l}^{i} (x_{i} (k), u_{i} (k))$ is defined to evaluate the effect of angular speed $w_{i} (k)$ on returning to its goal position

v_{g o a l}^{i} (x_{i} (k), u_{i} (k)) = ξ {(e^{| Δ t_{g}^{i} |} / α)}^{σ}

where α and σ are constant coefficients, ξ is an indicative function

ξ = {\begin{matrix} 1 & i f & Δ T^{g} > 0 \\ - 1 & i f & Δ T^{g} < 0 \end{matrix}

Function $v_{Δ g}^{h} (x (k), u (k))$ is defined to evaluate the effect of maneuver action u on reaching goals of two conflict-related UAVs

v_{Δ g}^{h} (x (k), u (k)) = (v_{g o a l}^{i} ρ_{i} + v_{g o a l}^{j} ρ_{j}) / (2 (ρ_{i} + ρ_{j}) v_{g o a l}^{max}) + 0.5

where $v_{g o a l}^{max} = ξ {(e^{Δ T} / α)}^{σ}$ . As different UAVs are with different priorities, ρ_i is defined as the priority weight. The value range of $v_{Δ g}^{h}$ is guaranteed to be in range [0,1].

Therefore, the midterm reward function $v_{i j} (x (k), u (k))$ is

v_{i j} (x (k), u (k)) = {\begin{matrix} 0 & i f & L_{s}^{i j} = 1 \\ (v_{Δ g}^{h} (x (k), u (k)) - v_{c o n f l i c t} (x (k), u (k)) + 1) / 2 & else \end{matrix}

$L_{s}^{i j}$ equals to 1 means that $(w_{i} (k + 1), w_{j} (k + 1))$ would lead to loss separation during the maneuver time period (t_k,t ^k + ΔT). In order to prevent the tree search algorithm to expand nodes that result in conflict, we define that $v_{i j} (x (k), u (k)) = 0$ when $L_{s} = 1$ . $L_{s}^{i j}$ equals to 0 when two UAVs are predicted to keep the safe separation in $(t_{k}, t_{k + 1})$ . As we stated above, the value range of $v_{i j} (x (k), u (k))$ is [0,1].

Multi-UAV conflict resolution

We have discussed the dynamics of UAVs and proposed the midterm reward function for pair-wise conflict resolution problem. There may be scenarios that more than two UAVs are involving in local conflict in multi-UAV environment.

The multi-UAV conflict is regarded as a series of pair-wise conflicts. These pair-wise conflicts are coupled. The tree search algorithm considers all these relevant pair-wise conflicts. In this article, we propose to analyze multi-UAV conflict problem by graph. As it is shown in Figure 5, the conflict relations of conflict-relevant UAVs are expressed by constraint graph $G (t) = (V,E (t))$ , where V={1,…, N} is the vertex set and $E (t) = {(j, i) | c_{ij} = 1}$ is the corresponding edge set. In our research, the structure of the constraint graph is state dependent. The multi-UAV conflict matrix (CM) is derived by the adjacency matrix of G(t). CM = { $c m (i, j) = 1$ if $e_{i j} (t) \in E(t)$ , cm(i,j) = 0 other wise| $i, j \in N$ }.

Figure 5.

(a) A configuration with four robots. (b) The constrained graph generated based on the states.

UAVs that involve in one specific conflict construct a connected graph. Therefore, UAVs in the local airspace can be grouped into disconnected sub clusters.

To deal with the conflict resolution problem for UAVs in each sub cluster, we define the multi-UAV conflict resolution midterm reward. When the algorithm expands a leave node $n_{i p, t}$ by action u, the midterm reward is the weighted summation of the midterm rewards of each pair-wise conflict. The midterm reward function is defined as:

{\begin{matrix} \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} v_{i j} & i f & L_{s} = 0 \\ 0 & e l s e \end{matrix}

L _s = 1 if the action set u lead to loss separation between any pair of UAVs. The tree search algorithm would not expand nodes $n_{i p, t}$ if $v (n_{i p, t}) = 0$ .

Simulation experiments

To demonstrate the tree search conflict resolution algorithm, we compare the tree search algorithm with other algorithms. The experiments are based on the Matlab/Simulink platform. We suppose that Pioneer UAVs fly on their predefined courses. The simulated air traffic controller would send navigation commands to UAVs if these UAVs are predicted to meet with conflict. The navigation command includes three-dimensional velocity and future positions. The autopilots on these UAVs would control four units based on the navigation commands, namely pitch, roll, yaw angular and throttle, to control UAV platforms.

Algorithm efficiency analysis

In the first simulation experiment, we devise four UAVs airspace conflict scenery. UAVs fly at the attitude of 60 m flatly with the speed of 60 m/s. We set the safe radius of UAVs to be 250 m based on the following reasons: Pioneers are substantially smaller and lighter than commercial planes; the effects of weak vortex for pioneers are more slender than commercial planes; pioneers are permitted to take large amplitude maneuvers to keep the safe separation as there are not human beings in the cabin. To guarantee the safety of UAV platforms, the maximum angular velocity is restricted at 0.2 rad/s. The alert distance ${\hat{r}}_{a}$ is 1500 m. The priorities of UAVs are set to be equal.

In the experiment, wind is regarded as dynamic disturbances. Wind effect would result in deviations of UAVs from their predefined waypoints, which may cause conflict dangers.

As shown in Figure 6, four UAVs are planned to pass the dark zone P at sequence {U₃, U₄, U₁, U₂}. The minimum distance between UAVs is 252 m if they fly in accordance with their plans. The transverse wind would change the ground speed of UAVs. Therefore, four UAVs would arrive at P zone almost simultaneously, which leads to loss separation dangers.

Figure 6.

Four UAVs predefined flight plan. UAV: unmanned aerial vehicles.

In this simulation experiment, we compare the tree search CDR algorithm with other algorithms. As the time constraint for online conflict resolution and the effect of unmodeled dynamitic would be accumulate step by step, we would not compare our algorithm with trajectory planning algorithms. We compare our algorithm with reactive algorithms. Steven Roelofsen et al. propose to avoid the conflict between UAVs by navigation function-based method.²¹ In the rule-based algorithm, UAVs would abide by the same collision avoidance rules to keep the safe separation.⁴⁰ The BPNN online learning method is applied in the tree search algorithm to estimate the effect of unmodeled dynamics. The conflict resolution results are shown as Figure 7. Figure 7(a) and (b) depict the conflict-free traces by the rule-based algorithm; Figure 7(c) and (d) depict the conflict-free traces by the navigation function-based algorithm. Figure 7(e) and (f) show the conflict-free traces by cooperative tree search algorithm. In the rule-based algorithm, each UAV would take half of the responsibility in each pair-wise conflict, which leads to additional fuel consumption. In the navigation function, UAVs would take large detours to avoid other UAVs. In our algorithm, conflicts are solved by cooperative tree search method. Each UAV would take different maneuvers to avoid the conflict. Therefore, UAVs would not take unnecessary fuel consumptions. The deviations from nominal paths are shown in Figure 8. It is shown that the navigation function-based method would cause larger deviations from nominal paths. The rule-based algorithm would have minor impact on UAVs flight path. The tree search algorithm lead to the least influence on UAVs flight paths.

Figure 7.

Conflict-free traces, (a), (c), and (e) depict 3-D traces. (b), (d) and (f) depict 2-D traces.

Figure 8.

Comparison of conflict resolution consumption. The deviation from nominal paths.

The distances between each pair of UAVs during the flying time 40 s to 120 s are shown in Figure 9. The orange line is the minimum safe separation distance between UAVs. The result shows that the tree search algorithm can guarantee the safe separation between UAVs.

Figure 9.

Distance between UAVs in CDR process. UAV: unmanned aerial vehicle; CDR: conflict detection and resolution.

CDR in complex scenario

In the second scenario, we demonstrate our algorithm in copying with the conflict resolution in the real environment. There is a disaster in a city region. The ground communication devices are destroyed because of disaster. In order to search and rescue the survivals, four communication-relay UAVs are allocated to fly above this region to build the communication. One scout UAV is issued to search important spots. The velocities of these five UAVs are 56 m/s, 54 m/s, 48 m/s, 66 m/s, and 30 m/s. In this scenario, we set the safe region of each UAV based on their speed. We define different safe radius for different UAV according to its velocity. The safe radius of each UAV A _i is set as $6 * v_{i}$ , where $v_{i}$ is the velocity of A _i, which is defined as equation (1). The flight altitude of these UAVs is about at 60 m. There are tall buildings in this city region. In order to guarantee the flight safety, UAVs should keep a safe distance between these buildings. These buildings are regarded as circular obstacles with radius 250 m. In this scenario, there are two obstacles. They locate at (2050 m, 6000 m) and (4100 m, 6500 m). The predefined traces of these UAVs and the obstacles are shown as Figure 10.

Figure 10.

Predefined traces of UAVs in search and rescue scenario. UAV: unmanned aerial vehicles

These UAVs may meet with loss separation dangers because of unexpected disturbances. The communicate-relay UAV are permitted to take large detours during the flight while the scout UAV should keep fly close to the preplanned trace. In the experiment, we set the priority of scout UAV as 9 and the priorities of communicate-relay UAVs as 1. The algorithm takes the obstacles into consideration when devising the conflict-free policy. The conflict-free trajectories of these UAVs are shown in Figure 11.

Figure 11.

Conflict-free traces for search and rescue scenario.

As Figure 11 shows that these UAVs can avoid collision with obstacles. The trace of UAV 5 has not been largely modified. During the conflict avoidance maneuver process. The communicate-relay UAVs have taken most of the work load to avoid the conflict dangers. The distances between UAVs are shown in Figure 12.

Figure 12.

The distances between UAV pairs. UAV: unmanned aerial vehicle.

As the velocities of these UAVs are different, the ranges of safe regions of UAVs are not equal. As it is shown in Figure 12 (a) to (d), the distance between each pair of UAVs is larger than ${\tilde{r}}_{i j}$ .

Conclusions

We study on UAVs conflict resolution problem in dynamic environments in this article. We first propose the tree search method to deal with UAVs conflict resolution problem. The BPNNs are applied to approximate the unmodeled dynamics. Secondly, the midterm reward for pairwise conflict between UAVs is defined as weighted sum of two factors, namely, loss separation punishment and goal attraction. We then discuss on multi-UAVs conflict resolution problem. Finally, we demonstrate the proposed algorithm by simulation experiments.

As the communication condition may be imperfect in some circumstances and the number of UAVs would increase. The distributed cooperative CDR method is required in these situations. We would study on the distributed cooperative CDR in our future work.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by Natural Science Foundation of China (number: 61403410), China Postdoctoral Science Foundation (number: 2014M552687) and Graduate Student Innovation Project of Hunan province (number: CX2014B013).

References

Consiglio

Chamberlain

Muñoz

. Concept of integration for UAS Operations in the NAS. In: 28th International Congress of the Aeronautical Sciences, Brisbane, Australia, 23–28 September 2012, pp. 1–13.

Dalamagkidis

Valavanis

Piegl

. On unmanned aircraft systems issues, challenges and operational restrictions preventing integration into the National Airspace System. Prog Aerosp Sci 2008; 44(7): 503–519.

Šišlák

Volf

Pechoucek

. Agent-based cooperative decentralized airplane-collision avoidance. IEEE Trans Intell Transp Syst 2011; 41: 365–375.

Alonso-Ayuso

. Conflict avoidance: 0-1 linear models for conflict detection & resolution. Top 2013; 21: 485–504.

Devasia

Iamratanakul

Chatterji

. Decoupled conflict-resolution procedures for decentralized air traffic control. IEEE Trans Intell Transp Syst 2011; 1 2: 422–437.

Wang

Liu

Zhang

. Adaptive fuzzy control for a class of stochastic pure-feedback nonlinear systems with unknown hysteresis. IEEE Trans Fuzzy Syst 2016; 24: 140–152.

El-Sousy

FFM

. Adaptive dynamic sliding-mode control system using recurrent RBFN for high-performance induction motor servo drive. IEEE Trans Ind Inform 2013; 9(4): 1922–1936.

Liu

Y-J

Tang

Tong

. Adaptive NN controller design for a class of nonlinear MIMO discrete-time systems. IEEE Trans Neural Netw Learn Syst 2015; 26: 1007–1018.

Zhao

Gao

. Robust adaptive control for a class of uncertain non-affine nonlinear systems using affine-type neural networks. Int J Syst Sci 2016; 47(11): 2691–2699.

10.

Liu

Y-J

Tang

Tong

. Adaptive NN controller design for a class of nonlinear MIMO discrete-time systems. IEEE Trans Neural Netw Learn Syst 2015; 26: 1007–1018.

11.

Tang

Ang

Ariffin

MKABM

. Predicting the motion of a robot manipulator with unknown trajectories based on an artificial neural network. Int J Adv Robot Syst 2014; 11: 176–185.

12.

Gillham

Howells

. A dynamic localized adjustable force field method for real-time assistive non-holonomic mobile robotics. Int J Adv Robot Syst 2015; 12: 147–168.

13.

Emami

Derakhshan

. An overview on conflict detection and resolution methods in air traffic management using multi agent systems. In: Artificial Intelligence and Signal Processing, 2012 16th CSI International Symposium, Shiraz, Iran, May 2–3 2012, pp. 293–298. New York: IEEE.

14.

Conde

Alejo

Cobano

. Conflict detection and resolution method for cooperating unmanned aerial vehicles. J Intell Robot Syst 2012; 65(1): 495–505.

15.

Persiani

Bagassi

. Route planner for unmanned aerial system insertion in civil non-segregated airspace. Proc I Mech E Part G: J Aerosp Eng 2012; 227: 687–702.

16.

Mellinger

Kushleyev

Kumar

. Mixed-integer quadratic program trajectory generation for heterogeneous quadrotor teams. In: 2012 IEEE International Conference on Robotics and Automation, St. Paul, Minnesota, 14–18 May 2012, pp. 477–483. New York: IEEE.

17.

Xueqiang

Jing

. Collision-free multiple unmanned combat aerial vehicles cooperative trajectory planning for time-critical missions using differential flatness approach. Defense Sci J 2014; 64(1): 13–20.

18.

Ome

. A space-discretized mixed-integer linear model for air-conflict resolution with speed and heading maneuvers. Comput Oper Res 2015; 58: 75–86.

19.

Omer

Farges

. Hybridization of nonlinear and mixed-integer linear programming for aircraft separation with trajectory recovery. IEEE Trans Intell Transp Syst 2013; 14(3): 1218–1230.

20.

Yang

Alvarez

Bruggemann

. A 3D collision avoidance strategy for UAVsin a non-cooperative environment. J Intell Robot Syst 2013; 70(1–4): 315–327.

21.

Roelofsen

Martinoli

Gillet

. Distributed deconfliction algorithm for unmanned aerial vehicles with limited range and field of view sensors. In: 2015 American control conference, Chicago, IL, 1–3 July 2015. pp. 4356–4361. New York: IEEE.

22.

Chakravarthy

Ghose

. Obstacle avoidance in a dynamic environment: a collision cone approach. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 1998; 28(1): 562–574.

23.

Masoud

. Decentralized self-organizing potential field-based control for individually motivated mobile agents in a cluttered environment: a vector-harmonic potential field approach. IEEE Trans Syst Man Cyberne A Syst Humans 2007; 37(3): 372–390.

24.

Fiorini

Shiller

. Motion planning in dynamic environments using velocity obstacles. Int J Robot Res 1998; 17(7): 760–772.

25.

Van Den Berg

Guy

Lin

. Reciprocal n-body collision avoidance. In: Robotics research, 2011. Springer Berlin Heidelberg, vol.70, pp. 3–19.

26.

Saska

Vonásek

Krajník

. Coordination and navigation of heterogeneous MAV–UGV formations localized by a ‘hawk-eye’-like approach under a model predictive control scheme. Int J Robot Res 2014; 33(10): 1393–1412.

27.

Fukushima

Kon

Matsuno

. Model predictive formation control using branch-and-bound compatible with collision avoidance problems. IEEE Trans Robot 2013; 29(5): 1308–1317.

28.

Frese

Beyerer

. Planning cooperative motions of cognitive automobiles using tree search algorithms. Lecture Notes in Computer Science 2010; 6359(6): 91–98.

29.

Frese

Beyerer

. A comparison of motion planning algorithms for cooperative collision avoidance of multiple cognitive automobiles. 2011 IEEE Intelligent Vehicles Symposium, Baden-Baden, Germany, 05–09 June 2011, pp. 1156–1162. New York: IEEE.

30.

Pan

Wang

. Model predictive control of unknown nonlinear dynamical systems based on recurrent neural networks. IEEE Trans Ind Electron 2012; 59(8): 3089–3101.

31.

Menon

Sweriduk

Sridhar

. Optimal strategies for freeflight air traffic conflict resolution. J Guid Control Dyn 1999; 22(2): 202–211.

32.

Bicchi

Pallotino

. Optimal cooperative conflict resolution forair traffic management systems. IEEE Trans Intell Transp Syst 2000; 1(4): 221–231.

33.

Aswani

Bouffard

Tomlin

. Extensions of learningbased model predictive control for real-time application to a quadrotor helicopter. In: American Control Conference, Montréal, Canada, 2012, pp. 4661–4666. New York: IEEE.

34.

Liaw

Shirinzadeh

Smith

. Robust neural network motion tracking control of piezoelectric actuation systems for micro/nano manipulation. IEEE Trans Neural Netw 2009; 20(2): 356–367.

35.

Haykin

. Neural networks and learning machines. 3rd ed. Upper Saddle River: Prentice Hall, 2009.

36.

Jung

Wehenkel

Ernst

. Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search. Int J Adapt Control 2014; 28(3–5): 255–289.

37.

Hren

J-F

Munos

. Optimistic planning of deterministic systems. In: Proceedings of European Workshop on Reinforcement Learning, Villeneuve d’Ascq, France, 30 June–3 July 2008, pp. 151–164. New York: Springer.

38.

Stipanovic

Hokayem

Spong

. Cooperative avoidance control for multi-agent systems. J Dynam Syst Meas Control 2007; 129(5): 699–707.

39.

Lin

W-S

Zheng

C-H

. Constrained adaptive optimal control using a reinforcement learning agent. Automatica 2012; 48: 2614–2619.

40.

Alejo

Cobano

Heredia

. Optimal reciprocal collision avoidance with mobile and static obstacles for multi-UAV systems. In: 2014 International Conference on Unmanned Aircraft Systems, Orlando, FL, 27–30 May 2014, pp. 1259–1266.

Decentralized cooperative unmanned aerial vehicles conflict resolution by neural network-based tree search method

Abstract

Keywords

Introduction

Related works

Tree search conflict resolution algorithm

UAV model

Approximate the effect of wind by NNs

Model-based tree search in conflict resolution

Primary concept

A. Action

B. budget of expansion

C. Middle-term rewards

Tree search algorithm

CDR analysis

UAV CDR problem analysis

CDR midterm reward

Horizontal maneuver discussion

Conflict resolution midterm reward

Multi-UAV conflict resolution

Simulation experiments

Algorithm efficiency analysis

CDR in complex scenario

Conclusions

Footnotes

Declaration of conflicting interests

Funding

References