Sage Journals: Discover world-class research

Abstract

Online path planning (OPP) for unmanned aerial vehicles (UAVs) is a basic issue of intelligent flight and is indeed a dynamic multi-objective optimization problem (DMOP). In this paper, an OPP framework is proposed in the sense of model predictive control (MPC) to continuously update the environmental information for the planner. For solving the DMOP involved in the MPC we propose a dynamic multi-objective evolutionary algorithm based on linkage and prediction (LP-DMOEA). Within this algorithm, the historical Pareto sets are collected and analysed to enhance the performance. For intelligently selecting the best path from the output of the OPP, the Bayesian network and fuzzy logic are used to quantify the bias to each optimization objective. The DMOEA is validated on three benchmark problems characterized by different changing types in decision and objective spaces. Moreover, the simulation results show that the LP-DMOEA overcomes the restart method for OPP. The decision-making method for solution selection can assess the situation in an adversarial environment and accordingly adapt the path planner.

Keywords

UAV online path planning dynamic multi-objective evolutionary algorithm Bayesian networks fuzzy logic

1. Introduction

The application of unmanned aerial vehicles (UAVs) for various military missions has received growing attention in the last decade. Apart from the obvious advantage of not placing human life at risk, the lack of a human pilot enables significant weight savings and lower costs. UAVs also provide an opportunity for new operational paradigms. To realize these advantages, UAVs must have a high level of autonomy and preferably work cooperatively in groups. In addition to some key elements involved in traditional aerial vehicles, such as control algorithms, localization and navigation, more emphasis has been put on the autonomic and intelligent abilities in uncertain or even adversarial environments.

Intelligent flight is the basic issue for UAVs to carry out any complicated mission. When the environment is static and known beforehand the flight path can be well designed offline [1, 2]. However, when the environment is changeable or there is no exact knowledge in advance about the environment, a UAV needs to intelligently plan its path online.

To this end, in this paper, we intend to solve the intelligent online path planning (OPP) problem for UAVs.

In general, path planning involves multiple optimal objectives. For instance, maximal safety and minimal energy cost are the two most common objectives. In some literature multi-objective optimal problems (MOPs) are transformed to single-objective ones by weighting each objective and summing them up [3, 4]. This requires knowing the bias to objectives of the optimizer in advance. Unfortunately, sometimes it is difficult to achieve. Therefore, from the nature of the dynamic MOPs (DMOPs), we optimize all the objectives simultaneously. Moreover, considering the fact that no exact information about the environment is known beforehand and the environment may change during the mission, the objectives involved in path planning are time variant. Accordingly, we need a dynamic multi-objective optimal method to deal with the problem at hand.

Evolutionary algorithms (EAs) are now an established research field at the intersection of artificial intelligence, computer science and operations research. Its problem-independent and global optimal characteristics establish its position in the optimization domain. However, most research in EAs has focused on static optimization problems. The main problem of using traditional EAs for DOPs lies in that they eventually converge to an optimum and thereby lose their population diversity which is necessary for efficiently exploring the search space, which consequently deprives them of the ability to adapt to the changes in the environment. To enhance EAs to solve DOPs, over the past two decades, a number of researchers have developed many methods to maintain diversity for traditional EAs to continuously adapt to the changing environment. Most of these methods can be categorized into the following four types of approaches: (1) increasing the diversity after a change, such as the hyper-mutation method [5]; (2) maintaining the diversity throughout the run, such as the random immigrants scheme [6], sharing or crowding mechanisms [7], and the thermodynamical genetic algorithm [8]; (3) memory-based approaches [9-11]; and (4) multi-population approaches [12-15]. Comprehensive surveys on EAs applied to dynamic environments can be found in [16-20].

Recently, in several works [21-24], the convergence is accelerated by predicting the characteristics of future changes when the behaviour of the problem follows a certain trend. Inspired by these ideas, partly based on our previous work in [25], we propose a multi-objective EA (MOEA), called a dynamic MOEA with Pareto set linking and prediction, and denoted LP-DMOEA, which stores and analyses the historic information to enhance its performance on DMOPs, to solve the OPP problem. Within LP-DMOEA, the historic Pareto solutions are first linked to construct several time series, and then a prediction method is employed to anticipate the Pareto set of the next problem. Benefiting from such anticipation, the initial population for the new problem can be heuristically generated to accelerate the convergence in the new environment. It is noteworthy that the obtainment of Pareto optimal solutions is not the end of solving an MOP. One solution should be selected from the Pareto set by the decision-maker (DM). As for UAVs, there are no actual DMs to deal with such a decision mission. They should intelligently make such a decision without interacting with human beings. To this end, we employ a Bayesian network (BN) to model the inference process of a pilot when he is assessing the danger level of an environment. In addition, fuzzy logic is used to quantify the decision bias reference to infer results.

The remaining part of this paper is organized as follows. Section 2 presents the formulation of the OPP and the details of LP-DMOEA; Section 3 presents the intelligent decision-making methods to choose a Pareto solution. Section 4 presents the experimental results and analysis. Conclusions are drawn in Section 5.

2. Online Path Planning in Terms of Dynamic Multi-Objective Optimization

From the practice point of view, offline global path planning is likely to be invalid when the environment is uncertain and time-variant. A UAV has to independently plan its online path referring to the local information detected by its onboard sensors. Pongpunwattana has proposed an OPP scheme in the sense of model predictive control (MPC) in [26]. As shown in Fig. 1, suppose the UAV has planned a path starting from point A at $t_{p}$ . Instead of executing the whole planning result, a partial path (e.g. the path between A and B) will be executed. While flying from A to B, it is planning a new path that starts from B. When it arrives at B at time $t_{p + 1}$ , this new path will be used. Of course, part of the new path will be executed. Hence, the OPP can be achieved by iteratively executing the steps above and the environmental information can be continuously updated to adapt the planner for a changing environment. Obviously, the path between A and B is the executing horizon and the paths planned at A or B are the planning horizon. The optimization problem in each time window can be time-variant. Hence, the OPP problem is indeed a DOP.

Figure 1.

Illustration of online path planning

2.1. Formulation for the OPP Problem

In a 2-D case, we formulate the OPP problem as follows:

{\begin{cases} \min f = {f_{1} (u, t), f_{2} (u, t)} \\ f_{1} (u, t) = \prod_{i = 1}^{n} p_{k i l l} (x (t) + \sum_{j = 1}^{i} g (u_{j})) \\ f_{2} (u, t) = {‖ x (t) + \sum_{i = 1}^{n} g (u_{i}) - Τ ‖}_{2} \\ u_{i} \in u (i = 1, \dots, n) \end{cases}

(1)

where $u$ is the sequential control input vector of a UAV from t to $t + Δ T_{s}$ $(Δ T_{s} = n \times Δ t)$ , g(u_i) denotes the Euclidean deviation of the UAV caused by the control input $u_{i}$ , $x (t)$ and $T$ are the vectors of position values of UAV and destination, respectively. $p_{k i l l} (x (t))$ means the probability of being destroyed. So, the first optimization objective is to minimize the destroy probability when the UAV flies along the planned path. The second optimization objective is easy to understand: it aims at minimizing the distance between the UAV (at the termination of a path segment) and its goal. For the convenience of simulation, we use a probabilistic threat exposure map [27] to model the battle field. The probability of becoming disabled by the i-th threat is characterized by the multi-dimensional Gaussian law.

p_{k i l l}^{i} (t) = \frac{1}{2 π \sqrt{\det (K_{i})}} \exp [- \frac{1}{2} {(x (t) - μ_{i})}^{T} K_{i}^{- 1} (x (t) - μ_{i})]

(2)

where $μ_{i} = [μ_{x, i}, μ_{y, i}]$ , $K_{i} = [\begin{matrix} σ_{x, i}^{2} & 0 \\ 0 & σ_{y, i}^{2} \end{matrix}]$ . Here, $μ_{i}$ denotes the position of a threat and $K_{i}$ determines the acting range.

2.2. Problem Solving Approach: The LP-DMOEA

To deal with the DMOP at hand, we propose the LP-DMOEA. The main idea of LP-DMOEA lies in heuristically generating the initial population for a new problem by making predictions from the historical information. Historical Pareto solutions are linked to construct several time series and then a prediction method is employed to anticipate the Pareto set of the next problem. Lastly, the initial population for the new problem can be heuristically generated to accelerate the convergence speed for the new problem.

As shown in Fig. 2, suppose the environment changes (new problem arrives) at t, the Pareto set $S_{p} (t)$ in offspring $O (t)$ is used to generate the next population $P (t + 1)$ . Three major steps are involved in such a process. Firstly, LP-DMOEA will select some representative Pareto solutions ${S^{'}}_{p} (t)$ from $S_{p} (t)$ considering both computational complexity of prediction and diversity of the Pareto front. Secondly, ${S^{'}}_{p} (t)$ and its historical counterparts are used to construct several time series $T S$ . Lastly, an anticipating method will be applied to $T S$ to predict the location of the new Pareto set and the initial population could be generated based on the prediction results.

Figure 2.

Flow chart of the LP-DMOEA

2.2.1. Selecting Pareto Solutions for Linking

As for selecting Pareto solutions for linking, there are two major ideas in the literature. The first one accomplishes this by considering the feature of each candidate in the objective space. In the work of Hatzakis and Wallace [21], the anchor points and closest-to-ideal point of a Pareto front in the objective space are chosen as the key feature points, and the corresponding solutions in the decision space are used to make the anticipation. This approach uses two or three key points to characterize the Pareto front. However, this approach will be invalid when the front is concave or very complex. As for the other idea, the candidates are selected directly in the decision space under a special principle. In the work of Zhou et al. [23], all the Pareto solutions obtained prior to an environmental change are used to anticipate. This approach is more direct because the factors involved in the time series construction are only in the context of the decision space. However, each Pareto solution will be linked to a time series, which may lead to a large number of time series and the computational complexity could be enormous.

In this paper, we follow the second idea. The difference to [23] lies in that we use a hyper-box-based selection (HBS) to construct time series from the Pareto sets. As shown in Fig. 3, suppose there are two decision values and the range of each value is divided into many sections by a preset parameter $ε_{1}$ or $ε_{2}$ . Thus, the whole decision space is divided into lots of hyper-boxes. Instead of the whole Pareto set, partial solutions will be selected for the time series construction. The HBS allows a hyper-box being occupied by only one Pareto solution in the Pareto set. If there are two or more solutions in a hyper-box, only the one that is the closest to the left-bottom corner of the hyper-box is the winner. For example, $P_{1}$ and $P_{2}$ are in the same hyper-box, since $P_{2}$ is closer to the left-bottom corner of the hyper-box, it is the winner and is selected to construct the time series.

Figure 3.

Diagram of the HBS

2.2.2. Construction of Time Series

As for constructing time series, the key issue is how to identify the relationship between the solutions selected by HBS. In this paper, we use the minimal distance principle to identify the relationship between two solutions. Each Pareto solution $x_{S} (i, t)$ in ${S^{'}}_{p} (t)$ will be added to the end of a time series and the number of $T S$ is equal to the quantity of $x_{S} (i, t)$ . Suppose $T S (i) \in T S$ is a time series constructed previously and $x_{T} (i)$ is its last element. If the Euclidean distance from $x_{S} (i, t)$ to $x_{T}$ is the shortest then $x_{S} (i, t)$ should be added to $T S (i)$ , i.e.

T S (i) = \arg \min_{x_{s} \in {S^{'}}_{p} (t), x_{T} \in T S (i)} {‖ x_{s} - x_{T} ‖}_{2}

(3)

Considering the limitation of memory and computation resource, we use a preset parameter K to control the maximal order number of a time series. This means there are at most K elements in a time series. If the length of a time series is shorter than K, then $x_{S} (t)$ will be added to the end of the time series directly. Otherwise, the elements in the time series will follow the first-in-first-out principle.

2.2.3. Prediction and Generation of the Initial Population

Many methods could be used to analyse the time series constructed above. In this paper, the following simple linear model is adopted:

{\tilde{x}}_{t + 1} = x_{t} + (x_{t} - x_{t - 1})

(4)

Considering the forecasting error caused by the inaccuracy of the forecasting model and the searching algorithm, the prediction results should not be directly used to initialize the new population, the diversity should be maintain to some degree. Here we maintain diversity in two aspects:

1) Only partial initial population is generated based on the prediction results: A preset parameter α is used to control the rate of the individuals which will be initialized referring to the prediction and the rest of the individuals will be randomly generated.

2) Variation with a noise: Similar to [23], we bring in a Gaussian noise λ to improve the chance of the initial population covering the true Pareto set. This noise is added to the predicted result of each decision value. The standard deviationδ of noise is estimated by looking at the changes that occurred before:

δ^{2} = \frac{1}{4 n} {‖ x_{t} - x_{t - 1} ‖}_{22}^{}

(5)

2.3. Chromosome Representation

The chromosome is the bridge between the optimization problem and the search space. For a path section, it is straightforward to code it as a series of consecutive line segments. However, this coding method is ambiguous for the control system since the control system cannot obtain an explicit input signal from such a chromosome. In this study, we use a series of consecutive yaw angle changing values to code a chromosome. In each time step (i.e. $Δ t$ ), a UAV flies at a corresponding yaw angle and the resulting path can be obtained. Suppose the UAV cruises at a constant velocity and only the 2-D case is considered. As shown in Fig. 4, the UAV makes a change of $u_{1}$ in its yaw angle at the moment of t and a line segment from $x (t)$ to $x (t + Δ t)$ can be geometrically calculated using the kinematics of the UAV.

Figure 4.

Diagram of the chromosome representation

3. Decision-Making on the Selection of Executive Solution

Although a set of Pareto solutions can be dynamically obtained using the LP-DMOEA, the OPP problem is not solved until one feasible solution is selected out for executing. In this section, we focus on how to select a solution referring to the bias of the DM and how to intelligently make such a decision.

3.1. Methodology Used to Select Solutions from the Pareto Set

In this paper we use the Weighted Stress Function Method (WSFM) proposed by Ferreira [28] to integrate the DM preferences after the search process has been made. Therefore, this is a posterior method where the search and the decision process are sequential. The main principle of WSFM lies in that for each optimal objective, the relative importance attributed to each objective will induce a “stress” for searching for solutions that maximize such objective. Thus, the best solution will be the one where the differences between the stresses associated to each objective have the minimum possible value. For a problem with M optimal objectives the WSFM is converted to a single objective optimal problem:

x = \arg \min_{x \in S} (\sum_{1 \leq i < j \leq M} | γ_{i} (f_{i} (x)) - γ_{j} (f_{j} (x)) |)

(6)

where $x$ and $S$ denote decision value vector and decision space separately, $γ_{i} (f_{i} (x))$ denotes the stress associated to the corresponding objective:

γ_{i} (f_{i}) = {\begin{cases} \frac{w_{i}}{2} \tan (- \frac{π}{ψ (w_{i})} (f_{i} - w_{i})) + ξ (w_{i}), f_{i} \leq w_{i} \\ - \frac{ξ (w_{i})}{\tan (\frac{π}{φ (w_{i})} (w_{i} - 1))} \tan (- \frac{π}{φ (w_{i})} (f_{i} - w_{i})) + ξ (w_{i}), f_{i} > w_{i} \end{cases}

(7)

where, $w_{i}$ $(\sum_{i}^{M} w_{i} = 1)$ is the weight contributed to each objective which is given by DM.

φ (w_{i}) = \frac{3}{4} {(1 - w_{i})}^{2} + 2 (1 - w_{i}) + δ_{1}

(8)

ψ (w_{i}) = φ (w_{i}) + 4 w_{i} - 2

(9)

ξ (w_{i}) = - \frac{1}{\tan (- \frac{π}{2 + 2 δ_{2}})} \tan (\frac{π}{1 + δ_{2}} (w_{i} - \frac{1}{2})) + 1

(10)

$δ_{1}$ and $δ_{2}$ are suggested to set to 0.002 and 0.008 respectively in [28].

Since the equation above considers the maximization problem and each objective should be normalized, we rewrite the original objective function (1) as follows:

{\begin{cases} \min f = {f_{1} (u, t), f_{2} (u, t)} \\ f_{1} (u, t) = 1 - \prod_{i = 1}^{n} p_{k i l l} (x (t) + \sum_{j = 1}^{i} g (u_{j})) \\ f_{2} (u, t) = 1 - \frac{{‖ x (t) + \sum_{i = 1}^{n} g (u_{i}) - Τ ‖}_{2} - ({‖ x (t) - Τ ‖}_{2} - n V Δ T)}{2 n V Δ T} \\ u_{i} \in u (i = 1, \dots, n) \end{cases}

(11)

3.2. Intelligent Situation Assessment via Bayesian Network (BN)

Now the problem turns to how to set $w_{i}$ intelligently. The weights $w_{1}$ and $w_{2}$ reflect the DM's bias to safety (i.e. $f_{1}$ ) and path length (i.e. $f_{2}$ ), respectively. In this study, the BN is employed to assess the danger level of the battle field.

3.2.1. Concept of the BN

The BN is a graphical representation of a joint probability distribution, representing dependence and conditional independence relationships. To our knowledge, it was first introduced by Kim and Pearl [29] and can be defined as follows.

Definition 1. $β = (G, θ)$ is a Bayesian network if $G = (X, E)$ is a direct acyclic graph where the set of nodes represents a set of random variables $X = {X_{1}, \dots, X_{n}}$ , and if $θ_{i} = [P (X_{i} | X_{P a (X_{i})})]$ is the matrix containing the conditional probability of node i given the state of its parents $P a (X_{i})$ .

The joint probability distribution can be written as:

P (X_{1}, X_{2}, \dots X_{n}) = \prod_{i = 1}^{n} P (X_{i} | X_{P a (X_{i})})

(12)

3.2.2. A BN-Based Assessment Model of Environment

The construction process of a BN, including the structure and parameters, is indeed the integration of the DM's knowledge. The resulting BN will make an intelligent inference instead of the DM. In this work, the enemy air defence (i.e. threats for UAVs) consists of two types of anti-air weapons: the anti-air guns (AAGuns) and the surface-to-air missiles (SAMs). The threat type is written as TT for short. A threat may work in one of the following states: no targets are found and the system is inactive (IA); surveillance radar detects the targets (Surv); targets have been intercepted by radars (Intercept); targets are being traced by radars (Trace) and open fire (Fire). There are five environmental danger levels (EDLs): very dangerous (VD), dangerous (D), medium (M), safe (S), and very safe (VS). If the working states of the threats (TS) could be known, the EDL can be easily inferred.

However, the TS is difficult to know directly and a UAV has to infer such information referring to the local information collected by its onboard sensors. Suppose there are two major onboard sensors assembled on a UAV: the missile launching detector (MLD) and the radar warning receiver (RWR). For these sensors, there are two working states: active (A) and inactive (IA). When the RWR works in state A, the UAV has been detected or traced by enemy radars and the anti-air weapons will be launched imminently. The matter is worse when the MLD is working in state A which means the UAV is under attack.

In addition to the sensors' information, the distance between a threat and the UAV is also a key factor that impacts the EDL. Suppose there are five range (R) scales: R1 (0∼1 km), R2 (1∼2 km), R3 (2∼4 km), R4 (4∼6 km), and R5 (larger than 6 km). The BN structure is shown in Fig. 5 and the corresponding conditional probability tables (CPTs) are given in the Appendix.

Figure 5.

The BN structure of the environmental assessment model

3.3. Quantification of Environmental Assessment Results

Since the BN is a qualitative inference tool, we need to quantify the inference results (i.e. the EDLs) to obtain the weight values associated to each optimization objective. In this work, we use fuzzy logic [30] to set the value of $w_{1}$ and the fuzzy rules are given in Fig. 6. The triangular-shaped membership function (as shown in Fig. 7) and the centre of gravity defuzzification are adopted. After the quantification of $w_{1}$ , $w_{2}$ can easily be calculated since $w_{1} + w_{2} = 1$ .

Figure 6.

Fuzzy rules between EDL and $w_{1}$

Figure 7.

Triangular-shaped membership function. Suppose the value range of $w_{1}$ is [0.1, 1].

4. Experimental Results and Analysis

In this section the NSGA2 [31] is used as the basic MOEA. We apply LP-DMOEA and random restart methods to NSGA2, and the resulting dynamic MOEAs are written as LP-DNSGA2 and R-DNSGA2, respectively. We firstly validate the LP-DMOEA by comparing LP-DNSGA2 with R-DNSGA2 on benchmark problems. Then will compare two OPP algorithms: OPP-A and OPP-B which use LP-DNSGA2 and R-DNSGA2, respectively. Lastly, the intelligent OPP using LP-DNSGA2 and the intelligent decision-making methods will be validated.

4.1. Performance Comparison on Benchmark Problems

In the following experiments and analysis the general algorithmic settings are given as follows: population size is set to 100, probability of simulated binary crossover is set to 0.9 and polynomial mutation rate is $1 / n$ (n is the length of a chromosome). Additionally, the maximal order of a time series is set to 10 (i.e. $K = 10$ ), $ε_{i}$ is set to 0.1 and all of the initial individuals are heuristically generated (i.e. $α = 1.0$ ).

4.1.1. Benchmark Problems

Three benchmark problems are used here. FDA1 and FDA3 are proposed by Ferina and Deb [23], and ZJZ is proposed by Zhou, Jin and Zhang [16]. Their definitions are given as follows where τ is the generation counter, $τ_{t}$ is the number of generations in each time window and $n_{t}$ controls the distance between two consecutive Pareto sets. In fact, $τ_{t}$ and $n_{t}$ represent the frequency of change and severity of change, respectively.

FDA1: {\begin{cases} f_{1} (x_{I}) = x_{1}, f_{2} = g \cdot h, h (f_{1}, g) = 1 - \sqrt{\frac{f_{1}}{g}} \\ g (x_{II}) = 1 + \sum_{x_{i} \in x_{II}} {(x_{i} - G (t))}^{2} \\ G (t) = \sin (0.5 π t), t = \frac{1}{n_{t}} ⌊ \frac{τ}{τ_{t}} ⌋ \\ x_{I} = (x_{1}) \in [0, 1], x_{II} = (x_{2}, …, x_{20}) \in [- 1, 1] \end{cases}

(13)

ZJZ: {\begin{cases} f_{1} (x_{I}) = x_{1} \\ f_{2} (x_{{_{I}}_{I}}) = 1 - (f_{1} / g (x_{II}))^{H (t)} \\ g (x_{II}) = 1 + \sum_{x_{i} \in x_{II}} {(x_{i} + G (t) - x_{1}^{H (t)})}^{2} \\ H (t) = 1.5 + G (t) \\ G (t) = \sin (0.5 π t), t = \frac{1}{n_{t}} ⌊ \frac{τ}{τ_{t}} ⌋ \\ x_{I} = (x_{1}) \in [0, 1], x_{II} = (x_{2}, …, x_{n}) \in [- 1, 2] \end{cases}

(14)

FDA3: {\begin{cases} f_{1} (x_{I}) = \sum_{x_{i} \in x_{I}} x_{1}^{F (t)}, f_{2} = g \cdot h \\ g (x_{II}) = 1 + G (t) + \sum_{x_{i} \in x_{II}} {(x_{i} - G (t))}^{2} \\ h (f_{1}, g) = 1 - \sqrt{\frac{f_{1}}{g}}, G (t) = | \sin (0.5 π t) | \\ F (t) = 10^{2 \sin (0.5 π t)}, t = \frac{1}{n_{t}} ⌊ \frac{τ}{τ_{t}} ⌋ \\ x_{I} = (x_{1}, \dots, x_{5}) \in [0, 1] \\ x_{II} = (x_{6}, \dots, x_{30}) \in [- 1, 1] \end{cases}

(15)

4.1.2. Performance Indicators

The general distance (GD) performance indicator has been widely used by many researchers to measure the convergence of a multi-objective evolutionary algorithm.

In this paper, the decision space GD (VD) is used and its definition is given below:

V D (P) = \frac{1}{| P^{*} |} \sum_{x \in P^{*}} {‖ x - y (x) ‖}_{2}

(16)

where $P^{*}$ is a reference Pareto front, P is an obtained Pareto set and $y (x) = \arg \min_{y \in P} {‖ x - y ‖}_{2}$ .

To fairly measure an algorithm, each algorithm is run $N_{E}$ times and we calculate the average VD metric of the population in each generation as below:

E_{f} (τ) = \frac{1}{N_{E}} \sum_{i = 1}^{N_{E}} V D (P (τ))

(17)

4.1.3. Experimental Results on Benchmark Problems

The Pareto set and Pareto front of the benchmark problem can be mathematically analysed and are given in Table 1; their characteristics are also shown.

The performance of LP-DNSGA2 and R-DNSGA2 in 50 independent runs is compared in Fig. 8. One can see LP-DNSGA2 overcomes R-DNSGA2 on all benchmark problems, its population converges better and the obtained Pareto fronts are closer to the reference Pareto fronts. This means our LP-DMOEA works effectively to solve dynamic multi-objective optimization problems.

Table 1.

The global Pareto solution sets of FDA1~FDA3 and their Pareto fronts

	Pareto set	Pareto front
FDA1	{x_i(t) = G(t), x_i(t)∈ x _II} Change	$f_{2} = 1 - \sqrt{f_{1}}$ No change; Convex
ZJZ	${x_{1} \in [0, 1], x_{i} (t) \in x_{I I} \| x_{i} (t) = x_{1}^{H (t)} - G (t)}$ Change	$f_{2} = 1 - f_{1}^{H (t)}$ Change; Convex to non-convex
FDA3	{x₁ ∈ [0,1], x_i(t) = G(t), x_i(t) ∈ x _II} No change	$f_{2} = (1 + G (t)) \times (1 - \sqrt{\frac{f_{1}}{1 + G (t)}})$ Change; Convex

Figure 8.

Comparison of the mean and stand deviation of $E_{f}$ between LP-DNSGA2 and R-DNSGA2

Figure 9.

A set of simulation snapshots of online path planning for a UAV. There are two moving threats (e.g. enemy aircraft) with $σ_{x} = σ_{y} = 2$ flying at a constant speed of 150 m/s. The speed of the UAV is 200 m/s, $w_{1} = 0.7$ and $w_{2} = 0.3$ .

4.2. Results and Analysis on the Intelligent OPP Algorithm

The algorithmic parameters involved in the following experiments are given as follows:

Parameters involved in LP-DNSGA2: rate of the heuristically generated individuals is set to 50% (i.e. $α = 0.5$ ), the maximal order number of a time series is set to 5 (i.e. $K = 5$ ), both $ε_{1}$ and $ε_{2}$ are set to 0.1.

Parameters involved in OPP: time step $Δ t$ is set to 1 second and the executing horizon is set to one time step (i.e. 1s). Since typical effective range of short distance anti-air guns is about 4km (e.g. Samavat 35mm towed anti-aircraft twin-cannon system of Iran [32]), the number of time steps (planning horizon) of a sequential control input is set to 20 (i.e. the chromosome length $n = 20$ ) according to the speed of the UAVs (200m/s).

4.2.1. Validation of LP-DMOEA-Based OPP Algorithm

To test the validity of the LP-DMOEA-based OPP algorithm (i.e. OPP-A), we consider a moving threat case. In this case, two enemy moving threats patrol in the mission field. A UAV should dynamically plan its flying path to avoid being detected. Fig. 9 shows the simulation snapshots. It can be seen that the UAV can successfully keep its stealth. In other words, in the 215 simulations, the UAV successfully keeps its survival probability equal to 1.0.

4.2.2. Comparison of Two OPP Algorithms

We compare OPP-A with OPP-B to show the advantage of the proposed dynamic MOEA over the random restart method. In addition, we would like to test the validity of WSFM. Therefore, in the following experiments, the intelligent decision-making method will not be used. Two OPP algorithms will be tested in an unknown environment (no information about the four threats are known in advance) with fixed bias weight values ( $w_{1} = 0.7$ and $w_{1} = 0.3$ , or $w_{1} = 0.3$ and $w_{1} = 0.7$ ).

There are four threats in the simulated battlefield whose parameters are given in Table 2. The simulation terminates when the distance between the goal and the UAV is within 1 km.

Table 2.

Parameters of the threats

No.	μ_x (km)	μ_y (km)	σ _x	σ _y	Type
1	5	5	1.4	1	SAM
2	5	−5	2	2	SAM
3	15	−11	1.4	1.4	AAGun
4	13	2	1	1	SAM

As shown in Fig. 10, the paths obtained by OPP-A (solid line) are more reasonable and smoother than that of OPP-B (dotted line). This can be explained by Fig. 11 where the curves of the yaw angle are compared. It can be seen that the yaw angle curve calculated by OPP-B fluctuates more violently, which is harmful for the control mechanism of an actual UAV.

Figure 10.

Comparison between OPP-A (solid line) and OPP-B (dashed line)

Figure 11.

The curves of yaw angle of OPP-A and OPP-B

The results of the survival probability and flight time are given in Table 3. It is obvious that OPP-A outperforms OPP-B again. This is because LP-DNSGA2 employed by OPP-A can effectively improve the dynamic optimization performance in contrast to R-DNSGA2. In addition, the paths considering the DM bias of $w_{1} = 0.7$ and $w_{1} = 0.3$ are more likely to keep away from the threats in contrast to the bias of $w_{1} = 0.3$ and $w_{1} = 0.7$ . This reveals that the WSFM can effectively integrate the DM bias into the automatic planner.

Table 3.

Simulation results

	Survival probability	Flight time (s)
OPP-A, w₁=0.7, w₂=0.3	0.987	72
OPP-B, w₁=0.7, w₂=0.3	0.979	77
OPP-A, w₁=0.3, w₂=0.7	0.821	69
OPP-B, w₁=0.3, w₂=0.7	0.808	72

4.2.3. Validation of Intelligent Behaviour Against A Pop-up Threat

In the following experiments, we will show the validity of the intelligent OPP which is the combination of OPP-A and the intelligent environment assessment. Here, we assume that Threat-4 does not appear or work until the UAV flies for 50 seconds. In such a case, the UAV should assess the environment and react to the pop-up threat intelligently.

As shown in Fig. 12, in the first 50 seconds, the UAV has successfully evaded the threats and would fly in a straight trajectory toward to its goal if the hostile environment did not change. Then, a pop-up threat (No. 4) suddenly appears at 50 s, and fortunately, the UAV can react to this change by flying away from the pop-up threat as quickly as possible. At that moment, the intelligent environmental assessment works effectively to increase the probability of $P (E D L = V D)$ accordingly and the weight value ( $w_{1}$ ) associated to the safety is set to a higher one. As seen from Fig. 13, the probability $P (E D L = V D)$ raises at 50 s and goes down again (at about 58 s) when the UAV has flown away from the pop-up threat. Accordingly, as shown in Fig. 14, the value of $w_{1}$ is set to 0.9 from 50 s to 58 s and goes down when the environment seems safe again.

Figure 12.

Path planned by intelligent OPP algorithm proposed in this paper

Figure 13.

The probabilities of each EDL versus time

Figure 14.

The value of $w_{1}$ versus time

5. Conclusion

Usually the mission environment for a UAV is unknown and may change arbitrarily. The intelligent flight is a key technology for a UAV to react to the changing environment. The major contribution of this work is to solve the intelligent OPP problem, which is a basic issue for intelligent flight, by integrating dynamic MOEA, BN and fuzzy logic.

Considering the fact that a UAV has to collect information via its onboard sensors sometimes, a MPC-like OPP method is employed to continuously update the local environmental information for the planner and this method is in fact a DMOP. For solving this problem, we have proposed the LP-DMOEA and the main idea is to utilize the historical information to enhance the performance. The historical Pareto sets are collected to construct several time series and the search process for the new problem could be guided by the prediction of those time series. The WSFM has been introduced to select the best solution referring to the bias of DM. For making use of such a posterior method, the BN is used to model the environmental assessment accomplished by a pilot and fuzzy logic is employed to quantify the assessment results so as to obtain the weight value associated to each optimal objective. We have used the famous NSGA2 as the basic MOEA and the simulation is in a simple military case. The experimental results show that the LP-DMOEA works more effectively for the OPP in contrast to the restart method due to the positive impact on heuristically initializing the population for the new problem. In addition, the intelligent methods for solution selection can automatically assess the changing environment and adapt the path planner.

Footnotes

7. Acknowledgments

This work was partly supported by the National Nature Science Foundation of China under grant 61105068 and China Postdoctoral Science Foundation under grant 2011M501475.

Appendix

Table 4~7 show the CPTs of the BN of environmental assessment model in Section 3.

References

Chandler

P.R.

Rasmussen

Pachter

, “UAV Cooperative Path Planning,” in proc. AIAA Guidance, Navigation, and Control Conference and Exhibit, AIAA, 2000.

Nikolos

I.K.

Valavanis

K.P.

Tsourveloudis

N.C.

Kostaras

A.N.

, “Evolutionary Algorithm Based Offline/Online Path Planner for UAV Navigation,” IEEE Transactions on Systems, Man, and Cybernetics — Part B: Cybernetics, vol. 33, no. 6, 2003, pp. 898–912.

Rathbun

Capozzi

, “Evolutionary Approaches to Path Planning Through Uncertain Environments,” in proc. Proceedings of AIAA Unmanned Aerospace Vehicles, Systems, Technologies and Operations Conference and Workshop, AIAA, 2002.

Zheng

Sun

Ding

, “Evolutionary Route Planner for Unmanned Air Vehicles,” IEEE TRANSACTIONS ON ROBOTICS, vol. 21, no. 4, 2005, pp. 609–620.

Cobb

H.G.

Grefenstette

J.J.

, “Genetic algorithms for tracking changing environments,” in proc. 5th International Conference on Genetic Algorithms, Morgan Kaufmann, 1993, pp. 523–530.

Grefenstette

J.J.

Fitzpatrick

, “Genetic algorithms for changing environments,” in proc. the 2nd International Conference on Parallel Problem Solving from Nature (PPSN II), 1992, pp. 137–144.

Cedeno

Vemuri

V.R.

, “On the use of niching for dynamic landscapes,” in proc. 1997 Congress on Evolutionary Computation (CEC 97), 1997, pp. 361–366.

Mori

Imanishi

Kita

Nishikawa

, “Adaptation to changing environments by means of the memory based thermodynamical genetic algorithm,” in proc. 7th International Conference on Genetic Algorithms, Morgan Kaufmann, 1997, pp. 299–306.

Branke

, “Memory enhanced evolutionary algorithms for changing optimization problems,” in proc. 1999 IEEE Congress on Evolutionary Computation (CEC 99), IEEE, 1999, pp. 1875–1882.

10.

Peng

Gao

Yang

, “Environment identification based memory scheme for estimation of distribution algorithms in dynamic environments,” Soft Computing, vol. 15, no. 2, 2011, pp. 311–326.

11.

Yang

Yao

, “Population-based incremental learning with associative memory for dynamic environments,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 5, 2008, pp. 542–561.

12.

Branke

Funes

Schmidt

Schmeck

, “A multipopulation approach to dynamic optimization problems,” in proc. Adaptive Computing in Design and Manufacturing (ACDM 2000), Springer-Verlag, 2000, pp. 299–308.

13.

Ursem

R.K.

, “Multinational GA optimization techniques in dynamic environments,” in proc. 2nd Annual Conference on Genetic and Evolutionary Computation Conference (GECCO 2000), Morgan Kaufmann Publishers, 2000, pp. 19–26.

14.

Wineberg

Oppacher

, “Enhancing the GA's ability to cope with dynamic environments,” in proc. 2nd Annual Conference on Genetic and Evolutionary Computation Conference (GECCO 2000), Morgan Kaufmann Publishers, 2000, pp. 3–10.

15.

Yang

, “A general framework of multipopulation methods with clustering in undetectable dynamic environments,” IEEE Transaction on Evolutionary Computation, accepted September 2011 (DOI: 10.1109/TEVC.2011.2169966).

16.

Jin

Branke

, “Evolutionary Optimization in Uncertain Environments — A Survey,” IEEE Transaction on Evolutionary Computation, vol. 9, no. 3, 2005, pp. 303–317.

17.

Branke

, Evolutionary Optimization in Dynamic Environments, Kluwer Academic Pub, 2002.

18.

Morrison

R.W.

, Designing Evolutionary Algorithms for Dynamic Environments, Springer, 2004, p. 148.

19.

Yang

Ong

Y.-S.

Jin

, (Eds). Evolutionary Computation in Dynamic and Uncertain Environments, in the series Studies in Computational Intelligence Vol. 51, Springer-Verlag, 2007.

20.

Goh

C.-K.

Tan

K.C.

, (Eds). Evolutionary Multi-objective Optimization in Uncertain Environments: Issues and Algorithms, in the series Studies in Computational Intelligence Vol. 186, Springer-Verlag, 2009.

21.

Hatzakis

Wallace

, “Dynamic Multi-Objective Optimization with Evolutionary Algorithms: A Forward-Looking Approach,” in proc. the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO 2006), ACM, 2006, pp. 1201–1208.

22.

Rossi

Abderrahim

Diaz

J.C.

, “Tracking Moving Optima Using Kalman-Based Predictions,” Evolutionary Computation, vol. 16, no. 1, 2008, pp. 1–30.

23.

Zhou

Jin

Zhang

Sendhoff

Tsang

, “Prediction-Based Re-initialization for Evolutionary Dynamic Multi-objective Optimization,” in proc. the 4th International Conference on Evolutionary Multi-criterion Optimization (EMO2007), Springer-Verlag, 2007, pp. 932–846.

24.

Koo

W.T.

Goh

C.K.

Tan

K.C.

, “A predictive gradient strategy for multiobjective evolutionary algorithms in a fast changing environment,” Memetic Computing, vol. 2, no. 2, 2010, pp. 87–110.

25.

Peng

Yan

, “Intelligent Flight for UAV via Integration of Dynamic MOEA, Bayesian Network and Fuzzy Logic,” in proc. 50th IEEE Congress on Decision and Control, IEEE, accepted.

26.

Pongpunwattana

Rysdyk

, “Evolution-based Dynamic Path Planning for Autonomous Vehicles,” Studies in Computational Intelligence, vol. 70, 2007, pp. 113–145.

27.

Zengin

Dogan

, “Dynamic Target Pursuit by UAVs in Probabilistic Threat Exposure Map,” in proc. AIAA 3rd “Unmanned Unlimited” Technical Conference, Workshop and Exhibit, AIAA, 2004.

28.

Ferreira

J.C.

Fonseca

C.M.

Gaspar-Cunha

, “Methodology to Select Solutions from the Pareto-Optimal Set: A Comparative Study,” in proc. 9th Annual Conference on Genetic and Evolutionary Computation (GECCO 2007), ACM, 2007.

29.

Kim

Pearl

, “Convice; a conversational inference consolidation engine,” IEEE Transactions on Systems, Man and Cybernetics, vol. 17, 1987, pp. 120–132.

30.

Zadeh

L.A.

, “Fuzzy sets,” Information and control, vol. 8, no. 3, 1965, pp. 338–353.

31.

Deb

, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, 2002, pp. 182–197.

32.

http://www.armyrecognition.com/iran_iranian_army_light_heavy_weapons_uk/samavat_35mm_towed_anti_aircraft_twin_cannon_skyguard_radar_technical_data_sheet_specifications_pic.html.

Intelligent Online Path Planning for UAVs in Adversarial Environments

Abstract

Keywords

1. Introduction

2. Online Path Planning in Terms of Dynamic Multi-Objective Optimization

2.1. Formulation for the OPP Problem

2.2. Problem Solving Approach: The LP-DMOEA

2.2.1. Selecting Pareto Solutions for Linking

2.2.2. Construction of Time Series

2.2.3. Prediction and Generation of the Initial Population

2.3. Chromosome Representation

3. Decision-Making on the Selection of Executive Solution

3.1. Methodology Used to Select Solutions from the Pareto Set

3.2. Intelligent Situation Assessment via Bayesian Network (BN)

3.2.1. Concept of the BN

3.2.2. A BN-Based Assessment Model of Environment

3.3. Quantification of Environmental Assessment Results

4. Experimental Results and Analysis

4.1. Performance Comparison on Benchmark Problems

4.1.1. Benchmark Problems

4.1.2. Performance Indicators

4.1.3. Experimental Results on Benchmark Problems

4.2. Results and Analysis on the Intelligent OPP Algorithm

4.2.1. Validation of LP-DMOEA-Based OPP Algorithm

4.2.2. Comparison of Two OPP Algorithms

4.2.3. Validation of Intelligent Behaviour Against A Pop-up Threat

5. Conclusion

Footnotes

7. Acknowledgments

Appendix

References