Sage Journals: Discover world-class research

Abstract

Due to its complexity and non-deterministic polynomial-time hard characteristic, multirobot task allocation problem remains a challenging issue in the field of cooperative robotics. Thanks to its easy implementation and promising convergence speed, the particle swarm optimization method has recently aroused increasing research interest in the area of multirobot task allocation problem. However, the efficiency of the standard particle swarm optimization is hindered by several deficiencies such as the inefficient capabilities in balancing exploration and exploitation, as well as the high likelihood of plunging into stagnation. Aiming at enhancing the performance of particle swarm optimization via remedying these two drawbacks, this paper proposes an improved particle swarm optimization method, which integrates standard particle swarm optimization 2011 with evolutionary game theory. To prevent particles being locked into stagnation, particles in the proposed particle swarm optimization first adopt the updating rules of standard particle swarm optimization 2011 to undertake their movements. Subsequently, attempting to well trade off the exploration and exploitation capabilities of particles, a novel self-adaptive strategy, which is determined by the evolutionary stable strategies of evolutionary game theory and the iteration number of particle swarm optimization, is presented to adaptively adjust the main control parameters of particles in the proposed particle swarm optimization. Since the convergence of particle swarm optimization remains paramount and dramatically affects the performance of particle swarm optimization, this paper also analytically investigates the convergence of the proposed particle swarm optimization and provides a convergence-guaranteed parameter selection principle for the proposed method. Finally, leveraging the development of the proposed particle swarm optimization, this paper completes the design of a new particle swarm optimization–based multirobot task allocation method. The performance of the new particle swarm optimization–based multirobot task allocation method is tested through three different allocation cases against four well-known evolutionary methods. Experimental results confirm that the proposed method generally outperforms its contenders in terms of the solution quality. Moreover, the proposed method performs slightly better than the majority of its peers as far as the computation time is concerned.

Keywords

Multirobot task allocation improved particle swarm optimization evolutionary game theory convergence of particle swarm optimization

Introduction

Since the cooperation of multiple robots can enhance the overall performance of the multirobot system, the application of teams of multiple robots to accomplish some important unmanned tasks has become increasingly common over the past few decades.^1,2 A crucial and fundamental issue of the cooperative robotics is the task allocation problem (TAP), which aims to determine robot-to-task assignments, while optimizing a set of predefined performance criteria.³ In virtue of its importance in the multirobot system, TAP has persistently aroused great research interest in recent years.^3,4 However, the TAP has naturally been proven to be a non-deterministic polynomial-time hard (NP-hard) problem, and the NP-hard nature of TAP results in difficulties in solving this problem.^3,4 Therefore, far more effective optimization methods are always demanded in the field of TAP.

Based on the best knowledge of the authors, the market-based allocation approaches can be one of main streams in solving TAP. The most representative market-based allocation approach could be the contract network protocol (CNP) method which was proposed by Smith in 1980.⁵ Since the initial development of CNP, many researchers have committed themselves to developing some other market-based allocation methods in order to further efficiently solve TAP. A market-based allocation method for the energy-efficient execution problem has been presented by Haghighi in the study by Mo.⁶ The authors in the study by Gautham et al.⁷ have developed a distributed task allocation method based on auction and consensus principles for the multirobot system in healthcare facilities. A distributed market-based method where each robot bids for each task has been established in the study by Sahar et al..⁸ Thanks to their formidable reliability and scalability, the market-based allocation methods are particularly suitable to find optimal solutions for the distributed multirobot system.⁴ But, in order to determine optimal robot-to-task assignments, robots in the market-based allocation approach often need to cooperate with each other through explicit communication. Once the commutation network of the multirobot system is interrupted or encounters some other potential issues, the performance of the market-based allocation approach would significantly degrade, which may constraint the implementation of the market-based method on large-scale allocation problems.⁴

Probably because of their population-based nature and promising parallel search power to find satisfied solutions to some complex NP-hard problems within tractable time, many different evolutionary algorithms (EAs) such as modified differential evolution (mDE),³ genetic algorithm (GA),⁹ harmony search algorithm (HSA),¹⁰ and hybrid ant colony optimization (HACO)¹¹ have been proposed for solving the TAP within the last decade. Among the currently existing EAs, due to its simplicity and fast convergence speed, particle swarm optimization (PSO) algorithm has been widely used to solve TAP in recent years. A modified binary PSO (mbPSO) task allocation method has been proposed to handle the real-time task allocation in the study by Prescilla et al..¹² Since the cognitive and social acceleration parameters are fixed to 2 in mbPSO, this method may have difficulties in adaptively adjust the global and local search abilities of particles. The authors in the literature¹³ have developed a motivated guaranteed convergence PSO (M-GCPSO) to handle TAP under communication constraints. Because different models of motivations of robots are incorporated into M-GCPSO, this method may be problem-dependent, which may restrict its implementation on some other allocation problems. An intrinsically motivated PSO (IMPSO) method, in which an intrinsic motivation for the fitness function is integrated with PSO, has been presented to solve TAP in the study by Klyne and Merrick.¹⁴ As values of the inertia weight of particles are either fixed to 0.729 or 0 by the noisy of the fitness function in IMPSO, this method may be unable to well strengthen the exploration capabilities of particles in the early stages of the evolution.

Proverbially, the optimization efficiency of the standard PSO is restricted by its some typical drawbacks such as the inefficient ability in adjusting exploration and exploration as well as the high possibility of trapping into stagnation.^15,16 In order to enhance of performance of PSO, these two typical flaws of the standard PSO need to be remedied or overcome. For this purpose, from the initial development of the standard PSO,¹⁷ a considerable amount of research work has been devoted to developing different PSO methods. Aiming at avoiding particles being locked into stagnation, Clerc¹⁸ has developed a modified PSO, called standard PSO 2011 (SPSO 2011). In order to guarantee the particle can search the solution space with a non-zero velocity, an additional random point is added to the velocity of the particle in SPSO 2011. Yet, since the three control parameters of particles are fixed, SPSO 2011 may still suffer from the difficulties in trade offing exploration and exploitation.¹⁹

Because the exploration and exploitation capabilities of PSO heavily rely on its three control parameters that are the inertia weight and the cognitive and social acceleration parameters, developing PSO through different parameter settings has been extensively studied in recent years. In an attempt to well balance exploration and exploitation, Akbari and Ziarati²⁰ have proposed a rank-based PSO (RBPSO), in which the γ best particles are considered to contribute to the updating of the three control parameters of particles. In order to strike a good balance between exploration and exploitation, some other terrific research work focusing on improving PSO through different parameter setting principles can be also found in literatures.^21
–23 Reviewing the excellent work done in literatures,^20

–23 despite improving the exploration and exploitation abilities of PSO through some advanced parameter setting principles, the convergence of their proposed PSO methods remains unknown or uncertain. Since the three main control parameters of particles not only influence the exploration and exploitation capabilities of PSO but also determine the convergence of PSO, and the convergence property is of great importance in the field of PSO development, the issue of guaranteeing the convergence of PSO must be addressed when improving PSO through different parameter setting principles.²⁴ Unfortunately, as a stochastic method, like the most of the stochastic methods, the stochastic nature of PSO imposes on difficulties in the analytical investigation of the convergence of PSO.²⁴

Aiming at remedying the two aforementioned shortcomings of the standard PSO to enhance the performance of PSO, this paper proposes a new enhanced PSO method, called self-adaptive evolutionary game particle swarm optimization (SAEGPSO), which integrates SPSO 2011 with evolutionary game theory (EGT). In order to drag particles in SAEGPSO out of stagnation, SAEGPSO first borrows the moving rules defined in SPSO 2011 to update the velocities and positions of particles. Then, in an attempt to adaptively adjust the exploration and exploitation powers of SAEGPSO, a novel self-adaptive strategy determined by the evolutionary stable strategies of EGT and the iteration number of SAEGPSO is presented to fine-tune the three main control parameters of particles. Moreover, since the three control parameters have profound impacts on the convergence of PSO, this paper also analytically investigates the convergence of SAEGPSO and provides a convergence-guaranteed parameter selection principle for the proposed method.

Leveraging the development of SAEGPSO, this paper completes the design of a SAEGPSO-based task allocation method for the multirobot task allocation problem (MRTAP). In order to reduce the optimization burden and add more diversification to solutions, the feasibility-based rule presented by Kalyanmoy et al.²⁵ is adopted to handle constraints of the MRTAP in the SAEGPSO-based allocation method. Finally, the developed SAEGPSO-based task allocation method is verified through three different task allocation cases against mDE,³ GA,⁹ SPSO 2011,¹⁸ and improved linearly decreasing weight particle swarm optimization (ILDWPSO).²⁶ The simulation results indicate that the proposed method performs superior to its competitors as far as the solution quality is considered. Moreover, the computation time of the proposed method is comparable with that of SPSO 2011 and slightly better than those of the other compared methods.

The rest of this paper is organized as follows. The second section states and formulates the studied MRTAP. After simply reviewing SPSO 2011 and EGT, SAEGPSO is presented in the third section. The fourth section mainly analytically investigates the convergence of SAEGPSO. The fifth section describes the SAEGPSO-based allocation method for the MRTAP. Simulations and comparisons are conducted in the sixth section. The seventh section ends this study by drawing conclusions and showing some potential future work.

Formulation of MRTAP

This study concentrates on the multitask single-robot (MT-SR) allocation instance,²⁷ in which each robot can execute multiple tasks and each task only needs a single robot to execute. Given a set of $T = {T_{1}, T_{2},..., T_{M}}$ tasks and $R = {R_{1}, R_{2},..., R_{N}}$ robots, the goal of the MRTAP is to find a conflict-free task execution sequence for each robot, while optimizing the defined global objective function of the multirobot system. Here, an assignment is said to be free of conflict if each task is assigned no more than once.¹

In this paper, the global objective function of the multirobot system is the summation of objective functions of each robot. Similar to the study by Shin and Zhang,²⁸ three objective functions that are the traveling distance, denoted as J_di, the paid cost, denoted as J_ci, and the gained benefit, denoted as J_bi, are considered for each robot i in this article. The traveling distance J_di denotes the total traveling distance that robot i needs to move after the robot accomplishing all assigned tasks.²⁸ The paid cost J_ci stands for the cost that robot i needs to pay when it fails to execute the assigned tasks.²⁸ The gained benefit J_bi represents the benefit that robot i gains after the robot executing the assigned tasks.²⁸

In real-world applications, when each robot can successfully execute some tasks, the multirobot system can obtain a reward. Therefore, in the studied MRTAP, the gained benefit J_bi is designed to simulate the reward that the multirobot system can obtain when the robot successfully completes the assigned tasks. On the other hand, in real-world applications, since the robot may fail to accomplish some tasks, especially in some complex or hazard environments, the multirobot system needs to pay some costs in the case where some tasks are not assigned or executed. As a result, in the studied MRTAP, the paid cost J_ci is designed to simulate the cost that the multirobot system needs to pay when some tasks are not assigned or executed. Obviously, the more tasks the team of robots execute, the more gained benefit the multirobot system may gain; however, the more risks the multiple-robot system has to take to accomplish the assigned tasks, that is, the more paid cost the multirobot system may need to pay. Therefore, there exist conflicts between the paid cost J_ci and the gained benefit J_bi in the MRTAP.²⁸

Similar to the study by Shin and Zhang,²⁸ this study also focuses on the numerical value of each objective function. Using a binary decision variable x_ij to indicate whether task j is assigned to robot i or not, the three objective functions of each robot are calculated as follows²⁸

\begin{array}{l} J_{d i} = \sum_{j = 1}^{M} x_{i j} \cdot d_{i j} \\ J_{c i} = [1 - \prod_{j = 1}^{M} (1 - x_{i j} \cdot V P_{j})] \cdot A V_{i} \\ J_{b i} = \sum_{j = 1}^{M} x_{i j} \cdot E P_{i} \cdot A T_{j} \end{array}

where AV_i denotes the tactical value of robot i. VP_j denotes the violated execute probability of task j to all robots. AT_j is the tactical value of the j th task. EP_i is the execution probability of robot i to all tasks. d_ij represents the Euclidean distance between the current position of robot i and the position of task j. Here, it is important to note that since the mission of the MRTAP is to determine a conflict-free task execution sequence for each robot i in the multirobot system, the current position of robot i is determined by the positions of tasks in the task execute sequence assigned to the robot. For example, the execution sequence of robot i is given as [Task_A, Task_B], namely, robot i is first assigned to execute Task_A, and then execute Task_B. When robot i is first allocated to complete Task_A, the current position of robot i is the initial position of this robot in the working environment. Next, after robot i completing Task_A, the robot is then assigned to execute Task_B from the position of Task_A, that is, the current position of robot i is now changed into the position of Task_A. Therefore, in the given example, the total traveling distance J_di of robot i equals to the summation of the distance between the initial position of the robot to Task_A and the distance between Task_A and Task_B.

Clearly, J_di and J_ci need to be minimized, while J_bi must be maximized in order to guarantee the multirobot system to gain maximum benefit with minimum traveling distance and paid cost. Therefore, the MRTAP can be formulated into a combinatorial optimization problem using the model described in equation (1) as follows

min J = \sum_{i = 1}^{N} a_{1} \cdot J_{d i} + a_{2} \cdot J_{c i} - a_{3} \cdot J_{b i}

Subject to : {\begin{array}{l} \sum_{i = 1}^{N} x_{i j} \leq 1, \forall j \in M (3) \\ \sum_{j = 1}^{N} x_{i j} \leq 1, \forall j \in N (4) \\ x_{i j} \in {0, 1}, \forall (i, j) \in N \times M (5) \end{array}

where a₁, a₂, and a₃ are three weighting parameters indicating the relative importance of J_di, J_ci, and J_bi ( $a_{1} = a_{2} = a_{3} = 1$ in this paper). M and N denote the total number of tasks and robots, respectively. x_ij is a binary decision variable, where $x_{i j} = 1$ indicates assigning robot i to task j, otherwise, $x_{i j} = 0$ . Due to the limitation of the payload of each robot, the maximum number of tasks that each robot can execute is constrained, that is, the payload constraint of each robot is limited. Here, L_i is used to denote the payload constraint of robot i, which stands for the maximum number of tasks that robot i can complete.

The proposed SAEGPSO method

Since SPSO 2011 and EGT are incorporated into the proposed SAEGPSO method, both these methods are first reviewed in this section. Afterwards, the proposed SAEGPSO is presented at the end of this section.

Review of SPSO 2011

Assume that the position and velocity of the i th particle at iteration t are denoted as $X_{i} (t) = [x_{i 1} (t),..., x_{i n} (t)]$ and $V_{i} (t) = [v_{i 1} (t),..., v_{i n} (t)]$ , respectively. Let ${Pbest}_{i} (t) = [{pbest}_{i 1} (t),..., {pbest}_{i n} (t)]$ represent the personal best solution of particle i at iteration t. Let $Gbest (t) = [{gbest}_{1} (t),\dots, {gbest}_{n} (t)]$ stand for the global best solution of the swarm at iteration t, where n is the dimension of the search space. Suppose $B C_{i} (t)$ denotes the iso-barycentre of particle i at iteration t. The authors in the literature¹⁸ suggested to calculate the barycenter of ${BC}_{i} (t)$ as follows

{BC}_{i} (t) = X_{i} (t) + \frac{c_{1} [{Pbest}_{i} (t) - X_{i} (t)] + c_{2} [Gbest (t) - X_{i} (t)]}{3}

where c₁ and c₂ are two positive parameters denoting the cognitive and social acceleration parameters, respectively.

In SPSO 2011, a random point, denoted as ${X^{'}}_{t} (t)$ , is first generated in a hypersphere, represented as $HP ({BC}_{i} (t), | | {BC}_{i} (t) - X_{i} (t) | |)$ . Here, the center of the hypersphere $HP ({BC}_{i} (t), | | {BC}_{i} (t) - X_{i} (t) | |)$ is located at ${BC}_{i} (t)$ and the radius of $HP ({BC}_{i} (t), | | {BC}_{i} (t) - X_{i} (t) | |)$ is determined by the Euclidean distance between ${BC}_{i} (t)$ and the particle’s current position $X_{i} (t)$ . Then, ${X^{'}}_{t} (t)$ is regarded as a disturbance added to the velocity of the particle in order to guarantee the particle can keep exploring the search space, so that the stagnation issue can be avoided in SPSO 2011.⁸ After ${X^{'}}_{t} (t)$ is produced in the hypersphere, particles in SPSO 2011 update their velocities and positions based on the moving rules as follows¹⁸

{\begin{matrix} V_{i} (t + 1) = ω V_{i} (t) + {X^{'}}_{i} - X_{i} (t) (7) \\ X_{i} (t + 1) = V_{i} (t + 1) + X_{i} (t) (8) \end{matrix}

where ω is a real coefficient denoting the inertia weight of the particle.

Also, it was suggested in the literature¹⁸ to fix the three control parameters of particles in SPSO 2011 as follows

{\begin{array}{l} ω = 0.7213 \\ c_{1} = c_{2} = 1.1931 \end{array}

As given in equation (7), because ${X^{'}}_{i} (t)$ is introduced to the velocity of the particle, the particle can keep exploring the search space with a non-null velocity. Therefore, the non-stagnation property can be achieved in SPSO 2011.¹⁸ Due to this main advantage of SPSO 2011, this method has been widely investigated in literatures.^29
–31

Review of EGT

Suppose a population has a certain amount of players who are playing a game in competition with one another. The purpose of each player in the competition game is to maximum its fitness value by selecting different strategies both based on the experience of the player and those of its competitors. The basic idea of playing this kind of game is that the more fit a strategy is at the current moment, the more likely it will be employed by more players in the future.³² In order to answer how strategies adopted by players change over time in this kind of game, modeling over different types of animal conflicts, EGT was originally proposed in the context of biology.³³ The requirement on modeling the evolution phenomena leads to the application of mathematical theory of games to interpret aspects of the evolutionary process.³⁴

Although EGT was originally developed in the context of the biologic science, it has increasingly aroused research interest of the other researchers from different fields such as the economists, sociologists, and the philosophers over the last few decades. Such an interpretation to the wide application of EGT on different fields may be explained by the following facts. Firstly, the notion of the evolution needs to be understood as the change of beliefs and norms over time.³⁴ Secondly, the modeling of the variations of strategies offers a social aspect which exactly matches the social system interactions.³⁴ Lastly, EGT can dynamically model the social interactions of different players within a population, which is one of missing elements of the classical game theory.³⁴ In EGT, the final output of a game is determined by the equilibrium point, called the evolutionary stable strategy (ESS). The principle of EGT is both based on the performances gained by a single strategy and the presence of other strategies. In addition to the ESS, the replicator dynamic equation (RDE) is another key concept in EGT, which is used to describe the adaptation of the population over time. Considering their importance in EGT, both ESS and RDE are described in the herein subsections.

Description of ESS

In EGT, a strategy can be considered as an ESS such that, when all players in a population play this strategy, no mutant strategy can invade the population under the influence of the natural selection.³² Suppose a population has some players following the strategy p₁ and a few players adopting strategy p₂. Assume $f (p_{1}, \bar{S_{d}})$ represents the payoff, that is, the fitness value of a specific player using strategy p₁. Then, for a minimization optimization problem (i.e., players in EGT tend to minimize their fitness values), p₁ is said to be an ESS if $f (p_{2}, \bar{S_{d}}) > f (p_{1}, \bar{S_{d}})$ for all $p_{2} \neq p_{1}$ and any sufficiently small μ ³². Here, $\bar{S_{d}}$ denotes the strategy distribution of the population and can be calculated as follows³²

\bar{S_{d}} = (1 - μ) p_{1} + μ p_{2}

where μ denotes the frequency of p₂ in the population.

Description of RDE

In order to discuss the stability of EGT, it is essential to define a dynamic for the EGT³² to describe the impacts of the variations of strategies on the population.³² For this purpose, Taylor and Jonker³² have developed the replicator dynamic equation to model the adaptation of strategies in a population. From the work done by Taylor and Jonker,³² it can be found that the matrix game is a most commonly used way to depict the strategy interactions among players, which can be described using the following notations: (1) e_i represents the i th pure strategy, where $i = 1, 2, ..., m$ and m is the number of pure strategies; (2) $K = f (e_{i}, e_{j})$ denotes the payoff matrix with the size of $m \times m$ ; (3) $\nabla^{m} \equiv {p = (p_{1},..., p_{m}) | \sum_{i = 1}^{m} p_{i} = 1, 0 \leq p_{i} \leq 1}$ is the set of mixed strategies (probability distribution over the pure strategy e_i); (4) $f (p, q) = p K q^{T}$ stands for the payoff, that is, the fitness value of players using strategy p facing other players adopting strategy q. In EGT, RDE is used to describe the difference between the payoff of the i th strategy and the mean payoff of the population, which is defined as follows³²

{\dot{p}}_{i} = - p_{i} (e_{i} \cdot K p^{T} - p \cdot K p^{T})

The RED defined by equation (11) is one of the most widely accepted evolutionary dynamics used to describe the evolution of the strategy frequency p_i.³⁴ Thanks to its promising search stability and population-based nature, EGT has been widely studied in literatures.^24,34

–37 For more details about EGT, the reader is also suggested to refer to these excellent works done in literatures.^24,34

–37

Modeling of SAEGPSO

Universally, striking a good balance between the exploration and exploitation capabilities of PSO is of great importance in the design of a PSO-based optimizer in order to enhance the performance of PSO. Ideally, on one hand, the exploration ability of PSO must be strengthened in the early stages of the evolution in order to encourage particles to search through the whole search space, rather than converging toward the current population-best solution.^15,20,22,38 On the other hand, the exploitation capability of PSO needs to be promoted in the latter phases of the evolution, so that particles are more likely to be encouraged to search carefully in a local region to increase the likelihood of finding the global optimal solution.^15,20,22,38 As is known to all, such two abilities of PSO are dramatically affected by its three control parameters. The basic principles regarding how the three control parameters of PSO influence such two capabilities can be summarized as follows: (1) the exploration power of PSO benefits from a large inertia weight, while the exploitation power benefits from a small inertia weight^15,20,22,38; (2) comparing with the social component, a large cognitive acceleration parameter leads particles to global searches and, consequently, promotes the exploration ability of PSO^15,38,39; (3) comparing with the cognitive component, a large social acceleration parameter turns particles into local searches and thus facilitates the exploitation ability of PSO.^15,38,39

Based on the basic principles stated above, despite achieving the non-stagnation property in SPSO 2011, this method may still suffer from the potential difficulties in well trade offing the exploration and exploitation abilities, since the three control parameters of particles in SPSO 2011 are fixed and there does not exit any distinction between the cognitive and social acceleration parameters. This paper concentrates on improving PSO through remedying the two aforementioned shortcomings of the standard PSO and proposes an improved method, called SAEGPSO, which combines SPSO 2011 with EGT. The motivation of integrating SPSO 2011 with EGT is that since both these two approaches are population-based methods, combining them together can naturally leverage their advantages to improve the performance of the enhanced method. Motivated by this consideration, in order to avoid particles falling into stagnation, the moving rules defined by equations (6) to (8) in SPSO 2011 are applied to guide the movements of particles in SAEGPSO. In order to well trade off the exploration and exploitation powers of SAEGPSO, a newly developed self-adaptive strategy determined by the evolutionary stable strategies of EGT and the iteration number of SAEGSPO is proposed to fine-tune the three control parameters of particles in SAEGPSO.

Prior to presenting how to combine the EGT principle with the iteration number of SAEGPSO in the developed self-adaptive strategy, there is necessity to recall some important notations such as “player”, “strategy,” and “payoff matrix” of EGT, since these notations will be analogized in the newly developed self-adaptive strategy presented in SAEGPSO. Players in EGT denote candidate solutions to an optimization problem, which push the evolution process of EGT. Strategies in EGT denote the candidate principles that each player can select to maximize its playoff, that is, the fitness value. The payoff matrix is a matrix which is constructed by the payoffs, that is, the fitness values of all players in the population. After recalling these notations in EGT, before the introduction of the proposed self-adaptive strategy, we first present the analogy between SAEGPSO and EGT as follows: (1) each player in EGT analogizes each particle in SAEGPSO; (2) each particle in SAEGPSO has three candidate strategies, which are moving only based on its inertia weight, moving only based on its personal best solution, and moving only based on the global best solution of the swarm, respectively; (3) the average of performance gained by the particle in SAEGPSO playing a specific strategy among the three candidate strategies consists of the payoff matrix of EGT.

Let e₁, e₂, and e₃, respectively, denote that the particle follows the strategy only based on its inertia weight, the strategy only based on its personal best solution, and the strategy only based on global best solution of the swarm. Then, the payoff matrix used in this article is defined as follows

K = {\begin{matrix} f (e_{1}) & \frac{f (e_{1}) - f (e_{2})}{2} & \frac{f (e_{1}) - f (e_{3})}{2} \\ \frac{f (e_{2}) - f (e_{1})}{2} & f (e_{2}) & \frac{f (e_{2}) - f (e_{3})}{2} \\ \frac{f (e_{3}) - f (e_{1})}{2} & \frac{f (e_{3}) - f (e_{2})}{2} & f (e_{3}) \end{matrix}}

where $f (e_{i})$ ( $i = 1, 2, 3$ ) is the payoff, that is, the fitness value that the particle obtains by only using the strategy e_i ( $i = 1, 2, 3$ ).

At each iteration t, the ESS of EGT is used to weight the mean of the fitness obtained by each particle, which denotes the ratio of each strategy when the swarm of SAEGPSO reaches a stable equilibrium point. At each iteration t, the ESS applied in this study is given as follows

ESS (t) = [Z_{1} (t), Z_{2} (t), Z_{3} (t)]

subject to (s.t.)

(\forall t \in ℕ) (\sum_{i = 1}^{3} Z_{i} (t) = 1)

The value of $f (e_{i})$ ( $i = 1, 2, 3$ ) is gained from the previous experience of the particle and calculated as follows

f (e_{i}) = \frac{1}{t} \sum_{t_{1} = 1}^{t} Z_{i} (t_{1} - 1) F (X (t_{1} - 1))

where t ∈ ℕ denotes the iteration number of SAEGPSO. $F (X (t_{1} - 1))$ is the previously obtained fitness value of the particle X at iteration $(t_{1} - 1)$ , where $t_{1} \in ℕ$ and $1 \leq t_{1} \leq t$ . $Z_{i} (t_{1} - 1)$ is the ratio of the i th strategy at iteration ( $t_{1} - 1$ ). Note that in order to guarantee each particle in SAEGPSO to equally follow the three strategies at the initial step of the evolution of SAEGPSO, we set that $ESS (0) = [1 / 3, 1 / 3, 1 / 3]$ .

Once the payoff $f (e_{i})$ ( $i = 1, 2, 3$ ) of each particle is obtained using equation (15), it is then used to fill the payoff matrix defined by equation (12). Once the payoff matrix given by equation (12) is filled, the RDE given by equation (11) is implemented to obtain the associated ESS, that is, $ESS (t) = [Z_{1} (t), Z_{2} (t), Z_{3} (t)]$ at each iteration t. After $ESS (t) = [Z_{1} (t), Z_{2} (t), Z_{3} (t)]$ is obtained, the three ratios $Z_{1} (t)$ , $Z_{2} (t)$ , and $Z_{3} (t)$ are then applied to adjust the three control parameters of each particle in the developed self-adaptive strategy in SAEGPSO as follows

ω (t + 1) = (ω_{s} - ω_{f}) exp (- \frac{δ_{ω} t}{β}) + ω_{f}

c_{1} (t + 1) = (c_{1 s} - c_{1 f}) exp (- \frac{δ_{c_{1}} t}{β}) + c_{1 f}

c_{2} (t + 1) = (c_{2 s} - c_{2 f}) exp (\frac{δ_{c_{2}} t}{β}) + c_{2 f}

where

δ_{ω} = \frac{ω_{s} - ω_{f}}{t_{max}}

δ_{c_{1}} = \frac{c_{1 s} - c_{1 f}}{t_{max}}

δ_{c_{2}} = \frac{c_{2 s} - c_{2 f}}{t_{max}}

β = \frac{Z_{1} (t) + Z_{2} (t)}{Z_{3} (t) + Δ}

where the subscripts “s” and “f” in ω, c₁, and c₂ denote their initial and final values, respectively. t_max represents the maximum iteration number. Δ is a sufficiently small positive parameter and empirically set to be $Δ = 1 e - 10$ in this paper in order to avoid the denominator of β given in equation (22) equals to zero. It is important to note that $ω_{s} > ω_{f}$ , $c_{1 s} > c_{1 f}$ , and $c_{2 f} > c_{2 s}$ in the above self-adaptive strategy in SAEGPSO.

The implementation of the three ratios $Z_{1} (t)$ , $Z_{2} (t)$ , and $Z_{3} (t)$ are used to determine the control parameters of particles in SAEGPSO can be interpreted by the fact that these three ratios denote a stable search direction that particles need to follow. Since the three ratios are obtained based on the previous experience of particles, when they are implemented to determine the control parameters of particles, particles may adapt the “shape” of the search space to optimize the search direction of the swarm. In addition, the stability characteristic of ESS in EGT may be applied to face the potential irregularity of the search space to help particles to avoid clustering into some local optimal, without damaging the search capabilities of particles. As authors in the literature²² have discovered that updating the control parameters of PSO based on the exponential or double exponential manner can accelerate the convergence of PSO, the three control parameters of particles in SAEGPSO are adaptively adjusted using the exponential function, as defined by equations (16) to (18) in the developed self-adaptive strategy.

Parametric analysis for SAEGPSO

In order to intuitively interpret the rationalities of the developed self-adaptive strategy in SAEGPSO, this subsection conducts a parametric analysis for SAEGPSO. It is clear from equations (16) to (18) that the inertia weight ω and the cognitive acceleration parameter c₁ decrease, while the social acceleration parameter c₂ increases as the iteration number t increases. Thus, based on the basic principles noted above, the exploration ability of SAEGPSO is more likely to be maintained in the early phases of the evolutionary process, which would be reduced over time, so that the exploitation capability may take over the exploration ability quickly in the latter states of the evolution. It is noticeable that, adopting the updating rule of a constant β, trade-offs between these two abilities only vary with respect to the iteration number t.

In addition to the iteration number t, the balance of the search behavior in SAEGPSO is also adapted based on the parameter β. From equations (16) to (18), one can easily observe that ω and c₁ decrease, while the variation in c₂ becomes larger as β increases. According to the basic principles stated above, this implies that the exploration tendency of SAEGPSO will be more preserved for a large β. From equation (22), a large β indicates a relatively large value of ( $Z_{1} (t) + Z_{2} (t)$ ). For a relatively large ( $Z_{1} (t) + Z_{2} (t)$ ), the search directions of particles could be more stable when particles follow the strategies of their inertia weight and personal best experience, which implies that the population may benefit more through particles following these two strategies. Therefore, according to equations (16) and (17), it is natural to increase ω and c₁ to promote the exploration power of SAEGPSO for a large β. Contrarily, a small β indicates that the value of $Z_{3} (t)$ is relatively large based on equation (22) (ignoring the sufficiently small constant Δ), which implies that the search directions of particles may be more stable and the swarm could benefit more when particles play the strategy of following the global best experience of the swarm. Hence, as shown in equation (18), for a small β, the variation of c₂ increases in order to enhance the exploitation ability of SAEGPSO.

To summarize, through implementing the self-adaptive strategy defined by equations (16) to (22), the three main control parameters of particles in SAEGPSO are adaptively adjusted, complying with the basic principles of PSO development. Thus, particles may be promoted to search for high-quality solutions and the performance of SAEGPSO can be enhanced. The variation trends of these three parameters of particles in SAEGPSO under different values of β are illustrated in Figure 1.

Figure 1.

Parameter changes with respect to different β in SAEGPSO. (a) Changes of inertia weight ω under different β. (b) Changes of cognitive acceleration parameter c₁ under different β. (c) Changes of social acceleration parameter c₂ under different β. SAEGPSO: self-adaptive evolutionary game particle swarm optimization.

Theoretical investigations of SAEGPSO

As stated in the third section, in order to enhance the performance of PSO via remedying the two typical flaws of the standard PSO, this paper proposes an enhanced PSO, called SAEGPSO. Aiming at dragging particles in SAEGPSO out of stagnation, the updating rules of SPSO 2011 defined by equations (7) and (8) are borrowed by SAEGPSO to update the velocities and positions of particles. On the other hand, focusing on well balancing the exploration and exploitation capabilities of SAEGPSO, a newly developed self-adaptive strategy determined by EGT and the iteration number of SAEGPSO is presented to tune the three control parameters of particles in SAEGPSO. Since one main modification of SAEGPSO lies on proposing a parameter setting principle to update the three control parameters of particles and the three control parameters influence the convergence of PSO, there is necessity to analytically investigate the convergence of SAEGPSO with respect to different values of the three control parameters. Therefore, we carry out a theoretical investigation, including the convergence analysis, the equilibrium point, and the convergence-guaranteed parameter selection principle, pertaining to SAEGPSO in following contents of this section.

Convergence analysis for SAEGPSO

The convergence analysis of PSO is used to determine the boundaries of the three control parameters which guarantee the convergence of PSO. In this subsection, we focus on investigating the convergence of SAEGPSO based on the deterministic model convergence analysis,³⁰ that is, in the case where ${X^{'}}_{i} = {BC}_{i} (t)$ . For simplicity and without loss of generality, omitting the subscript i of each variable in equations (7) and (8), the moving rules of particles in SAEGPSO can be rewritten into the matrix form as follows

[\begin{matrix} X (t + 1) \\ V (t + 1) \end{matrix}] = [\begin{matrix} 1 - c & ω \\ - c & ω \end{matrix}] [\begin{matrix} X (t) \\ V (t) \end{matrix}] + [\begin{matrix} c \\ c \end{matrix}] P

where

c = \frac{c_{1} + c_{2}}{3}

P = \frac{c_{1} \cdot Pbest + c_{2} \cdot Gbest}{c_{1} + c_{2}}

The characteristic equation of system (23) is easily obtained as follows

η^{2} - (1 + ω - c) η + ω = 0

Assume two roots to equation (26) are denoted as η₁ and η₂. Then, it is easily obtained that

η_{1, 2} = \frac{1 + ω - c \pm \sqrt{{(1 + ω - c)}^{2} - 4 ω}}{2}

Based on the dynamic system theory, the system (23) converges iff the magnitudes of the two characteristic roots are less than 1 (“iff” means “if and only if” in this paper).⁴⁰ In another word, system (23) converges, iff

Max {| η_{1} |, | η_{2} |} < 1

From equation (27), since η₁ and η₂ may be two real or complex roots, both these two cases are discussed separately so as to easily study the convergence of system (23).

Case 1. η₁ and η₂ are two complex roots, presented as $η_{1, 2} \in ℂ$ , where ℂ denotes the imaginary domain.

Lemma 1

For system (23), $η_{1, 2} \in ℂ$ , iff

{\begin{array}{l} 1 + ω - 2 \sqrt{ω} < c < 1 + ω + 2 \sqrt{ω} \\ ω \geq 0 \end{array}

Proof

It is trivial from equation (26) that $η_{1, 2} \in ℂ$ , iff

{(1 + ω - c)}^{2} - 4 ω < 0

The proof of lemma 1 is completed by solving equation (30) with classical mathematical method.

Then, we need to find the conditions on ω and c which guarantee the convergence of system (23) in the case where $η_{1, 2} \in ℂ$ . Here, please recall that the convergence of system (23) is guaranteed iff $Max {| η_{1} |, | η_{2} |} < 1$ .

Lemma 2

For $η_{1, 2} \in ℂ$ , system (23) converges, iff

{\begin{array}{l} 1 + ω - 2 \sqrt{ω} < c < 1 + ω + 2 \sqrt{ω} \\ 0 \leq ω < 1 \end{array}

Proof

Notice that the magnitude of any imaginary number H can be calculated as $| H | = \sqrt{H_{r}^{2} + H_{c}^{2}}$ , where H_r and H_c are the real and imaginary parts of H, respectively. Therefore, in the case where $η_{1, 2} \in ℂ$ , it can be easily obtained that

\begin{array}{l} Max {| η_{1} |, | η_{2} |} = | η_{1} | = | η_{2} | = \sqrt{ω} \end{array}

Therefore

Max {| η_{1} |, | η_{2} |} < 1 \Leftrightarrow \sqrt{ω} < 1

In the case where $η_{1, 2} \in ℂ$ , based on lemma 1, equation (29) must hold. Hence, both taking two conditions that $η_{1, 2} \in ℂ$ and $Max {| η_{1} |, | η_{2} |} < 1$ into account, system (23), that is, SAEGPSO, converges in the case where $η_{1, 2} \in ℂ$ , iff

{\begin{array}{l} 1 + ω - 2 \sqrt{ω} \leq c \leq 1 + ω + 2 \sqrt{ω} \\ 0 \leq ω < 1 \end{array}

This completes the proof of lemma 2.

Figure 2 displays the three-dimensional convergence domain of SAEGPSO and the corresponding projections of the convergence domain on different parameter planes in the case where $η_{1, 2} \in ℂ$ . For any $η_{1, 2} \in ℂ$ , SAEGPSO converges only if the selections of ω and c locate in the convergence region demonstrated in Figure 2(b).

Figure 2.

The convergence domain of SAEGPSO for $η_{1, 2} \in ℂ$ . (a) 3-D presentation of the convergence domain. (b) Projection of convergence domain on plane (ω, c). (c) Projection of convergence domain on plane (ω, Max ${| η_{1} |, | η_{2} |}$ ). (d) Projection of convergence domain on plane (c, Max ${| η_{1} |, | η_{2} |}$ ). SAEGPSO: self-adaptive evolutionary game particle swarm optimization.

Case 2. η₁ and η₂ are two real roots, denoted as $η_{1, 2} \in ℝ$ , where ℝ stands for real-valued domain.

Lemma 3

For equation (23), $η_{1, 2} \in ℝ$ , iff

{\begin{array}{l} c \in ℝ, ω < 0 \\ c \leq 1 + ω - 2 \sqrt{ω} or c \geq 1 + ω + 2 \sqrt{ω}, ω \geq 0 \end{array}

Proof

It is clear from equation (26) that $η_{1, 2} \in ℝ$ , iff

{(1 + ω - c)}^{2} - 4 ω \geq 0

Solving equation (36) using classical mathematical method, the proof of lemma 3 is completed.

Then, let us discover conditions on ω and c which can guarantee the convergence of system (23) under the situation where $η_{1, 2} \in ℝ$ .

Lemma 4

System (23) converges in the case where $η_{1, 2} \in ℝ$ , iff

{\begin{array}{l} 0 < c < 2 ω + 2, - 1 < ω < 0 \\ 0 < c \leq 1 + ω - 2 \sqrt{ω} or 1 + ω + 2 \sqrt{ω} \leq c < 2 ω + 2, 0 \leq ω < 1 \end{array}

Proof

It is trivial from equations (27) and (28) that $Max {| η_{1} |, | η_{2} |} < 1$ holds for $η_{1, 2} \in ℝ$ , iff

- 1 < \frac{1 + ω - c \pm \sqrt{{(1 + ω - c)}^{2} - 4 ω}}{2} < 1

Namely, iff

c - ω - 3 < \pm \sqrt{{(1 + ω - c)}^{2} - 4 ω} < c - ω + 1

As $η_{1, 2} \in ℝ$ , it is trivial from equation (39) that

(38) \Leftrightarrow {\begin{array}{l} c - ω - 3 < - \sqrt{{(1 + ω - c)}^{2} - 4 ω} \\ \sqrt{{(1 + ω - c)}^{2} - 4 ω} < c - ω + 1 \end{array}

Simplifying the right-hand inequalities in equation (40) using classical mathematical method, generates

(38) \Leftrightarrow {\begin{array}{l} 2 ω + 2 - c > 0 \\ c > 0 \end{array}

Based on lemma 3, equation (35) must hold when $η_{1, 2} \in ℝ$ . Thereafter, both considering two conditions that $η_{1, 2} \in ℝ$ and $Max {| η_{1} |, | η_{2} |} < 1$ , system (23), that is, SAEGPSO, converges in the case where $η_{1, 2} \in ℝ$ , iff

{\begin{array}{l} 0 < c < 2 ω + 2, - 1 < ω < 0 \\ 0 < c \leq 1 + ω - 2 \sqrt{ω} or 1 + ω + 2 \sqrt{ω} \leq φ < 2 ω + 2, 0 \leq ω < 1 \end{array}

For $η_{1, 2} \in ℝ$ , the three-dimensional convergence domain of SAEGPSO and the corresponding projections of the convergence domain on different parameter planes are visualized in Figure 3. Only if any parameter choice of ω and c locates within the region illustrated in Figure 3(b), the convergence of SAEGPSO can be guaranteed in the case where $η_{1, 2} \in ℝ$ .

Figure 3.

The convergence domain of SAEGPSO in the case where $η_{1, 2} \in ℝ$ . (a) 3-D presentation of the convergence domain. (b) Projection of convergence domain on plane (ω, c). (c) Projection of convergence domain on plane (ω, Max ${| η_{1} |, | η_{2} |}$ ). (d) Projection of convergence domain on plane (c, Max ${| η_{1} |, | η_{2} |}$ ). SAEGPSO: self-adaptive evolutionary game particle swarm optimization.

Then, let us combine the conclusions drawn in lemma 3 and lemma 4 together, we can conclude that the system (23), that is, SAEGPSO, converges, iff

{\begin{array}{l} 0 < c < 2 ω + 2 \\ - 1 < ω < 1 \end{array}

Figure 4 shows the three-dimensional convergence domain of SAEGPSO and the corresponding projections of the convergence domain on different parameter planes. The convergence of SAEGPSO can be guaranteed as long as any parameter choice of ω and c locates within the triangle region shown in Figure 4(b).

Figure 4.

The convergence domain of SAEGPSO. (a) 3-D presentation of the convergence domain. (b) Projection of convergence domain on plane (ω, c). (c) Projection of convergence domain on plane (ω, Max ${| η_{1} |, | η_{2} |}$ ). (d) Projection of convergence domain on plane (c, Max ${| η_{1} |, | η_{2} |}$ ). SAEGPSO: self-adaptive evolutionary game particle swarm optimization.

The equilibrium point of SAEGPSO

The equilibrium point of PSO is a stable point that particles converge toward at the end of the evolution. After studying the convergence of SAEGPSO in ‘Convergence analysis for SAEGPSO’ section, we still need to find the equilibrium point of this method. Calculating limits on both sides of equation (23) yields

{\begin{array}{l} lim_{t \to \infty} X (t + 1) = ω lim_{t \to \infty} V (t) + c lim_{t \to \infty} (P - X (t)) \\ lim_{t \to \infty} V (t + 1) = lim_{t \to \infty} X (t) + lim_{t \to \infty} V (t) \end{array}

Obviously, $lim_{t \to \infty} X (t + 1) = lim_{t \to \infty} X (t)$ and $lim_{t \to \infty} V (t + 1) = lim_{t \to \infty} V (t)$ when the convergence of SAEGPSO is guaranteed. Hence, substituting these two equations into equation (44), produces

{\begin{matrix} lim_{t \to \infty} X (t) = P = \frac{c_{1} \cdot Pbest + c_{2} \cdot Gbest}{c_{1} + c_{2}} (45) \\ lim_{t \to \infty} V (t) = 0 (46) \end{matrix}

where c₁ and c₂, respectively, denote the cognitive and social acceleration parameters of the particle. $Pbest$ and $Gbest$ represent the personal best solution of the particle and global best solution of the swarm, respectively.

Convergence-guaranteed parameter selection principle for SAEGPSO

In ‘Convergence analysis for SAEGPSO’ section, it has been proven that SAEGPSO converges iff the condition given by equation (43) is satisfied. Here, the remaining task is to answer how to set the initial and final values of the three control parameters of particles in the newly developed self-adaptive strategy to satisfy the condition given by equation (43), so that the convergence of SAEGPSO can be sufficiently guaranteed. Thus, this subsection provides a convergence-guaranteed parameter selection principle for SAEGPSO.

Lemma 5

SAEGPSO converges, only if the initial and final values of the three control parameters meet the following conditions

{\begin{array}{l} 0 < c_{1 s} + c_{1 f} < 6 ω_{f} + 6 \\ - 1 < ω_{f} < ω_{s} < 1 \\ c_{1 s} = c_{2 f} > c_{1 f} = c_{2 s} > 0 \end{array}

Proof

Substituting $c = (c_{1} + c_{2}) / 3$ into the condition given by equation (43), we can obtain that SAEGPSO converges, iff

{\begin{array}{l} 0 < c_{1} + c_{2} < 6 ω + 6 \\ - 1 < ω < 1 \end{array}

If $c_{1 s} = c_{2 f} > c_{1 f} = c_{2 s} > 0$ , it appears from equations (17), (18), (20), and (21) that $c_{1} + c_{2} = c_{1 s} + c_{1 f} > 0$ can be trivially satisfied for any particle at any iteration in SAEGPSO. Besides, we can easily make an observation from equations (17) to (19) that $ω_{f} \leq ω \leq ω_{s}$ , $c_{1 f} \leq c_{1} \leq c_{1 s}$ and $c_{2 s} \leq c_{2} \leq c_{2 f}$ for any particle at any iteration in SAEGPSO. Thus, it is clear that

{\begin{array}{l} 0 < c_{1 s} + c_{1 f} < 6 ω_{f} + 6 \\ - 1 < ω_{f} < ω_{s} < 1 \\ c_{1 s} = c_{2 f} > c_{1 f} = c_{2 s} > 0 \end{array} \Rightarrow {\begin{array}{l} 0 < c_{1} + c_{2} < 6 ω + 6 \\ - 1 < ω < 1 \end{array}

Because the right-hand side of equation (49) is the necessary and sufficient condition for the convergence of SAEGPSO, the proof of lemma 5 is completed based on the relationship given by equation (49).

It is important to note that values of ω_s, ω_f, $c_{1 s}$ , $c_{1 f}$ , $c_{2 s}$ , and $c_{2 f}$ are given beforehand. Therefore, the convergence condition given by equation (47) is easy to be met through setting properly values of these predefined parameters. The convergent position and velocity trajectories of the particle in SAEGPSO are demonstrated in Figure 5, where values of these parameters are empirically suggested to be as follows: $ω_{s} = 0.9$ , $ω_{f} = 0.1$ , $c_{1 s} = c_{2 f} = 2.5$ , and $c_{1 f} = c_{2 s} = 0.1$ .

Figure 5.

Convergent trajectory of the particle in SAEGPSO. (a) Convergent position trajectory. (b) Convergent velocity trajectory. SAEGPSO: self-adaptive evolutionary game particle swarm optimization.

The SAEGPSO-based allocation method for the MRTAP

When using the proposed SAEGPSO method to develop a SAEGPSO-based allocation method for the MRTAP, two issues need to be first addressed. The first one is to how to encode particles, that is, how to build a mapping between positions of particles and solutions to the MRTAP. The second issue is to answer how to handle constraints of the MRTAP in the SAEGPSO-based allocation method. The main mission of this section is to answer these two questions.

Encoding particles

As stated in the second section, the purpose of the studied MRTAP is to find a task execution sequence for each robot. To achieve this goal, as illustrated in Figure 6, the position of the particle in SAEGPSO is encoded as a $1 \times M$ vector, where M is the total number of tasks. As shown in Figure 6, let $[x_{j}]$ and ${x_{j}}$ denote the integer and decimal parts of the element x_j ( $j = 1, 2, ..., M$ ) in the position vector of the particle, respectively. Here, $[x_{j}] = i$ indicates assigning the j th task to the i th robot. Besides, the magnitude of the decimal part ${x_{j}}$ of x_j represents the sequence that the j th task is executed by the i th robot. The smaller the decimal part of the element x_j has, the more prior the corresponding task is performed by the corresponding assigned robot. For the position vector of any considered particle, if we sort all elements having the same integer part in ascending order according to the decimal parts of those elements, then we can easily obtain the task execution order for each robot.

Figure 6.

Representation of the particle in SAEGPSO. SAEGPSO: self-adaptive evolutionary game particle swarm optimization.

To concretely explain how to encode and decode particles based on the way described above, an illustrative MRTAP example in which three robots are assigned to execute seven tasks is given here. Assume the position vector of a particle is randomly given as [3.5211, 1.312, 2.23416, 1.45016, 1.01001, 2.36143, 3.25]. According to the way of decoding particles depicted above, the task execution sequences for robots 1, 2, and 3 are: $Task 5 \to Task 2 \to Task 4$ , $Task 3 \to Task 6$ , and $Task 7 \to Task 1$ , respectively. The first advantage of representing particles using the way described above is that the storage of the optimizer can be decreased since each particle is encoded as a $1 \times M$ vector. Moreover, $[x_{j}] = i$ formally indicates assigning the i th robot to conduct the j th task, which implies that the binary decision variable $x_{i j} = 1$ , and $x_{h j} = 0$ for $\forall h \neq i$ ( $h = 1, 2, ..., N$ ) in the MRTAP, where N is the total number of robots. Thus, the second advantage of representing particles applying the way presented above is that the constraints given by equations (3) and (5) of the MRTAP can be inherently satisfied.

Handling constraints of the MRTAP

Because the MRTAP studied in this paper is a constrained optimization problem (COP), the task regarding how to handle constraints of the problem must be addressed in order to efficiently solve this problem. Among the currently existing constraint handling techniques, the penalty function approach⁴¹ could be one of the most popular approaches. The penalty method can transform a COP into an unstrained optimization problem through introducing several penalty terms to the objective function of the COP. However, properly setting the penalty factors of the penalty function method is difficult and problem-dependent, which is the main drawback of this method.⁴²

In our SAEGPSO-based task allocation method, we adopt the feasibility-based rule²⁵ to tackle constraints of the MRTAP in order to release the burden of the optimizer, as well as increase diversification of solutions. As explained in ‘Encoding particles’ section, the constraints given by equations (3) and (5) of the MRTAP are trivially met using the way presented in this paper to encode particles. Therefore, only the constraint given by equation (4) of the MRTAP needs to be tackled. In order to handle the constraint given by equation (4), the constraint violation degree of each particle is first calculated as follows

viol = \sum_{i = 1}^{N} (0, max (| T S_{i} | - L_{i}))

where N is the total number of robots. $| {TS}_{i} |$ is the length of the task execution set of robot i. L_i is the payload constraint of robot i. Note that the information of the task execution set ${TS}_{i}$ of each robot can be obtained based on the way of encoding particles as explained in ‘Encoding particles’ section. Again, take the illustrative example as exampled in ‘Encoding particles’ section as an example to introduce how to calculate the constraint violation degrees of particles based on equation (50). Since the task execution sequences of robots 1, 2, and 3 are $Task 5 \to Task 2 \to Task 4$ , $Task 3 \to Task 6$ , and $Task 7 \to Task 1$ , respectively, we have that $| T S_{1} | = 3$ , $| T S_{2} | = | T S_{3} | = 2$ . Assume the payload constraints of the three robots are 1, 2, and 4, respectively. Then, it can be easily obtained that $viol = 2$ for the particle as exampled in ‘Encoding particles’ section.

After calculating constraint violation degrees of particles based on equation (50), the feasibility-based rule presented by Kalyanmoy et al.²⁵ is then used to choose the dominated solution between any two candidate solutions in the SAEGPSO-based allocation method. For a minimization optimization problem, the feasibility-based rule is depicted as follows: (1) for any two candidate solutions having the same constraint violation degree, the solution with smaller fitness value dominates the solution with larger fitness value; (2) for any two candidate solutions having different constraint violation degrees, the solution with smaller constraint violation degree is preferred over the solution with larger constraint violation degree.

Because the objective function and the constraint violation degree of each particle are considered and compared separately in the feasibility-based rule, no additional penalty factor is required when implementing this rule to handle constraints. Therefore, this rule can reduce the burden of the optimizer.²⁵ In addition, although the non-feasible solutions violate some constraints, they may also contain some useful information for finding high-quality solutions.⁴² Consequently, when these non-feasible solutions are considered in the feasibility-based rule, the diversification of solutions and the likelihood of finding better solutions may be increased.

In the proposed SAEGPSO-based task allocation method, in order to assure each task can assign a robot, $[x_{j}]$ must satisfy the boundary constraint $1 \leq [x_{j}] \leq N$ , where N is the total number of robots. For any integer part $[x_{j}]$ of the element x_j in the position vector of the particle violates the boundary constraint, the saturation strategy presented by Khalili-Damghani et al.⁴¹ is used to modify x_j as follows

x_{j} = {\begin{array}{l} 1 + {x_{j}} if [x_{j}] < 1 \\ x_{j} if 1 \leq [x_{j}] \leq N \\ N + {x_{j}} otherwise \end{array}

where $[x_{j}]$ and ${x_{j}}$ are the integer and decimal parts of the element x_j in the position vector of the particle, respectively. For any element x_j in the position vector of the particle, the aim of the saturation strategy given by equation (51) is to only modify the integer part of x_j, so that the integer part of x_j satisfy $1 \leq [x_{j}] \leq N$ and the decimal part of x_j remains unchanged.

Application of SAEGPSO-based allocation method on the MRTAP

The pseudocode of SAEGPSO for solving the MRTAP is shown in Table 1, where $t_{max}$ denotes the maximum iteration number of SAEGPSO.

Table 1.

The SAEGPSO-based framework for MRTAP.

1. Randomly generate an initial swarm

2. Evaluate the initial swarm and obtain

Pbest

and

Gbest

, as well as

ESS

for the initial swarm

3. while

t \leq t_{max}

4. Get the previous experience, including

Z_{1} (t - 1)

Z_{2} (t - 1)

Z_{3} (t - 1)

, and the fitness value

F (X (t - 1))

, of each particle

5. Calculate

π (e_{1})

π (e_{2})

, and

π (e_{3})

based on equation (15)

6. Compute the payoff matrix K based on equation (12)

7. Calculate

Z_{1} (t)

Z_{2} (t)

, and

Z_{3} (t)

based on the RDE given by equation (11)

8. Calculate the three control parameters ω, c₁, and c₂ of each particle based on equations (16) to (22)

9. Calculate the barycenter

BC (t)

of each particle based on equation (11)

10. Randomly obtain a point

X^{'} (t)

within the hypersphere

HP (BC (t), | | BC (t) - X (t) | |)

for each particle

11. Update the velocity of each particle based on equation (7)

12. Update the position of each particle based on equation (8)

13. Modify the position vector of each particle based on the saturation strategy given by equation (51)

14. Calculate the fitness value of each particle based on equations (1) and (2)

15. Calculate the constraint violation degree of each particle based on equation (50)

16. Update

Bbest

of each particle based on the feasibility-based rule

17. Update

Gbest

of the swarm based on the feasibility-based rule

18. Increase the iteration number t by 1

19. end while

20. Output

Gbest

It is noticeable from Table 1 that, similar to ω_s, ω_f, $c_{1 s}$ , $c_{1 f}$ , $c_{2 s}$ , and $c_{2 f}$ , the value of $t_{max}$ is empirically predefined by the users. Also, please note that how to empirically set the values of these parameters may be problem-dependent, which requires the previous experience of the users. However, it is important to note that settings of these parameters must abide by several generally rules of PSO development. Firstly, settings of ω_s, ω_f, $c_{1 s}$ , $c_{1 f}$ , , $c_{2 s}$ , and $c_{2 f}$ must meet the convergence condition of SAEGPSO given by equation (47) in order to sufficiently guarantee the convergence of SAEGPSO. Secondly, in order to facilitate the exploration ability of SAEGPSO in the early stages of the evolution and promote the exploitation ability of SAEGPSO in the latter stages of the evolution, it has to satisfy that $ω_{s} > ω_{f}$ , $c_{1 s} > c_{1 f}$ , and $c_{2 f} > c_{2 s}$ in the newly proposed self-adaptive strategy presented in SAEGPSO. Thirdly, the setting of $t_{max}$ must consider the solution optimality and the computation time of SAEGPSO in a compromising way, since $t_{max}$ has influences on the solution optimality and computation time of SAEGPSO. The bigger value $t_{max}$ has, the higher-quality solution SAEGPSO may find; however, the more computation time SAEGPSO has to consume. Since the decision makers may prefer to obtain a satisfied a solution with low computation cost, $t_{max}$ is empirically set to be 2000 when using SAEGPSO to solve the MRTAP in this paper. Also, in order to guarantee the convergence of SAEGPSO and well balance the trade-offs between exploration and exploitation, when using SAEGPSO to solve the MRTAP, we empirically set that $ω_{s} = 0.9$ , $ω_{f} = 0.1$ , $c_{1 s} = c_{2 f} = 2.5$ , and $c_{1 f} = c_{2 s} = 0.1$ based on the analysis results in ‘Convergence-guaranteed parameter selection principle for SAEGPSO’ section.

Simulations, comparisons, and analysis

In this subsection, SAEGPSO is evaluated through three different-sized task allocation cases against mDE,³ GA,⁹ SPSO 2011,¹⁸ and ILDWPSO.²⁶ For each allocation case, after each method conducting a Monte Carlo experiment with 30 runs, this study reports and compares the statistical results with respect to the solution optimality and computation time of each method. In every single run of the Monte Carlo test, the best solution of each method is output after 40 particles evolve 2000 iterations. The simulation parameters for SAEGPSO are set as follows: $ω_{s} = 0.9$ , $ω_{f} = 0.1$ , $c_{1 s} = c_{2 f} = 2.5$ , and $c_{1 f} = c_{2 s} = 0.1$ based on the analysis results in ‘Convergence-guaranteed parameter selection principle for SAEGPSO’ section. The simulation parameters for the other compared methods are exacted from their corresponding literature and summarized in Table 2.

Table 2.

The simulation parameters for different compared methods.

Methods	Parameter setting
GA⁹	$CR = 0.8$
mDE³	${CR}_{min} = 0.1$ , ${CR}_{max} = 0.9$
SPSO 2011¹⁸	$ω = 0.7213$ , $c_{1} = 1.1931$ , $c_{2} = 1.1931$
ILDWPSO²⁶	$α = 4$ , $ω_{start} = 0.9$ , $ω_{end} = 0.1$ , and $μ_{m} = 0.02$

Case study 1

This subsection shows the simulation results of all methods in solving the first allocation instance which assigns three robots to 12 tasks. The properties of robots and tasks are given and shown in Tables 3 and 4, respectively. After conducting the Monte Carlo experiment, Tables 5 and 6 report the statistical results of the solution and the computation time of each method, respectively. In these two tables, the best mean results with respect to the solution and computation time are highlighted in boldface. For concretely showing the allocation results and better understanding the evolutionary processes of different methods, the allocation schemes and fitness curves of the best solutions obtained by different methods in the Monte Carlo test are exhibited in Table 7 and Figure 7, respectively. The robots’ trajectories of the best solutions of SAEGPSO and GA are illustrated in Figure 8, respectively. Note that because positions of robots and tasks are given in Tables 3 and 4, and the allocation schemes of the best solutions of all methods are shown in Table 7, the robots’ trajectories of the best solutions of the other methods (except SAEGPSO and GA) are not displayed in order not to enlarge the size of the paper.

Table 3.

Properties of robots for the first studied case.

Robot index	Robot location (m)	Tactical value	Execution probability	Payload constraint
1	(50,100)	76	0.55	3
2	(50,200)	100	0.78	4
3	(50,300)	156	0.85	5

Table 4.

Properties of tasks for the first studied case.

Task index	Task location (m)	Tactical value	Vio-execution probability	Task index	Task location	Tactical value	Vio-execution probability
1	(100,50)	120	0.45	7	(153,285)	123	0.40
2	(94,152)	110	0.40	8	(142,381)	101	0.35
3	(102,252)	105	0.43	9	(200,20)	98	0.35
4	(95,363)	123	0.51	10	(210,120)	109	0.45
5	(150,75)	104	0.45	11	(202,240)	123	0.32
6	(145,185)	112	0.38	12	(230,340)	111	0.30

vio-execution probability: violated execution probability of the task to robots.

Table 5.

Simulation results of the allocated solutions obtained by each method for the first studied case.

	SAEGPSO	ILDWPSO	SPSO 2011	mDE	GA
Best	1.8131E+02	1.8796E+02	2.7455E+02	1.8388E+02	2.3649E+02
Worst	3.8292E+02	4.4714E+02	9.9014E+02	5.3785E+02	8.3413E+02
Mean	2.5549E+02	3.2747E+02	6.3294E+02	4.0792E+02	4.9294E+02
Std.	5.3881E+01	7.1334E+01	1.7039E+02	9.4914E+01	1.5759E+02

Std.: standard deviation. The best mean results with respect to the solution and the computation time among the four methods are highlighted in boldface.

Table 6.

Simulation results of the computation time (s) of each method for the first studied case.

	SAEGPSO	ILDWPSO	SPSO 2011	mDE	GA
Best	32.82	37.89	21.98	36.33	33.39
Worst	37.47	44.38	33.43	43.45	37.13
Mean	35.65	41.47	29.55	40.17	36.13
Std.	3.401	3.001	4.394	4.031	4.781

Std.: standard deviation. The best mean results with respect to the solution and the computation time among the four methods are highlighted in boldface.

Table 7.

Allocation schemes of the best solutions searched by different methods for the first studied case.^a

Methods	Allocation scheme of individual robot
SAEGPSO	$R_{1} : T_{1} \to T_{5} \to T_{9}$
	$R_{2} : T_{2} \to T_{6} \to T_{11} \to T_{10}$
	$R_{3} : T_{3} \to T_{7} \to T_{12} \to T_{8} \to T_{4}$
ILDWPSO	$R_{1} : T_{1} \to T_{5} \to T_{9}$
	$R_{2} : T_{2} \to T_{6} \to T_{11} \to T_{10}$
	$R_{3} : T_{4} \to T_{8} \to T_{12} \to T_{7} \to T_{3}$
SPSO 2011	$R_{1} : T_{1} \to T_{5} \to T_{9}$
	$R_{2} : T_{2} \to T_{4} \to T_{8} \to T_{12}$
	$R_{3} : T_{3} \to T_{7} \to T_{11} \to T_{6} \to T_{10}$
mDE	$R_{1} : T_{1} \to T_{5} \to T_{9}$
	$R_{2} : T_{2} \to T_{6} \to T_{11} \to T_{10}$
	$R_{3} : T_{3} \to T_{7} \to T_{4} \to T_{8} \to T_{12}$
GA	$R_{1} : T_{1} \to T_{5} \to T_{2}$
	$R_{2} : T_{3} \to T_{6} \to T_{10} \to T_{9}$
	$R_{3} : T_{4} \to T_{8} \to T_{12} \to T_{7} \to T_{11}$

^aHere, R_i and T_j denote the i th robot and j th task, and “ $T_{A} \to T_{B}$ ”means that Task B is executed after Task A.

Figure 7.

The fitness curves of the best runs of different methods for the first studied case.

Figure 8.

Robots’ trajectories of the best solutions searched by SAEGPSO and GA for the first studied case. (a) Robots’ trajectories of the best solution searched by SAEGPSO. (b) Robots’ trajectories of the best solution searched by GA. SAEGPSO: self-adaptive evolutionary game particle swarm optimization; GA: genetic algorithm.

From Table 5, the proposed SAEGPSO method is followed by ILDWPSO, mDE, GA, and SPSO 2011 in terms of the average solution optimality, which implies that SAEGPSO generally outperforms its contenders in terms of the solution optimality in solving the first test case. From Table 5, comparing with ILDWPSO, mDE, GA, and SPSO 2011, SAEGPSO averagely improves the solution optimality by 21.98%, 37.37%, 48.17%, and 59.63%, respectively. From Table 6, one can easily observe that SPSO 2011 is the most efficient method, while SAEGPSO method is ranked second in terms of the average computation time. Such observation is reasonable because the three control parameters of SPSO 2011 remain constant, no additional computation resource is needed to update these control parameters of SPSO 2011. Hence, in comparison with SAEGPSO, the computation time of SPSO 2011 is naturally reduced. However, it is important to note from Tables 5 and 6 that although SPSO 2011 is the most efficient method in the computation time, it obtains the worst performance in terms of the solution quality, and even if SAEGPSO is the second most efficient method as far as the computation time is concerned, it has the best performance in terms of solution quality. Since the studied MRTAP is completed off-line, the computation time may weigh less than the solution optimality in evaluating a task allocation optimizer. Thus, considering all of the concerns raised, it allows us to conclude that, comparing with the other four methods, SAEGPSO is highly competitive in solving the first test case. Moreover, from Table 7, it can be easily observed that the best solutions obtained by all methods are feasible since all the best solutions satisfy the constraints of the first studied allocation case, which, to some extent, can reveal the feasibility of these methods in solving the small-scale MRTAP.

Case study 2

The second task allocation instance is a medium-scale TAP, in which five robots are assigned to complete 22 tasks. The properties of robots and tasks are given and shown in Tables 8 and 9, respectively. The statistical results with respect to the solution and computation time of different methods for this allocation case are summarized in Tables 10 and 11, respectively. With the same reasons explained in ‘Case study 1’ section, the allocation schemes and fitness curves of the best solutions searched by all considered methods are displayed in Table 12 and Figure 9, respectively.

Table 8.

Properties of robots for the second studied case.

Robot index	Robot location (m)	Tactical value	Execution probability	Payload constraint
1	(16,11)	20	0.40	3
2	(32,21)	46	0.55	5
3	(45,65)	55	0.60	6
4	(72,64)	41	0.50	4
5	(11,77)	31	0.45	4

Table 9.

Properties of tasks for the second studied case.

Task index	Task location (m)	Tactical value	Vio-execution probability	Task index	Task location(m)	Tactical value	Vio-execution probability
1	(10,220)	50	0.38	12	(132,54)	37	0.58
2	(32,157)	37	0.45	13	(263,157)	40	0.45
3	(60,173)	39	0.59	14	(280,38)	37	0.34
4	(72,181)	40	0.50	15	(102,241)	36	0.46
5	(98,189)	35	0.47	16	(216,144)	42	0.44
6	(176,184)	46	0.40	17	(220,156)	46	0.55
7	(61,162)	44	0.60	18	(224,179)	47	0.67
8	(368,178)	42	0.57	19	(228,192)	50	0.37
9	(300,70)	41	0.60	20	(232,214)	37	0.57
10	(296,97)	41	0.65	21	(234,242)	44	0.44
11	(288,192)	38	0.40	22	(246,224)	46	0.41

vio-execution probability: violated execution probability of the task to robots.

Table 10.

Simulation results of the allocated solutions obtained by each method for the second studied case.

	SAEGPSO	ILDWPSO	SPSO 2011	mDE	GA
Best	1.7721E+03	1.7848E+03	1.9678E+03	1.8843E+03	2.0112E+03
Worst	2.3078E+03	2.3854E+03	2.7121E+03	2.5108E+03	2.4547E+03
Mean	1.9949E+03	2.1239E+03	2.3103E+03	2.1625E+03	2.2625E+03
Std.	1.2752E+02	1.6013E+02	1.9789E+02	1.8154E+02	1.2339E+02

The best mean results with respect to the solution and the computation time among the four methods are highlighted in boldface.

Table 11.

Simulation results of the computation time (seconds) of each method for the second studied case.

	SAEGPSO	ILDWPSO	SPSO 2011	mDE	GA
Best	51.95	59.36	49.35	59.17	50.45
Worst	62.34	68.43	57.25	65.35	64.73
Mean	56.78	65.48	51.59	63.71	58.47
Std.	3.892	3.744	4.064	3.021	4.715

The best mean results with respect to the solution and the computation time among the four methods are highlighted in boldface.

Table 12.

Allocation schemes of the best solutions searched by different methods for the second studied case.

Methods	Allocation scheme of individual robot
SAEGPSO	$R_{1} : T_{18} \to T_{19} \to T_{11}$
	$R_{2} : T_{14} \to T_{9} \to T_{10} \to T_{17} \to T_{8}$
	$R_{3} : T_{2} \to T_{6} \to T_{20} \to T_{21} \to T_{22} \to T_{13}$
	$R_{4} : T_{1} \to T_{4} \to T_{5} \to T_{16}$
	$R_{5} : T_{12} \to T_{7} \to T_{3} \to T_{15}$
ILDWPSO	$R_{1} : T_{17} \to T_{13} \to T_{21}$
	$R_{2} : T_{7} \to T_{3} \to T_{6} \to T_{11} \to T_{8}$
	$R_{3} : T_{2} \to T_{12} \to T_{14} \to T_{9} \to T_{10} \to T_{18}$
	$R_{4} : T_{20} \to T_{22} \to T_{16} \to T_{19}$
	$R_{5} : T_{4} \to T_{5} \to T_{15} \to T_{1}$
SPSO 2011	$R_{1} : T_{2} \to T_{3} \to T_{20}$
	$R_{2} : T_{10} \to T_{17} \to T_{18} \to T_{22} \to T_{11}$
	$R_{3} : T_{1} \to T_{15} \to T_{4} \to T_{7} \to T_{5} \to T_{6}$
	$R_{4} : T_{12} \to T_{14} \to T_{13} \to T_{8}$
	$R_{5} : T_{9} \to T_{16} \to T_{19} \to T_{21}$
mDE	$R_{1} : T_{16} \to T_{13} \to T_{14}$
	$R_{2} : T_{21} \to T_{20} \to T_{19} \to T_{18} \to T_{15}$
	$R_{3} : T_{7} \to T_{4} \to T_{6} \to T_{22} \to T_{11} \to T_{10}$
	$R_{4} : T_{12} \to T_{3} \to T_{2} \to T_{1}$
	$R_{5} : T_{5} \to T_{17} \to T_{9} \to T_{8}$
GA	$R_{1} : T_{13} \to T_{22} \to T_{21}$
	$R_{2} : T_{20} \to T_{11} \to T_{8} \to T_{18} \to T_{14}$
	$R_{3} : T_{2} \to T_{3} \to T_{17} \to T_{19} \to T_{9} \to T_{10}$
	$R_{4} : T_{12} \to T_{16} \to T_{6} \to T_{4}$
	$R_{5} : T_{5} \to T_{7} \to T_{15} \to T_{1}$

Figure 9.

The fitness curves of the best runs of different methods for the second studied case.

From Table 10, it is clear that SAEGPSO and ILDWPSO are the most and second most efficient approaches, while SPSO 2011 obtains the worst performance in terms of the solution quality for the second simulation scenario. Based on the results shown in Table 10, comparing with ILDWPSO, mDE, GA, and SPSO 2011, SAEGPSO generally enhances the solution optimality by 6.07%, 9.28%, 11.83%, and 13.65%, respectively. From Table 11, SPSO 2011 and SAEGPSO are ranked the first and second in terms of the computation time, respectively. It is notable from Tables 10 and 11 that, despite having the best performance in the computation time, SPSO 2011 is the least efficient method in the solution optimality, and even if SAEGPSO is ranked second in terms of the computation time, it is the most efficient approach among the five methods in terms of the solution quality. Hence, we can still conclude that SAEGPSO is highly competitive in solving the second allocation instance. Moreover, it can easily seen from Table 12 that the best solutions found by the five methods are feasible because these best solutions can satisfy all the constraints of the second allocation case, which, to a certain degree, indicates the feasibility of these methods in the medium-scale MRTAP.

From Figure 9, it is interesting to observe that the best fitness curves of some methods such as GA first rise in the early of the evolution and then drop or remain stable in the later stages of the evolution. This observation is due to that we use the feasibility-based rule²⁵ to choose the dominated solution between any two candidate solutions in the SAEGPSO-based task allocation method. Because the initial population of each method is randomly produced for any simulation scenario, it may happen that all the initial individuals of a method violate some constraints of this scenario, which means that all individuals of the method are initially non-feasible. In this case, the initial global best solution of the method is thus also non-feasible. As the evolutionary process of the method carrying on, if any particle of the method can find a solution having a smaller constraint violation degree but a larger fitness value than the initial non-feasible global best solution, the newly produced solution will then replace the initial non-feasible global best solution according to the feasibility-based rule. Therefore, in such a case, the fitness curves of the method first rise in the early stages of the evolutionary process. However, this phenomenon will vanish in the later stages of the evolutionary process, namely the fitness curves of the method either keep dropping or stable in the latter stages of the evolutionary process. This is because, with the evolution of the method continuing, some individuals of the method can find some feasible solutions in the latter stages of the evolution. In this case, after these newly found feasible solutions replace the historically non-feasible global best solution of the method, the current global best solution of the method becomes feasible, which can be only replaced by the feasible solutions that have less fitness values than the current feasible global best solution based on the feasibility-based rule. Therefore, as shown in Figure 9, the fitness curves of the five methods keep dropping or stable in the latter phases of the evolution.

Note that, since the first test case studied in ‘Case study 1’ section is a small-scale TAP, in which only 12 tasks are involved, it has a high chance that each method can obtain a feasible initial global best solution after randomly initializing the swarm of each method. Because all the initial global best solutions of all methods are feasible in the first test case studied in ‘Case study 1’ section, all the initial global best solutions of these methods can be only replaced by newly produced feasible solutions with smaller fitness values according to the feasibility-based rule. Therefore, as shown in Figure 7, the best fitness curves of the five methods first keep going down in the early stages of the evolution and then remain unchanged in the latter stages of the evolution.

Case study 3

The third test case is a relatively large-sized allocation instance, in which six robots are assigned to 40 tasks. The properties of robots and tasks are shown in Tables 13 and 14, respectively. Tables 15 and 16 show the statistical results of the allocation solution and the computation time of different methods for this studied case, respectively. The allocation results and fitness curves of the best runs of different methods in the Monte Carlo experiments conducted for this scenario are illustrated in Table 17 and Figure 10, respectively.

Table 13.

Properties of robots for the third studied case.

Robot index	Robot Location (m)	Tactical value	Execution probability	Payload constraint
1	(127.21,247.06)	24.75	0.41	7
2	(88.310,162.41)	22.62	0.53	9
3	(76.470,35.030)	26.81	0.54	4
4	(292.07,239.62)	37.44	0.23	6
5	(385.48,220.35)	20.06	0.40	6
6	(260.95,164.27)	25.68	0.60	8

Table 14.

Properties of tasks for the third studied case.

Task index	Task location (m)	Tactical value	Vio-execution probability	Task index	Task location (m)	Tactical value	Vio-execution probability
1	(96.670,147.46)	22.37	0.51	21	(9.3700,243.09)	56.76	0.29
2	(104.59,278.15)	33.06	0.53	22	(375.11,392.59)	60.95	0.24
3	(328.27,360.49)	65.41	0.37	23	(182.12,282.32)	66.69	0.55
4	(334.52,249.93)	41.78	0.42	24	(3.4100,380.60)	23.04	0.50
5	(37.630,348.50)	56.89	0.49	25	(351.78,331.67)	67.47	0.29
6	(377.11,351.23)	53.09	0.20	26	(387.35,165.45)	56.34	0.30
7	(276.82,311.36)	48.75	0.57	27	(256.69,384.07)	45.31	0.45
8	(378.02,143.25)	48.27	0.28	28	(162.23,260.93)	47.84	0.35
9	(56.372,226.44)	69.77	0.58	29	(281.55,52.870)	26.73	0.52
10	(217.38,326.58)	48.92	0.51	30	(28.590,45.000)	22.86	0.26
11	(258.85,259.92)	68.65	0.52	31	(365.16,284.31)	52.77	0.44
12	(275.511,34.98)	52.09	0.58	32	(298.72,246.81)	48.09	0.26
13	(208.69,378.02)	33.21	0.57	33	(179.91,268.38)	20.62	0.52
14	(371.49,198.37)	68.53	0.53	34	(224.13,171.77)	35.88	0.36
15	(214.94,1.8500)	42.40	0.31	35	(211.52,325.43)	53.40	0.45
16	(255.20,35.350)	32.40	0.28	36	(240.49,177.87)	23.83	0.50
17	(334.53,375.83)	42.74	0.59	37	(161.53,188.31)	68.27	0.20
18	(201.80,323.61)	39.88	0.36	38	(164.03,396.11)	45.58	0.36
19	(291.61,6.8100)	33.93	0.37	39	(249.02,9.4400)	50.13	0.38
20	(144.87,4.9100)	39.65	0.20	40	(106.64,55.120)	69.57	0.40

vio-execution probability: violated execution probability of the task to robots.

Table 15.

Simulation results of the allocated solutions obtained by each method for the third studied case.

	SAEGPSO	ILDWPSO	SPSO 2011	mDE	GA
Best	4.2863E+03	4.77746E+03	5.8868E+03	4.7733E+03	5.4734E+03
Worst	5.7021E+03	5.9897E+03	6.6846E+03	6.5341E+03	6.6271E+03
Mean	5.2082E+03	5.2522E+03	6.2577E+03	5.5713E+03	5.9449E+03
Std.	2.6677E+02	3.9779E+02	2.1438E+03	3.6886E+02	3.8411E+03

The best mean results with respect to the solution and the computation time among the four methods are highlighted in boldface.

Table 16.

Simulation results of the computation time (s) of each method for the third studied case.

	SAEGPSO	ILDWPSO	SPSO 2011	mDE	GA
Best	123.08	137.62	113.45	134.42	126.13
Worst	131.71	147.31	120.36	142.49	133.45
Mean	126.41	141.45	115.61	136.35	128.14
Std.	3.0728	3.6113	2.4365	2.9489	3.5124

The best mean results with respect to the solution and the computation time among the four methods are highlighted in boldface.

Table 17.

Allocation schemes of the best solutions searched by different methods for the third studied case.

Methods	Allocation scheme of individual robot
SAEGPSO	$R_{1} : T_{37} \to T_{34} \to T_{16} \to T_{8} \to T_{26} \to T_{31} \to T_{22}$
	$R_{2} : T_{1} \to T_{9} \to T_{21} \to T_{2} \to T_{27} \to T_{35} \to T_{23} \to T_{32} \to T_{6}$
	$R_{3} : T_{30} \to T_{20} \to T_{39} \to T_{40}$
	$R_{4} : T_{14} \to T_{7} \to T_{24} \to T_{38} \to T_{13} \to T_{3}$
	$R_{5} : T_{5} \to T_{12} \to T_{19} \to T_{4} \to T_{25} \to T_{18}$
	$R_{6} : T_{28} \to T_{33} \to T_{15} \to T_{29} \to T_{36} \to T_{11} \to T_{10} \to T_{17}$
ILDWPSO	$R_{1} : T_{38} \to T_{24} \to T_{1} \to T_{37} \to T_{29} \to T_{19} \to T_{26}$
	$R_{2} : T_{21} \to T_{30} \to T_{33} \to T_{28} \to T_{18} \to T_{13} \to T_{25} \to T_{12} \to T_{39}$
	$R_{3} : T_{40} \to T_{15} \to T_{34} \to T_{3}$
	$R_{4} : T_{31} \to T_{7} \to T_{11} \to T_{23} \to T_{35} \to T_{22}$
	$R_{5} : T_{32} \to T_{20} \to T_{4} \to T_{6} \to T_{17} \to T_{27}$
	$R_{6} : T_{36} \to T_{5} \to T_{9} \to T_{2} \to T_{16} \to T_{8} \to T_{14} \to T_{10}$
SPSO 2011	$R_{1} : T_{26} \to T_{12} \to T_{29} \to T_{5} \to T_{23} \to T_{3} \to T_{21}$
	$R_{2} : T_{1} \to T_{33} \to T_{32} \to T_{31} \to T_{14} \to T_{7} \to T_{18} \to T_{10} \to T_{6}$
	$R_{3} : T_{4} \to T_{20} \to T_{30} \to T_{36}$
	$R_{4} : T_{11} \to T_{25} \to T_{9} \to T_{40} \to T_{17} \to T_{8}$
	$R_{5} : T_{27} \to T_{13} \to T_{24} \to T_{38} \to T_{35} \to T_{22}$
	$R_{6} : T_{2} \to T_{39} \to T_{15} \to T_{16} \to T_{19} \to T_{37} \to T_{34} \to T_{28}$
mDE	$R_{1} : T_{33} \to T_{7} \to T_{22} \to T_{25} \to T_{4} \to T_{11} \to T_{23}$
	$R_{2} : T_{37} \to T_{27} \to T_{32} \to T_{31} \to T_{17} \to T_{28} \to T_{10} \to T_{38} \to T_{5}$
	$R_{3} : T_{15} \to T_{29} \to T_{36} \to T_{24}$
	$R_{4} : T_{12} \to T_{19} \to T_{39} \to T_{40} \to T_{8} \to T_{6}$
	$R_{5} : T_{21} \to T_{9} \to T_{30} \to T_{16} \to T_{20} \to T_{3}$
	$R_{6} : T_{26} \to T_{14} \to T_{34} \to T_{18} \to T_{35} \to T_{13} \to T_{2} \to T_{1}$
GA	$R_{1} : T_{35} \to T_{23} \to T_{22} \to T_{31} \to T_{8} \to T_{7} \to T_{11}$
	$R_{2} : T_{15} \to T_{39} \to T_{16} \to T_{26} \to T_{34} \to T_{27} \to T_{2} \to T_{1} \to T_{21}$
	$R_{3} : T_{17} \to T_{3} \to T_{4} \to T_{36}$
	$R_{4} : T_{30} \to T_{13} \to T_{38} \to T_{24} \to T_{5} \to T_{28}$
	$R_{5} : T_{6} \to T_{25} \to T_{32} \to T_{12} \to T_{10} \to T_{33}$
	$R_{6} : T_{20} \to T_{9} \to T_{18} \to T_{14} \to T_{37} \to T_{40} \to T_{29} \to T_{19}$

Figure 10.

The fitness curves of the best runs of different methods for the third studied case.

From Tables 15 and 16, it is apparent that the proposed SAEGPSO is followed by ILDWPSO, mDE, GA, and SPSO 2011 in terms of the average solution quality, and SPSO 2011 is followed by SAEGPSO, GA, mDE, and ILDWPSO in the average computation time. Both considering the statistical results of the solution quality and the computation time shown in Tables 15 and 16, we can conclude that SAEGPSO is highly powerful in solving the large-size MRTAP. From Table 17, it can easily noted that the best solutions obtained by all methods are feasible since all the best solutions trivially meet the constraints of the third task allocation case, which, more or less, can reflect the feasibility of each considered method in solving the large-scale MRTAP. With the same reasons depicted in ‘Case study 2’ section, as shown in Figure 10, the best fitness curves of mDE and SPSO 2011 first rise in the early stages of the evolution and then keep going down in the latter stages of the evolution.

Conclusion and future work

Focusing on enhancing the performance of PSO via overcoming two typical drawbacks of the standard PSO, this study proposes a novel enhanced SAEGPSO. To this end, we first let particles update their movements based on the moving rules defined in SPSO 2011 to avoid stagnation in SAEGPSO. Afterwards, a self-adaptive strategy determined by the evolutionary stable strategies of EGT and the iteration number of SAEGPSO is proposed for fine-tuning the control parameters of particles in order to well trade off the exploration and exploitation capabilities of SAEGPSO. To properly set the self-adaptive strategy, this paper also theoretically investigates the convergence of SAEGPSO with respect to values of the three control parameters and provides a convergence-guaranteed parameter selection principle for SAEGPSO.

Leveraging the development of SAEGPSO, this paper also develops a SAEGPSO-based allocation method for solving the MRTAP. In order to efficiently solve the MRTAP and reduce the optimization difficulties, the feasibility-based rule is adopted in the developed SAEGPSO-based allocation method to handle constraints of the MRTAP. In addition, the performance of the proposed method is verified through three different-scaled task allocation cases against four well-established evolutionary methods. The simulation results confirm that SAEGPSO outperforms its contenders in terms of the solution quality. Moreover, SAEGPSO is also highly powerful in terms of the computation time. Therefore, it allows us to conclude that SAEGPSO is highly promising in solving the MRTAP and can be considered as vital alternative in the field of MRTAP.

There are a few potential issues that deserve some future study. Firstly, the performance of SAEGPSO can be further enhanced through improving the EGT. Secondly, since the three control parameters not only affect the convergence of PSO, but also affect the convergence speed of PSO, we will study the convergence speed of SAEGPSO with respect to different control parameter settings in the future. Moreover, despite investigating the convergence to an equilibrium point of SAEGPSO, the local or global optimality of the equilibrium point has not been investigated in this study, which will be addressed in the near future. Thirdly, in order to further evaluate SAEGPSO in the MRTAP, we are also considering the possibility of comparing SAEGPSO with some other non-evolutionary methods such as the auction-based methods in the near future. Last but not least, we are considering to use the proposed PSO method to solve some other COP such as environmental/economic dispatch problem⁴³ and TAP⁴⁴ in the near future.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflict of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Luc Brunet

Choi

How

. Consensus-based auction approaches for decentralized task assignment. AIAA Guidance, Navigation, and Control Conference, Honolulu, Hawaii, 18–21 August 2008, pp. 328–334.

Claudio

Leyre

Antonio

. Simultaneous task subdivision and allocation using negotiations in multi-robot systems. International Journal of Advanced Robotic Systems, 2015.

Dexuan

Haikuan

Liqun

. An improved differential evolution algorithm for the task assignment problem. Eng Appl Artif Intell 2011; 24(4): 616–624.

Shuhua

Tieli

Chih-Cheng

. Multi-robot task allocation based on swarm intelligence. J Jilin Univ 2010; 40(1): 123–129.

Reid

. Correction to âǍIJThe contract net protocol: high-level communication and control in a distributed problem SolverâǍİ. IEEE Transactions on Computers 1981; C-30(5): 372.

. Market-based resource allocation for energy-efficient execution of multiple concurrent applications in wireless sensor networks. Lecture Notes in Electrical Engineering 2014; 274 LNEE: 173–178.

Gautham

Thomas

Sonya

. A distributed task allocation algorithm for a multi-robot system in healthcare facilities. Journal of Intelligent and Robotic Systems: Theory and Applications 2014; 80(1): 33–58.

Sahar

Anis

Omar

. A distributed market-based algorithm for the multi-robot assignment problem. Procedia Computer Science 2014; 32(12): 1108–1114.

Hyunjin

Youdan

Hyounjin

. Genetic algorithm based decentralized task assignment for multiple unmanned aerial vehicles in dynamic environments. International Journal of Aeronautical and Space Sciences 2011; 12 (2): 163–174.

10.

Ayed

Imtiaz

Hanaa

. Solving the task assignment problem using Harmony Search algorithm. Evolving Systems 2013; 4(3): 153–169.

11.

Gnana Sambandam

Ethala

. A hybrid ant colony tabu search algorithm for solving task assignment problem in heterogeneous processors. Advances in Intelligent Systems and Computing 2016; 398: 319–329.

12.

Prescilla

Immanuel Selvakumar

. Modified binary particle swarm optimization algorithm application to real-time task assignment in heterogeneous multiprocessor. Microprocessors and Microsystems 2013; 37(6–7): 583–589.

13.

Hardhienata

MKD

Ugrinovskii

Merrick

. Task allocation under communication constraints using motivated particle swarm optimization. In: 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014, pp. 3135–3142.

14.

Klyne

Merrick

. Intrinsically motivated particle swarm optimisation applied to task allocation for workplace hazard detection. Adaptive Behavior 2016; 24(4): 219–236.

15.

Asanga

Saman

Harry

. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Transactions on Evolutionary Computation 2004; 8(3): 240–255.

16.

Hui

Zixing

Yong

. Hybridizing particle swarm optimization with differential evolution for constrained numerical and engineering optimization. Applied Soft Computing Journal 2010; 10(2): 629–640.

17.

Eberhart

Kennedy

. A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science, pp. 39–43; 1995.

18.

Clerc

. Standard Particle Swarm Optimisation. Online. Available: https://hal.archives-ouvertes.fr/hal-00764996, 2012.

19.

Biwei

Zhanxia

Jianjun

. A framework for constrained optimization problems based on a modified particle swarm optimization. Mathematical Problems in Engineering 2016; 2016(6): 1–19.

20.

Reza

Koorush

. A rank based particle swarm optimization algorithm with dynamic adaptation. Journal of Computational and Applied Mathematics 2011; 235(8): 2694–2714.

21.

Chaoshun

Jianzhong

Pangao

. A novel chaotic particle swarm optimization based fuzzy clustering algorithm. Neurocomputing 2012; 83(15): 98–109.

22.

Pinkey

Kusum

Millie

. Novel inertia weight strategies for particle swarm optimization. Memetic Computing 2013; 5(3): 229–251.

23.

Mojtaba

Reza

. A novel stability-based adaptive inertia weight for particle swarm optimization. 2016; 38: 281–295.

24.

Cédric

Hyo

Patrick

. Convergence proof of an enhanced particle swarm optimisation method integrated with evolutionary game theory. Information Sciences 2016; s346–s347: 389–411.

25.

Kalyanmoy

Amrit

Sameer

. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 2002; 6(2): 182–197.

26.

Poongothai

Rajeswari

Kanishkan

. A heuristic based real time task assignment algorithm for heterogeneous multiprocessors. IEICE Electronics Express 2014; 11(3): 20130975 –20130975.

27.

Ayorkor Korsah

Anthony

Bernardine Dias

. A comprehensive taxonomy for multi-robot task allocation. The International Journal of Robotics Research 2013; 32(12): 1495–1512.

28.

Shin

Zhang

. Cooperative task allocation for multiple mobile robots based on multi-objective optimization method. In: IEEE International Conference on Computer Science & Information Technology, volume 1, pp. 484–489, 2010.

29.

Mauricio

Maurice

Rodrigo

. Standard particle swarm optimisation 2011 at CEC-2013: A baseline for future PSO improvements. In: 2013 IEEE Congress on Evolutionary Computation, Cancún, México, 20–23 June 2013, pp. 2337–2344.

30.

Mohammadreza

Zbigniew

. Analysis of stability, local convergence, and transformation sensitivity of a variant of particle swarm optimization algorithm. IEEE Transactions on Evolutionary Computation 2015; 20(c): 1.

31.

Mohammad

Zbigniew

. SPSO2011 Analysis of stability, local convergence, and rotation sensitivity. GECCO-14: Proceedings of the Genetic and, Vancouver, BC, 12–16 July 2014, pp. 9–15.

32.

Taylor

Jonker

LEOB

. Evolutionarily stable strategies and game dynamics. Mathematical Biosciences 1978; 40(1-2): 145–156.

33.

Gale

Eaves

. Logic of animal conflict. Nature 1975; 254(5499): 463.

34.

Cédric

Hyo-sang

Patrick

. A two-step optimisation method for Dynamic Weapon Target Assignment problem. Recent Advances on Meta-Heuristics and Their Application to Real Scenarios 2013.

35.

Cecilia

Paolo

Mario

. An evolutionary game-theoretical approach to particle swarm optimisation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2008; 4974 LNCS(1): 575–584.

36.

Qiuyi

Licheng

Yangyang

. A novel quantum-inspired immune clonal algorithm with the evolutionary game approach. Progress in Natural Science 2009; 19(10): 1341–1347.

37.

Riad

Hamouche

Boubaker

. Swarm intelligence based optimization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2014; 8472: 50–59.

38.

Biwei

Zhanxia

Jianjun

. Hybridizing particle swarm optimization and differential evolution for the mobile robot global path planning. International Journal of Advanced Robotic Systems 2016; 13(1): 1–17.

39.

Yuhui

Russell

. Parameter selection in particle swarm optimization. Evolutionary Programming VIISE—57 1998; 1447: 591–600.

40.

Andries

. A convergence proof for the particle swarm optimiser. Fundamenta Informaticae 2010; 105(4): 341–374.

41.

Khalili-Damghani

Abtahi

Tavana

. A new multi-objective particle swarm optimization method for solving reliability redundancy allocation problems. Reliability Engineering & System Safety 2013; 111: 58–75.

42.

Zhen

Shaojun

Zhixiang

. A new constraint handling method based on the modified Alopex-based evolutionary algorithm. Computers & Industrial Engineering 2014; 73(1): 41–50.

43.

Jun

Vasile

Xiao

. Solving the power economic dispatch problem with generator constraints by random drift particle swarm optimization. IEEE Transactions on Industrial Informatics 2014; 10(1): 222–232.

44.

Soheila

Hesamoddin

Hesam

. Solving robot path planning problem by using a new elitist multi-objective IWD algorithm based on coefficient of variation. Soft Computing 2015; 1–17.

Multirobot task allocation based on an improved particle swarm optimization approach

Abstract

Keywords

Introduction

Formulation of MRTAP

The proposed SAEGPSO method

Review of SPSO 2011

Review of EGT

Description of ESS

Description of RDE

Modeling of SAEGPSO

Parametric analysis for SAEGPSO

Theoretical investigations of SAEGPSO

Convergence analysis for SAEGPSO

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

The equilibrium point of SAEGPSO

Convergence-guaranteed parameter selection principle for SAEGPSO

Lemma 5

Proof

The SAEGPSO-based allocation method for the MRTAP

Encoding particles

Handling constraints of the MRTAP

Application of SAEGPSO-based allocation method on the MRTAP

Simulations, comparisons, and analysis

Case study 1

Case study 2

Case study 3

Conclusion and future work

Footnotes

Declaration of conflicting interests

Funding

References