Sage Journals: Discover world-class research

Abstract

This article proposes a unique active relative localization mechanism for multi-agent simultaneous localization and mapping, in which an agent to be observed is considered as a task, and the others who want to assist that agent will perform that task by relative observation. A task allocation algorithm based on deep reinforcement learning is proposed for this mechanism. Each agent can choose whether to localize other agents or to continue independent simultaneous localization and mapping on its own initiative. By this way, the process of each agent simultaneous localization and mapping will be interacted by the collaboration. Firstly, a unique observation function which models the whole multi-agent system is obtained based on ORBSLAM. Secondly, a novel type of Deep Q Network called multi-agent systemDeep Q Network (MAS-DQN) is deployed to learn correspondence between Q value and state–action pair, abstract representation of agents in multi-agent system is learned in the process of collaboration among agents. Finally, each agent must act with a certain degree of freedom according to MAS-DQN. The simulation results of comparative experiments prove that this mechanism improves the efficiency of cooperation in the process of multi-agent simultaneous localization and mapping.

Keywords

MAS DQN SLAM relative localization

Introduction

A critical task of a mobile robot is to determine its pose (position and orientation) under the given environment map, which is also the basis of other type of tasks. However, the environment map does not exist at the beginning. When the mobile robots enter an unknown part of environment, they need to construct the map (3-D point cloud map, topological map, 2-D map) to satisfy their tasks through its own sensors, and at the same time determine their own localization in the map, this is called simultaneous localization and mapping (SLAM) problem. Mapping and localization are mutually complementary. In order to build accurate maps, it is necessary for the robot to have an accurate estimation of its pose, vice versa, localization also requires established high-quality maps, it is the difficulty of SLAM. Visual SLAM refers to the SLAM that only the camera acts as the only visual sensor. Compared with the traditional laser, the visual sensors can obtain more abundant information and can be deployed on mobile robots at very low cost.

Multi-agent system (MAS) refers to that agents in one system collaborate and negotiate with others to accomplish goals that cannot be accomplished by a single agent or to increase the efficiency of task execution. Each agent’s ability, attributes, and structure are different, which provides a great space for collaboration. MAS has attracted many researchers to investigate the applicability of it in many pertinent areas closely related to human livelihood and industry such as security system,¹ exploration and rescue,² surveillance,³ humanitarian assistance,⁴ environment protection,^5,6 and health care.^7,8 Traditional SLAM has been considered as an independent behavior. With the continuous application of MAS, multi-agent SLAM has become the focus of scholars’ research.

Multi-agent SLAM can be divided into two types: off-line mode and online mode. Off-line mode means that each agent’s SLAM process is independent and the results are fused together after all agents stop. Online mode refers to that agents can cooperate with each other in SLAM process, such as cooperative loop closure detection and cooperative localization, cooperative map fusion, and so on. Relative orientations in MAS are important on account of that it is a basis for combining maps built with data from two agents. The difficulty of online multi-agent SLAM is how to cooperate in the best way under the condition that each agent has different attributes with different states. This issue is particularly prominent in the relative localization between robots. For instance, the localization ability of wheeled robot is stronger than that of unmanned aerial vehicle (UAV) because of the odometer, but the pose of UAV is more abundant, which leads to its strong mapping ability. If there are these two kinds of robots in the system at the same time, an appropriate collaboration model will make full use of the advantages of both.⁹ This type of relative localization is controlled manually. If MAS wants to optimize the efficiency of automatic cooperative positioning between agents, each agent needs to perceive all the states and their attributes related to SLAM to determine whether it has the optimal condition to observe the agent in need of assistance or not. But in most cases, the task-related attributes and states of robots are not quantifiable which brings difficulties to collaboration.

In order to solve the above problem of SLAM of agents in MAS, this article proposes a cooperative mechanism applied to multi-agent SLAM, which can learn the main attributes related to each agent and extracts the state feature vectors of all agents in MAS based on ORBSLAM,¹⁰ each agent will determine its specific behavior outputted by MAS-DQN. In the “Experiment” section, we use model of isomorphic robots with different attributes to simulate various robots, and the validity of our algorithm is proved.

The article is organized as follows: in the second section, related work about coalition formation for MAS and multi-agent SLAM is discussed. In the third section, the DQN and “Rainbow” which we applied to drive MAS-DQN are introduced. In the fourth section, the structure of MAS-DQN including the novel observation function and reward function is described. In the fifth section, we introduce relative observation between agents based on the proposed mechanism. In the sixth section, the effectiveness of the algorithm is discussed. In the seventh section, the complete algorithm of the approach is shown.

Related work

The relative localization among agents in MAS refers to the location of an unknown agent whose pose is unknown using others limited knowledge and ability. This process is mainly composed of two steps: first, select the agents that can provide location-based services. These agents can be determined by a general control center or freely negotiated by the agents according to their own demands, interests, and attributes. This step mainly involves multi-agent task allocation (MRTA). In the second step, the selected agent uses its own pose and different types of map such as graph-based map,^11,12 landmarks with covariance matrix,¹³ and grid-based map^14,15 to measure or observe the pose of the agent to be localized. Next, we will introduce the related work of these two points separately.

MRTA has always been a hot topic in MAS research. Task allocation in MAS (MRTA) can be formulated as an optimal assignment problem where the goal is to optimally assign a set of agents to a set of tasks in such a way that optimizes the overall system performance under a condition of satisfying a set of constraints. According to team organizational paradigm, MRTA can be divided into two categories: centralized approaches and distributed approaches.

There is a central agent that monitors and assigns tasks to other agents in a centralized MAS. Agents who are assigned tasks need to send all the information they obtain to that central agent. They need to keep communication with each other at all times. The center-based algorithm is very general for solving the MRTA problem as a result that monitors can directly control each agent.¹⁶ Coltin and Veloso¹⁷ proposed centralized approaches for mobile robot task allocation in hybrid wireless sensor networks. In the literature,¹⁸ a centralized multi-robot task allocation for industrial plant inspection by using A* and genetic algorithm is introduced. Higuera and Dudek¹⁹ proposed a fair division-based MRTA approach as another centralized algorithm to handle the problem of allocating a single global task among heterogeneous robots. In order to solve the problem of curse of dimension caused by multi-agent reinforcement learning, Wadhwania et al. got inspiration from distillation and value-matching and proposed a new actor-critic algorithm and method for combining knowledge from agents with the same structure.²⁰

In configuration of distributed approaches, agents do not need to report to a central agent, and they do not need to be online at all times. They can freely communicate with other agents and assign tasks through negotiation with different demands. This type of algorithm is also widely used to solve MRTA problem. Farinelli et al. proposed a distributed online dynamic task assignment for multi-robot patrolling.²¹ Luo et al.²² proposed distributed algorithm for constrained multi-robot task assignment for grouped tasks. Various ambient assisted living technologies have been proposed for MRTA in Healthcare Facilities.²³ Emotional robots model use artificial emotions to endow the robot with emotional factors, which makes the robot more intelligent and adjust its behavior choice through the emotional mechanism. The introduction of emotional and personal factors improves the diversity and autonomy of robots.²⁴ Jonathan How’s team focused on using consensus-based auction algorithms to solve decentralized task assignments^25,26 and utilized game theory to prove that the consensus-based bundle algorithm converges to a pure strategy Nash equilibrium.²⁷ Johnson et al. used implicit coordination to satisfy allocation constraints.²⁸

Some relative localization algorithms for MAS use some prior information such as initial positions of the agents and exact positions of landmarks.²⁹ Shames et al.³⁰ presented the theoretical basis of relative localization between agents by using prior knowledge in maps. Zhou and Roumeliotis³¹ proposed a mechanism for relative localization in which agents try to observe each other in-time if they have a chance with unknown initial correspondence. Zkucur and Levent³² establishes the relationship between relative observation and map fusion. However, in most cases, agents cannot get enough prior knowledge for relative location. Some statistical estimation algorithms, such as Kalman filtering,³³ particle filtering,³⁴ Markov localization perform better in this case.³⁵

Most of the relative observation approaches serve for map fusion. In the mechanism proposed by this article, the data from relative observations between agents are also applied for optimizing the pose graph of the observed agents which improves the efficiency of data utilization.

DQN and rainbow

Q-learning is an off-policy learning method in reinforcement learning. It uses Q-table to store the cumulative value of each state–action pair. According to Bellman equation, when the strategy of maximizing Q value is adopted in every step, the $Q (s_{t}, a_{t})$ can be calculated as equation (1)

Q (s_{t}, a_{t}) = r_{t} + γ * ({argmax}_{a^{'}} Q (s_{t + 1}, a^{'}))

where $Q (s_{t}, a_{t})$ refers to the cumulative discounted reward when agent’s action is a_t at state s_t . s_t is the state on time t and $s_{t + 1}$ is the state on time $t + 1$ , a_t is the action adapted by agent at time t. The state of the system is transformed to $s_{t} + 1$ from s_t for the action a_t taken by agent.r_t is the immediate reward for the transition from s_t to $s_{t + 1}$ . γ is discounted factor. When the state and action space are high-dimensional or continuous, the maintenance of Q-table $Q (s, a)$ will become unrealistic. Therefore, with the problem of updating Q-table transformed into a problem of fitting function, the basic idea of DQN is to use the neural network to make the Q function $Q_{θ} (s, a)$ approach the optimal Q value by updating the parameter θ which refers to all weights of the deep neural network. The specific sequence of DQN³⁶ is shown in Algorithm 1.¹

Algorithm 1.

The training process of DQN.

where x is the state of the system that has not been purified by observation function and ϕ is the observation function, M and t respectively represent the number of iterations and the number of training samples. However, the combination of deep learning (DL) and reinforcement learning (RL) will inevitably lead to some problems, such as:

DL needs a large number of labeled samples for supervised learning; RL only has the reward of state–action pairs as return value.

The sample of DL is independent, and the state of RL is correlative.

The distribution of DL targets is fixed; the distribution of RL is always changing, that is, it is difficult to reproduce the situation that was trained before.

Previous studies have shown that the use of nonlinear networks to represent value functions is unstable.

This article used “Rainbow” to build the training model of MAS-DQN. Rainbow³⁷ is an integrated DQN. It mainly integrates the following algorithms to try to solve the above problems:

Double Q-Learning constructs two neural networks with the same structure but different parameters: behavior network and target network. When updating the model, the best action at $t + 1$ time is selected by behavior network, then the target value of the optimal action at $t + 1$ time is estimated by target network. By decoupling these two steps, it can reduce the impact of Q-Learning method on overestimation of value.

Priority replay buffer makes the model choose more samples that can improve the model to use the replay memory more efficiently.

Dueling DQN decomposes the value function into two parts, making the model easier to train and expressing more valuable information.

Distributional DQN changes the output of the model from computational value expectation to computational value distribution in order to let the model can learn more valuable information.

Noisy DQN adds some exploratory ability to the model by adding noise to the parameters, which is more controllable.

In order to overcome the weakness of slow learning speed in Q-learning earlier stage, multistep learning uses more steps of reward, the target value can be estimated more accurately in the early stage of training, thus speeding up the training speed.

Rainbow model integrates the advantages of the above models which proves the possibility of integrating the above features successfully.

The network for propose mechanism

Observation function for MAS-DQN based on ORBSLAM

In deep RL, the local information obtained by an agent is called observation. Unlike previous deep RL algorithms, the observation function of the proposed mechanism models all agents and determines the behavior of each agent at the same time instead of for only one agent. In this section, ORBSLAM feature vector and output of the observation function are introduced.

ORBSLAM feature vector

ORBSLAM¹⁰ is one of the most commonly used system among modern visual SLAM frameworks. It runs in three threads: In tracking thread, the system tracks motion of the platform by using tracked feature points; local mapping thread is used for constructing global map and optimizing local map, loop closure thread takes responsibility for detecting loop when agent revisited to an already mapped area, which can prevent increase of the accumulative errors.

ORBSLAM is deployed on each agent independently to simultaneously construct the map. In our system, the state of the environment consists of all pose estimation states of agents which are mainly determined by three items: map points, key frames, and loop closure detection. Some variables related to the three items are used to compose the state vector of the agent called ORBSLAM feature vector whose members are introduced as follow:

The first of component of the ORBSLAM feature vector is a variable related to the map point. The map point is constructed from the matched point features. We include the number of map points observed from current frame in feature vector.

The second of component of the feature vector is a variable related to the key frames which are the selected frames according to strict requirements to ensure that they contain sufficient map information. Some key frames proved to contain insufficiently accurate map information will be gradually removed. In order to estimate the content of map information at nearby locations which play a decisive role in pose estimation, the number of the generated new key frames and that of the eliminated old key frames from our last sampling time in the state vector are included in the state vector.

The third of component of the feature vector is a variable related to the result of loop closure. Loop closure detection is the problem that a mobile agent determines whether itself has returned to a previously visited location. If a closed-loop is detected, it will impose a strong constraint on global optimization. In feature vector, we add the time interval of the frame from last loop closure detected to measure the impact of loop closure to the current frame.

Another critical factor is the collection of distances from the agent to other agents. The longer the distance from the agent to target agent is, the bigger difficult to it to assist the target. The distance is calculated by D* algorithm. The characteristic of D* algorithm is that when the terrain between the target point and the source point is known, the shortest path and the shortest distance can be obtained. If not fully known, the shortest path in the known terrain and the estimated distance between the target point and the source point can be output. An ORBSLAM feature vector contains m − 1 (m is the number of agents) such distances, representing the distance between the agent and the other m − 1 agents, respectively.

An ordered arrangement of all the above variables is an ORBSLAM feature vector. On the next subsubsection, the observation function base on the ORBSLAM feature vector will be introduced.

The observation function

The output of the observation function represents the perception of the overall SLAM state of the whole MAS. The ORBSLAM feature vectors of all agents at the same time form an observation frame. Considering the continuity of the SLAM process, the previous states should be taken into count when the system makes macro-decision for each agent, so that the output of the observation function should include the current frame and the fixed number of preceding frames. These frames consist of an observation frame set, which eventually flattens into a vector as the output of the observation function. The specific steps of the observation function are shown in Algorithm 2.

Algorithm 2.

Observation function of the proposed algorithm.

While the parameter n is the number of observation frames, m is the total number of agents in the system, and $O (i, j)$ is the ORBSLAM2 feature vector of agent j at time i.

Reward function

The reward function is generally used to reflect the immediate benefit of an agent reaching a new state through action. In our proposed algorithm, the transition error is used to measure the reward, which is shown in the equation (3)

loss = \sqrt{{(x^{'} - x)}^{2} + {(y^{'} - y)}^{2} + {(z^{'} - z)}^{2}}

In order to embody the collaboration between agents, the reward function also adds the influence of agent’s target on his decision-making. The final reward function of agent is as equation (4)

r (i) = - (μ loss (i) + ε (1 - μ) ({loss}_{k} (t) - {loss}_{k - 1} (t)))

where ${loss}_{k} (i)$ represents the loss of agent i at time k. $({loss}_{k} (t) - {loss}_{k - 1} (t))$ denotes the contribution of agent i to his target t to assist. As the selfishness of the agent, μ is used to control the frequency of agent reorganization, ε depends on the result of the assistance, and its value, such as the equation (5), is discussed in detail in the next subsection

\{\begin{array}{l} ε = 1 & When the target agent is successfully assisted by this agent \\ ε = 0 & When the target is itself or the position for collaboration is not reached \\ ε = - 1 & When the collaboration is failed \end{array}

The output formation of the proposed DQN

There are $m * (m + 1)$ neurons in the output layer, where m refers to the number of agents, and the output of $(i - 1) * (m + 1) + j$ th neurons represents the cumulative reward value of the assistant behavior to the jth agent of the ith agent learned by DQN. The jth is 0 means that no action will be taken for the agent which is waiting for the assistance of other agents. SLAM is carried out independently for agent i when $j = i$ . Each agent will be given an order with the maximum cumulative reward ${argmax}_{a} Q (s, a)$ with a certain probability. How agents can assist other agents will be described in the next section. The structure of the whole neural network is shown in Figure 1.

Figure 1.

Structure of MAS-DQN. MAS: multi-agent system; DQN: Deep Q Network.

Relative observation between agents in the proposed mechanism

In the proposed mechanism, agents execute the order given by MAS-DQN through mutual observation with each others. As shown in Figure 2, red agents are assigned to assist blue agents. First, they shall reach the predetermined observation position. If all the predetermined positions cannot be reached, the observation will be judged as a failure, agent obtains the pose of the target through the relative localization of the target agent. The obtained position and posture will provide a strong constraint for the target agent to optimize the pose graph. In another word, it is used to improve the accuracy of correcting pose. This observation actually has the same effect as the loop closure detection. The way of calculating the pose of the target agent is introduced consequently.

Figure 2.

Assistance in MAS based on relative observation. MAS: multi-agent system.

Assume that agent i observes a feature point $p^{i j}$ of target agent j whose coordinate in agent i’s camera coordinate system is $p_{c}^{i j}$ , then the coordinate $p_{w}^{i j}$ in the world coordinate frame based on agent i’s pose can be obtained by equation (6)

p_{w}^{i j} = R_{i} * p_{c}^{i j} + t_{i}

where R_i refers to agent i’s rotation matrix which converts camera coordinates into world coordinates and t_i refers to translation matrix. R and t together constitute the pose of robot camera. The world coordinates of one feature point are the consistent, the relationship between the world coordinates of point p and the coordinates of point p in the camera coordinate system of agent j is shown in equation (7).

p_{w} = R_{j} * {p^{'}}_{c} + t_{j}

where R_j and T_j is the pose of agent j, ${p^{'}}_{c}$ is the coordinate of feature point p in agent j’s 3-D model. Because there may be multiple agents observing agent j at the same time, p_w may have multiple values, which makes it impossible to obtain the pose of agent j directly. Nonlinear optimization method can be used to continuously approximate the true value of pose. Firstly, the error of pose estimation is defined as equation (8)

e = \frac{1}{2} Σ_{i} Σ_{k} | | p_{k}^{i j} - ({R^{'}}_{j} * {p^{'}}_{k} + {t^{'}}_{j} {) | |}_{2}^{2}

where $p_{k}^{i j}$ is the coordinate of Kth point of agent j observed by agent i. $R^{'}$ , $t^{'}$ is the estimated value of agent $j^{'} s$ pose matrix at this moment. $R^{'}$ , $t^{'}$ can be expressed by Lie algebra $exp (ξ)$ , then the optimization of the target function can be expressed as

{min}_{ξ} (\frac{1}{2} Σ_{i} Σ_{k} | | p_{k}^{i j} - e x p (ξ {) | |}_{2}^{2})

The derivative of error term with respect to pose can be obtained by using Lie algebraic perturbation model, and the value of $R^{'}$ , $t^{'}$ can be obtained iteratively when e is the minimum value by using nonlinear optimization methods such as the gradient descent method. There are different solutions to solve this problem, such as gradient descent algorithm, Gauss Newton algorithm, Levenberg–Marquardt algorithm, and so on. In this article, Gauss Newton algorithm is adopted. Its time complexity is O (TM), where T is the total number of iterations, and M is the total number of error terms.

Effectiveness proof of the proposed algorithm

In this part, we will prove the effectiveness of our algorithm. That is to give the demonstration that after experience replay, the proposed DQN can make macro-decision to improve the efficiency of the whole MAS for each agent after training. We use the utility theory to model the performance of the whole MAS. As shown in the previous section, the effect of SLAM of a single agent x can be regards the immediate utility value of x which is associated with $loss (x)$

U (x) = - loss (x)

The expectation of the immediate utility value of a non-cooperative MAS is as follows

E (U (X)) = \frac{Σ_{x \in X} u (x)}{| X |}

The calculation of that in collaborative MAS is relatively complex since the effect of collaboration between agents on the overall system utility value should be considered, we use a graph to represent the cooperation of the whole system. An agent will be regard as a node in the graph and a directed edge from the itself to the target will be established as shown in Figure 3. If the collaboration is not executed at that time, the edge is the solid line. If the collaboration is just completed, the edge is a double-line edge. If the collaboration fails, the line is a dashed line, and if there is no collaboration object, the arrow points to itself. The whole MAS can be divided into several maximally connected subgraphs, as shown in the Figure 3, where one subgraph is a group which can represent the collaboration relationship between agents. The total utility value of a group $U (g)$ be represented as equation (12)

u {(g)}_{x \in g} = (u (x) + c (x, x^{'}))

where $x^{'}$ is the target of x and $C (x, x^{'})$ is the effect of assistance $({loss}_{k} (t) - {loss}_{k - 1} (t))$ . If x belongs to g, then x must not belong to any subgraph of $G - g$ .The utility of MAS is represented as equation (13)

E (u (X)) = \frac{Σ_{g \in G} U (g)}{| x |} = \frac{Σ_{x \in X} r (x)}{| X |}

Figure 3.

Graph of coordination.

In summary, to calculate the cumulative reward of all agents is to calculate the total cumulative utility value. MAS-DQN will gradually learn the cumulative utility function $U (S, A (X))$ . Where S and $A (X)$ respectively refer to the state space and action space of MAS.

The last layer of MAS-DQN is the decision-making layer. All the layers before the decision layer are called feature extraction layer. The feature extraction layer is actually a high-level abstraction of ORBSLAM feature vector. The weights connected with the decision-making level determine the discrepancy of different decision-making of agents. That is, in the face of the same extracted high-level characteristics, different action produce different cumulative reward. This situation is caused by the different attributes between agents, which DQN learns through the cooperation between agents.

Sequences of the proposed mechanism

The implementation steps of the proposed overall cooperation mechanism for multi-agent SLAM are shown in Algorithm 3.

Algorithm 3.

Complete algorithm.

The system consists of mobile agents and organizer which MAS-DQN is deployed on. The following is a detailed explanation of this algorithm: Initialize the whole algorithm and all variables (01). Organizer broadcasts the requirements for synchronizing agent information and waits for the reply of all the agents (03–04). Each agent respond organizer with their world coordinates, built local maps, ORBSLAM feature vectors, and assistance status (05–07). The local maps are not the final form of maps. They are only 2-D maps that provide information for agent global navigation. After the organizer obtains the information returned by all agents (08), the first thing to do for it is to merge the local maps of all agents and feedback the global maps to agents (09–10). The second mission is to get the output of the observation function according to the set of ORBSLAM feature vector of all agents (11). Next, the immediate reward of each agent is calculated (12). Finally, the output is imported to MAS-DQN to get the action of each agent which will be broadcasted (13–14). $T (ϕ_{t - 1}, a_{t - 1}, r_{t}, ϕ_{t})$ is added to experience reply buffer (15–16). After obtaining enough sample information, MAS-DQN will be trained (17–21). Agent x decode its own goals from the organizer’s instructions a (23). Agent x’s target means is the agent which it assist, if the target of it is itself x, x will SLAM independently (24–25). If x is not assigned a target, it will stop to wait for the assistance (26–27). If x get one target, it will follow the calculated path to approach his intended location and update his assistance status (28–54). Life is a constant, which is set up to prevent assistance from going on indefinitely (30–36). This loop will continue until time runs out (02–57).

Experiment

The simulation experiments introduced in this section is to evaluate the performance of the proposed mechanism and MAS-DQN with different parameters. The experiment was implemented on the Robot Operating System (ROS) platform which can be used to simulate the real physical environment. We build one simulation mobile agent with the “telobot” model which equips “Kinect” as vision sensor. The initial pose of each robot is known and ORBSLAM is deployed independently on each robot. The experimental world is shown as Figure 4.

Figure 4.

Experimental world.

The main indicators to measure the effect of SLAM in MAS are translation root mean square error (RMSE), orientation RMSE of the tracks of agents. Although agents in experimental environments have the same model, we can embody the differences of their attributes by changing their ability values related to motion and vision sampling. Their maximum angular acceleration, linear acceleration, maximum angular velocity, and maximum linear velocity obey normal distribution $N (μ, σ_{1})$ , in which $σ_{1}$ is used to control the discreteness of the ability values of agents, so that the performance of the algorithms can be tested when the value of agents’ attributes vary with different degrees. When it comes to visual sensors, we add noise that obeys the normal distribution $N (μ, σ_{2})$ to the images captured by visual sensor, and the variance of the camera noise of different agents $σ_{2}$ still obeys the normal distribution $N (1, σ_{1})$ .

We set up four experiments to verify the effectiveness of our algorithm. The following are the details of all experiments and the corresponding experimental results. The experiments have been done with $σ_{1} = 0.2$ . The time step is set as 0.5 s. All the comparative experiments were done after the convergence of training process for MAS-DQN.

Experiment 1: This experiment shows the role of collaboration in multi-agent SLAM. The results are shown in Figure 5 in which the two algorithms for comparison are explained as follows:

Case1: The proposed collaboration mechanism based on MAS-DQN.

Case2: Each agent SLAM independently without any collaboration.

Figure 5.

Result of experiment 1.

The proposed algorithm reduce translation RMSE (m) by 35.68% orientation RMSE (°) by 32.02% compared with the mechanism without coordination

We use these two different mechanisms to generate octomaps³⁸ of the same environment, which are shown in Figures 6 and 7.

Figure 6.

The octomap built by the proposed mechanism.

Figure 7.

The octomap built by the agents in MAS without any coordination. MAS: multi-agent system.

Experiment 2: The main purpose of this experiment is to compare mechanism that agents collaborate with each other without considering attributes and states. The experimental results are shown in the Figure 8, which illustrate that the proposed algorithm reduce translation RMSE by 28.33% and orientation RMSE by 31.68%.

Case 1: The proposed collaboration mechanism based MAS-DQN.

Case 2: Relative localization based on cluster matching algorithm without distance IR in the environment.³⁹

Figure 8.

Result of experiment 2.

Experiment 3: This experiment is used to compare the proposed algorithm with other feature-based MRTA algorithms that take agent attributes as constants. The experimental results are shown in the Figure 9.

Case 1: The proposed collaboration mechanism based on MAS-DQN.

Case 2: The proposed collaboration mechanism based on emotional recruitment model.

Case 3: The proposed collaboration mechanism based on auction model.

Figure 9.

Result of experiment 3.

The proposed algorithm reduce translation RMSE (m) by 13.00% and orientation position RMSE (°) by 23.01% compared with the mechanism based on emotional recruitment model. Compared with auction model, the MAS-DQN reduced translation RMSE by 16.03% and orientation RMSE by 28.32%.

Experiment 4: In order to test the effectiveness of our algorithm in dealing with different degrees of difference in agent attributes, experiment 4 were performed when variance $σ_{1}$ was set to different values. The result of comparative experiment is show in Tables 1 and 2. The proposed method reduces the average growth rate of translation RMSE by 16.58% and 26.10% by compared with auction model and emotional recruitment model. The average growth rate of orientation RMSE is reduced by 10.04% and 7.48%.

Table 1.

Translation RMSE (m) with different variance.

AlgorithmVariance	0.15	0.2	0.25
Emotional recruitment model	0.0638	0.0692	0.0752
Auction model	0.0657	0.0717	0.0761
Proposed algorithm	0.0568	0.0602	0.0643

Table 2.

Orientation RMSE with different variance.

AlgorithmVariance	0.15	0.2	0.25
Emotional recruitment model	4.689	4.849	5.620
Auction model	4.792	5.208	5.776
Proposed algorithm	3.326	3.733	3.928

Experiment 5: This experiment mainly compares our algorithm and the multi-agent SLAM algorithm called MR-vSLAM which only uses the prior information of map to collaborate in the aspect of localization.⁴⁰ The experimental results are shown in the Figure 10, which illustrate that the proposed algorithm reduces translation RMSE by 7.81% and orientation RMSE by 13.80%.

Figure 10.

Result of experiment 5.

Table 3 compares the computational costs of several algorithms mentioned in this article. Where the n is the number of agent, k is the number of natural landmark, L is the number of hidden layers of the MAS-DQN, C_l is the number of neurons of the lth hidden layer.

Table 3.

Comparison of computational cost of the algorithms.

Algorithm	Emotional recruitment model	Auction model	MR-vSLAM	The proposed algorithm
Computational cost	$O (n)$	$O (n)$	$O (n log (k))$	$O (n C_{1} + Σ_{l = 1}^{L - 1} (C_{l} C_{l + 1}) + C_{L} n^{2})$

Conclusion

This article proposes a novel centralized multi-agent cooperative SLAM mechanism with considering the status and attributes of agents. The algorithm applies a new observation function based on ORBSLAM to apperceive the SLAM state of the whole MAS, a rainbow-based deep RL framework called MAS-DQN is designed to learn the overall utility function $U (S, A (x))$ of the system, and the effectiveness of the framework is proved. The simulation results show MAS-DQN reduces transition RMSE (m) by 13.00%, 13.06%, and 7.81% compared with emotional recruitment model, auction model, and MR-vSLAM, the orientation RMSE (°) is also reduced by 24.42%, 29.64%, and 13.80% which is considered as great improvement on recently researched approaches. This algorithm inescapable brings huge computational cost to organizer, which makes the real time of the system unable to be guaranteed. In the future, we will focus on distributed learning system to reduce the burden.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a special fund project of Harbin Science and Technology Innovation Talents Research (grant no. RC2013XK010002) and the National Natural Science Foundation of China (grant no. 61375081).

ORCID iDs

Zhaoyi Pei

Muhammad Zuhair Qadir

References

Rahman

Mahmud

AMT

, et al. Multi-agent approach for enhancing security of protection schemes in cyber-physical energy systems. IEEE Trans Ind Inform 2017; 13(2): 436–447.

Azimi

Delavar

Rajabifard

. Developing a multi-agent based modeling for smart search and rescue operation. Geoinformation for disaster management conference, Girne, Cyprus, 14 March 2018, pp. 133–157. Cham: Springer.

Sun

Cao

, et al. Pursuing an evader through cooperative relaying in multi-agent surveillance networks. Automatica 2017; 83: 155–161.

Saaidia

Bellamine

Amor

. Temporal performances evaluation of multi-robot demining system inspired by ant behavior. Int J Comput Sci Inf Secur 2016; 14(6): 16.

Espina

Grech

De Jager

, et al. Multi-robot teams for environmental monitoring. Innov Def Support Syst 2011; 336: 183–209.

Shkurti

Meghjani

, et al. Multi-domain monitoring of marine environments using a heterogeneous robot team. 2012 IEEE/RSJ international conference on intelligent robots and systems, Vilamoura, Portugal, 7–12 October 2012, pp. 1747–1753. IEEE.

Rodrguez

De Paz

Villarrubia

, et al. Multi-agent information fusion system to manage data from a WSN in a residential home. Inf Fusion 2015; 23: 43–57.

Jemaa

Ltifi

Ayed

. Visual intelligent remote healthcare monitoring system using multi-agent technology. 2016 2nd international conference on advanced technologies for signal and image processing (ATSIP), Monastir, Tunisia, 21–23 March 2016, pp. 465–471. IEEE.

Surmann

Berninger

Worst

. 3D mapping for multi hybrid robot cooperation. 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) 2017, Vancouver, BC, Canada, 24–28 September 2017, pp. 626–633. IEEE.

10.

Mur-Artal

Montiel

JMM

Tardós

. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 2017; 31(5): 1147–1163.

11.

Lazaro

Paz

Pinies

, et al. Multi-robot SLAM using condensed measurements. In: Proceedings of the 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo, Japan, 3–7 November 2013, pp. 1069–1076. IEEE.

12.

Grisetti

Kümmerle

Stachniss

, et al. A tutorial on graph-based SLAM. IEEE Intell Transp Syst Mag 2011; 2: 31–43.

13.

Lauritzen

Nair

Ullrich

. A conservative semi-Lagrangian multi-tracer transport scheme (CSLAM) on the cubed-sphere grid. J Comput Phys 2010; 229(5): 1401–1424.

14.

Eliazar

Parr

. DP-SLAM 2.0. In: Proceedings of the IEEE international conference on robotics and automation, New Orleans, LA, USA, 26 April–1 May 2004, pp. 1314–1320. IEEE.

15.

Huletski

Kartashov

. A SLAM research framework for ROS. Proceedings of the 12th Central and Eastern European software engineering conference in Russia, Moscow Russia, October 2016. Article No.12.

16.

Al-Yafi

Lee

Mansouri

. Mtap-masim: a multi-agent simulator for the mobile task allocation problem. In: IEEE international workshop on enabling technologies: infrastructures for collaborative enterprises, Groningen, Netherlands, 29 June 2009, pp. 25–27.

17.

Coltin

Veloso

. Mobile robot task allocation in hybrid wireless sensor networks. In: Intelligent robots and systems (IROS), Taipei, Taiwan, 18–22 October 2010, pp. 2932–2937. IEEE.

18.

Liu

Kroll

. A centralized multi-robot task allocation for industrial plant inspection by using a* and genetic algorithms. In: Proceedings, part II 11th international conference artificial intelligence and soft computing, ICAISC 2012, Zakopane, Poland, 29 April–3 May 2012, pp. 466–474.

19.

Higuera

Dudek

. Fair subdivision of multi-robot tasks. In: 2013 IEEE international conference on robotics and automation (ICRA), Karlsruhe, Germany, 6–10 May 2013, pp. 3014–3019. IEEE.

20.

Wadhwania

Kim

Omidshafiei

, et al. Policy distillation and value matching in multiagent reinforcement learning. arXiv preprint arXiv:1903.06592, 2019.

21.

Farinelli

Iocchi

Nardi

. Distributed on-line dynamic task assignment for multi-robot patrolling. Auton Robots 2017; 41(6): 1321–1345.

22.

Luo

Chakraborty

Sycara

. Provably-good distributed algorithm for constrained multi-robot task assignment for grouped tasks. IEEE Trans Robot 2015; 31(1): 19–30.

23.

Das

McGinnity

Coleman

, et al. A distributed task allocation algorithm for a multi-robot system in healthcare facilities. J Intell Robot Syst 2015; 80(1): 33–58.

24.

Jafari

Carrillo

LRG

. Brain emotional learning-based intelligent controller for flocking of multi-agent systems. American control conference, Seattle, WA, USA, 24–26 May 2017. IEEE.

25.

Choi

Brunet

How

. Consensus-based decentralized auctions for robust task allocation. IEEE Trans Robot 2009; 25(4): 912–926.

26.

Brunet

Choi

How

. Consensus-based auction approaches for decentralized task assignment. AIAA guidance, navigation and control conference and exhibit, Honolulu, Hawaii, 18–21 August 2008, p. 6839.

27.

Choi

Kim

Johnson

, et al. Potential game-theoretic analysis of a market-based decentralized task allocation algorithm. In: Asama

Fukuda

(eds) Distributed autonomous robotic systems. Tokyo: Springer, 2016, pp. 207–220.

28.

Johnson

Choi

How

. The hybrid information and plan consensus algorithm with imperfect situational awareness. In: Asama

Fukuda

(eds) Distributed autonomous robotic systems. Tokyo: Springer, 2016, pp. 221–233.

29.

Saim

Munawar

Al-Saggaf

. An overview of localization methods for multi-agent systems. Int J Eng Res Appl 2017; 7(01): 19–24.

30.

Shames

Fidan

Anderson

, et al. Self-localization of mobile agents in the plane. In: 2008 3rd international symposium on wireless pervasive computing santorini, Greece, 7–9 May 2008, pp. 116–120. IEEE.

31.

Zhou

Roumeliotis

. Multi-robot SLAM with unknown initial correspondence: the robot rendezvous case. In: Proceedings of the 2006 IEEE international conference on intelligent robots and systems, Beijing, China, 9–15 October 2006, pp. 1785–1792. IEEE.

32.

Zkucur

Levent

. Cooperative multi-robot map merging using Fast-SLAM. In: RoboCup 2009. Berlin: Springer-Verlag, 2010.

33.

Martinelli

Pont

Siegwart

Multi-robot localization using relative observations. Proceedings of the 2005 IEEE international conference on robotics and automation, Barcelona, Spain, 18–22 April 2005. IEEE.

34.

Prorok

Martinoli

. A reciprocal sampling algorithm for lightweight distributed multi-robot localization. 2011 IEEE/RSJ international conference on intelligent robots and systems, San Francisco, CA, USA, 25–30 September 2011. IEEE.

35.

Burgardet

. Experiences with an interactive museum tour-guide robot. Artif Intell 1999; 114: 3–55.

36.

Mnih

Kavukcuoglu

. Human-level control through deep reinforcement learning. Nature 2015; 518(7540): 529–533.

37.

Hessel

Modayil

Van Hasselt

, et al. Rainbow: combining improvements in deep reinforcement learning. Thirty-second AAAI conference on artificial intelligence. New Orleans, LA, USA, February 2–7 2018.

38.

Hornung

Wurm

Bennewitz

, et al. OctoMap: an efficient probabilistic 3D mapping framework based on octrees. Auton Robot 2013; 34(3): 189–206.

39.

Rashid

Frasca

. Multi-robot localization and orientation estimation using robotic cluster matching algorithm. Robot Auton Syst 2015; 63:108–121.

40.

Quande

. Research of multi-robot cooperative SLAM based on Vision [D]. Harbin: Harbin Institute of Technology, 2016.

Active collaboration in relative observation for multi-agent visual simultaneous localization and mapping based on Deep Q Network

Abstract

Keywords

Introduction

Related work

DQN and rainbow

The network for propose mechanism

Observation function for MAS-DQN based on ORBSLAM

ORBSLAM feature vector

The observation function

Reward function

The output formation of the proposed DQN

Relative observation between agents in the proposed mechanism

Effectiveness proof of the proposed algorithm

Sequences of the proposed mechanism

Experiment

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References