Sage Journals: Discover world-class research

Abstract

Efficient exploration in unknown environments is a critical challenge for search and rescue (SAR) robots. Existing methods neglect the balance of movement cost and rescue priority, rely on prior maps, and suffer from redundant paths and high communication overhead. This paper proposes an information-driven POMDP-based multi-robot formation framework for map-free autonomous SAR. We design a POMDP model with environmental semantics and human experience for dynamic cost-priority balance, a grid centroid-based topological path method for full coverage, and a distributed node fusion communication strategy for low-overhead collaboration. Simulation and real-world experiments show that our method reduces SAR completion time by up to 22.1%, cuts 90% of victim rescue time by 36.2%, and lowers path repetition by over 30% compared with state-of-the-art methods. It significantly improves the exploration and collaborative efficiency of multi-robot SAR systems, providing a robust solution for time-critical SAR missions.

Keywords

Mobility and navigation robotic exploration scheduling and planning multi-robot system

Introduction

Autonomous exploration is a fundamental research direction in mobile robotics, which enables robots to navigate efficiently in unknown and complex environments by integrating simultaneous localization and mapping (SLAM)^1–4 and path planning algorithms. This technology is of great practical value for search and rescue (SAR) missions, where robots can replace human rescuers to execute tasks in hazardous scenarios such as toxic gas leaks, building fires, and earthquake-induced collapses. It not only reduces the operational risks of rescue workers but also effectively improves the survival probability of trapped persons in disaster areas.⁵ However, in debris-filled unstructured environments, traditional SAR robots are prone to falling into local optima and cannot make optimal autonomous decision-making, which often requires manual intervention. This results in low SAR operational efficiency and severely hinders the realization of fully autonomous SAR systems.^6,7 SAR missions are typically modeled as coverage path planning (CPP) problems,⁸ whose core goal is to design an efficient exploration strategy for robots to traverse all key areas in the target environment and complete the rescue task within a limited time. Although existing research has made considerable progress in CPP, exploration strategies, learning-based decision-making, and multi-robot collaboration, there are still prominent limitations that cannot meet the requirements of real-time and efficient autonomous SAR. Most traditional CPP methods only focus on the priority of the current grid in path planning and fail to establish a quantitative mechanism to balance short-term movement costs and long-term rescue benefits,^9–11 leading to frequent path retracing and low rescue efficiency. Frontier-based exploration methods generate random sampling points to select the next exploration target^12–14; although increasing sampling points can expand the exploration range, most points are irrelevant to SAR mission objectives, which consumes substantial computational resources and hinders real-time decision-making of robots. Learning-based methods exhibit strong performance in handling complex nonlinear relationships in SAR environments,^15–17 but they rely on massive labeled training data and high computational costs, and their generalization ability across diverse unknown disaster scenarios cannot be guaranteed. In addition, traditional path planning algorithms struggle to balance the dual goals of full environmental coverage and minimal redundant motion,^18,19 and the traditional point-to-point communication mode of multi-robot systems leads to high communication overhead and easy congestion, which further reduces the collaborative efficiency of SAR missions (Figure 1). In summary, three core research gaps exist in the current autonomous SAR robotic systems: (1) the functional significance of different environmental areas is neglected in the decision-making process, and there is no effective way to dynamically balance movement costs and rescue priorities in unknown environments; (2) most exploration and decision-making methods rely on random sampling or prior environmental maps, which leads to high computational overhead or poor adaptability to completely unknown SAR scenarios; (3) multi-robot collaboration lacks an efficient information-sharing strategy, resulting in redundant exploration and high communication overhead. To fill these gaps, this article proposes a novel multi-source information-driven framework for multi-robot formation SAR, which enables robots to achieve autonomous navigation and optimal decision-making without prior environmental maps. Inspired by the human SAR strategy of using zoning maps and human activity distribution to plan search routes, we integrate environmental semantic information and human rescue experience into the robot decision-making model, and design a dedicated topological path planning method and a distributed multi-robot collaboration strategy to improve the overall SAR efficiency. Specifically, the proposed framework consists of three key components: first, a POMDP decision-making model incorporating environmental semantic information and human experience is established, which uses a reward mechanism based on human activity density to guide robots to high-priority rescue areas and adjusts robot actions dynamically according to the matching relationship between environmental objects and functional areas. Second, a grid centroid-based topological path generation method is developed, which constructs a crisscross passable topological path by taking grid cell centers as fixed sampling points and discretizing the action space into eight directions, and determines the next target point through the breadth-first search (BFS) algorithm to ensure full environmental coverage and minimal path retracing. Third, a distributed multi-robot formation strategy based on node fusion communication is designed, where each robot maintains its own local topological path and uploads exploration information to a public fusion node; the fusion node integrates local data to form a global topological path and redistributes it to all robots, avoiding redundant searching and the inefficiency of traditional point-to-point communication. The main contributions of this work are summarized as follows:

An information-driven POMDP decision model is proposed, which fuses SAR priorities and environmental semantic information to realize the dynamic balance of movement costs and rescue priorities, and improves the autonomous decision-making efficiency of robots in unknown SAR environments.

A grid-based topological path planning algorithm for SAR tasks is developed, which adopts deterministic sampling of grid centroids to construct a full-coverage crisscross topological path, and combines the BFS algorithm to determine the next exploration target, effectively reducing path repetition and computational overhead.

A distributed multi-robot formation method based on node fusion communication is designed, which realizes the sharing of local topology and exploration information among robots without direct inter-robot communication, minimizing redundant exploration and communication overhead, and improving the collaborative efficiency of multi-robot SAR systems.

The rest of this article is organized as follows: Section “Related Work” reviews the related work in SAR robotic systems and analyzes the limitations of existing methods in detail. Section “Methods” elaborates on the specific design and implementation of the proposed information-driven POMDP model, topological path planning algorithm, and multi-robot formation strategy. Section “Experiments & results” presents the simulation and real-world experiments to verify the effectiveness and superiority of the proposed method. Finally, Section “Conclusion” draws the conclusions and outlines the future research directions.

Figure 1.

The search and rescue (SAR) task environment is divided into a grid map with priorities. Each robot makes optimal decisions based on SAR priority and a semantic information-driven POMDP model, and maintains the global topological path through distributed node communication.

Related work

In SAR scenarios, efficient and comprehensive exploration of unknown environments is critical for robotic systems. This task requires robots to possess efficient autonomous decision-making capabilities while navigating in complex terrains and prioritizing areas based on urgency and potential victim presence. Previous works have explored various approaches to address these challenges, but limitations remain in terms of scalability, real-time efficiency, and integration of multi-source information. Early efforts in SAR robotics focused on CPP, which aims to efficiently traverse all specified locations within an environment. Gajjar et al.²⁰ proposed a cost function to guide robots toward the minimum cost grid, enabling complete traversal. Wang et al.²¹ introduced a comprehensive CPP strategy based on the artificial bee colony algorithm, optimizing traditional neural network methods. Zhu et al.²² applied the Glasius Bio-inspired Neural Network (GBNN) algorithm for multi-autonomous underwater vehicle cooperative CPP. Chen et al.²³ proposed an improved Bio-inspired Neural Network model to reduce the interference of complex obstacle environments on robot decision-making, and introduced a Dual Improved Bio-inspired Neural Network to replace the GBNN. However, these methods often fail to account for the varying importance of different waypoints, treating all areas with equal priority. In SAR scenarios, this limitation can lead to inefficient allocation of rescue efforts and reduced effectiveness of the mission. To address the limitations of traditional CPP methods, recent works have incorporated human experience and environmental semantics into SAR missions. Shih et al.⁹ constructed a grid probability map, assigning values to grids based on the likelihood of finding trapped individuals, and used a greedy search algorithm to guide robot navigation. Song et al.¹¹ and Juan et al.²⁴ abstracted real-life SAR scenarios into grid maps with priority levels based on environmental characteristics. These methods have limited dynamic decision-making capabilities and ignore long-term benefits and the value of exploring non-urgent areas, resulting in poor SAR effectiveness. We feed the experience-based SAR priority information to SAR robots as a grid area function map, which the robots adopt as long-term rewards in the dynamic decision-making process to enhance SAR efficiency. Meanwhile, the semantic perception module enables the robot to locally fine-tune SAR decisions by matching the correlation between environmental objects and functional areas.

POMDPs have been widely applied to robotic navigation to enable intelligent decision-making in dynamic and partially observable environments. Wu et al.²⁵ proposed a POMDP framework for robot decision-making, which selects actions to maximize long-term rewards. Aydemir et al.²⁶ and Niroui et al.²⁵ incorporated semantic information into POMDP for object search and exploration in unknown environments. However, these methods often rely heavily on prior environmental maps, limiting their applicability in real-world SAR scenarios.

Exploration strategies can be used to address these problems, achieving robot navigation in unknown environments without prior maps. Umari and Mukhopadhyay²⁷ presented an exploration strategy based on multiple rapidly-exploring random trees (RRT), guiding robots toward unexplored areas by detecting frontiers. Lindqvist et al.²⁸ proposed an Exploration-RRT algorithm for 3D missions, considering potential information gain and travel costs. Chi et al.²⁹ introduced a generalized Voronoi diagram-based heuristic path planning method to improve RRT efficiency. Liu et al.³⁰ proposed a heuristics-biased sampling-based robot exploration strategy; they utilized the semantic information of the environment as the heuristics and eliminated the disadvantages of ignoring the geometric continuity of the environment in traditional RRT. However, these random sampling methods often generate numerous irrelevant nodes during the exploration process, which leads to significant computational overhead and inefficient path planning. This inefficiency is further exacerbated by the need for extensive backtracking and repeated sampling, ultimately resulting in time-consuming exploration processes. We design a path coverage method suitable for SAR missions. By employing a fixed sampling method based on grid centers to generate a crisscross topological path, we ensure high environmental coverage while improving sampling efficiency and reducing path retracing.

In addition, multi-robot systems have been explored to enhance SAR efficiency through collaborative efforts. Murtaza et al.³¹ created a priority grid map for unmanned aerial vehicles (UAVs), applying POMDP algorithms to optimize SAR missions. This method requires direct inter-robot communication, leading to high communication overhead and reduced real-time efficiency. We maintain global topological paths through distributed node communication, avoiding the inefficiencies of point-to-point direct communication.

Recently, artificial intelligence-based optimal control laws for uncertain autonomous systems have shed new light on SAR robot decision-making and exploration. For instance, bioinspired machine learning-enabled optimal path planning for faulty UAVs achieves trajectory optimization under hardware fault uncertainty,³² and the exploration–exploitation adaptive law boosts the adaptability of model-free control for autonomous systems under environmental and model uncertainty.³³ These methods focus on optimal control under single or specific uncertainty via data-driven or bioinspired learning. In contrast, our POMDP framework integrates environmental semantic information and human experience, balancing movement costs and rescue priorities through belief state update, and is thus more suitable for comprehensive exploration and multi-objective decision-making in unknown complex SAR environments. Meanwhile, our grid-based topological path planning and distributed node fusion communication effectively compensate for the limitations of the above methods in large-scale environmental coverage and multi-robot collaborative exploration.

Methods

The overall block diagram of the system is shown in Figure 2. The proposed multi-robot SAR framework hierarchically integrates autonomous exploration, CPP, and POMDP-based decision-making into a closed-loop perception–decision–execution pipeline, with each module providing exclusive and essential information for the upper layer to ensure efficient exploration in unknown environments. Autonomous exploration is the environmental modeling foundation, which fuses LiDAR and SLAM to generate real-time grid occupancy maps and detect inter-grid passability, answering where the robot can move and providing basic environmental data for subsequent modules. CPP acts as the action set construction bridge, which builds a crisscross topological path via grid centroid sampling and screens reachable unsearched grids as the POMDP action set. It addresses how to traverse the environment with minimal retracing and reduces the computational redundancy of POMDP decision-making. POMDP-based decision-making is the top-layer optimal decision core, which fuses environmental semantics, SAR priorities, and the CPP-generated action set to balance movement costs and rescue urgency, select the optimal next grid, and answer where the robot should move first for the SAR mission. The three modules are tightly coupled: autonomous exploration updates the environmental model for CPP, CPP generates feasible actions for POMDP, and POMDP’s optimal decision drives robotic movement to initiate a new exploration cycle. All modules are integrated into the distributed node-based multi-robot formation strategy, ensuring consistent environmental information and collaborative decision-making for the entire formation.

Figure 2.

During each planning phase, each robot identifies all next movable grids based on the global topological path and its current position, forming an action set. It then employs the POMDP solver to determine the optimal next grid with the highest expected rewards. After executing the move, the robot acquires a new observation and updates the confidence probabilities using the observation function.

Specifically, the SAR environment is divided into a grid map with prior information, and each grid is assigned priority based on functional characteristics and environmental semantic features. During the exploration process, each robot maintains a crisscross pattern topological path, and these local topological paths are then integrated into a global topological path, which provides all the robots with the next reachable grids set. At each moment, robots calculate the reachable grids set and the most beneficial grid based on the POMDP framework. When the robot reaches the center of the grid, it will obtain the observation data and update the status, representing the completion of the search for the grid.

POMDP formulation

During human-led SAR operations, the urgency and significance of rescue efforts vary greatly depending on the location, and rescuers typically prioritize regions with prior information. Inspired by human SAR strategies, robots are also designed to categorize the environment into different priorities based on prior information, such as the extent of building structural damage and the functional characteristics of each area. In this work, we propose a POMDP framework to improve SAR efficiency by combining SAR priorities and semantic information. The POMDP framework can be described as a seven-component tuple ( $S, A, Ω, T, R, O, γ$ ), with details as follows:

$C (a) = α \cdot d (a) + β \cdot O (a)$ , $d (a)$ is the grid movement distance of action $a$ in the original topology path (adjacent = 1, diagonal = 2).

$P (s) = δ \cdot P (o_{1} | s)$ , $P (o_{1} | s)$ represents the probability of observing high human activity $o_{1}$ in the state $s$ .

$S$ : denotes a finite state set that represents different priority areas in the SAR environment, and the size of this set corresponds to the number of environmental priorities.

$A$ : a finite set of actions, which represents the grids where the robot can go in the next step, following the crisscross pattern topological path proposed by our path planning method.

$Ω$ : a finite set of observations, representing the observation of trapped persons. It is assumed that areas with higher priority should exhibit higher levels of human activity. The robot would automatically obtain the observation information and update the belief $b (s)$ when it arrives at the grid with humans.

$T (s, a, s^{'}) = T (s^{'} | s, a)$ : the probability of transitioning from $s$ to $s^{'}$ with action $a$ .

$R (s, a) = [P (s) / max (P (s))] - [C (a) / max (C (a))]$ : the reward when execute action $a$ in $s$ .

The reward function $R (s, a)$ is the core for the dynamic balance of movement costs $C (a)$ and rescue priorities $P (s)$ . By maximizing $R (s, a)$ , the robot autonomously selects actions with high rescue priority and low movement cost. The normalization with $max (P (s))$ and $max (C (a))$ unifies the metric dimensions, ensuring the balanced tradeoff is integrated into the POMDP belief update $b (s)$ and optimal policy $π *$ of the original model.

$O (o, s^{'}, a) = P_{r} (o | s^{'}, a)$ : the probability of observing $o$ if state $s^{'}$ is reached after executing action $a$ on the previous time step.

$γ (0 \leq γ \leq 1)$ : discount factor for all the future rewards.

$b (s)$ : The confidence probability for each possible state. In the POMDP model, since the environment is partially observable, the robot cannot directly determine its exact current state. Instead, it maintains a probability distribution representing its belief $b (s)$ over each possible state. Robot will update its belief state from $b (s)$ to $b (s^{'})$ by using the Bellman equation (1).

{\begin{matrix} b^{'} (s^{'}) = \frac{O (s^{'}, a, o) \sum_{s} T (s, a, s^{'}) b (s)}{P_{r} (o ∣ a, b)} \\ P_{r} (o ∣ b, a) = \sum_{s^{'}} O (s^{'}, a, o) \sum_{s} T (s, a, s^{'}) b (s) \end{matrix}

(1)

In the process of POMDP, the value function

V_{n}^{*} (b)

is a fundamental concept that quantifies the expected total discounted reward when the agent executes the optimal policy

π^{*}

, as shown in (2).

V_{n} * (b) = max_{a \in A} [ρ (b, a) + γ \sum_{o \in Ω} P_{r} (o ∣ b, a) V_{n - 1} * (b^{'})]

(2)

Specifically,

V_{n}^{*} (b)

represents the maximum expected cumulative reward that can be achieved over

n

time steps, considering the uncertainty in the state and the partial observability of the environment. The optimal policy

π^{*}

is determined by selecting the action that maximizes the total discounted reward at each belief state

b

. This decision-making process is guided by the Bellman principle of optimality, which states that an optimal policy has the property that, regardless of the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. Applying the Bellman principle to the value function

V_{n}^{*} (b)

, we can decompose it into the sum of the immediate reward and the expected future total discounted reward for the remaining

n - 1

time steps. The immediate reward

ρ

in (2) is defined as the expected reward when the agent takes action

a

in belief state

b

ρ (b, a) = \sum_{s} b (s) R (s, a)

(3)

Furthermore, the horizon

h \in N

denotes the number of steps the agent considers in the planning process. While infinite-horizon POMDPs can be solved by using methods such as PBVI and SARSOP, these approaches are computationally intensive due to the complexity of optimizing over large belief spaces. Given the time-sensitive nature of SAR missions, where rapid decision-making is crucial, the infinite horizon approach is not suitable. Instead, we adopt a finite horizon approach to facilitate efficient decision-making. The solution formula for the finite horizon value function

V_{n}^{*} (b)

is presented as follows:

{\begin{matrix} V_{n} * (b) = max_{a \in A} [ρ (b, a) + γ \sum_{o \in Ω} P_{r} (o ∣ b, a) V_{n - 1} * (b^{'})] \\ V_{n - 1} * (b) = max_{a \in A} [ρ (b, a) + γ \sum_{o \in Ω} P_{r} (o ∣ b, a) V_{n - 2} * (b^{'})] \\ ⋮ \\ V_{0} (b) = max_{a \in A} [ρ (b, a)] = {max}_{a \in A} [\sum_{s} b (s) R (s, a)] \end{matrix}

(4)

This method recursively decomposes the value function into smaller subproblems, allowing for efficient computation of the optimal policy over a finite number of steps. By using this finite horizon approach, we can balance the tradeoff between computational complexity and the quality of the solution, making it suitable for real-time SAR missions.

Grid-based topological path planning for SAR

Contrary to the stochastic sampling strategy employed by RRTs, this work proposes a deterministic sampling approach to improve the efficiency of the path planning. Specifically, we divide the map into grids and select the geometric centers of each grid as fixed sampling points. As the robot explores, the real-time SLAM-generated occupancy grid map is updated and utilized to detect obstacles between each grid and its eight neighbors. The relationship between two neighboring grids can be divided into three categories, as shown in Figure 3: 0 signifies no obstacle, 1 denotes an obstacle, and $- 1$ indicates an unknown status. As shown in Figure 4, we connect all mutually passable fixed sampling points with yellow lines, forming a passable topological path that represents the accessible area. Compared with the RRTs, our approach eliminates the time-consuming sampling of numerous unnecessary points, resulting in a more uniform and extensive topological path.

Figure 3.

The white area denotes the known region, whereas the gray area signifies the unknown region. The H grid and A grid have a relationship value of 0, indicating they are passable. Similarly, adjacent grid pairs such as G–F, F–I, I–E, and E–D also have a relationship value of 0. Conversely, grids A, B, C, and D are labeled with $- 1$ , indicating they are not passable due to unknown obstacles between them.

Figure 4.

The green and yellow colors represent the RRT and our topological path, respectively. Our method achieves higher environmental coverage with fewer sampling points, indicating higher SAR efficiency. RRT: rapidly-exploring random trees; SAR: search and rescue.

Based on the grid topological path, the robot employs a BFS algorithm to identify the set of nearby unsearched grids, referred to as set $A$ in POMDP. Specifically, the BFS algorithm starts from the grid where the robot is currently located and expands outward layer by layer, prioritizing grids that are closer in distance.

The algorithm examines the eight neighboring grids of the current grid. If a neighboring grid has not been searched and there are no obstacles between it and the current grid, it is added to set $A$ . The algorithm then continues to examine the neighbors of these newly added grids, repeating the process until all reachable unsearched grids are included in set $A$ . This process avoids repeated searches of grids that have already been explored, thereby improving search efficiency.

Finally, as shown in (5), the robot calculates the benefit of each element in set $A$ and selects the element with the maximum benefit as the next destination. This cost function quantifies the passability and coverage redundancy cost of topological path edges between grid centroids $g_{i}$ and $g_{j}$ . Only the edges with $C (g_{i}, g_{j}) < 1$ are constructed as valid topological paths, which screens out unpassable and redundant grid connections in advance. It ensures the topological path has full environmental coverage with minimal path retracing, and provides an optimal path basis for the subsequent BFS algorithm to generate the robot’s action set $A$ . This method not only enhances search efficiency and avoids path retracing due to random sampling, but also ensures that the robot can make autonomous decisions and navigate in unknown environments to complete the SAR mission as quickly as possible.

C (g_{i}, g_{j}) = λ \cdot | R (g_{i}, g_{j}) | + (1 - λ) \cdot U (g_{i}, g_{j}), λ \in [0.6, 0.8]

(5)

where

g_{i}

g_{j}

are fixed sampling points of grid centroids,

R (g_{i}, g_{j})

is the original grid obstacle relationship value (0 = no obstacle, 1 = obstacle,

-

1 = unknown),

U (g_{i}, g_{j})

is the original grid exploration status (0 = not explored, 1 = explored).

The multi-robot formation strategy

In multi-robot formation tasks, the traditional point-to-point communication method requires each robot to send information to all other robots in the formation simultaneously. This mode is not only inefficient in large-scale formations but also prone to communication congestion. Moreover, the method of generating topological paths based on a global map is susceptible to errors during the fusion of sub-maps, which in turn leads to path planning errors and severely affects the efficiency of SAR tasks.

To address these issues, we propose a distributed-node-based multi-robot formation method. As shown in Figure 5, multiple robots are deployed within the environment, each functioning as an independent agent that autonomously performs the SAR mission. The information fusion node is tasked with aggregating and storing the search data and global topological paths from all individual robots. Rather than directly merging occupancy maps, we merge the topological paths of the multiple robots. During task execution, each robot maintains a local grid topological path and publishes its topological path to the fusion node. Furthermore, once a robot finishes searching a grid, it shares the search results and observations of that grid with the fusion node. When the fusion node receives messages from a robot, it processes and merges them to update the search information and global topological paths. The fusion node then distributes these updates to all robots, ensuring each robot has access to the collective field of view. Specifically, we implement this flexible multi-robot formation strategy through topic communication of Robot Operating System, allowing for the easy addition or removal of robots.

Figure 5.

Method diagram of multi-robot formation.

We model the multi-robot SAR system as a nonlinear discrete system with input constraints and external disturbances to characterize its dynamic and coupling characteristics:

x_{k + 1} = f (x_{k}, u_{k}) + w_{k}, u_{min} \leq u_{k} \leq u_{max}

(6)

where

x_{k}

denotes the robot’s grid position and heading (matching the topological path grid state),

f (x_{k}, u_{k})

is the nonlinear state transition function for differential drive robots,

u_{k}

is the constrained control input (linear/angular velocity), and

w_{k}

is the external disturbance (e.g., ground jitter in SAR environments). The global optimization problem for the SAR system is defined as:

min J = \sum C (a_{k}), max R_{total} = \sum R (s_{k}, a_{k})

(7)

⋃_{g \in V} S (g) = S_{total}

(8)

where

C (a_{k})

and

R (s_{k}, a_{k})

are the movement cost and reward function in the original POMDP model,

⋃_{g \in V} S (g) = S_{total}

is the full coverage constraint of the topological path. The optimization goal is to minimize the total movement cost and maximize the rescue reward under the constraints of system dynamics, input limits, and full environmental coverage. For unknown sudden uncertainties in real-time SAR applications, the proposed framework adapts via real-time update of topological path, distributed information redundancy, and dynamic adjustment of POMDP belief state

b (s)

Uncertainties in the multi-robot SAR system are divided into internal (parametric/non-parametric) and external (modeled/unmodeled) types, with distinct impacts on SAR missions, and all are addressed by the original framework: (1) Internal parametric uncertainty (from hardware parameter deviation causing minor state tracking errors) is compensated by the POMDP belief state $b (s)$ update via real-time observation $o \in Ω$ to correct state estimation errors. (2) Internal non-parametric uncertainty (from unmodeled dynamics like robot slip causing path errors) is suppressed by the grid-based topological path planning and BFS algorithm, where deterministic grid sampling and passable connection $(0 / 1 / - 1)$ limit robot motion in feasible ranges, reducing path planning errors. (3) External modeled uncertainty (from predictable disturbances such as slight airflow) is filtered by the real-time SLAM occupancy grid map via dynamic environmental update to eliminate disturbance interference. (4) External unmodeled uncertainty (from unpredictable environmental changes/hardware faults, the main factor reducing SAR efficiency) is overcome by the original distributed node fusion communication and global topological path update mechanism. In real-time applications, the magnitude and structure of unmodeled uncertainties are fully unknown, and the fusion node can real-timely update the global topological path with local robot information and dynamic SLAM maps, enabling the system to adapt to unknown environmental changes autonomously; meanwhile, each robot’s independent local topological path maintenance ensures single robot faults do not affect the overall mission, and the fusion node can allocate the faulty robot’s task to others based on the existing global topological path directly.

Finally, this distributed-node approach enhances the communication efficiency of multi-robot formations and improves the accuracy of the global topological path by sharing topological paths instead of maps, thereby avoiding the global topological path inaccuracies caused by map drift errors during the fusion process, and consequently increasing the efficiency of SAR tasks.

Experiments & results

Robot exploration in simulations

We use a laptop with Ubuntu 20.04 as the simulation platform, equipped with a 13th Gen Intel Core i7-13700K CPU and an NVIDIA GeForce RTX 2060 Super GPU. We use gazebo to set up the simulation scenarios to verify the efficiency of our SAR approach, as shown in Figure 6. The size of each scene is 20 m $\times$ 20 m, and different numbers of obstacles are used to increase the complexity of the environment. In particular, the prior map for each scene is a 10 $\times$ 10 grid map with the same division of SAR priorities.

Figure 6.

The environment is divided into red, orange, yellow, and white areas according to the intensity of people from high to low.

The priority categories are assigned in grids, with the number of trapped people determined based on these categories. Specifically, there are four priority levels, with their respective distributions, so the set of states is defined as $S = {s_{1}, s_{2}, s_{3}, s_{4}}$ and the set of observation is ${O = o_{1}, o_{2}}$ , where $o_{1}$ indicates a high level of human activity in that grid and $o_{2}$ indicates the opposite. We assign the following probabilities for observing $o_{1}$ across the four priorities: 0.6, 0.55, 0.5, and 0.35. Correspondingly, the probabilities for observing $o_{2}$ are set to 0.4, 0.45, 0.5, and 0.65, respectively. When the robot moves to the center of a new grid, it will acquire an observation $o \in {o_{1}, o_{2}}$ based on this function, which reflects the inferred level of human activity within that grid.

We conduct simulation experiments by using a two-wheeled differential robot equipped with a lidar. In each scene, we use one robot for independent SAR and three robots for collaborative SAR. In addition, we compare our method with San Juan et al.²⁴ and Umari and Mukhopadhyay²⁷ to demonstrate the advantages. Following San Juan et al.²⁴ and Umari and Mukhopadhyay,²⁷ we directly assign values to each grid to represent priority, and the robot then selects the next grid based on both priority and distance.

We carry out single-robot experiments and multi-robot experiments in the simulated scenarios, respectively. The evolution of the passable topological paths over time during a single robot experiment is illustrated in Figure 6. At every moment, the topological path dynamically fills the entire known space, connecting all the grids in the known space and establishing the connectivity between them. It can be seen that the global topological path becomes gradually complete over time. Our method can achieve efficient coverage of the SAR environment by using only a few sampling points.

The evolution of the passable topological paths during the multi-robot experiment is depicted in Figure 7. Specifically, the solid lines of different colors represent the local topological paths of different robots, and three robots are positioned in the upper right, lower left, and lower right corners, respectively. Each robot independently makes decisions and explores while sharing information to merge passable topological paths and search results. As a result, the passable topological paths expand simultaneously from three directions. Compared with the single-robot framework, the proposed multi-robot formation strategy can complete the SAR of the entire environment in a shorter time, which significantly improves the SAR efficiency. To eliminate any potential contingencies, we conduct 10 experimental and comparative tests in each scene for both single-robot simulation and multi-robot simulation. The changing curves of explored grids and searched victims during the single robot search process are shown in Figure 8, and the quantitative metric results are summarized in Table 1. The red curve illustrates the performance of our method, while the orange curve and purple curve represent the result of San Juan et al.²⁴ and Umari and Mukhopadhyay,²⁷ respectively. We can find that the rate of change is initially rapid but gradually diminishes over time. Although Umari and Mukhopadhyay²⁷ have a relatively high exploration efficiency at the beginning, the subsequent growth trend is very slow. Our method consistently explores more grids and detects more trapped persons than the comparison test at most times.

Figure 7.

The passable topological paths of three robots in three scenes.

Figure 8.

The explored grids and searched persons during the experiment for a single robot.

Table 1.

The result of the time consumed by single-robot experiments.

Scene	Occupied grids	Completion time (s)			90% grids time (s)			90% persons time (s)
		Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷
Scene 1 (S1)	70	873 $\pm$ 26.3	924 $\pm$ 31.5	Fail	771 $\pm$ 21.7	858 $\pm$ 28.9	958 $\pm$ 35.2	731 $\pm$ 19.8	851 $\pm$ 27.6	Fail
Scene 2 (S2)	102	996 $\pm$ 30.5	1290 $\pm$ 42.8	Fail	844 $\pm$ 25.6	1084 $\pm$ 36.7	1075 $\pm$ 34.9	838 $\pm$ 24.5	1068 $\pm$ 33.8	1278 $\pm$ 41.6
Scene 3 (S3)	138	1407 $\pm$ 38.9	1542 $\pm$ 45.2	Fail	1140 $\pm$ 32.8	1378 $\pm$ 40.5	Fail	990 $\pm$ 29.7	1366 $\pm$ 39.4	1552 $\pm$ 46.8

Note. All results are the mean $\pm$ standard deviation of 10 repeated experiments.

The SAR mission is finished when all the grids have been explored. The robot moves at a rapid speed in the initial stages and progressively decelerates as it advances toward task completion. Finally, we take the average of the results from 10 experiments. Experimental results reveal that our methods needs 873, 996, and 1407 s to complete the SAR mission in scenes S1, S2, and S3, respectively. By comparison, the method proposed by San Juan et al.²⁴ requires corresponding 924, 1290, and 1542 s, respectively. And the method proposed by Umari and Mukhopadhyay²⁷ is prone to fall into local extreme values in the later stage, resulting in the problem of SAR failure. Due to the deceleration in the final stage of the task, we use 90% completion as the benchmark for comparison. The average time taken by our method to complete 90% of the grids is 771, 844, and 1140 s for these three scenes, respectively. This indicates that the time required by our method is reduced by up to 22.1% and 21.5%, respectively, compared with San Juan et al.²⁴ and Umari and Mukhopadhyay.²⁷ The small standard deviation values of the proposed method in all time indicators ( $\leq$ 38.9 s) also reflect the high stability of the algorithm in single-robot SAR tasks, and the standard deviation is consistently lower than that of the comparison methods proposed by San Juan et al.²⁴ and Umari and Mukhopadhyay,²⁷ which means our method has smaller performance fluctuations in different experimental repetitions and stronger robustness to environmental uncertainty.

Moreover, the average time required to search for 90% of the victims using our approach is 731, 838, and 990 s, respectively. This indicates that the time required by our method is reduced by up to 27.5% in S3 and 36.2% in S3, respectively, compared with San Juan et al.²⁴ and Umari and Mukhopadhyay.²⁷ The results highlight that our method becomes increasingly advantageous as the obstacle density rises, resulting in a more pronounced performance improvement. The changing curves of explored grids and searched victims during the multi-robot search process are shown in Figure 9. Table 2 shows the average number of explored grids obtained from multiple experiments, and Table 3 shows the average time. For the number of explored grids in multi-robot experiments (Table 3), the proposed method has a small standard deviation for the single robot exploration amount and the total exploration amount, which indicates that the distributed node communication strategy achieves balanced task allocation among multiple robots, and avoids the large fluctuation of exploration efficiency caused by uneven task distribution in San Juan et al.²⁴ and Umari and Mukhopadhyay.²⁷

Figure 9.

The explored grids and searched persons during the multi-robot search process. r1, r2, and r3 represent three robots, respectively.

Table 2.

The number of explored grids in the multi-robot experiment.

Scene	Occupied grids	Robot 1			Robot 2			Robot 3			Total
		Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷
Scene 1 (S1)	70	39.4 $\pm$ 4.2	42.4 $\pm$ 5.6	36.1 $\pm$ 6.8	41.4 $\pm$ 4.5	48.9 $\pm$ 6.2	38.5 $\pm$ 5.9	41.8 $\pm$ 4.7	42.7 $\pm$ 5.8	36.1 $\pm$ 6.5	122.6 $\pm$ 8.3	134 $\pm$ 10.5	95.8 $\pm$ 12.1
Scene 2 (S2)	102	40.1 $\pm$ 4.6	47.2 $\pm$ 6.1	34.9 $\pm$ 7.2	42.5 $\pm$ 4.8	43.1 $\pm$ 5.7	35.7 $\pm$ 6.3	41.4 $\pm$ 4.9	47.6 $\pm$ 6.5	36.2 $\pm$ 6.9	124 $\pm$ 8.7	137.9 $\pm$ 11.2	84.7 $\pm$ 13.5
Scene 3 (S3)	138	44 $\pm$ 5.1	42.9 $\pm$ 5.8	33.8 $\pm$ 7.5	43.1 $\pm$ 5.2	41 $\pm$ 5.4	37.3 $\pm$ 6.6	42.6 $\pm$ 5.3	39.5 $\pm$ 6.1	37.1 $\pm$ 7.1	129.7 $\pm$ 9.2	123.4 $\pm$ 10.8	90.1 $\pm$ 12.8

Note. All results are the mean $\pm$ standard deviation of 10 repeated experiments.

Table 3.

The result of the time consumed by multi-robot experiments.

Scene	Occupied grids	Completion time (s)			90% grids time (s)			90% persons time (s)
		Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷	Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷
Scene 1 (S1)	70	779 $\pm$ 22.5	851 $\pm$ 28.7	788 $\pm$ 30.2	541 $\pm$ 18.6	598 $\pm$ 23.5	528 $\pm$ 25.8	488 $\pm$ 16.9	583 $\pm$ 24.7	577 $\pm$ 26.3
Scene 2 (S2)	102	818 $\pm$ 25.7	1001 $\pm$ 35.6	905 $\pm$ 32.9	597 $\pm$ 21.8	668 $\pm$ 27.4	967 $\pm$ 38.5	581 $\pm$ 19.7	647 $\pm$ 25.9	785 $\pm$ 31.6
Scene 3 (S3)	138	979 $\pm$ 31.8	1180 $\pm$ 40.3	1011 $\pm$ 36.7	651 $\pm$ 24.9	792 $\pm$ 32.6	1008 $\pm$ 39.8	527 $\pm$ 18.5	754 $\pm$ 30.5	779 $\pm$ 32.4

Note. All results are the mean $\pm$ standard deviation of 10 repeated experiments.

It can be seen that Umari and Mukhopadhyay²⁷ explored more grids in the early stage to find more people waiting for rescue. However, due to the randomness of the growth of its tree structure, it is prone to fall into a local extremum in the unstructured affected area. In the later stage of the experiment, the changing trend of its curve is not obvious, and the efficiency is relatively low. In terms of the three indicators of completion time, the time to explore 90% of the grids, and the time to rescue 90% of the personnel, our method reduces the time by up to 9.6%, 38.3%, and 32.3%, respectively, compared with Umari and Mukhopadhyay,²⁷ and reduces the time by up to 18.3%, 17.8%, and 30.1%, respectively, compared with San Juan et al.²⁴ This indicates that our multi-robot experimental method has a more obvious improvement in the execution efficiency of SAR tasks. The multi-robot time consumption results (Table 3) further verify the stability of the proposed method: the standard deviation of all time indicators is <31.8 s, which is lower than that of San Juan et al.²⁴ ( $\leq$ 40.3 s) and Umari and Mukhopadhyay²⁷ ( $\leq$ 39.8 s). The low standard deviation means that the global topological path fusion based on node communication can effectively reduce the randomness of multi-robot collaborative exploration, making the SAR task completion time more stable in repeated experiments.

Real-world experiments

To further validate the efficiency of our SAR method, we also carried out real-world experimental studies. Experimental environments consists of a $4 m \times 4 m$ scene R1 which is divided into a $5 \times 5$ grid map and a $6 m \times 3 m$ scene R2, which is divided into a 6 $\times$ 3 grid map, as depicted in Figure 10(a) and (c), respectively. The prior grid map is designed with two priority levels, where a higher number of trapped individuals are strategically placed in the higher-priority grids to simulate a more urgent search scenario (red points mean trapped individuals and orange line means the area with high priority). In the real-world experiments, we utilize the TurtleBot3-Burger robot, as shown in Figure 10(a). Additionally, we conduct comparative experiments with San Juan et al.²⁴ and Umari and Mukhopadhyay²⁷ to highlight the performance differences.

Figure 10.

Platform, pre-setting, and result of real-world experiments. (a) Real scene R1. (b) Curve of R1. (c) Real scene R2. (d) Curve of R2.

Figure 10(b) and (d), respectively, shows the change curves of the rescued personnel in these two experiments, and Table 4 shows the time required for the rescue of all personnel under different methods. It can be seen that our method always tends to identify the trapped people at the first moment, and the efficiency of SAR is relatively high. In R1, compared with San Juan et al.²⁴ and Umari and Mukhopadhyay,²⁷ the SAR required time of our method was reduced by 9.4% and 31.6%, respectively, and the required time in R2 was reduced by 30.1% and 38.4%, respectively. The small standard deviation values of the proposed method ( $\leq$ 12.4 s in R1, $\leq$ 5.8 s in R2) demonstrate its high stability in real-world SAR scenarios, and the standard deviation is consistently lower than that of comparison methods proposed by San Juan et al.²⁴ and Umari and Mukhopadhyay.²⁷ This indicates that the integrated framework of topological path planning and POMDP decision-making can effectively reduce the impact of real environmental noise and sensor errors on exploration efficiency, leading to more consistent and reliable rescue performance in practical applications. Figure 11 shows the trajectory routes of the robot in two real experiments. It can be seen that the robot will give priority to exploring areas with a higher density of people. This indicates that our method is highly efficient for SAR missions, capable of rescuing trapped individuals in a timely manner and minimizing SAR time to the greatest extent.

Figure 11.

The upper column is trajectory of R1, and the lower column is trajectory of R2 (positive direction is from left to right).

Table 4.

The result of the time consumed by multi-robot experiments.

Scene	Occupied grids	100% persons time (s)
		Ours	San Juan et al.²⁴	Umari and Mukhopadhyay²⁷
Scene 1 (R1)	25	365 $\pm$ 12.4	403 $\pm$ 15.7	534 $\pm$ 21.6
Scene 2 (R2)	18	130 $\pm$ 5.8	186 $\pm$ 9.2	211 $\pm$ 10.5

Note. All results are the mean $\pm$ standard deviation of 10 repeated experiments.

Table 5.

Key symbols and definitions in the proposed framework.

Module	Symbol	Definition
POMDP model	$S$	Finite state set, representing search and rescue (SAR) priority areas of the environment
	$A$	Finite action set, reachable grids for robot via topological path
	$Ω$	Finite observation set ( $o_{1}$ : high human activity; $o_{2}$ : low human activity)
	$T (s, a, s^{'})$	State transition probability from $s$ to $s^{'}$ by executing action $a$
	$R (s, a)$	Immediate reward function for action $a$ in state $s$
	$O (o, s^{'}, a)$	Observation probability of $o$ after transitioning to $s^{'}$ via $a$
	$γ$	Discount factor for future rewards ( $0 \leq γ \leq 1$ )
	$b (s)$	Belief state, confidence probability of robot in state $s$
	$b^{'} (s^{'})$	Updated belief state after observation and action execution
	$V_{n} * (b)$	Finite-horizon optimal value function over $n$ time steps
	$ρ (b, a)$	Expected immediate reward in belief state $b$ via action $a$
	$π *$	Optimal decision policy of POMDP
Cost & Reward	$C (a)$	Movement cost of robot executing action $a$
	$d (a)$	Grid movement distance of action $a$ (adjacent = 1, diagonal = $\sqrt{2}$ )
	$O (a)$	Obstacle cost factor of action $a$ (0 = passable,1 = obstacle)
	$α, β$	Weight coefficients for movement cost ( $α + β = 1$ )
	$P (s)$	Rescue priority of state $s$
	$P (o_{1} \| s)$	Probability of observing $o_{1}$ in state $s$
	$δ$	Weight coefficient for rescue priority ( $δ = 1$ )
Topological path planning	$g_{i}, g_{j}$	Fixed sampling points of grid centroids ( $i, j$ : grid serial numbers)
	$C (g_{i}, g_{j})$	Cost function of topological edge between $g_{i}$ and $g_{j}$
	$R (g_{i}, g_{j})$	Grid obstacle relationship (0 = passable,1 = obstacle, $- 1$ = unknown)
	$U (g_{i}, g_{j})$	Grid exploration state (0 = unexplored, 1 = explored)
	$λ$	Weight coefficient for topological edge cost ( $0.6 \leq λ \leq 0.8$ )
	$V$	Vertex set of topological path (all grid centroids)
	$E$	Edge set of topological path (passable $g_{i}$ - $g_{j}$ connections)
	$S (g), S_{total}$	Coverage area of single grid/global SAR environment
System modeling	$x_{k}$	Robot state at time $k$ (grid position + heading)
	$u_{k}$	Robot constrained control input (linear/angular velocity)
	$u_{min}, u_{max}$	Upper/lower limits of control input
	$f (x_{k}, u_{k})$	Nonlinear state transition function for differential drive robot
	$w_{k}$	External disturbance in SAR environment (ground jitter, airflow, etc.)

Conclusion

In this article, we have proposed a robot exploration strategy for unknown, complex, obstacle-filled SAR environments. Unlike traditional map-building exploration, our strategy aims for a comprehensive environmental search, requiring the robot to traverse the entire environment. This article has featured a POMDP model incorporating environmental semantics and human experience to balance movement costs and rescue priorities. It also integrated autonomous exploration with CPP, using grid centroids as fixed sampling points to construct a topological path for complete environmental coverage. Additionally, a distributed multi-robot formation strategy based on node fusion communication allowed information sharing without direct inter-robot communication, minimizing redundant efforts. Simulation and real-world experiments have demonstrated that our methods significantly improved exploration efficiency and reduced path repetition compared to state-of-the-art methods. Overall, our method has provided a robust solution for decision-making in time-critical SAR scenarios, improving the success rates and efficiency of SAR missions. In future work, we will add the visual recognition module and further refine the resolution of the prior grid map to improve the generalization ability and flexibility of our method.

Footnotes

ORCID iD

Shaolong Chang

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by State Grid JiangXi Information & Telecommunication Branch Project Funding (Contract No.: 52183524000P).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Chen

Chi

et al. A survey of autonomous robots and multi-robot navigation: Perception, planning and collaboration. Biomim Intell Robot 2025; 5: 100203.

Zhang

et al. Anti-degeneracy scheme for lidar SLAM based on particle filter in geometry feature-less environments. IEEE Robot Autom Lett 2025; 10: 6784–6791.

Zhang

et al. An adaptive compensation strategy for sensors based on the degree of degradation. Biomim Intell Robot 2025; 5: 100235.

Xiao

et al. DOA: A degeneracy optimization agent with adaptive pose compensation capability based on deep reinforcement learning. arXiv preprint arXiv:250719742, 2025.

Yuan

Liu

Chi

et al. A Gaussian mixture model based fast motion planning method through online environmental feature learning. IEEE Trans Ind Electron 2023; 70: 3955–3965.

Liu

Nejat

Doroodgar

. Learning based semi-autonomous control for robots in urban search and rescue. In: 2012 IEEE international symposium on safety, security, and rescue robotics, College Station, TX, USA, pp.1–6. IEEE. 刘毅、Nejat G 和 Doroodgar B. 基于学习的城市搜救机器人半自主控制。2012 年，IEEE 国际研讨会关于安全、保安与救援机器人。美国德克萨斯州学院站。1–6. IEEE。

Doroodgar

Liu

Nejat

. A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims. IEEE Trans Cybern 2017; 44: 2719–2732.

Galceran

Carreras

. A survey on coverage path planning for robotics. Rob Auton Syst 2013; 61: 1258–1276.

Shih

Tsai

Lin

. A speed up approach for search and rescue. In: 2018 IEEE international conference on systems, man, and cybernetics, Miyazaki, Japan, pp.4178–4183. IEEE. 施格、蔡 PH 和林 CL。加快搜救速度。2018 年，IEEE 国际系统、人类与控制论会议。宫崎日本。第 4178-4183 页。IEEE。

10.

Waharte

Trigoni

. Supporting search and rescue operations with UAVs. In: 2010 international conference on emerging security technologies, Canterbury, UK, pp.142–147. IEEE. Waharte S 和 Trigoni N。支援无人机搜救行动。2010 年，国际新兴安全技术会议。坎特伯雷，英国，第 142–147 页。IEEE。

11.

Song

Qiu,

, et al. Multi-UAV disaster environment coverage planning with limited-endurance. In: 2022 international conference on robotics and automation, Philadelphia, PA, USA, pp.10760–10766. IEEE. 宋浩、余志、邱志等。多无人机灾害环境覆盖规划，具备有限续航时间。2022 年，国际机器人与自动化会议。美国宾夕法尼亚州费城，第 10760–10766 页。IEEE。

12.

Uslu

Çakmak

Balcılar

, et al. Implementation of frontier-based exploration algorithm for an autonomous robot. In: 2015 international symposium on innovations in intelligent systems and applications, Madrid, Spain, pp.1–7. IEEE. Uslu E， Çakmak F， Balcılar M 等。实现基于前沿的自主机器人探索算法。2015 年，关于智能系统与应用创新的国际研讨会。西班牙马德里，第 1–7 页。IEEE。

13.

Brugali

Yujra

JRL

. Simultaneous frontier-based exploration and topological mapping. In: 2023 seventh IEEE international conference on robotic computing (IRC), Laguna Hills, CA, USA, pp.219–222. IEEE. 布鲁加利 D 和尤杰拉 JRL。同时进行基于前沿的探索与拓扑映射。2023 年，第七届 IEEE 国际机器人计算会议（IRC）举行。美国加利福尼亚州拉古纳山，第 219–222 页。IEEE。

14.

Fang

Ding

Wang

. Autonomous robotic exploration based on frontier point optimization and multistep path planning. IEEE Access 2019; 7: 46104–46113.

15.

Wang

Chi

et al. Neural RRT*: Learning-based optimal path planning. IEEE Trans Autom Sci Eng 2020; 17: 1748–1758.

16.

Wang

Liu

Chen

et al. Robot path planning via neural-network-driven prediction. IEEE Trans Artif Intell 2022; 3: 451–460.

17.

Wang

Shen

Nan

et al. An end-to-end path planner combining potential field method with deep reinforcement learning. IEEE Sens J 2024; 24: 26584–26591.

18.

Almasri

Alajlan

Elleithy

. Trajectory planning and collision avoidance algorithm for mobile robotics system. IEEE Sens J 2016; 16: 5021–5028.

19.

Zhang

et al. High-traversability and precise navigation for mobile robots in constrained environments. IEEE Sens J 2025; 25: 22815–22826.

20.

Gajjar

Bhadani

Dutta

et al. Complete coverage path planning algorithm for known 2D environment. In 2017 2nd IEEE international conference on recent trends in electronics, information & communication technology. pp. 963–967. IEEE.

21.

Yang

Xing

, et al. Research on artificial bee colony method based complete coverage path planning algorithm for search and rescue robot. In: 2022 5th international symposium on autonomous systems, Hangzhou, China, pp.1–6. IEEE. 杨 L、邢 B、李 C 等。基于人工蜂群的方法的搜救机器人全覆盖路径规划算法研究。2022 年，第五届国际自治系统研讨会。中国杭州。第 1–6 页。IEEE。

22.

Zhu

Tian

Jiang

, et al. Multi-AUVs cooperative complete coverage path planning based on GBNN algorithm. In: 2017 29th Chinese control and decision conference, Chongqing, China, pp.6761–6766. IEEE. 朱 D、田 C、江 X 等。多 AUV 协同完成基于 GBNN 算法的完整覆盖路径规划。2017 年，中国第 29 届控制与决策会议。中国重庆。第 6761–6766 页。IEEE。

23.

Wang

Cheng

Wang

et al. Efficient object search with belief road map using mobile robot. IEEE Robot Autom Lett 2018; 3: 3081–3088 .

24.

San Juan

Santos

Andújar

. Intelligent UAV map generation and discrete path planning for search and rescue operations. Complexity 2018; 2018: 6879419.

25.

Niroui

Sprenger

Nejat

. Robot exploration in unknown cluttered environments when dealing with uncertainty. In: 2017 IEEE international symposium on robotics and intelligent sensors, Ottawa, ON, Canada, pp.224–229. IEEE. Niroui F、Sprenger B 和 Nejat G。在未知杂乱环境中探索机器人，面对不确定性。2017 年，IEEE 国际机器人与智能传感器研讨会。加拿大安大略省渥太华。第 224–229 页。IEEE。

26.

Aydemir

Pronobis

Göbelbecker

et al. Active visual object search in unknown environments using uncertain semantics. IEEE Trans Robot 2013; 29: 986–1002 .

27.

Umari

Mukhopadhyay

. Autonomous robotic exploration based on multiple rapidly-exploring randomized trees. In: 2017 IEEE/RSJ international conference on intelligent robots and systems, Vancouver, BC, Canada, pp.1396–1402. IEEE. Umari H 和 Mukhopadhyay S. 基于多棵快速探索随机树的自主机器人探索。2017 年，IEEE/RSJ 国际智能机器人与系统会议。加拿大温哥华，加拿大，第 1396–1402 页。IEEE。

28.

Lindqvist

Agha-Mohammadi

Nikolakopoulos

. Exploration-RRT: A multi-objective path planning and exploration framework for unknown and unstructured environments. arXiv preprint arXiv:210403724 (2021).

29.

Chi

Ding

Wang

et al. A generalized Voronoi diagram-based efficient heuristic path planning method for RRTs in mobile robots. IEEE Trans Ind Electron 2022; 69: 4926–4937.

30.

Liu

Yuan

et al. An efficient robot exploration method based on heuristics biased sampling. IEEE Trans Ind Electron 2023; 70: 7102–7112.

31.

Murtaza

Kanhere

Jha

. Priority-based coverage path planning for aerial wireless sensor networks. In: 2013 IEEE eighth international conference on intelligent sensors, sensor networks and information processing, Melbourne, VIS, Australia, pp.219–224. IEEE.Murtaza G、Kanhere S 和 Jha S. 基于优先级的空中无线传感器网络覆盖路径规划。2013 年，IEEE 第八届国际智能传感器、传感器网络与信息处理会议。墨尔本 VIS 澳大利亚。第 219–224 页。IEEE。

32.

Tutsoy

Asadi

Ahmadi

et al. Minimum distance and minimum time optimal path planning with bioinspired machine learning algorithms for faulty unmanned air vehicles. IEEE Trans Intell Transp Syst 2024; 25: 9069–9077.

33.

Tutsoy

Barkana

Balikci

. A novel exploration–exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans Cybern 2021; 53: 329–337.

An information-driven POMDP framework for multi-robot formation in autonomous search and rescue

Abstract

Keywords

Introduction

Related work

Methods

POMDP formulation

Grid-based topological path planning for SAR

The multi-robot formation strategy

Experiments & results

Robot exploration in simulations

Real-world experiments

Conclusion

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

References