Abstract
Efficient exploration in unknown environments is a critical challenge for search and rescue (SAR) robots. Existing methods neglect the balance of movement cost and rescue priority, rely on prior maps, and suffer from redundant paths and high communication overhead. This paper proposes an information-driven POMDP-based multi-robot formation framework for map-free autonomous SAR. We design a POMDP model with environmental semantics and human experience for dynamic cost-priority balance, a grid centroid-based topological path method for full coverage, and a distributed node fusion communication strategy for low-overhead collaboration. Simulation and real-world experiments show that our method reduces SAR completion time by up to 22.1%, cuts 90% of victim rescue time by 36.2%, and lowers path repetition by over 30% compared with state-of-the-art methods. It significantly improves the exploration and collaborative efficiency of multi-robot SAR systems, providing a robust solution for time-critical SAR missions.
Introduction
Autonomous exploration is a fundamental research direction in mobile robotics, which enables robots to navigate efficiently in unknown and complex environments by integrating simultaneous localization and mapping (SLAM)1–4 and path planning algorithms. This technology is of great practical value for search and rescue (SAR) missions, where robots can replace human rescuers to execute tasks in hazardous scenarios such as toxic gas leaks, building fires, and earthquake-induced collapses. It not only reduces the operational risks of rescue workers but also effectively improves the survival probability of trapped persons in disaster areas.
5
However, in debris-filled unstructured environments, traditional SAR robots are prone to falling into local optima and cannot make optimal autonomous decision-making, which often requires manual intervention. This results in low SAR operational efficiency and severely hinders the realization of fully autonomous SAR systems.6,7 SAR missions are typically modeled as coverage path planning (CPP) problems,
8
whose core goal is to design an efficient exploration strategy for robots to traverse all key areas in the target environment and complete the rescue task within a limited time. Although existing research has made considerable progress in CPP, exploration strategies, learning-based decision-making, and multi-robot collaboration, there are still prominent limitations that cannot meet the requirements of real-time and efficient autonomous SAR. Most traditional CPP methods only focus on the priority of the current grid in path planning and fail to establish a quantitative mechanism to balance short-term movement costs and long-term rescue benefits,9–11 leading to frequent path retracing and low rescue efficiency. Frontier-based exploration methods generate random sampling points to select the next exploration target12–14; although increasing sampling points can expand the exploration range, most points are irrelevant to SAR mission objectives, which consumes substantial computational resources and hinders real-time decision-making of robots. Learning-based methods exhibit strong performance in handling complex nonlinear relationships in SAR environments,15–17 but they rely on massive labeled training data and high computational costs, and their generalization ability across diverse unknown disaster scenarios cannot be guaranteed. In addition, traditional path planning algorithms struggle to balance the dual goals of full environmental coverage and minimal redundant motion,18,19 and the traditional point-to-point communication mode of multi-robot systems leads to high communication overhead and easy congestion, which further reduces the collaborative efficiency of SAR missions (Figure 1). In summary, three core research gaps exist in the current autonomous SAR robotic systems: (1) the functional significance of different environmental areas is neglected in the decision-making process, and there is no effective way to dynamically balance movement costs and rescue priorities in unknown environments; (2) most exploration and decision-making methods rely on random sampling or prior environmental maps, which leads to high computational overhead or poor adaptability to completely unknown SAR scenarios; (3) multi-robot collaboration lacks an efficient information-sharing strategy, resulting in redundant exploration and high communication overhead. To fill these gaps, this article proposes a novel multi-source information-driven framework for multi-robot formation SAR, which enables robots to achieve autonomous navigation and optimal decision-making without prior environmental maps. Inspired by the human SAR strategy of using zoning maps and human activity distribution to plan search routes, we integrate environmental semantic information and human rescue experience into the robot decision-making model, and design a dedicated topological path planning method and a distributed multi-robot collaboration strategy to improve the overall SAR efficiency. Specifically, the proposed framework consists of three key components: first, a POMDP decision-making model incorporating environmental semantic information and human experience is established, which uses a reward mechanism based on human activity density to guide robots to high-priority rescue areas and adjusts robot actions dynamically according to the matching relationship between environmental objects and functional areas. Second, a grid centroid-based topological path generation method is developed, which constructs a crisscross passable topological path by taking grid cell centers as fixed sampling points and discretizing the action space into eight directions, and determines the next target point through the breadth-first search (BFS) algorithm to ensure full environmental coverage and minimal path retracing. Third, a distributed multi-robot formation strategy based on node fusion communication is designed, where each robot maintains its own local topological path and uploads exploration information to a public fusion node; the fusion node integrates local data to form a global topological path and redistributes it to all robots, avoiding redundant searching and the inefficiency of traditional point-to-point communication. The main contributions of this work are summarized as follows:
An information-driven POMDP decision model is proposed, which fuses SAR priorities and environmental semantic information to realize the dynamic balance of movement costs and rescue priorities, and improves the autonomous decision-making efficiency of robots in unknown SAR environments. A grid-based topological path planning algorithm for SAR tasks is developed, which adopts deterministic sampling of grid centroids to construct a full-coverage crisscross topological path, and combines the BFS algorithm to determine the next exploration target, effectively reducing path repetition and computational overhead. A distributed multi-robot formation method based on node fusion communication is designed, which realizes the sharing of local topology and exploration information among robots without direct inter-robot communication, minimizing redundant exploration and communication overhead, and improving the collaborative efficiency of multi-robot SAR systems.
The rest of this article is organized as follows: Section “Related Work” reviews the related work in SAR robotic systems and analyzes the limitations of existing methods in detail. Section “Methods” elaborates on the specific design and implementation of the proposed information-driven POMDP model, topological path planning algorithm, and multi-robot formation strategy. Section “Experiments & results” presents the simulation and real-world experiments to verify the effectiveness and superiority of the proposed method. Finally, Section “Conclusion” draws the conclusions and outlines the future research directions.

The search and rescue (SAR) task environment is divided into a grid map with priorities. Each robot makes optimal decisions based on SAR priority and a semantic information-driven POMDP model, and maintains the global topological path through distributed node communication.
Related work
In SAR scenarios, efficient and comprehensive exploration of unknown environments is critical for robotic systems. This task requires robots to possess efficient autonomous decision-making capabilities while navigating in complex terrains and prioritizing areas based on urgency and potential victim presence. Previous works have explored various approaches to address these challenges, but limitations remain in terms of scalability, real-time efficiency, and integration of multi-source information. Early efforts in SAR robotics focused on CPP, which aims to efficiently traverse all specified locations within an environment. Gajjar et al. 20 proposed a cost function to guide robots toward the minimum cost grid, enabling complete traversal. Wang et al. 21 introduced a comprehensive CPP strategy based on the artificial bee colony algorithm, optimizing traditional neural network methods. Zhu et al. 22 applied the Glasius Bio-inspired Neural Network (GBNN) algorithm for multi-autonomous underwater vehicle cooperative CPP. Chen et al. 23 proposed an improved Bio-inspired Neural Network model to reduce the interference of complex obstacle environments on robot decision-making, and introduced a Dual Improved Bio-inspired Neural Network to replace the GBNN. However, these methods often fail to account for the varying importance of different waypoints, treating all areas with equal priority. In SAR scenarios, this limitation can lead to inefficient allocation of rescue efforts and reduced effectiveness of the mission. To address the limitations of traditional CPP methods, recent works have incorporated human experience and environmental semantics into SAR missions. Shih et al. 9 constructed a grid probability map, assigning values to grids based on the likelihood of finding trapped individuals, and used a greedy search algorithm to guide robot navigation. Song et al. 11 and Juan et al. 24 abstracted real-life SAR scenarios into grid maps with priority levels based on environmental characteristics. These methods have limited dynamic decision-making capabilities and ignore long-term benefits and the value of exploring non-urgent areas, resulting in poor SAR effectiveness. We feed the experience-based SAR priority information to SAR robots as a grid area function map, which the robots adopt as long-term rewards in the dynamic decision-making process to enhance SAR efficiency. Meanwhile, the semantic perception module enables the robot to locally fine-tune SAR decisions by matching the correlation between environmental objects and functional areas.
POMDPs have been widely applied to robotic navigation to enable intelligent decision-making in dynamic and partially observable environments. Wu et al. 25 proposed a POMDP framework for robot decision-making, which selects actions to maximize long-term rewards. Aydemir et al. 26 and Niroui et al. 25 incorporated semantic information into POMDP for object search and exploration in unknown environments. However, these methods often rely heavily on prior environmental maps, limiting their applicability in real-world SAR scenarios.
Exploration strategies can be used to address these problems, achieving robot navigation in unknown environments without prior maps. Umari and Mukhopadhyay 27 presented an exploration strategy based on multiple rapidly-exploring random trees (RRT), guiding robots toward unexplored areas by detecting frontiers. Lindqvist et al. 28 proposed an Exploration-RRT algorithm for 3D missions, considering potential information gain and travel costs. Chi et al. 29 introduced a generalized Voronoi diagram-based heuristic path planning method to improve RRT efficiency. Liu et al. 30 proposed a heuristics-biased sampling-based robot exploration strategy; they utilized the semantic information of the environment as the heuristics and eliminated the disadvantages of ignoring the geometric continuity of the environment in traditional RRT. However, these random sampling methods often generate numerous irrelevant nodes during the exploration process, which leads to significant computational overhead and inefficient path planning. This inefficiency is further exacerbated by the need for extensive backtracking and repeated sampling, ultimately resulting in time-consuming exploration processes. We design a path coverage method suitable for SAR missions. By employing a fixed sampling method based on grid centers to generate a crisscross topological path, we ensure high environmental coverage while improving sampling efficiency and reducing path retracing.
In addition, multi-robot systems have been explored to enhance SAR efficiency through collaborative efforts. Murtaza et al. 31 created a priority grid map for unmanned aerial vehicles (UAVs), applying POMDP algorithms to optimize SAR missions. This method requires direct inter-robot communication, leading to high communication overhead and reduced real-time efficiency. We maintain global topological paths through distributed node communication, avoiding the inefficiencies of point-to-point direct communication.
Recently, artificial intelligence-based optimal control laws for uncertain autonomous systems have shed new light on SAR robot decision-making and exploration. For instance, bioinspired machine learning-enabled optimal path planning for faulty UAVs achieves trajectory optimization under hardware fault uncertainty, 32 and the exploration–exploitation adaptive law boosts the adaptability of model-free control for autonomous systems under environmental and model uncertainty. 33 These methods focus on optimal control under single or specific uncertainty via data-driven or bioinspired learning. In contrast, our POMDP framework integrates environmental semantic information and human experience, balancing movement costs and rescue priorities through belief state update, and is thus more suitable for comprehensive exploration and multi-objective decision-making in unknown complex SAR environments. Meanwhile, our grid-based topological path planning and distributed node fusion communication effectively compensate for the limitations of the above methods in large-scale environmental coverage and multi-robot collaborative exploration.
Methods
The overall block diagram of the system is shown in Figure 2. The proposed multi-robot SAR framework hierarchically integrates autonomous exploration, CPP, and POMDP-based decision-making into a closed-loop perception–decision–execution pipeline, with each module providing exclusive and essential information for the upper layer to ensure efficient exploration in unknown environments. Autonomous exploration is the environmental modeling foundation, which fuses LiDAR and SLAM to generate real-time grid occupancy maps and detect inter-grid passability, answering where the robot can move and providing basic environmental data for subsequent modules. CPP acts as the action set construction bridge, which builds a crisscross topological path via grid centroid sampling and screens reachable unsearched grids as the POMDP action set. It addresses how to traverse the environment with minimal retracing and reduces the computational redundancy of POMDP decision-making. POMDP-based decision-making is the top-layer optimal decision core, which fuses environmental semantics, SAR priorities, and the CPP-generated action set to balance movement costs and rescue urgency, select the optimal next grid, and answer where the robot should move first for the SAR mission. The three modules are tightly coupled: autonomous exploration updates the environmental model for CPP, CPP generates feasible actions for POMDP, and POMDP’s optimal decision drives robotic movement to initiate a new exploration cycle. All modules are integrated into the distributed node-based multi-robot formation strategy, ensuring consistent environmental information and collaborative decision-making for the entire formation.

During each planning phase, each robot identifies all next movable grids based on the global topological path and its current position, forming an action set. It then employs the POMDP solver to determine the optimal next grid with the highest expected rewards. After executing the move, the robot acquires a new observation and updates the confidence probabilities using the observation function.
Specifically, the SAR environment is divided into a grid map with prior information, and each grid is assigned priority based on functional characteristics and environmental semantic features. During the exploration process, each robot maintains a crisscross pattern topological path, and these local topological paths are then integrated into a global topological path, which provides all the robots with the next reachable grids set. At each moment, robots calculate the reachable grids set and the most beneficial grid based on the POMDP framework. When the robot reaches the center of the grid, it will obtain the observation data and update the status, representing the completion of the search for the grid.
POMDP formulation
During human-led SAR operations, the urgency and significance of rescue efforts vary greatly depending on the location, and rescuers typically prioritize regions with prior information. Inspired by human SAR strategies, robots are also designed to categorize the environment into different priorities based on prior information, such as the extent of building structural damage and the functional characteristics of each area. In this work, we propose a POMDP framework to improve SAR efficiency by combining SAR priorities and semantic information. The POMDP framework can be described as a seven-component tuple (
The reward function
In the process of POMDP, the value function
Grid-based topological path planning for SAR
Contrary to the stochastic sampling strategy employed by RRTs, this work proposes a deterministic sampling approach to improve the efficiency of the path planning. Specifically, we divide the map into grids and select the geometric centers of each grid as fixed sampling points. As the robot explores, the real-time SLAM-generated occupancy grid map is updated and utilized to detect obstacles between each grid and its eight neighbors. The relationship between two neighboring grids can be divided into three categories, as shown in Figure 3: 0 signifies no obstacle, 1 denotes an obstacle, and

The white area denotes the known region, whereas the gray area signifies the unknown region. The H grid and A grid have a relationship value of 0, indicating they are passable. Similarly, adjacent grid pairs such as G–F, F–I, I–E, and E–D also have a relationship value of 0. Conversely, grids A, B, C, and D are labeled with

The green and yellow colors represent the RRT and our topological path, respectively. Our method achieves higher environmental coverage with fewer sampling points, indicating higher SAR efficiency. RRT: rapidly-exploring random trees; SAR: search and rescue.
Based on the grid topological path, the robot employs a BFS algorithm to identify the set of nearby unsearched grids, referred to as set
The algorithm examines the eight neighboring grids of the current grid. If a neighboring grid has not been searched and there are no obstacles between it and the current grid, it is added to set
Finally, as shown in (5), the robot calculates the benefit of each element in set
The multi-robot formation strategy
In multi-robot formation tasks, the traditional point-to-point communication method requires each robot to send information to all other robots in the formation simultaneously. This mode is not only inefficient in large-scale formations but also prone to communication congestion. Moreover, the method of generating topological paths based on a global map is susceptible to errors during the fusion of sub-maps, which in turn leads to path planning errors and severely affects the efficiency of SAR tasks.
To address these issues, we propose a distributed-node-based multi-robot formation method. As shown in Figure 5, multiple robots are deployed within the environment, each functioning as an independent agent that autonomously performs the SAR mission. The information fusion node is tasked with aggregating and storing the search data and global topological paths from all individual robots. Rather than directly merging occupancy maps, we merge the topological paths of the multiple robots. During task execution, each robot maintains a local grid topological path and publishes its topological path to the fusion node. Furthermore, once a robot finishes searching a grid, it shares the search results and observations of that grid with the fusion node. When the fusion node receives messages from a robot, it processes and merges them to update the search information and global topological paths. The fusion node then distributes these updates to all robots, ensuring each robot has access to the collective field of view. Specifically, we implement this flexible multi-robot formation strategy through topic communication of Robot Operating System, allowing for the easy addition or removal of robots.

Method diagram of multi-robot formation.
We model the multi-robot SAR system as a nonlinear discrete system with input constraints and external disturbances to characterize its dynamic and coupling characteristics:
Uncertainties in the multi-robot SAR system are divided into internal (parametric/non-parametric) and external (modeled/unmodeled) types, with distinct impacts on SAR missions, and all are addressed by the original framework: (1) Internal parametric uncertainty (from hardware parameter deviation causing minor state tracking errors) is compensated by the POMDP belief state
Finally, this distributed-node approach enhances the communication efficiency of multi-robot formations and improves the accuracy of the global topological path by sharing topological paths instead of maps, thereby avoiding the global topological path inaccuracies caused by map drift errors during the fusion process, and consequently increasing the efficiency of SAR tasks.
Experiments & results
Robot exploration in simulations
We use a laptop with Ubuntu 20.04 as the simulation platform, equipped with a 13th Gen Intel Core i7-13700K CPU and an NVIDIA GeForce RTX 2060 Super GPU. We use gazebo to set up the simulation scenarios to verify the efficiency of our SAR approach, as shown in Figure 6. The size of each scene is 20 m

The environment is divided into red, orange, yellow, and white areas according to the intensity of people from high to low.
The priority categories are assigned in grids, with the number of trapped people determined based on these categories. Specifically, there are four priority levels, with their respective distributions, so the set of states is defined as
We conduct simulation experiments by using a two-wheeled differential robot equipped with a lidar. In each scene, we use one robot for independent SAR and three robots for collaborative SAR. In addition, we compare our method with San Juan et al. 24 and Umari and Mukhopadhyay 27 to demonstrate the advantages. Following San Juan et al. 24 and Umari and Mukhopadhyay, 27 we directly assign values to each grid to represent priority, and the robot then selects the next grid based on both priority and distance.
We carry out single-robot experiments and multi-robot experiments in the simulated scenarios, respectively. The evolution of the passable topological paths over time during a single robot experiment is illustrated in Figure 6. At every moment, the topological path dynamically fills the entire known space, connecting all the grids in the known space and establishing the connectivity between them. It can be seen that the global topological path becomes gradually complete over time. Our method can achieve efficient coverage of the SAR environment by using only a few sampling points.
The evolution of the passable topological paths during the multi-robot experiment is depicted in Figure 7. Specifically, the solid lines of different colors represent the local topological paths of different robots, and three robots are positioned in the upper right, lower left, and lower right corners, respectively. Each robot independently makes decisions and explores while sharing information to merge passable topological paths and search results. As a result, the passable topological paths expand simultaneously from three directions. Compared with the single-robot framework, the proposed multi-robot formation strategy can complete the SAR of the entire environment in a shorter time, which significantly improves the SAR efficiency. To eliminate any potential contingencies, we conduct 10 experimental and comparative tests in each scene for both single-robot simulation and multi-robot simulation. The changing curves of explored grids and searched victims during the single robot search process are shown in Figure 8, and the quantitative metric results are summarized in Table 1. The red curve illustrates the performance of our method, while the orange curve and purple curve represent the result of San Juan et al. 24 and Umari and Mukhopadhyay, 27 respectively. We can find that the rate of change is initially rapid but gradually diminishes over time. Although Umari and Mukhopadhyay 27 have a relatively high exploration efficiency at the beginning, the subsequent growth trend is very slow. Our method consistently explores more grids and detects more trapped persons than the comparison test at most times.

The passable topological paths of three robots in three scenes.

The explored grids and searched persons during the experiment for a single robot.
The result of the time consumed by single-robot experiments.
Note. All results are the mean
The SAR mission is finished when all the grids have been explored. The robot moves at a rapid speed in the initial stages and progressively decelerates as it advances toward task completion. Finally, we take the average of the results from 10 experiments. Experimental results reveal that our methods needs 873, 996, and 1407 s to complete the SAR mission in scenes S1, S2, and S3, respectively. By comparison, the method proposed by San Juan et al.
24
requires corresponding 924, 1290, and 1542 s, respectively. And the method proposed by Umari and Mukhopadhyay
27
is prone to fall into local extreme values in the later stage, resulting in the problem of SAR failure. Due to the deceleration in the final stage of the task, we use 90% completion as the benchmark for comparison. The average time taken by our method to complete 90% of the grids is 771, 844, and 1140 s for these three scenes, respectively. This indicates that the time required by our method is reduced by up to 22.1% and 21.5%, respectively, compared with San Juan et al.
24
and Umari and Mukhopadhyay.
27
The small standard deviation values of the proposed method in all time indicators (
Moreover, the average time required to search for 90% of the victims using our approach is 731, 838, and 990 s, respectively. This indicates that the time required by our method is reduced by up to 27.5% in S3 and 36.2% in S3, respectively, compared with San Juan et al. 24 and Umari and Mukhopadhyay. 27 The results highlight that our method becomes increasingly advantageous as the obstacle density rises, resulting in a more pronounced performance improvement. The changing curves of explored grids and searched victims during the multi-robot search process are shown in Figure 9. Table 2 shows the average number of explored grids obtained from multiple experiments, and Table 3 shows the average time. For the number of explored grids in multi-robot experiments (Table 3), the proposed method has a small standard deviation for the single robot exploration amount and the total exploration amount, which indicates that the distributed node communication strategy achieves balanced task allocation among multiple robots, and avoids the large fluctuation of exploration efficiency caused by uneven task distribution in San Juan et al. 24 and Umari and Mukhopadhyay. 27

The explored grids and searched persons during the multi-robot search process. r1, r2, and r3 represent three robots, respectively.
The number of explored grids in the multi-robot experiment.
Note. All results are the mean
The result of the time consumed by multi-robot experiments.
Note. All results are the mean
It can be seen that Umari and Mukhopadhyay
27
explored more grids in the early stage to find more people waiting for rescue. However, due to the randomness of the growth of its tree structure, it is prone to fall into a local extremum in the unstructured affected area. In the later stage of the experiment, the changing trend of its curve is not obvious, and the efficiency is relatively low. In terms of the three indicators of completion time, the time to explore 90% of the grids, and the time to rescue 90% of the personnel, our method reduces the time by up to 9.6%, 38.3%, and 32.3%, respectively, compared with Umari and Mukhopadhyay,
27
and reduces the time by up to 18.3%, 17.8%, and 30.1%, respectively, compared with San Juan et al.
24
This indicates that our multi-robot experimental method has a more obvious improvement in the execution efficiency of SAR tasks. The multi-robot time consumption results (Table 3) further verify the stability of the proposed method: the standard deviation of all time indicators is <31.8 s, which is lower than that of San Juan et al.
24
(
Real-world experiments
To further validate the efficiency of our SAR method, we also carried out real-world experimental studies. Experimental environments consists of a

Platform, pre-setting, and result of real-world experiments. (a) Real scene R1. (b) Curve of R1. (c) Real scene R2. (d) Curve of R2.
Figure 10(b) and (d), respectively, shows the change curves of the rescued personnel in these two experiments, and Table 4 shows the time required for the rescue of all personnel under different methods. It can be seen that our method always tends to identify the trapped people at the first moment, and the efficiency of SAR is relatively high. In R1, compared with San Juan et al.
24
and Umari and Mukhopadhyay,
27
the SAR required time of our method was reduced by 9.4% and 31.6%, respectively, and the required time in R2 was reduced by 30.1% and 38.4%, respectively. The small standard deviation values of the proposed method (

The upper column is trajectory of R1, and the lower column is trajectory of R2 (positive direction is from left to right).
The result of the time consumed by multi-robot experiments.
Note. All results are the mean
Key symbols and definitions in the proposed framework.
Conclusion
In this article, we have proposed a robot exploration strategy for unknown, complex, obstacle-filled SAR environments. Unlike traditional map-building exploration, our strategy aims for a comprehensive environmental search, requiring the robot to traverse the entire environment. This article has featured a POMDP model incorporating environmental semantics and human experience to balance movement costs and rescue priorities. It also integrated autonomous exploration with CPP, using grid centroids as fixed sampling points to construct a topological path for complete environmental coverage. Additionally, a distributed multi-robot formation strategy based on node fusion communication allowed information sharing without direct inter-robot communication, minimizing redundant efforts. Simulation and real-world experiments have demonstrated that our methods significantly improved exploration efficiency and reduced path repetition compared to state-of-the-art methods. Overall, our method has provided a robust solution for decision-making in time-critical SAR scenarios, improving the success rates and efficiency of SAR missions. In future work, we will add the visual recognition module and further refine the resolution of the prior grid map to improve the generalization ability and flexibility of our method.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by State Grid JiangXi Information & Telecommunication Branch Project Funding (Contract No.: 52183524000P).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
