Abstract
Mobile robots have been used to explore novel environments and build useful maps for navigation. Although sensor-based random tree techniques have been used extensively for exploration, they are not efficient for time-critical applications since the robot may visit the same place more than once during backtracking. In this paper, a novel, simple yet effective heuristic backtracking algorithm is proposed to reduce the exploration time and distance travelled. The new algorithm is based on the selection of the most informative node to approach during backtracking. A new environmental complexity metric is developed to evaluate the exploration complexity of different structured environments and thus enable a fair comparison between exploration techniques. An evaluation index is also developed to encapsulate the total performance of an exploration technique in a single number for the comparison of techniques. The developed backtracking algorithm is tested through computer simulations for several structured environments to verify its effectiveness using the developed complexity metric and the evaluation index. The results confirmed significant performance improvement using the proposed algorithm. The new evaluation index is also shown to be representative of the performance and to facilitate comparisons.
Keywords
1. Introduction
There is increasing need for autonomous robots in hazardous environments, such as disaster sites and nuclear plants, as well as in inaccessible areas such as volcanoes and for space missions. Exploration is essential for robots that are required to move autonomously in novel environments. Therefore, developing efficient strategies for exploration is both interesting and important. Map information is important for path planning and task execution since the availability of a map increases the speed with which the robot can reach areas of interest in the environment. It is important to define the goal and evaluation criteria to judge the exploration performance. Intuitively, the objective of exploration is to gain the maximum amount of accurate information about the environment - represented by the explored space completeness - in the shortest time and over the minimum distance travelled.
1.1 Robot exploration algorithms and backtracking techniques
Exploration is usually made using a greedy strategy that plans one step ahead by determining the next-best locations - called ‘frontiers' - which maximize the acquired information. One of the most popular frontier-based exploration techniques was developed in [1], in which the frontiers are defined as the boundaries between free and unexplored areas. Approaching those frontiers enables the acquisition of more information about the unknown environment. Frontier-based exploration [1] is common to almost all exploration techniques and, depending upon the frontier selection mechanism, the existing techniques can be broadly classified into three categories, namely: optimal-frontier, behaviour-based and randomized motion techniques.
In optimal-frontier exploration, the next frontier is selected based upon a cost function. In [1], this function was selected to be the shortest distance required to reach each of them. The cost function may take other forms. For example, in [2], two criteria were considered in the evaluation, namely: the travelling cost required to reach a frontier and the expected information gain when performing a sensing action at that frontier. The cost function was controlled by three parameters: the distance cost, the expected utility and localizability (the latter of which is defined by the suitability of a frontier to enhance the robot localization when reaching that frontier, as described in [3]).
Optimal frontier strategies have two common problems. First, due to continuous map updates, the currently approached region might become fully explored during navigation before reaching the destination frontier. In this case, the robot will start to explore the next unknown region. This problem occurs when using sensors with a wide perceptual range. Second, the optimal frontier can lie outside the room being explored. This situation may cause the robot to explore the same room twice, with unnecessary long distances. Repetitive re-checking of the frontier during navigation and segmentation of the partially built map were suggested in [4] to solve the first and the second problem respectively. The segmentation process separates rooms from each other causing the robot to favour visiting frontiers lying inside the currently explored room. However, although such a solution works well in office-like environments, open environments cannot be reasonably segmented, which may impair performance.
The exploration process could be decomposed into smaller, simple, reactive tasks, which leads to the use of behaviour-based exploration approaches [5–11]. The exploration task is divided into simple simultaneous actions. Such actions or behaviours involve repulsive and attractive forces, such as avoiding obstacles and reaching targets, respectively. For instance, in [6] a behaviour-based approach combined three reactive behaviours for exploration, namely: “reach frontiers”, “avoid obstacles” and “avoid other robots”. Another example in which a simple wall-following behaviour as a reactive model led exploration was proposed in [7]. In [8], repulsive behaviour from previously visited areas was used for exploration. In [9, 10], a complex behaviour architecture was proposed in which a combination of several weighted behaviours were fused together for efficient exploration. These behaviours can be generally formulated as: “go to frontier”, “go to unexplored areas”, “avoid other robots” and “avoid obstacles”. However, reactive systems do not perform well in large and complex environments [11]. In such environments, the forces combining those behaviours could compensate each other in a certain region, causing local minima that trap the robot at a certain point. The local minima problem is common in behaviour-based techniques, not only for exploration but also for normal navigation. This requires such systems to have a local minima detection and recovery mechanism to avoid this problem.
In randomized motion planning [12–14], robots are directed to acquire more information through random steps. Randomized increments of a data structure called ‘sensor-based random trees' (SRTs) were generated in [13]. This tree represents the roadmap of the explored area with an associated safe region that describes obstacle-free regions around the robot depending upon its sensor aspects. The nodes of this tree are the explored locations. This basic SRT strategy was later modified to enhance the process. For instance, in frontier-based SRT (FB-SRT) [14], the random selection of target points was biased towards local frontier arcs in the current safe region. This improves efficiency in terms of shorter travelled paths and greater area coverage. Furthermore, in [15] and [16] a sensor-based exploration tree (SET) was constructed for depth-first search exploration. The target configurations were selected to maximize the estimated information utility in forward mode only by calculating the expected utility along the local frontier boundary. This helps the robot to filter out any useless actions that might be executed. In these strategies (SRT, FB-SRT and SET), if there are no more areas to explore, the robot goes back through the previous nodes to find new regions and explore again. This backtrack strategy causes long exploration distances and times, especially in places with wide open spaces.
Several approaches were proposed to solve this backtracking problem. For instance, bridges were added in [17] to the exploration tree so that the robot could plan quick paths without looping all the previous nodes. A bridge was added between any two adjacent nodes with a common safe region between them and separated by a distance greater than double the sensor range. The drawback of this improvement is the possibility that other nodes might be worth reaching despite having no shared safe region with the currently visited node. Ignoring such a node will reduce the area covered by the robot. In addition, in [18] a short path was planned between the current node without additional information and the initial node assuming that no information will be gathered from the nodes between them. This hypothesis works well in corridor environments rather than wide spaces and office-like environments.
1.2 Performance metrics for robotic exploration
Although robots are designed bearing in mind their working environment, there is no efficient way to compare robot performances in different places. One way to do this is to run a series of simulations of robots working in different structures. However, this is time-consuming and requires much effort. Several complexity metrics (CMs) had been introduced [19–22], namely: a space syntax, entropy, and obstacle distribution and compression. Those metrics are suitable for path planning rather than exploration. The space syntax method [19], which is concerned with the connectivity of environmental features rather than distances, uses a labour-intensive axial map to measure environmental complexity. Thus, it is not suitable for exploration. Entropy is regarded as a measure of uncertainty in the working environment [21, 22]. The more free spaces there are, the more decisions the robot should make to reach its target and the more time will be taken. Hence, entropy is a measure of the environmental complexity if the robot wishes to reach a particular goal. For exploration, there is no inaccurate decision, since any given decision will have benefits for exploration. Additionally, the sensor details are not considered in the entropy computations. Therefore, entropy is not an adequate measure of environmental complexity from the exploration point of view. Since the distribution of obstacles in an environment affects the navigation task's complexity, the environment is identified by a unique factor called ‘the compression factor’ [22]. This factor measures the repeatability of patterns of obstacles in a certain environment, and hence the environment's complexity. In other words, if the obstacles have repeated patterns and - in turn - can be easily described, then this environment has a low CM, and vice versa. Even though it considers the sensor properties, it is computationally expensive and not suitable for measuring the environmental complexity for the exploring robot. Therefore, there is no complexity measure in the published literature suitable from the exploration point of view, to the best of our knowledge.
Although sensor-based exploration enjoys simplicity and completeness, it can take long time and involve a greater distance, mainly due to its backtracking strategy [18]. Suppose that there are no frontier areas during exploration, then the robot must go back to previous locations until there exists a frontier area. In some situations, the robot can travel a significant distance until it reaches a position which has frontier regions. In this paper, which extends our preliminary work in [30], a novel, simple yet effective backtracking technique based on a heuristic algorithm is proposed. This new algorithm is based on the selection of the most informative node to approach directly rather than travelling across all previous nodes in order. The most informative node is determined using the ray-casting algorithm. The proposed technique can be regarded as a combination of the optimal frontier and randomized motion planning strategies. Random exploration planning is applied in forward mode, while the optimal node is approached in backward mode. The new technique is suitable for time-critical applications, such as rescue applications. Another contribution of this paper is in devising a new evaluation index EI for exploration that is useful for comparing different techniques. The EI encapsulates the exploration efforts, distance and time, the percentage of the area covered, completeness and the number of nodes in a single number to avoid any trade-off among those metrics. An efficient environmental CM is also proposed to evaluate the degree of complexity of an environment and to compare techniques working in different structured environments.
This paper is organized as follows: In Section 2, the basics of the basic SRT exploration strategy are briefly outlined. In Section 3, the new exploration technique using the proposed heuristic backtracking algorithm is described. The proposed environmental CM and EI are introduced in Section 4. Simulations running on different exploration scenarios and a comparison with the basic sensor-based technique are presented in Section 5. Finally, conclusions are drawn in Section 6.
2. Sensor-based random tree exploration
The SRT exploration technique [13] is based on a random selection of robot configurations q=[x y θ]T inside the local safe region (LSR), where x and y represent the position of the robot and θ represents the robot orientation with respect to the local coordinate frame. The LSR represents the free space around the robot in the current configuration qcurr, where its shape depends upon the sensor characteristics as described in [23]. A road map of the visited configurations - with the associated LSR - is represented by an incremental data structure called ‘SRT’. Each node in the tree T represents the explored location. Pseudo-code of the SRT technique is shown in Algorithm 1. This algorithm is repeated Kmax times, which is the maximum number of nodes assumed to be found in the environment. The Algorithm starts at qcurr acquiring the sensor measurements through the

Validation of different candidate configurations in SRT: qcand3 is accepted, while qcand1 and qcand2 are not
Algorithm 1: A pseud,-code for the basic SRT algorithm
3. A new heuristic backtracking algorithm
The following are the explicit assumptions for the new exploration approach:
Robot localization is provided by a separate module.
The exploration environment is planar, i.e., R 2, due to the nature of the planar range sensor used.
The exploration environment is static, i.e., it consists of unchanging surroundings in which the robot explores.
The robot is holonomic, i.e., it can turn in any direction.
In SRT, when there are no more valid configurations to reach, the robot traverses the parent node of qcurr, searching for new candidate locations to explore. This backtrack step takes time, which is not justified. In the proposed approach, forward mode exploration is performed in a similar fashion to the basic SRT method as shown in Algorithm 2. While in backtracking mode, when there are no valid configurations to reach, the tree data structure that has been built is tested in reverse order; starting from the current node, and the parent nodes, qtest, are checked for whether they can provide more information. A gain G(qtest) is calculated by the function
The basic idea for enhancing backtracking in SRT is shown schematically in Fig. 2. In the basic SRT strategy, as shown in Fig. 2(a), when there are no new areas the robot backtracks to all the previous nodes until exiting the currently explored room. While using the proposed approach, the robot searches for the most informative node, as shown in Fig. 2(b), to which the starting node is assumed as a new starting node. A shortest path - with the help of the built LSRs - is then planned to approach this node of interest, saving both distance and time.

A sketch showing a robot exploring a room using: a) the basic SRT strategy, and b) the proposed approach, where a shortest path, in green, is planned to the most informative node
3.1 Map building
A spatial representation for the unknown environment is required for two tasks: environment monitoring (such as in surveillance applications) and helping the robot to navigate the environment in backtracking mode. In this paper, the occupancy grid-based map is used for this representation. The environment is divided into small grids, each containing a value representing the probability of being occupied by obstacles. It is necessary to know for each cell whether it is unknown, free or an obstacle. Initially, the map is assumed to be unknown. Given a range scan and the robot pose, the occupancy grids within the sensor range are updated as follows. Firstly, scan readings are converted into Cartesian coordinates of the occupancy grid map, creating a polygon of points. Secondly, this polygon is identified as the LSR and filled using the flood-fill algorithm. Thirdly, obstacle cells are identified by the sensor readings that are lower than the maximum sensor range Rmax. The process is shown in Fig.3.

Building an occupancy grid map using scanner range data
Algorithm 2: A pseudo-code for the proposed heuristic backtracking algorithm.
3.2 The most informative node
In the proposed approach, when there is no valid configuration to reach, the robot selects the most informative node among the previous nodes. This informative node is expected to have information gain (in terms of free cells) greater than the threshold Gthresh, which depends upon the maximum range of the sensor. In other words, a large sensor range means more regions to explore and, hence, a large threshold is required. This threshold is selected heuristically to achieve a compromise between exploration completeness and the total distance travelled. The higher the threshold, the shorter the exploration time and the lower the level of exploration completeness. In fact, the estimation of the information gain that could be obtained at any point is difficult. The actual gain is hard to predict, as it varies according to the structure of the corresponding region. In [11], this gain was calculated by counting the number of unknown cells lying in a particular region surrounded by the maximum sensor range. This method did not guarantee a correct estimation, since some unreachable and unknown regions could be counted. In [4], the information gain was approximated as the relative difference between the current map entropy and the expected entropy after the simulated robot step at the candidate location. This approach requires scanning all the cells in the global map. In our algorithm, a simple heuristic ray-casting [24] method is applied to estimate how valuable a certain node qtest will be. During the ray-casting process, as illustrated in Algorithm 3, the number of configurations traversed by the rays which can contribute to the exploration process is recorded, and the sum of all the valid configurations traversed by the rays is used as a measure of how much information gain can theoretically be obtained from a particular node. This suits the laser scanner sensor used, where the number of scan rays and the angle between them depend upon the characteristics of the actual sensor used. Note, that sensor noise is not considered when estimating the information gain of each node, since the noise can affect the estimation and should be modelled. Valid configurations are tested through the

Estimation of information gain at different configurations by the use of a ray-casting technique
Algorithm 3: Ray-Casting algorithm to estimate the most informative node.
After identifying the most informative node, a shortest path is planned using the A* algorithm [25] to reach it. This saves both exploration distance and time as compared to visiting all the previous nodes. The dimensions of the robot are taken into account while planning the path by eroding the partially built map with a disk structure element. Unknown and obstacle cells are avoided during robot navigation.
4. The proposed complexity metric and evaluation index
4.1 Environmental complexity metric
Robots are tested in different environments with different degrees of complexity, which is problematic for comparing the performance of different exploration strategies. Therefore, it is useful to develop a metric that can quantify environmental complexity. This will help to reduce the effect of the environmental complexity on the performance comparison of the different exploration strategies. The available complexity measures, such as space syntax, entropy and obstacle compression, are measures related to path planning rather than robot exploration. From the path-planning perspective, free areas mean more choices for the robot to reach its target, and hence a more complex environment. On the other hand, from the exploration perspective, free areas mean more information to be acquired about the unknown environment, and hence imply a less complex environment. As such, in this paper a novel environmental CM for the exploration process is proposed, as follows.
Intuitively, obstacles in the environment could help or hinder the exploration process. The exploration process will be efficient if the sensor range spans the entire environment. The effect of the obstacle density and distribution is illustrated in Fig. 5, where two environments - namely, a mazy environment and an open space environment - have the same number of obstacle cells but a different distribution. From the exploration point of view, the mazy environment is more complex than the open space environment. The complexity measurement could be simplified by calculating the difference between the actual number of nodes required to cover a certain environment, Nact, and the estimated number of nodes, Nest, required to cover the abstract free area without considering the obstacle distribution, as follows:

Effect of obstacle distribution over complexity spaces - two environments with the same number of obstacle cells with different distributions. The first mazy environment is more complex than the open space environment
where CM ∈ [0,1] is the CM, ranging from zero to one, whereby zero means an absence of scattered obstacles and that greater benefits will be acquired from the sensor range, while higher values mean that the sensor range is not used effectively to cover the environment, such as with the maze-like environment. The estimated number of nodes, Nest, can be calculated as the number of free cells in the environment divided by the area covered by the inner rectangle of the sensor field of view (FOV), as follows:
where Rmax is the sensor's perceptual range.
On the other hand, calculating the actual number of nodes required to fully cover the structured environment while considering the obstacles' density and distribution, Nact, which was addressed in [26], is not easy. Here, a modification of the art gallery algorithm [27], a well-known algorithm for visibility in computational geometry, will be used. The reason for this is that the basic algorithm does not consider the sensor details, such as the maximum range and the FOV. Furthermore, it assumes a line of sight sensor, which is not practical (besides being computationally expensive). Thus, only an approximate solution that depends upon the density and distribution of obstacles in the environment can be obtained. Therefore, it is proposed to find the minimum actual number of nodes required to cover the entire space based on the basic art gallery problem. Basically, it randomly samples the environment to construct a relatively large set Ssam of covering nodes. Sensor aspects, such as the maximum range and the FOV, are considered by using the ray-casting
Algorithm 4: Greedy Randomized Art-Gallery Algorithm.
After acquiring the 2D model of the environment, the algorithm computes the estimated number of nodes, Nest, required to cover the entire environment using (2). Afterwards, n samples will be generated randomly to be distributed over the environment. To eliminate the randomness effect over the output of the algorithm, the number of random samples is selected to be large, for example, 100 samples for the Rmax =2m. The gain G(si) of each sample, si, will be computed by the use of the ray-casting algorithm, where sensor aspects will be considered. The sample si with maximum gain will be selected, and the area covered by this sample will be plotted over the map by the

The CM values, obtained over a single run, for different environments with n=100 random samples, FOV=360°, a coverage threshold. Uthresh=99%. and with different sensor ranges, Top:=2m. Down: Rmax=4m.
4.2 Performance evaluation index
The performance of an exploration strategy is usually measured by four metrics, namely: the distance travelled, D, the exploration time, T, the number of nodes created in the tree, Nnodes, and the completeness, C. The distance travelled is defined as the total distance travelled by a robot after returning back to the home position, while the exploration time is defined as the time taken by a robot to complete the exploration process. Completeness is the percentage of total area covered after the homing step. However, there is always a trade-off among these metrics by which the performance of an exploration strategy can be evaluated. In order to avoid this trade-off, a single EI is proposed here. This EI encapsulates all the metrics in just one number. Intuitively, the proposed index can be formulated based on its relationship with the mentioned metrics. EI is proposed to be directly proportional to the completeness, C, and inversely proportional to the normalized exploration time,
where wc, wt, wd and wn are the proportional weights which are added to measure the importance of each normalized metric for the EI. In the proposed approach, each of those weights is set to unity since the four metrics are equally important. The completeness is defined as:
The normalized number of nodes is made here with respect to the near-optimal number of actual nodes, Nact, calculated from the greedy randomized art-gallery algorithm, and so
The distance travelled is normalized to the total distance required for a robot to explore the entire environment, which can be estimated using the actual number of nodes calculated. The distance between two nodes in the coverage problem based on graph theory [28] is equal to double the maximum sensor range, and so the total distance, Dtotal, required can be approximated to the following:
As such, the normalized travel distance is given as:
The normalized exploration time is given as:
where v is the average speed of the robot, which is a fixed value during the exploration process. Note that the exploration time measured here is the total simulation time, including the computation time of the algorithm. This is why the normalized value of the distance travelled differs from the normalized exploration time.
5. Simulation results
Several simulation scenarios have been implemented to validate the proposed exploration approach. Each scenario has been identified by its environmental CM and the exploration EI. The 3D mobile robot simulator Webots [29] was used in all our simulations. A three-wheel omnidirectional mobile robot was used. The robot has a diameter of 0.2 m and carries a 360° laser range finder with one-degree angular resolution. In the simulations, the parameters for the SRT algorithm were selected as follows: α =0.9, dmin =0.7m, Gthresh =100 cells and Rmax =2m. The performance of the developed approach was compared with the basic SRT approach through several metrics, representing the effort paid (the distance travelled, D, the exploration time, T), the coverage gained (the completeness, C) and the number of nodes created in the tree, Nnodes, as well as the exploration EI. Two environmental scenarios are presented here, namely an office-like environment and a maze-like environment.
5.1 The office-like environment
The proposed exploration approach with heuristic back-tracking and the basic SRT strategy are implemented in the office-like environment shown in Fig. 7. The environmental CM for this environment is calculated using (1) as 0.31. Detailed simulation steps at different simulation times with the associated roadmap are shown in Fig. 8. The nodes created and the paths travelled are shown in green and blue, respectively, while the shortest paths planned by the heuristic approach are shown in red. In a basic SRT strategy, red edges mean that the robot has backtracked along them to return to its home position, which does not appear in the proposed backtracking approach as the robot is not required to go back over all previous nodes (it is just required to plan a short path to the most informative nodes). A comparison between the proposed approach and the basic SRT approach is given in Table 1 in terms of the mentioned metrics at different sensor ranges, namely 1 m and 2 m. Due to the random behaviour of SRT, values are averaged over five simulation runs with different initial robot positions. The standard deviations are also shown for the four metrics.
Simulation results of the office-like environment

Office-like environment

Simulation steps at different simulation times T with the associated roadmap. Left: the basic SRT approach. Right: the proposed approach.
It can be noted from Table 1 that there is a significant decrease in the total path length and the total exploration time for the proposed approach compared to the basic SRT approach. Furthermore, it can be observed that a significant reduction in the exploration distance appears clearly in the scenario with a smaller sensor range, which can be attributed to the high number of exploration edges required to fill the entire open space. Additionally, the proposed approach provides nearly complete coverage, which is comparable to that of the original SRT approach, while exerting less exploration effort. Furthermore, the standard deviations of the four metrics are shown to be small values, which proves the high reliability of the results. The new EI is also shown to be representative of the exploration performance avoiding trade-off among the metrics. The proposed approach showed higher evaluation indices compared to those of the basic SRT at different perceptual ranges. The average speed of the robot is set to 10m/s and each of the proportional weights measuring the contribution of each metric to the index is taken to be one. In addition, it is worth noting that the complexity of exploration process is reduced through the decreased number of tree nodes.
Here, it is imperative to discuss the advantages of the proposed backtracking approach over the approaches presented in [17] and [18]. The backtracking approach in [17] is based on constructing a bridge between two adjacent nodes with a common safe region between them. This approach is helpful in corridor-like environments, which have a low probability of having in-between nodes with unexplored regions. Moreover, in [18] the backtracking approach is based on constructing a short cut between the initial and the most recently visited nodes without taking into account the possibility of there being a valuable node between them. This demonstrates the advantage of the proposed backtracking approach over these two approaches, since the proposed heuristic backtracking is based on approaching the most informative node which has a high probability of having unexplored regions in office-like or even in open-space environments, as illustrated in the results.
5.2 The maze-like environment
Similarly, simulations of the proposed exploration approach with heuristic backtracking and the basic SRT strategy were conducted in a maze-like environment, as shown in Fig. 9. The environmental CM for this environment is calculated using (1) as 0.42. The simulation results of the basic SRT are presented in Fig. 10(a) while those of the proposed SRT are shown in Fig. 10(b), in which the red lines represent the shortest paths followed whenever there are no valuable nodes to travel to. Table 2 summarizes the results obtained with a 2 m perceptual range averaged over five simulation runs with different initial robot positions.
Simulation results of the maze-like scenario

Maze-like environment

Simulation results for the maze-like scenario: (a) the basic SRT approach, (b) the proposed approach
Similarly, the significant benefit in the total path length and the total exploration time are shown for the proposed approach compared with those of the basic SRT approach. In addition, the proposed approach showed a higher EI compared to that of the basic SRT.
6. Conclusions
A novel heuristic backtracking algorithm has been developed for sensor-based random tree exploration to reduce the exploration time and the distance travelled so as to cope with time-critical applications. The new approach is based on the selection of the most informative node to approach rather than backtracking across all unnecessary explored areas. The enhancement of SRT exploration using the developed backtracking algorithm has been confirmed by conducting several simulations in different exploration scenarios. A new evaluation index has been devised to encapsulate the exploration metrics, namely the exploration time, the distance travelled, the coverage, and the number of nodes in a single number, avoiding trade-off among these metrics. This index has been shown to be representative of the exploration performance to be used for exploration comparison, especially in SRT-like algorithms. A step towards finding a unique complexity metric has also been developed, representing the environment's structural complexity from the exploration point of view. This metric reflects the difference between the ideal number of visits required to fully cover the environment and the actual number of visits required. It has been shown that this complexity metric is dependent upon the disparity among obstacles in the environment. Although the estimated number of nodes ultimately depends upon the sensor properties, such as range, field of view and angular resolution, the complexity metric is nearly independent of these properties; since these properties will be reflected in both the actual and the estimated number of nodes required to cover the environment, the complexity metric will therefore not change. In the future, we will explore the potential of the evaluation index in diverse scenarios as a step towards generalizing it for the measurement of exploration performance and comparing between techniques.
