Abstract
Gas source localization is one of the most common applications of a gas-sensitive mobile robot. However, most of the existing work focuses on rule-based algorithms for wheeled robots, which are difficult to apply in complex terrain with obstacles. In this article, we propose an olfactory quadruped robot to perform the gas source localization task using the Dueling Deep Q-Network (Dueling DQN) algorithm. For training, we designed a set of environments and imported gas dispersion data from computational fluid dynamics (CFD) software to construct a simulator. The olfactory quadruped robot was trained in this simulator using the Dueling DQN algorithm to learn how to find the gas source. The trained neural network was then deployed on the olfactory quadruped robot. Our method has been tested in both simulation and real environments. The olfactory quadruped robot can traverse rugged terrain in real experiments and efficiently find gas sources, demonstrating that our method is highly robust and has important practical value.
Introduction
Gas source localization (GSL) is the task of finding a chemical gas source in a specific environment. Traditional GSL methods can be mainly divided into bio-detection 1 and networked electronic noses (e-noses). 2 Bio-detection involves searching for the gas source on the spot with portable handheld sensors or trained animals. The networked e-noses method entails pre-positioning a large number of olfactory sensors, collecting data, and determining the location of the gas source. These two methods have many limitations and are challenging to apply to sudden hazardous gas leaks. With the development of mobile robots, olfactory robot-based GSL has been developed. 3 –5 The olfactory robot is able to locate gas sources autonomously through the e-noses and some additional sensors such as anemometers, 6 which has the advantages of low cost and no biological involvement.
Affected by airflow, gas molecules spread in space in plumes. Therefore, researchers typically decouple the GSL task into three subtasks: plume search, plume tracking, and source declaration, 7 and then accomplish the GSL task by implementing these subtasks. Over the past decades, researchers have proposed a variety of GSL algorithms for specific environments. Ferri et al. use a bioinspired GSL algorithm in low airflow. 8 Li et al. propose a real-time odor source localization method for robots using a particle filter algorithm during exploration. 9 Neumann et al. design a gas-sensitive micro-drone with a bioinspired gas tracking algorithm. 10 Hai-Feng et al. develop a chemical plume tracking method for autonomous underwater vehicles (AUVs) using partially observable Markov decision process (POMDP) and artificial potential field (APF) algorithms, showing improved stability and efficiency. 11 Monroy et al. estimate gas concentration distribution to localize gas sources. 12,13 Saska et al. propose a cooperative gas plume tracking method for micro-drone swarms using particle swarm optimization. 14 Monroy et al. combine visual and chemical sensors to narrow the search area based on semantic relationships between objects and gases. 15 These traditional algorithms rely on specific airflow conditions and environmental settings, lacking sufficient generality. This limitation poses significant inconvenience and difficulty for their widespread practical application.
Deep reinforcement learning (DRL) is a crucial branch of deep learning that has seen rapid development in recent years, often outperforming humans in various fields. 16,17 The difference between DRL and reinforcement learning (RL) is the policy-value function. The policy-value function of DRL is usually a deep neural network. By setting up a large number of different environments for training the neural network, we can get a robust policy. Deep Q-learning (DQN) 18,19 is a classical off-policy DRL algorithm. Dueling DQN 20 improves upon the DQN network structure, enabling faster learning and better convergence. The objective of Dueling DQN is to optimize the policy-value function through unsupervised learning of the agent–environment interaction, aiming to maximize the reward.
DRL has a wide range of application areas, but the related research on DRL-based GSL is still in the primary stage. 21 –23 However, related studies have initially demonstrated the superiority of DRL-based GSL. Using a learning approach to the GSL problem will no longer require a step-by-step solution. The end-to-end feature makes it easier to deploy without the need to tune a large number of parameters. DRL-based GSL can be applied to a wide variety of environments, including but not limited to outdoor, obstacle-laden, and stereo environments. Combined with our olfactory quadruped robot platform, the GSL task can be deployed in almost any scenario, which will break the limitations of traditional approaches and push GSL to real-world usage.
Since hazardous chemical gas leaks often occur in rugged and complex environments, and the distribution of gas molecules is highly unstable, the GSL task in real-world environment is particularly challenging. Most of the existing works on mobile robot-based GSL are difficult to be applied to realistic scenarios. This is not only because of the poor robustness of the existing algorithms but also because of the limitations of the robot platform, which makes it difficult to fully exploit the potential performance of the algorithms.
To enhance the application potential of GSL, we have utilized a quadruped robot equipped with olfactory capabilities as our platform. While most research in this area has focused on wheeled robots, which are limited to flat indoor environments due to their mechanical structure, quadruped robots offer distinct advantages. Quadruped robots can navigate unstructured and harsh environments, thanks to their walking capabilities. Additionally, they exhibit a wide range of velocities through various gait patterns, providing greater flexibility in motion control. Importantly, quadruped robots have minimal environmental impact compared to wheeled or tracked robots, which can damage surfaces and cause extensive environmental harm. This practicality adds significant value to our work. Nowadays, quadruped robots are widely used in complex tasks such as inspection of substations 24 or cable tunnels. 25 Considering the complexity of the GSL task, we believe that a quadruped robot is a better choice.
The purpose of our study is to reduce the GSL task’s dependence on specific scenarios. The main contribution of our study is the proposal of a GSL algorithm based on Dueling DQN. We deploy this algorithm on an olfactory quadruped robot equipped with e-noses and an anemometer. The Dueling DQN-based GSL algorithm enhances the olfactory quadruped robot’s environmental adaptability, and the robot platform further optimizes the algorithm’s performance. Together, they enable the deployment of the GSL task in typical real-world environments.
The rest of this article is organized as follows: Section “Dueling DQN-based GSL” details how the Dueling DQN algorithm is applied to the GSL task, including the design of the neural network, training, and simulation test. The sensors and controllers designed for the olfactory quadruped robot are presented in the section “Olfactory quadruped robot.”. Then, we deployed the trained neural network on our olfactory quadruped robot and tested GSL in a real experimental environment. The final experimental results are presented in the section “Experiment.”
Dueling DQN-based GSL
The GSL algorithms proposed in earlier studies were usually based on bionic optimization algorithm. 26 These algorithms search for the gas source through the concentration gradient but can lead to very steeply trajectories and have low success rates. Anemotaxis-based algorithms such as Surge-Cast 27 are very efficient when they are downwind of the gas source. However, the performance under other conditions is terrible. Besides, neither of these algorithms can be applied to scenarios with obstacles. To take advantage of our olfactory quadruped robot, we propose the Dueling DQN-based GSL algorithm, which is capable of locating gas source in environments with complex obstacles.
The main concept of the Dueling DQN-based GSL algorithm proposed in this article is transforming the GSL task into a learning problem and solve it by DRL. In addition to this, we center the map around the robot to solve the overfitting problem, so our algorithm does not rely on a specific environment and also performs well in unfamiliar environments.
The DQN algorithm was first applied to playing Atari games and has obtained scores that surpasses than human experts in several games. 18,19 The Dueling DQN is an improved version of the DQN algorithm. 20 Dueling DQN can help the agent to understand which states are more valuable. This allows the training process to converge faster.
To apply Dueling DQN to the GSL task, we approximate the GSL task as the Markov decision processes (MDPs), as shown in Figure 1. The olfactory quadruped robot samples the environment E to obtain the current state St
. In the GSL task, St
includes information about the trajectory, obstacles, concentration, and airflow. The action At
is selected by a convolutional neural network (CNN). Thanks to the omnidirectional movement capability of our olfactory quadruped robot, the direction of movement can be set to any around the robot. The robot executes At
in the environment and reaches the next state
where Pt represents the number of times the robot has come to its current position. The purpose of this punishment is to encourage the robot to explore more regions to avoid getting trapped in local optima.

Markov decision process approximated from GSL. GSL: gas source localization.
Dueling DQN algorithm
Dueling DQN is one of the DRL algorithms. The purpose of the Dueling DQN algorithm is to obtain Q value, which is the evaluation probability distribution of an action. Dueling DQN splits the Q value into a value function
The purpose of Dueling DQN is to continuously update an action-value function
where
The optimal action-value function
The basic idea of many RL algorithms is to perform iterative updates using the Bellman equation, but this approach is difficult to implement in high-dimensional state space. So we use a function approximator
where
For the GSL task, we modified part of the Dueling DQN algorithm, and the modified pseudocode is shown in Algorithm 1.
Dueling DQN for GSL.
The termination conditions for each episode are as follows: The robot reaches the source or a grid adjacent to the source. The robot is about to hit a border or an obstacle. The number of movement steps exceeds the preset value (This value is set to 450 during the training process.).
The environment design for RL is a very tedious process. To achieve more robustness in a certain number of scenarios, a random transformation is applied to the environment at the beginning of each episode, including transpose, flip, and rotate, and then randomly choose the spawn location of the robot. Random changes in the environment bring longer convergence times but yield a more robust policy.
In some similar work,
21
applying policy to different scenarios often requires a certain amount of transfer learning. In this article, only one policy has been trained and applied to multiple scenarios. Our approach is to decouple the policy with the environment by limiting the observation area. The input of our policy consists of historical data in an
To perform the convolution operation on the raw data, we define the input layer as a

The olfactory quadruped robot is located in the center of all matrices. (a) Trajectory is the first matrix, in which the value of each cell is the number of times the robot passing the corresponding position. (b) The obstacles matrix is cropped from the map, the obstacle regions marked with 1, other regions are 0. (c)–(e) indicate the concentration and airflow data measured by the robot. The latest measurements will overwrite the old data.
CNN is used as the backbone of the dueling neural network because of its powerful ability to extract features. The structure of the dueling neural network is shown in Figure 3. The features extracted by the convolutional network are then fed into the fully connected neural networks

Dueling Deep Q-Network designed for GSL task. GSL: gas source localization.
Training
To train the neural network, we designed a set of environments and presimulated the gas dispersion by CFD software. Figure 4 shows our training environment. Algorithm proposed in this article is not coupled with the size and shape of the environment, but for convenience, the size of each training environment is set to 15 m × 15 m and then gridded into 900 cells size of 0.5 m × 0.5 m.

Training environments: (a)–(d) shows the molar concentration of each environment; (e)–(h) shows the airflow of each environment.
We used Pytorch, which can automatically calculate gradients, as the training framework, Adam as the optimizer. To speed up the training process, we use the Tianshou framework to sample the environment in parallel. Some hyperparameters during the training process are set as shown in Table 1. To help the neural network learn more deeply, we set a very small learning rate α and a large discount ratio
Hyperparameters in the training process.
The curve of reward and step length during the training process is shown in Figure 5. Step length represents the number of steps used to find the gas source (regardless of whether the result was successful or not). At the beginning of the training, the robot would quickly hit the wall, causing the episode to terminate and having a small step length. After 10,000 steps of training, the robot learned to avoid obstacles, but it was trapped and difficult to find the gas source, which leads to a very large step length. After 400,000 steps, the step length gradually decreases and converges, which means that the robot is able to find the gas source efficiently at this moment.

Training curve of the reward and step length.
GADEN test
GADEN 28 is a gas dispersion simulator widely used in robot olfaction (RO). We have developed a simulation platform for olfactory quadruped robots based on the GADEN project and implemented basic motion and perception functions in the simulator. The olfactory quadruped robot measures the gas concentration by a virtual metal oxide (MOX) or photo ionization (PID) gas detector in the simulator and the airflow vector by a virtual ultrasonic anemometer.
To demonstrate the robustness and versatility of our algorithm, we conducted the GADEN simulation experiments to test our algorithm, we constructed a set of environments, including those in and out of the training set, and tested the GSL task. We also compared some commonly used GSL algorithms in GADEN simulator. It should be noted that these compared algorithms cannot be used in complex environments with obstacles, so we only compare their success rate and efficiency in a simple blank scenarios.
To perform the GSL simulation test, we model the scenarios by CAD software and import the inner mesh into CFD software to obtain airflow data. For the next step, the airflow data are fed into GADEN filament simulator to run the gas dispersal simulation. Once the simulation is done, the GSL task is ready to be tested on our simulation platform for olfactory quadruped robots.
The GSL tests were conducted in different environments with different settings (gas source location, airflow speed, and filaments release speed). The final results are shown in Figure 6. It is worth pointing out that for unfamiliar environments that the olfactory quadruped robot has never seen before (environments that are not included in the training environments), the algorithm proposed in this article is still able to find the gas source, which demonstrates the adaptability of our algorithm to different environments. Based on the GADEN test result, practical deployment of our algorithm is possible without transfer learning.

(a)–(d) are the environments used for training. (e) and (f) are the test environments robot has never seen before. The green point cloud is the plume of ethanol gas, and the black line is a sampling of the odometer data. The red flower marks the location of the gas source. The yellow triangle marks the initial position.
Most of the classical algorithms cannot be applied with obstacles. To compare the proposed algorithm with some classical algorithms, we set up a 15 m × 15 m empty scenario for comparison, which is shown in Figure 7. The algorithm used for comparison is gradient climbing 5 from chemotaxis and Surge-Cast 27 from anemotaxis. The gradient climbing algorithm finds the gas source by continuously moving toward a higher concentration. The Surge-Cast is a novel GSL algorithm which moves the robot in the plume straight upwind until it loses the plume and then moves crosswind to find plume.

Empty space for algorithms comparison. The blue wall is the wing inlet, the red wall is the wing outlet. The red flower marks the location of the gas source.
All three algorithms are tested 900 times in the simulator and the results are shown in Table 2. The Dueling DQN-based GSL algorithm proposed in this article achieved the highest success rate. This means that our algorithm can find the gas source at almost any location. The success rate of Surge-Cast is second only to our algorithm. Most of the failed instances of the Surge-Cast algorithm occur upwind. However, our algorithm still performs well upwind. Gradient climbing algorithm has the lowest success rate because the distribution of gases is often not satisfactory, so the gradient climbing algorithm can easily fall into a local optimum and lead to a failure result.
Success rate of the GSL algorithms.
GSL: gas source localization.We also compared the overhead rate of these three algorithms, which denotes the distance traveled from start-up to find the gas source divided by the shortest distance. As shown in Figure 8, the Dueling DQN algorithm has the largest overhead rate, which means that the Dueling DQN algorithm will search more regions autonomously. This brings a higher success rate. The gradient climbing algorithm has a very steep trajectory because it needs to move along the concentration gradient. This also leads to its high overhead rate and low success rate. The success rate of the Surge-Cast algorithm is not high, but the overhead rate is very low, which means this algorithm is very efficient in some specific environments.

Overhead rate of different algorithms, excluding failed instances.
Deployment
Scenes for GSL tasks often have complex structures and different sizes. To deploy the trained neural network on real robots, some modifications to the computational process are required. During the training process, we define the observation as a dense matrix of fixed size for computational efficiency. When deployed, since the sampled data from the sensors are sparse and the map size is indeterminate, we first store the data in a continuous memory space and construct the observation matrix for every inference process later.
We typically deploy neural networks on edge computing devices, such as the Movidius (Intel) and Jetson series (NVIDIA). Considering the level of integration, the device chosen for this study is the Jetson Nano. We export the weights and structure of the neural network to an open neural network exchange (ONNX) format, the ONNX format offers the possibility of deployment on different devices for further study. We also package all the programs into robot operating system (ROS) packages, which facilitates the software distribution and containerized deployment.
Olfactory quadruped robot
Nowadays, quadruped robot technology has been well developed. However, there is still no research that combines RO technology with quadruped robots. Since gas leak sources are often located in narrow and winding environments, the olfactory robot requires a smaller size and needs to move flexibly in the environment. More importantly, the floor of the GSL task is not always smooth, which means it must have the ability to adapt to uneven surfaces. Traditional wheeled robots often have difficulty collecting the data needed for GSL algorithm due to their inability to move over uneven ground. This can result in some key data loss, further increasing the difficulty of locating the gas source. In addition to this, many sensors, such as anemometers, cannot be tilted during measurement, which places a demand on the robot’s ability to adjust its body posture.
The quadruped robot is able to move over uneven ground while adjusting its body posture to ensure the sensors work properly. All of these functions are difficult to achieve for wheeled robots. To overcome the drawbacks of the wheeled robots, an olfactory quadruped robot is proposed as the executor for the GSL task, and its appearance is shown in Figure 9.

Olfactory quadruped robot designed for GSL tasks. GSL: gas source localization.
Hardware design
There is already a lot of research on quadruped robots, and many commercial quadruped robots have emerged. Existing quadruped robots often rely on their advanced actuators and sensors, resulting in unaffordable prices and maintenance difficulties. The olfactory quadruped robot proposed in this article is simplified in its mechanical and circuit design to make it easier to be reproduced. Meanwhile, our modular design also provides more potential possibilities for other application scenarios.
The olfactory quadruped robot uses the STM32F407 (STMicroelectronics) as its microcontroller to run our motion algorithm to control 12 high-precision digital servos and collect data from the e-noses and anemometers. The servo is driven by a servo control board with 18 output channels, which has a quarter-microseconds resolution and a frequency up to 333 Hz. To meet the requirements of neural network inference, we used Jetson Nano as the upper layer computing platform and implemented a high-speed data link through the USB Virtual Com Port (VCP) provided by the microcontroller. We also modified the rosserial project to establish the mapping of microcontroller resources to the ROS layer via VCP. This allows us to read sensor data or control the robot as if we were accessing a local process. The hardware schematic is shown in Figure 10.

Hardware connection diagram of the olfactory quadruped robot.
The olfactory quadruped robot has a total of 12 degrees of freedom, which means it not only has the capability of omnidirectional movement but can also maintain balance by adjusting its body pose. Most of the components of the olfactory quadruped robot are manufactured with three-dimensional (3D) printing. This allows us to easily adjust some parameters of the mechanical structure. Some key parameters of the robot are shown in Table 3.
The key parameters of the olfactory quadruped robot.
Sensors
The olfactory quadruped robot, as a mobile robot, is equipped with basic motion sensors including attitude and heading reference system (AHRS), visual odometer, and light detection and ranging (LiDAR). These sensors provide the robot with the most basic environmental awareness capabilities and allow easy access to the ROS navigation stack. Combined with some popular ROS packages, such as move_base and cartographer, the olfactory quadruped robot can implement the mapping and navigation functions. Also the plugin-based architecture of the ROS navigation stack allows us to change different sensors or perform sensor fusion.
The perception of e-noses is based on their inside olfactory sensors. The sensor device detects the physical and/or chemical changes incurred by gas molecules, and these changes are measured as an electrical signal. 29 Commonly used gas sensors on mobile robots include MOX, semiconductor, and infrared. 30 These sensors are sensitive to different gases and have different performances. Usually, to manufacture a general-purpose e-nose, an array of different sensors is required. 31
The olfactory quadruped robot has the ability to detect gas concentration by using its e-noses. In practice, the e-nose for a specific gas can be changed according to demand, thus enabling the traceability of different gases. In this article, harmless CO2 gas is used as the gas source considering experimental safety factors. The COZIR (Gas Sensing Solutions Ltd) infrared CO2 sensor is designed to monitor carbon dioxide levels indoors. It has the feature of high measurement frequency and its noise measurement is less than 10 ppm. Considering the performance requirements of robustness and response time, this article selects COZIR CO2 gas sensors as the e-noses.
The olfactory quadruped robot is also equipped with a small ultrasonic wind speed and direction sensor (VEMSEE: PR-3003-FSXCS-N01/4G-*) as its anemometer. Since the speed of sound transmission in the air is superimposed on the speed of airflow, ultrasonic sensors can measure wind speed and direction by the time difference method. The ultrasonic wind speed and direction sensor is lighter and more sensitive than the traditional mechanical sensor. It can convert wind speed and direction into analog output, which means we can easily read the data using the ADC chip (ADS1115) with the microcontroller.
Motion control
The controller design of the quadruped robot is inspired by the MIT Cheetah. 32,33 The controller of the quadruped robot is divided into three layers, including the body controller, leg controller, and joint controller. The block diagram of the system architecture is shown in Figure 11. The olfactory quadruped robot translates desired posture and velocity into foot positions via body controller and leg controller. The state estimator gives an estimate of the body pose from the feedback of sensors, thus forming a closed loop allowing the robot to keep its balance during motion.

System architecture block diagram. The upper level control commands include velocity and posture targets (green). These targets are translated into joint angles by the controllers (red) and sent to robot. The body posture is estimated by reading the data from sensors (blue), thus forming a closed loop to keep the body balance.
The motion control of the robot is mainly implemented by the leg controller, which generates the trajectory of feet through Bessel curves to reduce the impact and energy consumption. 34
The foot motion of the quadruped robot can be divided into the swing phase and stance phase. The trajectory of the swing phase is generated by a 2D Bessel curve of 12 control points.
where
The trajectory of the stance phase is shown by the following equation
where
The control of forwarding velocity is implemented according to the Raibert-Heuristic equation 35
where vd is the desired forward speed, v is the forward speed feedback, and Kv is the feedback gain.
To help the quadruped robot walk faster and more stable, the trajectory-related parameters need to be modified iteratively. The details of parameter settings are not the focus of this article.
Since most of our sensors are fixed on the body of the olfactory quadruped robot. And some sensors, such as the ultrasonic wind direction sensor, need to be kept level during the measurement process. This means we need to keep the body of our robot balanced during sensor sampling, otherwise it will cause interference to the sampled data. Control of body posture can be implemented by controlling the positions of the foot reference points, which means the positions of the feet when the robot is standing stationary. A PID controller was implemented to perform the balancing control of the robot, which inputs the roll and pitch data from IMU fixed on robot body to calculate the error, and finally obtains the foot reference positions by kinematics inverse operation.
Finally, we conducted a simple test of the motion function of the olfactory quadruped robot. We manually move the robot through the remote controller as shown in Figure 12, meanwhile measure the change in roll and pitch angle of its body link through the attached IMU. The body controller performed well during our motion test. As shown in Figure 13, the effect of the body controller is not particularly obvious when the olfactory quadruped robot is moving forward. However, when moving laterally or rotating, the group with the body controller enabled will walk more smoothly, meanwhile have less roll and pitch angle variation. This is of great significance for the olfactory quadruped robot to walk on uneven ground and ensure the precision of the sensor measurements, thus allowing more accurate data to be fed into the neural network.

Motion testing of the olfactory quadruped robot.

The motion test result of the olfactory quadruped robot. Two groups are included: those with the body controller enabled and those without. The roll and pitch change curves of the robot body were recorded by attached IMU. (a) Move forward, (b) move backward, (c) move left, (d) move right, (e) counterclockwise rotation, and (f) clockwise rotation.
Experiment
To demonstrate the suitability of our approach under real-world conditions, we conducted a real experiment in an indoor environment with obstacles and uneven ground and analyzed the performance and trajectory carried out by the olfactory quadruped robot. The experimental obstacles are wooden cubic blocks anchored to the ground, and the uneven ground is a terrain with several prominences. The experimental environment is filled with artificial airflow using fans and with a CO2 gas source inside. Commonly used CO2 gas sources are liquefied CO2 gas canisters or dry ice. For the use of gas canisters is not convenient and risky, we use the dry ice sublimation to produce CO2 gas.
To test the performance of our GSL algorithm in scenarios with obstacles and to verify the ability of the olfactory quadruped robot to traverse rugged terrain, we designed the experimental environment as shown in Figure 14. The field for the experiment was a

The global view of the experimental environment. The initial position is marked with magenta circles. Blue arrows indicate the direction of airflow.
Experimental setup
Before performing the GSL task, the olfactory quadruped robot needs to obtain the occupancy map of the ambient environment by running a simultaneous localization and mapping 36,37 algorithm. With an occupancy map, the robot can obtain its current location by localization algorithm such as adaptive Monte Carlo localization. 38 These algorithms are not the focus of this article, so we assume that the occupancy map is already known. Next, we dilate the occupancy map to avoid the robot scraping against obstacles or walls and finally transform it into a grid as the obstacles layer of the neural network input.
Once the map is constructed, we place the gas source and the olfactory quadruped robot in the preset location and start releasing CO2 gas. After 30s of gas dispersion, the indoor CO2 distribution is relatively stable. At this point, we launch the olfactory quadruped robot and execute the GSL task.
In the experiment, we fix the position of the gas source and let the olfactory quadruped robot execute the GSL task from different initial positions. The initial positions are set up by considering the location of the gas source, the direction of airflow, and the distribution of obstacles. The details of the initial position are set as follows. Position 1 is a corner at the downwind direction with a low initial gas concentration and requires crossing all the complex terrain to reach the vicinity of the gas source. Position 2 is also downwind of the gas source, but the airflow is parallel to the line of position 2 and the gas source, which means a higher initial concentration and easier tracking of the plume. Position 3 is located between the obstacle and the map boundary with a low initial gas concentration, and this position places high demands on the capability of obstacle avoidance. Position 4 is in the upwind direction, which means that the gas concentration is approximately zero during most of the search, and the only data the algorithm can rely on is airflow.
To test the performance of our algorithm, four experiments were conducted, with Exp. 1–4 corresponding to the initial positions 1–4. The procedure of the experiments is approximately the same. During the experiment, we record the trajectory of the robot and the gas concentration data measured by the e-nose for further analysis.
Results and discussion
The olfactory quadruped robot finds the gas source autonomously from different initial positions. We use RVIZ to record and visualize its trajectory, and the result is shown in Figures 15 –18. We also calculated the curve of the distance between the olfactory quadruped robot, which is shown in Figure 19. A distance of 0 indicates that the olfactory quadruped robot reaches the location of the gas source. The curve shows that our GSL process is quite smooth and efficient. Since we divide the environment into grids, and it is considered that the GSL task is successful only when it reaches the neighboring grids, the accuracy of our GSL algorithm is within 1.0 m.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 1. (a) Panorama and (b) trajectory.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 2. (a) Panorama and (b) trajectory.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 3. (a) Panorama and (b) trajectory.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 4. (a) Panorama and (b) trajectory.

The curve of the distance between the olfactory quadruped robot and the gas source during the experiment.
Precisely the quadruped robot can always find the gas source as long as it is given an infinite amount of time, but this means the efficiency is very low. So we propose the overhead rate to describe the efficiency of GSL. Overhead rate denotes the distance traveled from start-up to find the gas source divided by the shortest distance. The shortest distance is given by the A* algorithm. 39 The overhead rate results are shown in Table 4.
Experimental results under different settings.
Experiments demonstrate the capability of our method to find gas sources in complex environments with obstacles. For larger obstacles, the olfactory quadruped robot will add them to the occupancy map for localization and obstacle avoidance. During our four experiments, the olfactory quadruped robot was able to step over the uneven ground and maintain balance. Figure 20 shows the entire process of the olfactory quadruped robot crossing the uneven ground region. Common wheeled robots cannot pass through this region, and this kind of obstacle is usually difficult to be detected by LiDAR, thus traditional wheeled robots are difficult to handle it. Once the wheeled robot is accidentally caught in it, it will be very difficult to break out. But this region has less impact on a quadruped robot because it walks on its legs. In fact, the ground where GSL is applied is not necessarily level, such as factories and forests. The common wheeled robots are difficult to perform GSL in such places even with the state of art algorithms. The ability to perform GSL tasks in these environments is precisely the innovation of the olfactory quadruped robot proposed in this article.

Olfactory quadruped robot crosses uneven ground and maintains balance.
Our experimental environment is not included in the training environment. In such an environment, the olfactory quadruped robot can correctly find the gas source, which means that the proposed Dueling DQN-based GSL can adapt well to the unfamiliar environment, and also indicates that there is no overfitting problem in our model. This demonstrates that we are able to apply the olfactory quadruped robot to new scenarios to perform GSL tasks without retraining the model. If the model needs to be retrained every time, the olfactory quadruped robot is deployed in a new scenario, this will greatly increase the workload and cost. With all this in mind, our algorithm will provide great convenience if our olfactory quadruped robot needs to be deployed in real-world application scenarios.
To demonstrate the performance of our algorithm in different situations, we set up four initial positions in the experiment, three of which were located downwind and one upwind. For positions 1 and 2, the olfactory quadruped robot can quickly find the plume and follow it to the gas source. In position 3, the olfactory quadruped robot will have to go around obstacles to find the plume, which is a rigorous test of the obstacle avoidance performance of our algorithm. Position 4 is located upwind, and GSL from an upwind position is extremely challenging for the existing GSL algorithms, but finding the gas source in this situation is exactly the advantage of our algorithm.
At the beginning of Exp. 1, the olfactory quadruped robot approached the gas source very smoothly and quickly. However, the olfactory quadruped robot is trapped in the region of a local maximum. As shown in Figure 21, the gas concentration in this region is higher than the surrounding area. But after about seven wasted steps, it can leave that region and find the real source. This experiment demonstrates that our algorithm can find the global optimum (gas source location). Even if there is a local optimum, after several steps, the olfactory quadruped robot can detect anomalies and thus search for other regions. This is an intelligent behavior that some traditional gradient-based algorithms cannot do.

Gas concentration curves measured by the olfactory quadruped robot from different initial locations.
Exp. 2 performed very well and obtained a lower overhead rate because the plume could be tracked at the very beginning. In Exp. 2, our olfactory quadruped robot just wanders once near the very first obstacle. This is probably because the distance is too far and the airflow is more turbulent. During the rest of the experiment, the olfactory quadruped robot didn’t waste any extra steps and quickly found the gas source.
In Exp. 3, the olfactory quadruped robot would first bypass the nearby obstacle and then find the plume. It took about five steps to bypass the obstacle. After bypassing the obstacles and finding the plume, the olfactory quadruped robot was very efficient in finding the source of the gas. This kind of intelligent behavior is difficult to achieve with traditional algorithms. The gradient-based algorithm would easily hit the obstacle or be trapped in this region due to its difficulty in avoiding obstacles. For Surge-Cast algorithm, the behavior of the robot depends on the choice of threshold, where a small threshold will cause the robot to keep moving forward, while a large threshold will cause the robot to crash into the obstacle. Exp. 3 is a solid support of the superiority of our algorithm in scenarios with obstacles.
Exp. 4 was the most difficult part of our entire experiments. Locating gas sources from an upwind direction is often a difficult task for chemotaxis and anemotaxis algorithms. 40,41 As shown in Figure 21, the robot measured very low concentrations in the upwind position and relied entirely on the perception of airflow and obstacles. Locating the gas source under such conditions is a nearly impossible task for conventional algorithms. However, the olfactory quadruped robot is still able to avoid obstacles and find gas sources with a larger overhead rate in Exp. 4. The results illustrate that our algorithm can work properly when the target gas is sparse. This experiment can provide a theoretical basis and experimental rationale for the future large-scale gas source location tasks.
Conclusions
This article proposes an olfactory quadruped robot that can be applied to complex terrain for GSL tasks. The olfactory quadruped robot is equipped with e-noses and an ultrasonic anemometer, which can sense the gas concentration and airflow. To execute GSL tasks in diverse environments, a Dueling DQN-based GSL algorithm is proposed. The algorithm inputs trajectory, obstacles, concentration, and airflow data to output the next movement direction of the olfactory quadruped robot.
The adaptability of our algorithm has been evaluated in a set of simulation environments. These environments have different obstacles, airflow, and locations of gas sources. The olfactory quadruped robot starts from random positions and can find the gas source correctly and efficiently. Next, we conducted a real-world experiment in a complex environment with obstacles. The olfactory quadruped robot was able to find the gas source both from upwind and downwind. This demonstrates the robustness of our approach.
Future work will focus on generalizing our Dueling DQN-based GSL algorithm to more scenarios (e.g., high-dimensional, dynamic obstacles) and redesigning our olfactory quadruped robot to have better motion capabilities. Although obstacles were set to different heights in our experiments, exactly it still belongs to a 2D environment. Recent studies have proposed quadrupedal robots can have the ability to move in 3D environments by jumping or climbing, and if it can be applied to GSL tasks, it will greatly expand the used scenarios of GSL, which also means more challenges and opportunities.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China [grant numbers 61873197, 62073250, 62003249], National Student Innovation and Entrepreneurship Training Program (202110488007).
