Sage Journals: Discover world-class research

Abstract

Gas source localization is one of the most common applications of a gas-sensitive mobile robot. However, most of the existing work focuses on rule-based algorithms for wheeled robots, which are difficult to apply in complex terrain with obstacles. In this article, we propose an olfactory quadruped robot to perform the gas source localization task using the Dueling Deep Q-Network (Dueling DQN) algorithm. For training, we designed a set of environments and imported gas dispersion data from computational fluid dynamics (CFD) software to construct a simulator. The olfactory quadruped robot was trained in this simulator using the Dueling DQN algorithm to learn how to find the gas source. The trained neural network was then deployed on the olfactory quadruped robot. Our method has been tested in both simulation and real environments. The olfactory quadruped robot can traverse rugged terrain in real experiments and efficiently find gas sources, demonstrating that our method is highly robust and has important practical value.

Keywords

Robot olfaction gas source localization reinforcement learning Dueling DQN quadruped robot

Introduction

Gas source localization (GSL) is the task of finding a chemical gas source in a specific environment. Traditional GSL methods can be mainly divided into bio-detection¹ and networked electronic noses (e-noses).² Bio-detection involves searching for the gas source on the spot with portable handheld sensors or trained animals. The networked e-noses method entails pre-positioning a large number of olfactory sensors, collecting data, and determining the location of the gas source. These two methods have many limitations and are challenging to apply to sudden hazardous gas leaks. With the development of mobile robots, olfactory robot-based GSL has been developed.^3
–5 The olfactory robot is able to locate gas sources autonomously through the e-noses and some additional sensors such as anemometers,⁶ which has the advantages of low cost and no biological involvement.

Affected by airflow, gas molecules spread in space in plumes. Therefore, researchers typically decouple the GSL task into three subtasks: plume search, plume tracking, and source declaration,⁷ and then accomplish the GSL task by implementing these subtasks. Over the past decades, researchers have proposed a variety of GSL algorithms for specific environments. Ferri et al. use a bioinspired GSL algorithm in low airflow.⁸ Li et al. propose a real-time odor source localization method for robots using a particle filter algorithm during exploration.⁹ Neumann et al. design a gas-sensitive micro-drone with a bioinspired gas tracking algorithm.¹⁰ Hai-Feng et al. develop a chemical plume tracking method for autonomous underwater vehicles (AUVs) using partially observable Markov decision process (POMDP) and artificial potential field (APF) algorithms, showing improved stability and efficiency.¹¹ Monroy et al. estimate gas concentration distribution to localize gas sources.^12,13 Saska et al. propose a cooperative gas plume tracking method for micro-drone swarms using particle swarm optimization.¹⁴ Monroy et al. combine visual and chemical sensors to narrow the search area based on semantic relationships between objects and gases.¹⁵ These traditional algorithms rely on specific airflow conditions and environmental settings, lacking sufficient generality. This limitation poses significant inconvenience and difficulty for their widespread practical application.

Deep reinforcement learning (DRL) is a crucial branch of deep learning that has seen rapid development in recent years, often outperforming humans in various fields.^16,17 The difference between DRL and reinforcement learning (RL) is the policy-value function. The policy-value function of DRL is usually a deep neural network. By setting up a large number of different environments for training the neural network, we can get a robust policy. Deep Q-learning (DQN)^18,19 is a classical off-policy DRL algorithm. Dueling DQN²⁰ improves upon the DQN network structure, enabling faster learning and better convergence. The objective of Dueling DQN is to optimize the policy-value function through unsupervised learning of the agent–environment interaction, aiming to maximize the reward.

DRL has a wide range of application areas, but the related research on DRL-based GSL is still in the primary stage.^21
–23 However, related studies have initially demonstrated the superiority of DRL-based GSL. Using a learning approach to the GSL problem will no longer require a step-by-step solution. The end-to-end feature makes it easier to deploy without the need to tune a large number of parameters. DRL-based GSL can be applied to a wide variety of environments, including but not limited to outdoor, obstacle-laden, and stereo environments. Combined with our olfactory quadruped robot platform, the GSL task can be deployed in almost any scenario, which will break the limitations of traditional approaches and push GSL to real-world usage.

Since hazardous chemical gas leaks often occur in rugged and complex environments, and the distribution of gas molecules is highly unstable, the GSL task in real-world environment is particularly challenging. Most of the existing works on mobile robot-based GSL are difficult to be applied to realistic scenarios. This is not only because of the poor robustness of the existing algorithms but also because of the limitations of the robot platform, which makes it difficult to fully exploit the potential performance of the algorithms.

To enhance the application potential of GSL, we have utilized a quadruped robot equipped with olfactory capabilities as our platform. While most research in this area has focused on wheeled robots, which are limited to flat indoor environments due to their mechanical structure, quadruped robots offer distinct advantages. Quadruped robots can navigate unstructured and harsh environments, thanks to their walking capabilities. Additionally, they exhibit a wide range of velocities through various gait patterns, providing greater flexibility in motion control. Importantly, quadruped robots have minimal environmental impact compared to wheeled or tracked robots, which can damage surfaces and cause extensive environmental harm. This practicality adds significant value to our work. Nowadays, quadruped robots are widely used in complex tasks such as inspection of substations²⁴ or cable tunnels.²⁵ Considering the complexity of the GSL task, we believe that a quadruped robot is a better choice.

The purpose of our study is to reduce the GSL task’s dependence on specific scenarios. The main contribution of our study is the proposal of a GSL algorithm based on Dueling DQN. We deploy this algorithm on an olfactory quadruped robot equipped with e-noses and an anemometer. The Dueling DQN-based GSL algorithm enhances the olfactory quadruped robot’s environmental adaptability, and the robot platform further optimizes the algorithm’s performance. Together, they enable the deployment of the GSL task in typical real-world environments.

The rest of this article is organized as follows: Section “Dueling DQN-based GSL” details how the Dueling DQN algorithm is applied to the GSL task, including the design of the neural network, training, and simulation test. The sensors and controllers designed for the olfactory quadruped robot are presented in the section “Olfactory quadruped robot.”. Then, we deployed the trained neural network on our olfactory quadruped robot and tested GSL in a real experimental environment. The final experimental results are presented in the section “Experiment.”

Dueling DQN-based GSL

The GSL algorithms proposed in earlier studies were usually based on bionic optimization algorithm.²⁶ These algorithms search for the gas source through the concentration gradient but can lead to very steeply trajectories and have low success rates. Anemotaxis-based algorithms such as Surge-Cast²⁷ are very efficient when they are downwind of the gas source. However, the performance under other conditions is terrible. Besides, neither of these algorithms can be applied to scenarios with obstacles. To take advantage of our olfactory quadruped robot, we propose the Dueling DQN-based GSL algorithm, which is capable of locating gas source in environments with complex obstacles.

The main concept of the Dueling DQN-based GSL algorithm proposed in this article is transforming the GSL task into a learning problem and solve it by DRL. In addition to this, we center the map around the robot to solve the overfitting problem, so our algorithm does not rely on a specific environment and also performs well in unfamiliar environments.

The DQN algorithm was first applied to playing Atari games and has obtained scores that surpasses than human experts in several games.^18,19 The Dueling DQN is an improved version of the DQN algorithm.²⁰ Dueling DQN can help the agent to understand which states are more valuable. This allows the training process to converge faster.

To apply Dueling DQN to the GSL task, we approximate the GSL task as the Markov decision processes (MDPs), as shown in Figure 1. The olfactory quadruped robot samples the environment E to obtain the current state S_t . In the GSL task, S_t includes information about the trajectory, obstacles, concentration, and airflow. The action A_t is selected by a convolutional neural network (CNN). Thanks to the omnidirectional movement capability of our olfactory quadruped robot, the direction of movement can be set to any around the robot. The robot executes A_t in the environment and reaches the next state $S_{t + 1}$ and receives the reward R_t . The reward value at each time step t is defined as follows

R_{t} = {\begin{array}{l} 20 & if the robot arrives the gas source \\ - 10 & if the robot hits obstacles \\ - 0.1 P_{t} & otherwise \end{array}

where P_t represents the number of times the robot has come to its current position. The purpose of this punishment is to encourage the robot to explore more regions to avoid getting trapped in local optima.

Figure 1.

Markov decision process approximated from GSL. GSL: gas source localization.

Dueling DQN algorithm

Dueling DQN is one of the DRL algorithms. The purpose of the Dueling DQN algorithm is to obtain Q value, which is the evaluation probability distribution of an action. Dueling DQN splits the Q value into a value function $V (s; θ, β)$ and an advantage function $A (s, a; θ, α)$ , where $θ$ is the parameter of the shared convolutional layer, α and $β$ are the parameters of the advantage function and value function. The Q value is defined as follows

\begin{matrix} Q (s, a; θ, α, β) \\ = V (s; θ, β) + (A (s, a; θ, α) - \frac{1}{| A |} \sum_{a^{'}} A (s, a^{'}; θ, α)) \end{matrix}

The purpose of Dueling DQN is to continuously update an action-value function $Q (s, a)$ through the interaction of the agent with its environment. Once we have an accurate $Q (s, a)$ , we can select the most valuable action each time to maximize future rewards G_t . Future reward G_t is defined as follows

G_{t} = R_{t + 1} + γ R_{t + 2} + γ^{2} R_{t + 3} + \dots = \sum_{k = 0}^{T - t - 1} γ^{k} R_{t + k + 1}

where $γ \in (0, 1)$ is the discount rate, indicates how much importance the strategy places on long-term benefits.

The optimal action-value function $Q^{*} (s, a)$ is the maximum expectation of reward that can be achieved by any subsequent step after observing state $S_{t} = s$ and taking action $A_{t} = a$ .

Q^{*} (s, a) = max_{π} E [G_{t} | S_{t} = s, A_{t} = a, π]

$Q^{*} (s, a)$ satisfies the Bellman equation

Q^{*} (s, a) = max_{π} E [\sum_{k = 0}^{T - t - 1} γ^{k} r_{t + k + 1} | s, a, π]

The basic idea of many RL algorithms is to perform iterative updates using the Bellman equation, but this approach is difficult to implement in high-dimensional state space. So we use a function approximator $Q (s, a; θ, α, β) \approx Q^{*} (s, a)$ to estimate the optimal action-value function, where $θ$ is the parameter of the CNN, α and $β$ are the parameters of advantage function and value function. The loss function $L (θ_{i})$ is defined as follows

L (θ_{i}) = (y_{i} - Q (s, a; θ_{i}, α_{i}, β_{i} {))}^{2}

where $y_{i} = E [r + γ {max}_{a^{'}} Q (s^{'}, a^{'}; θ_{i - 1}, α_{i - 1}, β_{i - 1}) | s, a]$ is the target for iteration i. The Q-network is trained by minimizing the loss function $L (θ_{i})$ at each iteration i.

For the GSL task, we modified part of the Dueling DQN algorithm, and the modified pseudocode is shown in Algorithm 1.

Algorithm 1.

Dueling DQN for GSL.

Initialize replay memory D to capacity N

Initialize action-value function Q with random weights

θ, α, β

Initialize target action-value function

\hat{Q}

with the same weights as Q

for

episode = 1, M

Select a random environment E and apply a random transform to E

Random spawn robot in E and observer

ϕ_{1}

for

t = 1, T

With probability

ε

select a random movement a_t

otherwise select

a_{t} = {argmax}_{a} Q (ϕ (s_{t}), a; θ, α, β)

Execute movement a_t in E and observer reward r_t and state

ϕ_{t + 1}

Store transition

(ϕ_{t}, a_{t}, r_{t}, ϕ_{t + 1})

in D

Sample random minibatch of transitions

(ϕ_{j}, a_{j}, r_{j}, ϕ_{j + 1})

from D

ϕ_{j + 1}

is not terminal then

Set

y_{i} = r_{j} + γ {max}_{a^{'}} \hat{Q} (ϕ_{j + 1}, a^{'}; θ, α, β)

else

Set

y_{i} = r_{j}

end if

Perform an optimization step on

{(y_{i} - Q (ϕ_{j}, a_{j}; θ, α, β))}^{2}

end for

The termination conditions for each episode are as follows:

The robot reaches the source or a grid adjacent to the source.

The robot is about to hit a border or an obstacle.

The number of movement steps exceeds the preset value (This value is set to 450 during the training process.).

The environment design for RL is a very tedious process. To achieve more robustness in a certain number of scenarios, a random transformation is applied to the environment at the beginning of each episode, including transpose, flip, and rotate, and then randomly choose the spawn location of the robot. Random changes in the environment bring longer convergence times but yield a more robust policy.

In some similar work,²¹ applying policy to different scenarios often requires a certain amount of transfer learning. In this article, only one policy has been trained and applied to multiple scenarios. Our approach is to decouple the policy with the environment by limiting the observation area. The input of our policy consists of historical data in an $11 \times 11$ grid centered on the olfactory quadruped robot itself and then outputs the direction of movement. This trick worked very well in a series of tests in both GADEN and real environments.

To perform the convolution operation on the raw data, we define the input layer as a $5 \times 11 \times 11$ array included the history data of trajectory, obstacles, concentration, and airflow around the robot so far (The airflow is decomposed into x and y directions, so it contains two matrices.). Since different units of gas concentration can lead to differences of scale, we adopted a min-max normalization operation for the concentration matrix. The details of input layers are shown in Figure 2. Note that for regions outside the map range, a padding operation is required. The obstacles layer is padded with 1 and the other layers are padded with 0.

Figure 2.

The olfactory quadruped robot is located in the center of all matrices. (a) Trajectory is the first matrix, in which the value of each cell is the number of times the robot passing the corresponding position. (b) The obstacles matrix is cropped from the map, the obstacle regions marked with 1, other regions are 0. (c)–(e) indicate the concentration and airflow data measured by the robot. The latest measurements will overwrite the old data.

CNN is used as the backbone of the dueling neural network because of its powerful ability to extract features. The structure of the dueling neural network is shown in Figure 3. The features extracted by the convolutional network are then fed into the fully connected neural networks $V (s; θ, β)$ and $A (s, a; θ, α)$ to obtain the state-value and advantages. The output layer of the neural network joins the state-value and advantages through equation (2). Each value of the output layer denotes the expected cumulative reward of the corresponding movement direction. The robot chooses its movement direction by the $ε$ -greedy method and then reaches the next state.

Figure 3.

Dueling Deep Q-Network designed for GSL task. GSL: gas source localization.

Training

To train the neural network, we designed a set of environments and presimulated the gas dispersion by CFD software. Figure 4 shows our training environment. Algorithm proposed in this article is not coupled with the size and shape of the environment, but for convenience, the size of each training environment is set to 15 m × 15 m and then gridded into 900 cells size of 0.5 m × 0.5 m.

Figure 4.

Training environments: (a)–(d) shows the molar concentration of each environment; (e)–(h) shows the airflow of each environment.

We used Pytorch, which can automatically calculate gradients, as the training framework, Adam as the optimizer. To speed up the training process, we use the Tianshou framework to sample the environment in parallel. Some hyperparameters during the training process are set as shown in Table 1. To help the neural network learn more deeply, we set a very small learning rate α and a large discount ratio $γ$ . The initial value of $ε$ is set to 1.0 and decreases linearly to 0.05 during the training process, which encourages the robot to explore more regions in the beginning stages of training. ( $ε$ is set to 0 when the training is completed.)

Table 1.

Hyperparameters in the training process.

α	$γ$	Memory size	Batch size
0.0001	0.95	20,000	64

The curve of reward and step length during the training process is shown in Figure 5. Step length represents the number of steps used to find the gas source (regardless of whether the result was successful or not). At the beginning of the training, the robot would quickly hit the wall, causing the episode to terminate and having a small step length. After 10,000 steps of training, the robot learned to avoid obstacles, but it was trapped and difficult to find the gas source, which leads to a very large step length. After 400,000 steps, the step length gradually decreases and converges, which means that the robot is able to find the gas source efficiently at this moment.

Figure 5.

Training curve of the reward and step length.

GADEN test

GADEN²⁸ is a gas dispersion simulator widely used in robot olfaction (RO). We have developed a simulation platform for olfactory quadruped robots based on the GADEN project and implemented basic motion and perception functions in the simulator. The olfactory quadruped robot measures the gas concentration by a virtual metal oxide (MOX) or photo ionization (PID) gas detector in the simulator and the airflow vector by a virtual ultrasonic anemometer.

To demonstrate the robustness and versatility of our algorithm, we conducted the GADEN simulation experiments to test our algorithm, we constructed a set of environments, including those in and out of the training set, and tested the GSL task. We also compared some commonly used GSL algorithms in GADEN simulator. It should be noted that these compared algorithms cannot be used in complex environments with obstacles, so we only compare their success rate and efficiency in a simple blank scenarios.

To perform the GSL simulation test, we model the scenarios by CAD software and import the inner mesh into CFD software to obtain airflow data. For the next step, the airflow data are fed into GADEN filament simulator to run the gas dispersal simulation. Once the simulation is done, the GSL task is ready to be tested on our simulation platform for olfactory quadruped robots.

The GSL tests were conducted in different environments with different settings (gas source location, airflow speed, and filaments release speed). The final results are shown in Figure 6. It is worth pointing out that for unfamiliar environments that the olfactory quadruped robot has never seen before (environments that are not included in the training environments), the algorithm proposed in this article is still able to find the gas source, which demonstrates the adaptability of our algorithm to different environments. Based on the GADEN test result, practical deployment of our algorithm is possible without transfer learning.

Figure 6.

(a)–(d) are the environments used for training. (e) and (f) are the test environments robot has never seen before. The green point cloud is the plume of ethanol gas, and the black line is a sampling of the odometer data. The red flower marks the location of the gas source. The yellow triangle marks the initial position.

Most of the classical algorithms cannot be applied with obstacles. To compare the proposed algorithm with some classical algorithms, we set up a 15 m × 15 m empty scenario for comparison, which is shown in Figure 7. The algorithm used for comparison is gradient climbing⁵ from chemotaxis and Surge-Cast²⁷ from anemotaxis. The gradient climbing algorithm finds the gas source by continuously moving toward a higher concentration. The Surge-Cast is a novel GSL algorithm which moves the robot in the plume straight upwind until it loses the plume and then moves crosswind to find plume.

Figure 7.

Empty space for algorithms comparison. The blue wall is the wing inlet, the red wall is the wing outlet. The red flower marks the location of the gas source.

All three algorithms are tested 900 times in the simulator and the results are shown in Table 2. The Dueling DQN-based GSL algorithm proposed in this article achieved the highest success rate. This means that our algorithm can find the gas source at almost any location. The success rate of Surge-Cast is second only to our algorithm. Most of the failed instances of the Surge-Cast algorithm occur upwind. However, our algorithm still performs well upwind. Gradient climbing algorithm has the lowest success rate because the distribution of gases is often not satisfactory, so the gradient climbing algorithm can easily fall into a local optimum and lead to a failure result.

Table 2.

Success rate of the GSL algorithms.

Gradient climbing	Surge-Cast	Dueling DQN
32.11%	72.56%	98.67%

GSL: gas source localization.We also compared the overhead rate of these three algorithms, which denotes the distance traveled from start-up to find the gas source divided by the shortest distance. As shown in Figure 8, the Dueling DQN algorithm has the largest overhead rate, which means that the Dueling DQN algorithm will search more regions autonomously. This brings a higher success rate. The gradient climbing algorithm has a very steep trajectory because it needs to move along the concentration gradient. This also leads to its high overhead rate and low success rate. The success rate of the Surge-Cast algorithm is not high, but the overhead rate is very low, which means this algorithm is very efficient in some specific environments.

Figure 8.

Overhead rate of different algorithms, excluding failed instances.

Deployment

Scenes for GSL tasks often have complex structures and different sizes. To deploy the trained neural network on real robots, some modifications to the computational process are required. During the training process, we define the observation as a dense matrix of fixed size for computational efficiency. When deployed, since the sampled data from the sensors are sparse and the map size is indeterminate, we first store the data in a continuous memory space and construct the observation matrix for every inference process later.

We typically deploy neural networks on edge computing devices, such as the Movidius (Intel) and Jetson series (NVIDIA). Considering the level of integration, the device chosen for this study is the Jetson Nano. We export the weights and structure of the neural network to an open neural network exchange (ONNX) format, the ONNX format offers the possibility of deployment on different devices for further study. We also package all the programs into robot operating system (ROS) packages, which facilitates the software distribution and containerized deployment.

Olfactory quadruped robot

Nowadays, quadruped robot technology has been well developed. However, there is still no research that combines RO technology with quadruped robots. Since gas leak sources are often located in narrow and winding environments, the olfactory robot requires a smaller size and needs to move flexibly in the environment. More importantly, the floor of the GSL task is not always smooth, which means it must have the ability to adapt to uneven surfaces. Traditional wheeled robots often have difficulty collecting the data needed for GSL algorithm due to their inability to move over uneven ground. This can result in some key data loss, further increasing the difficulty of locating the gas source. In addition to this, many sensors, such as anemometers, cannot be tilted during measurement, which places a demand on the robot’s ability to adjust its body posture.

The quadruped robot is able to move over uneven ground while adjusting its body posture to ensure the sensors work properly. All of these functions are difficult to achieve for wheeled robots. To overcome the drawbacks of the wheeled robots, an olfactory quadruped robot is proposed as the executor for the GSL task, and its appearance is shown in Figure 9.

Figure 9.

Olfactory quadruped robot designed for GSL tasks. GSL: gas source localization.

Hardware design

There is already a lot of research on quadruped robots, and many commercial quadruped robots have emerged. Existing quadruped robots often rely on their advanced actuators and sensors, resulting in unaffordable prices and maintenance difficulties. The olfactory quadruped robot proposed in this article is simplified in its mechanical and circuit design to make it easier to be reproduced. Meanwhile, our modular design also provides more potential possibilities for other application scenarios.

The olfactory quadruped robot uses the STM32F407 (STMicroelectronics) as its microcontroller to run our motion algorithm to control 12 high-precision digital servos and collect data from the e-noses and anemometers. The servo is driven by a servo control board with 18 output channels, which has a quarter-microseconds resolution and a frequency up to 333 Hz. To meet the requirements of neural network inference, we used Jetson Nano as the upper layer computing platform and implemented a high-speed data link through the USB Virtual Com Port (VCP) provided by the microcontroller. We also modified the rosserial project to establish the mapping of microcontroller resources to the ROS layer via VCP. This allows us to read sensor data or control the robot as if we were accessing a local process. The hardware schematic is shown in Figure 10.

Figure 10.

Hardware connection diagram of the olfactory quadruped robot.

The olfactory quadruped robot has a total of 12 degrees of freedom, which means it not only has the capability of omnidirectional movement but can also maintain balance by adjusting its body pose. Most of the components of the olfactory quadruped robot are manufactured with three-dimensional (3D) printing. This allows us to easily adjust some parameters of the mechanical structure. Some key parameters of the robot are shown in Table 3.

Table 3.

The key parameters of the olfactory quadruped robot.

Height	Upper leg	Lower leg	Mass	Max velocity
347.8 mm	64.2 mm	95.4 mm	3.3 kg	0.4 m/s

Sensors

The olfactory quadruped robot, as a mobile robot, is equipped with basic motion sensors including attitude and heading reference system (AHRS), visual odometer, and light detection and ranging (LiDAR). These sensors provide the robot with the most basic environmental awareness capabilities and allow easy access to the ROS navigation stack. Combined with some popular ROS packages, such as move_base and cartographer, the olfactory quadruped robot can implement the mapping and navigation functions. Also the plugin-based architecture of the ROS navigation stack allows us to change different sensors or perform sensor fusion.

The perception of e-noses is based on their inside olfactory sensors. The sensor device detects the physical and/or chemical changes incurred by gas molecules, and these changes are measured as an electrical signal.²⁹ Commonly used gas sensors on mobile robots include MOX, semiconductor, and infrared.³⁰ These sensors are sensitive to different gases and have different performances. Usually, to manufacture a general-purpose e-nose, an array of different sensors is required.³¹

The olfactory quadruped robot has the ability to detect gas concentration by using its e-noses. In practice, the e-nose for a specific gas can be changed according to demand, thus enabling the traceability of different gases. In this article, harmless CO₂ gas is used as the gas source considering experimental safety factors. The COZIR (Gas Sensing Solutions Ltd) infrared CO₂ sensor is designed to monitor carbon dioxide levels indoors. It has the feature of high measurement frequency and its noise measurement is less than 10 ppm. Considering the performance requirements of robustness and response time, this article selects COZIR CO₂ gas sensors as the e-noses.

The olfactory quadruped robot is also equipped with a small ultrasonic wind speed and direction sensor (VEMSEE: PR-3003-FSXCS-N01/4G-*) as its anemometer. Since the speed of sound transmission in the air is superimposed on the speed of airflow, ultrasonic sensors can measure wind speed and direction by the time difference method. The ultrasonic wind speed and direction sensor is lighter and more sensitive than the traditional mechanical sensor. It can convert wind speed and direction into analog output, which means we can easily read the data using the ADC chip (ADS1115) with the microcontroller.

Motion control

The controller design of the quadruped robot is inspired by the MIT Cheetah.^32,33 The controller of the quadruped robot is divided into three layers, including the body controller, leg controller, and joint controller. The block diagram of the system architecture is shown in Figure 11. The olfactory quadruped robot translates desired posture and velocity into foot positions via body controller and leg controller. The state estimator gives an estimate of the body pose from the feedback of sensors, thus forming a closed loop allowing the robot to keep its balance during motion.

Figure 11.

System architecture block diagram. The upper level control commands include velocity and posture targets (green). These targets are translated into joint angles by the controllers (red) and sent to robot. The body posture is estimated by reading the data from sensors (blue), thus forming a closed loop to keep the body balance.

The motion control of the robot is mainly implemented by the leg controller, which generates the trajectory of feet through Bessel curves to reduce the impact and energy consumption.³⁴

The foot motion of the quadruped robot can be divided into the swing phase and stance phase. The trajectory of the swing phase is generated by a 2D Bessel curve of 12 control points.

p_{i}^{sw} (t) = \sum_{k = 0}^{n} c_{k} B_{k}^{n} (S_{i}^{sw} (t))

where $S_{i}^{sw} \in [0, 1]$ is the swing phase signal, $B_{k}^{n} (S_{i}^{sw} (t))$ is the Bernstein polynomial of degree n, $(n + 1)$ is the number of control points, and c_k is the kth control point.

The trajectory of the stance phase is shown by the following equation

{\begin{array}{l} p_{i, x}^{st} (t) = L_{span} (1 - 2 S_{i}^{st} (t)) + P_{0, x} \\ p_{i, y}^{st} (t) = δ cos (\frac{π}{2 L_{span}} p_{i, x} (t)) + P_{0, y} \end{array}

where $S_{i}^{st} \in [0, 1]$ is the stance phase signal, $δ$ is half of the stroke length, and $L_{span}$ and P ₀ are identically set to the values of the swing phase trajectory.

The control of forwarding velocity is implemented according to the Raibert-Heuristic equation³⁵

L_{span} = \frac{v T_{st}}{2} + K_{v} (v - v_{d})

where v_d is the desired forward speed, v is the forward speed feedback, and K_v is the feedback gain.

To help the quadruped robot walk faster and more stable, the trajectory-related parameters need to be modified iteratively. The details of parameter settings are not the focus of this article.

Since most of our sensors are fixed on the body of the olfactory quadruped robot. And some sensors, such as the ultrasonic wind direction sensor, need to be kept level during the measurement process. This means we need to keep the body of our robot balanced during sensor sampling, otherwise it will cause interference to the sampled data. Control of body posture can be implemented by controlling the positions of the foot reference points, which means the positions of the feet when the robot is standing stationary. A PID controller was implemented to perform the balancing control of the robot, which inputs the roll and pitch data from IMU fixed on robot body to calculate the error, and finally obtains the foot reference positions by kinematics inverse operation.

Finally, we conducted a simple test of the motion function of the olfactory quadruped robot. We manually move the robot through the remote controller as shown in Figure 12, meanwhile measure the change in roll and pitch angle of its body link through the attached IMU. The body controller performed well during our motion test. As shown in Figure 13, the effect of the body controller is not particularly obvious when the olfactory quadruped robot is moving forward. However, when moving laterally or rotating, the group with the body controller enabled will walk more smoothly, meanwhile have less roll and pitch angle variation. This is of great significance for the olfactory quadruped robot to walk on uneven ground and ensure the precision of the sensor measurements, thus allowing more accurate data to be fed into the neural network.

Figure 12.

Motion testing of the olfactory quadruped robot.

Figure 13.

The motion test result of the olfactory quadruped robot. Two groups are included: those with the body controller enabled and those without. The roll and pitch change curves of the robot body were recorded by attached IMU. (a) Move forward, (b) move backward, (c) move left, (d) move right, (e) counterclockwise rotation, and (f) clockwise rotation.

Experiment

To demonstrate the suitability of our approach under real-world conditions, we conducted a real experiment in an indoor environment with obstacles and uneven ground and analyzed the performance and trajectory carried out by the olfactory quadruped robot. The experimental obstacles are wooden cubic blocks anchored to the ground, and the uneven ground is a terrain with several prominences. The experimental environment is filled with artificial airflow using fans and with a CO₂ gas source inside. Commonly used CO₂ gas sources are liquefied CO₂ gas canisters or dry ice. For the use of gas canisters is not convenient and risky, we use the dry ice sublimation to produce CO₂ gas.

To test the performance of our GSL algorithm in scenarios with obstacles and to verify the ability of the olfactory quadruped robot to traverse rugged terrain, we designed the experimental environment as shown in Figure 14. The field for the experiment was a $4.48 m \times 8.08 m$ area enclosed by several wooden planks. During the experiment, the olfactory quadruped robot needs to find our gas source through the heavy obstacles in this field.

Figure 14.

The global view of the experimental environment. The initial position is marked with magenta circles. Blue arrows indicate the direction of airflow.

Experimental setup

Before performing the GSL task, the olfactory quadruped robot needs to obtain the occupancy map of the ambient environment by running a simultaneous localization and mapping^36,37 algorithm. With an occupancy map, the robot can obtain its current location by localization algorithm such as adaptive Monte Carlo localization.³⁸ These algorithms are not the focus of this article, so we assume that the occupancy map is already known. Next, we dilate the occupancy map to avoid the robot scraping against obstacles or walls and finally transform it into a grid as the obstacles layer of the neural network input.

Once the map is constructed, we place the gas source and the olfactory quadruped robot in the preset location and start releasing CO₂ gas. After 30s of gas dispersion, the indoor CO₂ distribution is relatively stable. At this point, we launch the olfactory quadruped robot and execute the GSL task.

In the experiment, we fix the position of the gas source and let the olfactory quadruped robot execute the GSL task from different initial positions. The initial positions are set up by considering the location of the gas source, the direction of airflow, and the distribution of obstacles. The details of the initial position are set as follows.

Position 1 is a corner at the downwind direction with a low initial gas concentration and requires crossing all the complex terrain to reach the vicinity of the gas source.

Position 2 is also downwind of the gas source, but the airflow is parallel to the line of position 2 and the gas source, which means a higher initial concentration and easier tracking of the plume.

Position 3 is located between the obstacle and the map boundary with a low initial gas concentration, and this position places high demands on the capability of obstacle avoidance.

Position 4 is in the upwind direction, which means that the gas concentration is approximately zero during most of the search, and the only data the algorithm can rely on is airflow.

To test the performance of our algorithm, four experiments were conducted, with Exp. 1–4 corresponding to the initial positions 1–4. The procedure of the experiments is approximately the same. During the experiment, we record the trajectory of the robot and the gas concentration data measured by the e-nose for further analysis.

Results and discussion

The olfactory quadruped robot finds the gas source autonomously from different initial positions. We use RVIZ to record and visualize its trajectory, and the result is shown in Figures 15 –18. We also calculated the curve of the distance between the olfactory quadruped robot, which is shown in Figure 19. A distance of 0 indicates that the olfactory quadruped robot reaches the location of the gas source. The curve shows that our GSL process is quite smooth and efficient. Since we divide the environment into grids, and it is considered that the GSL task is successful only when it reaches the neighboring grids, the accuracy of our GSL algorithm is within 1.0 m.

Figure 15.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 1. (a) Panorama and (b) trajectory.

Figure 16.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 2. (a) Panorama and (b) trajectory.

Figure 17.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 3. (a) Panorama and (b) trajectory.

Figure 18.

Record the olfactory quadruped robot’s trajectory through rviz in Exp. 4. (a) Panorama and (b) trajectory.

Figure 19.

The curve of the distance between the olfactory quadruped robot and the gas source during the experiment.

Precisely the quadruped robot can always find the gas source as long as it is given an infinite amount of time, but this means the efficiency is very low. So we propose the overhead rate to describe the efficiency of GSL. Overhead rate denotes the distance traveled from start-up to find the gas source divided by the shortest distance. The shortest distance is given by the A* algorithm.³⁹ The overhead rate results are shown in Table 4.

Table 4.

Experimental results under different settings.

No.	Initial location	Total step	Overhead rate
1	(0.5, 7)	26	1.86
2	(2, 7)	19	1.36
3	(4, 4)	14	1.17
4	(4, 0.5)	13	2.17

Experiments demonstrate the capability of our method to find gas sources in complex environments with obstacles. For larger obstacles, the olfactory quadruped robot will add them to the occupancy map for localization and obstacle avoidance. During our four experiments, the olfactory quadruped robot was able to step over the uneven ground and maintain balance. Figure 20 shows the entire process of the olfactory quadruped robot crossing the uneven ground region. Common wheeled robots cannot pass through this region, and this kind of obstacle is usually difficult to be detected by LiDAR, thus traditional wheeled robots are difficult to handle it. Once the wheeled robot is accidentally caught in it, it will be very difficult to break out. But this region has less impact on a quadruped robot because it walks on its legs. In fact, the ground where GSL is applied is not necessarily level, such as factories and forests. The common wheeled robots are difficult to perform GSL in such places even with the state of art algorithms. The ability to perform GSL tasks in these environments is precisely the innovation of the olfactory quadruped robot proposed in this article.

Figure 20.

Olfactory quadruped robot crosses uneven ground and maintains balance.

Our experimental environment is not included in the training environment. In such an environment, the olfactory quadruped robot can correctly find the gas source, which means that the proposed Dueling DQN-based GSL can adapt well to the unfamiliar environment, and also indicates that there is no overfitting problem in our model. This demonstrates that we are able to apply the olfactory quadruped robot to new scenarios to perform GSL tasks without retraining the model. If the model needs to be retrained every time, the olfactory quadruped robot is deployed in a new scenario, this will greatly increase the workload and cost. With all this in mind, our algorithm will provide great convenience if our olfactory quadruped robot needs to be deployed in real-world application scenarios.

To demonstrate the performance of our algorithm in different situations, we set up four initial positions in the experiment, three of which were located downwind and one upwind. For positions 1 and 2, the olfactory quadruped robot can quickly find the plume and follow it to the gas source. In position 3, the olfactory quadruped robot will have to go around obstacles to find the plume, which is a rigorous test of the obstacle avoidance performance of our algorithm. Position 4 is located upwind, and GSL from an upwind position is extremely challenging for the existing GSL algorithms, but finding the gas source in this situation is exactly the advantage of our algorithm.

At the beginning of Exp. 1, the olfactory quadruped robot approached the gas source very smoothly and quickly. However, the olfactory quadruped robot is trapped in the region of a local maximum. As shown in Figure 21, the gas concentration in this region is higher than the surrounding area. But after about seven wasted steps, it can leave that region and find the real source. This experiment demonstrates that our algorithm can find the global optimum (gas source location). Even if there is a local optimum, after several steps, the olfactory quadruped robot can detect anomalies and thus search for other regions. This is an intelligent behavior that some traditional gradient-based algorithms cannot do.

Figure 21.

Gas concentration curves measured by the olfactory quadruped robot from different initial locations.

Exp. 2 performed very well and obtained a lower overhead rate because the plume could be tracked at the very beginning. In Exp. 2, our olfactory quadruped robot just wanders once near the very first obstacle. This is probably because the distance is too far and the airflow is more turbulent. During the rest of the experiment, the olfactory quadruped robot didn’t waste any extra steps and quickly found the gas source.

In Exp. 3, the olfactory quadruped robot would first bypass the nearby obstacle and then find the plume. It took about five steps to bypass the obstacle. After bypassing the obstacles and finding the plume, the olfactory quadruped robot was very efficient in finding the source of the gas. This kind of intelligent behavior is difficult to achieve with traditional algorithms. The gradient-based algorithm would easily hit the obstacle or be trapped in this region due to its difficulty in avoiding obstacles. For Surge-Cast algorithm, the behavior of the robot depends on the choice of threshold, where a small threshold will cause the robot to keep moving forward, while a large threshold will cause the robot to crash into the obstacle. Exp. 3 is a solid support of the superiority of our algorithm in scenarios with obstacles.

Exp. 4 was the most difficult part of our entire experiments. Locating gas sources from an upwind direction is often a difficult task for chemotaxis and anemotaxis algorithms.^40,41 As shown in Figure 21, the robot measured very low concentrations in the upwind position and relied entirely on the perception of airflow and obstacles. Locating the gas source under such conditions is a nearly impossible task for conventional algorithms. However, the olfactory quadruped robot is still able to avoid obstacles and find gas sources with a larger overhead rate in Exp. 4. The results illustrate that our algorithm can work properly when the target gas is sparse. This experiment can provide a theoretical basis and experimental rationale for the future large-scale gas source location tasks.

Conclusions

This article proposes an olfactory quadruped robot that can be applied to complex terrain for GSL tasks. The olfactory quadruped robot is equipped with e-noses and an ultrasonic anemometer, which can sense the gas concentration and airflow. To execute GSL tasks in diverse environments, a Dueling DQN-based GSL algorithm is proposed. The algorithm inputs trajectory, obstacles, concentration, and airflow data to output the next movement direction of the olfactory quadruped robot.

The adaptability of our algorithm has been evaluated in a set of simulation environments. These environments have different obstacles, airflow, and locations of gas sources. The olfactory quadruped robot starts from random positions and can find the gas source correctly and efficiently. Next, we conducted a real-world experiment in a complex environment with obstacles. The olfactory quadruped robot was able to find the gas source both from upwind and downwind. This demonstrates the robustness of our approach.

Future work will focus on generalizing our Dueling DQN-based GSL algorithm to more scenarios (e.g., high-dimensional, dynamic obstacles) and redesigning our olfactory quadruped robot to have better motion capabilities. Although obstacles were set to different heights in our experiments, exactly it still belongs to a 2D environment. Recent studies have proposed quadrupedal robots can have the ability to move in 3D environments by jumping or climbing, and if it can be applied to GSL tasks, it will greatly expand the used scenarios of GSL, which also means more challenges and opportunities.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the National Natural Science Foundation of China [grant numbers 61873197, 62073250, 62003249], National Student Innovation and Entrepreneurship Training Program (202110488007).

ORCID iDs

Yu He

Lei Cheng

References

Wilson

Hoyt

Janata

, et al. Chemical sensors for portable, handheld field instruments. IEEE Sens J 2001; 1(4): 256–274.

Matthes

Groll

Keller

. Optimal weighting of networked electronic noses for the source localization. In: 2005 Systems communications (ICW’05, ICHSN’05, ICMCS’05, SENET’05), Montreal, QC, Canada, 14–17 August 2005, pp. 455–460. Montreal, QC, Canada: IEEE.

Widyantara

Rivai

Purwanto

. Gas source localization using an olfactory mobile robot equipped with wind direction sensor. In: 2018 International conference on computer engineering, network and intelligent multimedia (CENIM), Surabaya, Indonesia, 26–27 November 2018, pp. 66–70. Surabaya, Indonesia: IEEE.

Cheng

Wang

, et al. Research on bionic rotorcraft robot based olfactory detection. In: 2017 Second international conference on advanced robotics and mechatronics (ICARM), Hefei and Tai’an, China, 27–31 August 2017, pp. 289–293. Hefei and Tai’an, China: IEEE

Rozas

Morales

Vega

. Artificial smell detection for robotic navigation. In: Fifth international conference on advanced robotics’ robots in unstructured environments, Pisa, Italy, 19–22 June 1991, pp. 1730–1733. Pisa, Italy: IEEE.

Rahardi

Rivai

Purwanto

. Implementation of hot-wire anemometer on olfactory mobile robot to localize gas source. In: 2018 International conference on information and communications technology (ICOIACT), Yogyakarta, Indonesia, 6–7 March 2018, pp. 412–417. Yogyakarta, Indonesia: IEEE.

Chen

Huang

. Odor source localization algorithms on mobile robots: a review and future outlook. Robot Auton Syst 2019; 112: 123–136.

Ferri

Caselli

Mattoli

, et al. SPIRAL: a novel biologically-inspired algorithm for gas/odor source localization in an indoor environment with no strong airflow. Robot Auton Syst 2009; 57(4): 393–402.

Meng

Wang

, et al. Odor source localization using a mobile robot in outdoor airflow environments with a particle filter algorithm. Auton Robots 2011; 30(3): 281–292.

10.

Neumann

Hernandez Bennetts

Lilienthal

, et al. Gas source localization with a micro-drone using bio-inspired and particle filter-based algorithms. Adv Robot 2013; 27(9): 725–738.

11.

Hai-Feng

Wei

, et al. Underwater chemical plume tracing based on partially observable Markov decision process. Int J Adv Robot Syst 2019; 16(2): 1729881419831874.

12.

Monroy

Blanco

Jiménez

. Time-variant gas distribution mapping with obstacle information. Auton Robots 2016; 40(1): 1–16.

13.

Monroy

Gonzalez-Jimenez

Sanchez-Garrido

. Monitoring household garbage odors in urban areas through distribution maps. In: SENSORS, 2014 IEEE, Valencia, Spain, 2–5 November 2014, pp. 1364–1367. Valencia, Spain: IEEE.

14.

Saska

Langr

Přeučil

. Plume tracking by a self-stabilized group of micro aerial vehicles. In: Hodicky

(ed.) International workshop on modelling and simulation for autonomous systems, pp. 44–55. Rome, Italy: Springer, Cham.

15.

Monroy

Ruiz-Sarmiento

Moreno

, et al. A semantic-based gas source localization with a mobile robot combining vision and chemical sensing. Sensors 2018; 18(12): 4174.

16.

Silver

Huang

Maddison

, et al. Mastering the game of go with deep neural networks and tree search. Nature 2016; 529(7587): 484–489.

17.

Berner

Brockman

Chan

, et al. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:191206680, 2019.

18.

Mnih

Kavukcuoglu

Silver

, et al. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:13125602, 2013.

19.

Mnih

Kavukcuoglu

Silver

, et al. Human-level control through deep reinforcement learning. Nature 2015; 518(7540): 529–533.

20.

Wang

Schaul

Hessel

, et al. Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd international conference on machine learning, PMLR, New York, USA, 20–22 June 2016, Vol. 48, pp. 1995–2003.

21.

Chen

Huang

. A Deep Q-network for robotic odor/gas source localization: modeling, measurement and comparative study. Measurement 2021; 183: 109725.

22.

Song

Chen

. Plume tracing via model-free reinforcement learning method. IEEE Trans Neural Netw Learn Syst 2019; 30(8): 2515–2527.

23.

Hayes

Martinoli

Goodman

. Swarm robotic odor localization: off-line optimization and validation with real robots. Robotica 2003; 21(4): 427–441.

24.

Xiao

, et al. Design of a quadruped inspection robot used in substation. In: 2021 IEEE fourth advanced information management, communicates, electronic and automation control conference (IMCEC), Chongqing, China, 18–20 June 2021, pp. 766–769. Chongqing, China: IEEE.

25.

Zhou

Wang

Chen

, et al. Automatic inspection method of cable tunnel in complex environment based on quadruped robot. In: 2021 IEEE third international conference on frontiers technology of information and computer (ICFTIC), Greenville, SC, USA, 12–14 November 2021, pp. 599–603. Greenville, SC, USA: IEEE.

26.

Shi

Sun

. Robots active olfaction based on improved genetic algorithm. In 2013 Fifth international conference and computational intelligence and communication networks, Mathura, India, 27–29 September 2013, pp. 622–625. Mathura, India: IEEE.

27.

Waphare

Gharpure

Shaligram

, et al. Implementation of 3-nose strategy in odor plume-tracking algorithm. In 2010 International conference on signal acquisition and processing, Bangalore, India, 9–10 February 2010, pp. 337–341. Bangalore, India: IEEE.

28.

Monroy

Hernandez-Bennetts

Fan

, et al. GADEN: a 3d gas dispersion simulator for mobile robot olfaction in realistic environments. Sensors 2017; 17(7): 1479.

29.

Arshak

Moore

Lyons

, et al. A review of gas sensors employed in electronic nose applications. Sens Rev 2004; 24(2): 181–198.

30.

Dinh

Choi

Son

, et al. A review on non-dispersive infrared gas sensors: improvement of sensor detection limit and interference correction. Sens Actuators B Chem 2016; 231: 529–538.

31.

Cheng

Meng

Lilienthal

, et al. Development of compact electronic noses: a review. Meas Sci Technol 2021; 32(6): 062002.

32.

Lee

. Hierarchical controller for highly dynamic locomotion utilizing pattern modulation and impedance control: implementation on the MIT Cheetah robot. PhD Thesis, Massachusetts Institute of Technology, 2013.

33.

Bledt

Powell

Katz

, et al. MIT cheetah 3: design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), Madrid, Spain, 1–5 October 2018, pp. 2245–2252. Madrid, Spain: IEEE.

34.

Sun

Hua

, et al. Modeling and analysis on low energy consumption foot trajectory for hydraulic actuated quadruped robot. Int J Adv Robot Syst 2021; 18(6): 17298814211062006.

35.

Schwind

Koditschek

. Control of forward velocity for a simplified planar hopping robot. In Proceedings of 1995 IEEE international conference on robotics and automation, Nagoya, Japan, 21–27 May 1995, pp. 691–696. Nagoya, Japan: IEEE.

36.

Filipenko

Afanasyev

. Comparison of various SLAM systems for mobile robot in an indoor environment. In: 2018 International conference on intelligent systems (IS), Funchal, Portugal, 25–27 September 2018, pp. 400–407. Funchal, Portugal: IEEE.

37.

Quan

Zhang

. A novel mobile robot navigation method based on deep reinforcement learning. Int J Adv Robot Syst 2020; 17(3): 1729881420921672.

38.

Peng

Zheng

, et al. An improved AMCL algorithm based on laser scanning match in a complex and unstructured environment. Complexity 2018; 2018(5): 1–11.

39.

Costa

Silva

. A survey on path planning algorithms for mobile robots. In: 2019 IEEE international conference on autonomous robot systems and competitions (ICARSC), Porto, Portugal, 24–26 April 2019, pp. 1–7. IEEE.

40.

Gongora

Monroy

Gonzalez-Jimenez

. Porto, Portugal: Gas source localization strategies for teleoperated mobile robots: an experimental analysis. In: 2017 European conference on mobile robots (ECMR), Paris, France, 6–8 September 2017, pp. 1–6. Paris, France: IEEE.

41.

Ojeda

Monroy

Gonzalez-Jimenez

. An evaluation of gas source localization algorithms for mobile robots. In: Proceedings of the third international conference on applications of intelligent systems, Las Palmas de Gran Canaria, Spain, 7–9 January 2020, pp. 1–6. New York, NY, USA: ACM.

Gas source localization using Dueling Deep Q-Network with an olfactory quadruped robot

Abstract

Keywords

Introduction

Dueling DQN-based GSL

Dueling DQN algorithm

Training

GADEN test

Deployment

Olfactory quadruped robot

Hardware design

Sensors

Motion control

Experiment

Experimental setup

Results and discussion

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References