Underwater chemical plume tracing based on partially observable Markov decision process

Abstract

Chemical plume tracing based on autonomous underwater vehicle uses chemical as a guidance to navigate and search in the unknown environments. To solve the key issue of tracing and locating the source, this article proposes a path-planning strategy based on partially observable Markov decision process algorithm and artificial potential field algorithm. The partially observable Markov decision process algorithm is used to construct a source likelihood map and update it in real time with environmental information from the sensors on autonomous underwater vehicle in search area. The artificial potential field algorithm uses the source likelihood map for accurately planning tracing path and guiding the autonomous underwater vehicle to track along the path until the source is detected. This article carries out simulation experiments on the proposed algorithm. The experimental results show that the algorithms have good performance, which is suitable for chemical plume tracing via autonomous underwater vehicle. Compared with the bionic method, the simulation results show that the proposed method has higher success rate and better stability than the bionic method.

Keywords

Chemical plume tracing POMDP artificial potential field source likelihood map

Introduction

Olfactory is a long-distance perceptual behavior, which is widely used by animals in search of food and finding mates,^1
–3 such as Pacific salmon returning to their habitat,⁴ Antarctic seabirds foraging,⁵ and some insects mating and feeding.^6
–8 In nature, olfactory plays a vital role in most animals. When searching for targets such as food and nests, animal olfactory cues are more effective than the visual and auditory sense cues.⁹ Although the detection of smell is simpler than hearing and vision, how to use olfactory cues to search for a specific target is still a difficult task.¹⁰ So far, few studies exist on this powerful and effective basic sensory system in robots.

With the tremendous development in the fields of robot technology, sensor technology, artificial intelligence, active olfactory technology has attracted attention of more and more scholars. And it will have a huge application prospect in civil, military, or anti-terrorism fields. Use of robots with olfactory technology can detect the leak of hazardous chemicals, lock down unexploded bombs and naval mine, search and rescue trapped people at the disaster site, detect fire sources in fire places,¹¹ and so on. Olfactory robot can even perform some complex and difficult tasks that require the cooperation of humans and animals (such as dogs) with highly developed olfactory.

Hayes et al.¹² described active olfactory as a task on how to use mobile robots to search odor in two-dimensional environment and divided it into three behaviors: plume finding, plume traversal, and odor source declaration. Later, the entire search task was divided into four behavior patterns: plume finding, tracing the plume, reacquiring the plume, and odor source declaration, as researched by Li et al.¹³ Plume finding is a behavior pattern used at the beginning of search tasks, which can make the robot find the plume as soon as possible. The robot may lose plume target which makes it impossible to keep tracking because flow field in the environment is constantly changing. So it needs to use some known plume information for finding and tracking again. The whole process is called reacquiring the plume behavior. Odor source declaration is a behavior of the robot to determine whether it has reached the source of the smell and confirm the source location.

The current research studies mainly focus on active olfactory tracing method based on bionics principle. Human has developed olfactory tracing and locating method based on the theory of bionics by imitating the strategy of some living creatures using developed olfactory to search odors. Some organisms can trace chemicals by the concentration gradient of the chemical information obtained (such as slime mold), which is the most commonly used method of bionics called chemotaxis. Most animals have two odor sensors: nostrils or tentacles. They will compare the concentration of odors detected by the nostrils and then adjust to the direction of the strongest signal. Based on this phenomenon, it is first put forward that this method can be applied to robot olfactory research by Berg and Brown. The robot can move in a direction of higher concentration which is detected by two chemical sensors in different positions.¹⁴ The movement behavior of lobster and moth were researched by Grasso and Belanger, and the result showed that all organisms had the similar movement behavior. According to the abovementioned behavior, an efficient simulation platform can construct a movement of moving along a z-type in the wind. The simulation platform can test, optimize, and verify the active detection method designed in the initial phase. This form of movement can keep robot itself remain inside the plume and gradually approach the plume source.^15

–19 By studying the process of tracking odors in moths, Farrell and Pang developed a task planner for searching plume used on the autonomous underwater vehicle (AUV). After several tracing chemical plume experiments, the results showed that the method carried out by Farrell and Pang is an effective method for tracking and locating plume.^20,21

At present, the successful detection algorithms are based on bionics, but few scholars have developed a tracking positioning method that is based on the theory of engineering which can be used in actual production. The computer system of the AUV is equipped with powerful storage capability and analysis ability, the nature that the organisms do not have. For example, the AUV’s computer system can record the flow speed/direction and detect the chemical information concentration, locate the AUV real-time position information, and store the functions about the concentration of the flow speed and chemical information in the memory. Organisms cannot accurately remember where they had been precisely before, where it had detected odor, and the recent historical data of wind speed and wind direction. In this article, a tracking and positioning method based on engineering theory will be adopted to efficiently utilize the storage and computational analysis abilities far beyond those of natural organisms for the AUV olfactory search and detection. Compared with the pure bionics algorithm, the tracking and location method proposed in this article can make full use of the storage capacity and compute and analyze resources of the AUV to achieve more stable performance.

This article presents a partially observable Markov decision process (POMDP) method used on the probability distribution map of the chemical source and is based on the artificial potential field (APF) method of the AUV olfaction search positioning path-planning method. Specific details about the APF olfactory path-planning method are available in the literature.²² The main works of this article were as follows: Firstly, the POMDP method is used to estimate the probability distribution of the source of the chemical plume, and the source likelihood map is obtained. Secondly, according to the characteristics of the probability distribution map, the chemical plume is tracked by the APF method. Finally, the method is verified by experiments. Experimental results showed that the abovementioned methods could effectively and steadily solve the AUV’s sense of smell for tracking and locating problems.

Construction of chemical source likelihood map

Framework of partial observation Markov decision process theory

The traditional partial observation Markov decision process can be described as a six-tuple, $M = {S, A, T, R, Ω, O}$ . In this article, the POMDP method is used to construct the source likelihood map. Based on the method of updating the belief state, the belief b is added and the compensation function R is removed. So improved partial observation Markov decision process can be described as a new six-tuple: $M = {S, A, T, Ω, O, b}$ .

State S : The system is described as a state at each decision moment. S represents all possible state, we call S -state space. State S is divided into two states: explicit state and hidden state. The explicit state can be directly observed, and the hidden state cannot be directly observed, so the probability of the hidden state can be estimated based on observations.

Decision A : Decision-making can be called a series optional actions a . In this article, the decision A is the heading angle command for the AUV.

State transfer function T : P_ij (a) indicates the transition probability from the state i to the next state j at the t time after taking the decision a .

Observation Ω : Ω represents a finite set of observation values.

In this article, observation values are detected by the AUV’s chemical sensor, ( d ) represents the AUV detects the chemical plume, and ( $\bar{d}$ ) represents the AUV detects nothing, $Z = {d, \bar{d}}$ .

This binary observation avoids the fluctuation of chemical plume concentration in time-varying environment, with the fuzzy concentration value.

Observation function O: $S \times A \times Ω \to [0, 1]$ and $O (a, s^{'}, z^{'}) = P (z^{'} | a, s^{'})$ represent the probability of the new observation z′ which appears when the hidden state reaches s′ after the AUV takes decision a . The observation function is established by the observation model, and the observation model can be calculated by the plume movement model.

Initial belief state b : b represents the estimated probability of the hidden state which is stored in S , means the probability distribution map of the chemical plume source in different states. Where $b = P (s | h)$ is the estimation of the state s after a series of observations, h is a series of historical observation values. $b (s)$ represents a belief state at a certain state s , the sum of all belief states in the S set is 1, means $\sum_{s \in S} b (s) = 1$ . Where $b_{0} = P (s_{0})$ is the estimated probability of hidden state at initial time.

After making the decision a , the AUV changed to new observation z′, and the hidden state changed to s′. Decision a and observation z′ are added in current moment history data h′ after observations, $h^{'} = {a, z^{'}, h}$ . We can make estimate of the new hidden state s′ according to data h′ and get the new estimated probability of s′ is $b_{t + 1} (s^{'}) = P (s^{'} | h^{'}) = P (s^{'} | a, z^{'}, h)$ . Note that, when estimating the new state, we only consider the belief state $b_{t} (s)$ . So $b_{t + 1} (s^{'}) = P (s^{'} | h^{'}) = P (s^{'} | a, z^{'}, h)$ becomes

\begin{array}{l} b_{t + 1} (s^{'}) = P (s^{'} | z^{'}, a, b_{t}) = \frac{P (s^{'}, z^{'}, a, b_{t})}{P (z^{'}, a, b_{t})} \\ \begin{matrix} ​ & ​ & = \end{matrix} \frac{P (z^{'} | s^{'}, a, b_{t}) P (s^{'} | a, b_{t}) P (a, b_{t})}{P (z^{'} | a, b_{t}) P (a, b_{t})} \\ \begin{matrix} ​ & ​ & \begin{array}{l} = \frac{O (a, s^{'}, z^{'}) \sum_{s \in S} P (s^{'} | a, b_{t}, s) P (s | a, b_{t})}{P (z^{'} | a, b_{t})} \\ = \frac{O (a, s^{'}, z^{'}) \sum_{s \in S} T (s, a, s^{'}) b_{t} (s)}{P (z^{'} | a, b_{t})} \end{array} \end{matrix} \end{array}

Using full probability formula for calculating probability

\begin{array}{l} P (z^{'} | a, b_{t}) = \sum_{s^{'} \in S} P (z^{'} | s^{'}, a, b_{t}) \cdot P (s | a, b_{t}) \\ \begin{matrix} ​ & ​ & ​ & = \end{matrix} \sum_{s^{'} \in S} {P (z^{'} | s^{'}, a, b_{t}) \cdot \sum_{s \in S} P (s^{'} | a, b_{t}, s) P (s | a, b_{t})} \\ \begin{matrix} ​ & ​ & ​ & = \sum_{s^{'} \in S} {P (z^{'} | s^{'}, a) \cdot \sum_{s \in S} P (s^{'} | a, s) P (s | a, b_{t})} \end{matrix} \\ \begin{matrix} ​ & ​ & ​ & = \end{matrix} \sum_{s^{'} \in S} {O (a, s^{'}, z^{'}) \cdot \sum_{s \in S} T (s, a, s^{'}) b_{t} (s)} \end{array}

Combine equations (1) and (2) to be the update equation $b (s)$ . Because the observation z′ is not related to the observation state s, we have the equation $P (z' | a, b_{t}, s) = P (z' | a, s)$ . For the same reason, the probability of the state changes to s′ is not related to the estimated probability of the state s , we have the equation $P (s' | a, b_{t}, s) = P (s' | a, s)$ .

Establishment of probability distribution map model of chemical source location

Tracing and locating chemical plume accords with the characteristics of POMDP method: Chemical source position is the hidden state which cannot be observed, but the AUV’s position is the explicit state which can be observed by localization sensors. When the AUV is tracing plume, the flow field is changing all the time and the AUV guides APF algorithm to make decision (path planning) only by learning the change states surrounding. The decision path may be right or wrong, a correct decision can make the AUV go further toward completing the task, but a wrong decision cannot cause the task to be failed. Performing a series of correct decisions can make the task to be successful, and a series of wrong decisions can make the task to be failed, which meets characteristics of decision-making process. POMDP method has a wide range of applications in the AUV planning field, we use POMDP method to build the chemical plume source likelihood map in this section.

The establishment of the state space

To make easier calculation of constructing source likelihood map, the plume search area is divided into several rectangular cells, and the setting of the coordinate system, and the division of the cells are shown in Figure 1. Cell vector $C = [C_{1}, \dots, C_{M}]$ represents cells in the search area, $M = m \times n$ means number of all cells. $f \in [1, m]$ is used for counting the number of cells along the x direction, $g \in [1, n]$ is used for counting the number of cells along the y direction. The position information of the AUV can be used for calculating the serial number of the cell (f, g) which the AUV is. When f and g are known, the cell sequence number i is

i = f + (g - 1)

Figure 1.

Cell partitioning sketch of the search area.

According to the known number of cell sequences number i and equations (4) and (5), (f, g) becomes

f (i) = rem (i - 1, m) + 1

g (i) = int (i - 1, m)

Int(c1, d1) represents the largest integer less than or equal to c1/d1, and rem(c1, d1) represents the remainder of c1/d1. This proves that the two cell representation methods, C_f _{, g} and C_i , are equivalent.

Cell vector $C = [C_{1}, \dots, C_{M}]$ has a vector of scent source confirmation information $c = (c_{1}, \dots, c_{M}), c_{i} \in {0, 1}, i = 1, 2, \dots, M$ , $c_{i} = 1$ represents the source is in the cell $c_{i}$ ; $c_{i} = 0$ represents no source in the cell $c_{i}$ . Because the source location information cannot be found directly in the process of searching plume, M states in vector c are unknown. These states can only be estimated based on the observed information that is obtained by the sensor, and then the source location can be estimated, with M states in vector c are hidden state.

The AUV location information $L_{v} (t_{i}) = (x_{v} (t_{i}) \begin{matrix} , \end{matrix} y_{v} (t_{i}))$ is the explicit state, which can be detected by positioning sensor. Define the cell $C = [C_{1}, \dots, C_{M}]$ as the state of the AUV, means the AUV has M states ${C_{i} | i = 1, 2, \dots, M}$ in the entire search area.

$U (L, t_{i}) = (u_{x} (L, t_{i}), u_{y} (L, t_{i}))$ represents the flow speed information, $L = (x, y)$ is some point location information in the entire search area, and $t_{i}$ represents at any time. Flow speed information at the current position of the AUV $L_{v} (t_{i}) = (x_{v} (t_{i}) \begin{matrix} , \end{matrix} y_{v} (t_{i}))$ can be detected using flow speed sensor, which cannot detect all speed information in the entire search area. So we need to convert the flow speed information, assume that changes of the flow speed are relatively small, and consider flow speed of the AUV’s location at time $t_{i}$ is equal to flow speed at time $t_{i}$ in the entire search area. In this article, flow speed information $U (L, t_{i}) = (u_{x} (L, t_{i}), u_{y} (L, t_{i}))$ is considered as an explicit state that can be detected by the sensor in the entire search area. During the whole calculation process, only flow speed information for the last l time steps can be retained, because it is unable to get the flow speed information for the whole time. If the number of flow speed information is less than l, all flow speed historical information must be saved and composed.

The AUV’s state vector $S (t_{i})$ at time $t_{i}$ includes $(c, C)$ , c is the hidden state that cannot be obtained by sensor and must be estimated by other information. C is the explicit state that can be obtained by sensor. So only hidden state c can have the initial belief state b, where b represents whether there is a probability value of the source in the cell, so that we can get the plume source likelihood map in the entire search area by traversing all cells.

Determination of decision

Decision is heading angle command for the AUV which perform searching and locating task, decision space is 1-D continuous space $A = ϕ_{cmd} : [- π, π]$ . Supposing the AUV is going to the target cell $C_{g}$ , after confirming information of the target cell, heading angle command for the AUV is

ϕ_{cmd} = arctan (\frac{y_{g} - y_{v}}{x_{g} - x_{v}})

$(x_{g}, y_{g})$ is the center position coordinate of the target cell $C_{g}$ , $(x_{v}, y_{v})$ is the center position coordinate in the cell $C_{v}$ , $ϕ_{cmd}$ is the heading angle command at the current time. Then, we let $C_{g}$ represent decision a , $a = C_{g}$ .

State transfer function

The chemical source location in the search area is constant, state c and its state transition is also constant. If there is a source in some cell, the transition probability is 1. If there is no source in the cell, the transition probability is 0

P (c_{i} ​^{'} | c_{i}, a) = {\begin{matrix} 1 \begin{matrix} ​ & {c_{i}}^{'} \end{matrix} = c_{i} \\ 0 \begin{matrix} ​ & {c_{i}}^{'} \neq c_{i} \end{matrix} \end{matrix}

$P ({c_{i}}^{'} | c_{i}, a)$ represents the transition probability from state $c_{i}$ to state ${c_{i}}^{'}$ caused by taking decision a .

Observation, observation model, and observation function

When the AUV detects the cells surrounding by the sensor carried on the AUV, gets all the detection information is observation result, we call this process observation. In this article, chemical sensor is used for searching chemical plume and locating the chemical plume based on the change of chemical concentration.

What can be determined is that the AUV can detect the presence of chemical plume in some cell by the chemical sensor. Before starting the search and localization task, we set a chemical concentration threshold. If the chemical concentration is detected by the sensor is higher than this set threshold, the plume is thought to have been detected, $z = d$ ; if it is lower than the set threshold, the plume is thought to have been not detected, $z = \bar{d}$ . Therefore, there are two results for $Z = {d, \bar{d}}$ : detected (d) and not detected ( $\bar{d}$ ).

Setting a threshold can avoid chemical concentration and gradient information failure in complex flow field, which results in achieving greater simplicity and higher reliability. The value of threshold in this article is obtained according to the observation results of flow field, normally taking 10% above the maximum concentration value in all observation results as the threshold value.

Observation model can be calculated by chemical plume motion model. Many scholars used two motions together for studying chemical plume instantaneous concentration distribution: molecular Brown motion and the fluid flow motion. Based on this theory, the location of chemical plume is

\dot{L} (t) = U (L, t) + N (t)

where $L = (x, y)$ is the chemical plume location, $U = (u_{x} \begin{matrix} , u_{y} \end{matrix})$ is the flow speed, and $N = (n_{x} \begin{matrix} , n_{y} \end{matrix})$ is the Gaussian random variable (normal distribution) with zero mean and the $(σ_{x}^{2}, σ_{y}^{2})$ variance.

In the literature,²³ the plume diffusion distribution model is studied on the basis of equation (8). This section is only a brief introduction. Pang divided the plume diffusion model into single plume diffusion distribution model and continuous plume diffusion distribution model.

(1) Single releasing chemical plume model: Equation (8) can be transformed into

L_{s} (t_{l}, t_{k}) = L_{j} - V (t_{l}, t_{k}) - W (t_{l}, t_{k})

where $L_{j}$ is decided by the location of the AUV, $V (t_{l}, t_{k})$ is calculated by the flow speed history data, then we can calculate the value of $L_{j} - V (t_{l} - t_{k})$ . $W (t_{l}, t_{k})$ is a random variable obeying Gaussian distribution with zero mean. $L_{s} (t_{l}, t_{k})$ is also a random variable obeying Gaussian distribution with $L_{j} - V (t_{l} - t_{k})$ mean and $(t_{k} - t_{l}) σ^{2}$ variance. The probability of source is $L_{j}$ depends on the probability distribution of $W (t_{l}, t_{k})$ , and probability distribution of the source can be obtained by solving equation (9). Probability density function $W (t_{l} \begin{matrix} , \end{matrix} t_{k})$ is

f (w_{x} (t_{l} \begin{matrix} , \end{matrix} t_{k})) = \frac{1}{\sqrt{2 π (t_{k} - t_{l}) σ_{x}^{2}}} exp (- \frac{w_{x}^{2}}{2 (t_{k} - t_{l}) σ_{x}^{2}})

f (w_{y} (t_{l} \begin{matrix} , \end{matrix} t_{k})) = \frac{1}{\sqrt{2 π (t_{k} - t_{l}) σ_{y}^{2}}} exp (- \frac{w_{y}^{2}}{2 (t_{k} - t_{l}) σ_{y}^{2}})

When a plume is released at time $t_{l}$ , the source is in the cell $C_{i}$ and define $S_{i j} (t_{l}, t_{k})$ , which is the probability that plume exists in the cell $C_{j}$ at the time $t_{k} (t_{k} > t_{l})$ after moving for a while. The probability in both directions is independent of each other, because of the orthogonality of x and y directions, and after some simplification

\begin{matrix} S_{i j} (t_{l} \begin{matrix} , \end{matrix} t_{k}) = \frac{1}{2 π (t_{k} - t_{l}) σ_{x} σ_{y}} \int_{- L_{x} / 2}^{L_{x} / 2} exp (- \frac{{(x_{j} - x_{i} - V_{x} (t_{l} \begin{matrix} , \end{matrix} t_{k}) - x)}^{2}}{2 (t_{k} - t_{l}) σ_{x}^{2}}) d x \\ \times \int_{- L_{y} / 2}^{L_{y} / 2} exp (- \frac{{(y_{j} - y_{i} - V_{y} (t_{l} \begin{matrix} , \end{matrix} t_{k}) - y)}^{2}}{2 (t_{k} - t_{l}) σ_{y}^{2}}) d y \end{matrix}

$L_{x} and \begin{matrix} L_{y} \end{matrix}$ are side lengths of the entire cell. We can get the location probability of the source in any cell $C_{i}$ by traversing all the cells, is source likelihood map.

Define μ is the probability that plume exists in the cell, it also is the reliability of the chemical sensor. In this article, setting the reliability is $μ = 0.9$ . When a single plume is released at time $t_{l}$ in the cell $C_{i}$ , the probability can be detected bythe chemical sensor carried on the AUV at time $t_{k}$ is $μ S_{i j} (t_{l} \begin{matrix} , \end{matrix} t_{k})$ , and the probability cannot be detected is $(1 - μ S_{i j} (t_{l} \begin{matrix} , \end{matrix} t_{k}))$ .

Continuous releasing chemical plume model: The previous discussion considered situation of releasing a single plume. Normally, it should be continuous releasing chemical plume situation. The calculation formula of $β_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})$ is

β_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) = \sum_{l = 0}^{k - 1} \frac{1}{k} \cdot S_{i j} (t_{l} \begin{matrix} , \end{matrix} t_{k}) = \frac{1}{k} \sum_{l = 0}^{k - 1} S_{i j} (t_{l} \begin{matrix} , \end{matrix} t_{k})

Vector $β_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})$ represents source location probability distribution map which is detected by the AUV in the situation of continuous releasing chemical plume.

The previous discussion considered that the situation of plume can be detected. If the AUV cannot detect the plume at time $t_{k}$ in the cell $C_{j}$ which is released at time $t_{l}$ in the source cell $C_{i}$ , then that is considered as the AUV losing plume. $γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})$ represents the probability of this losing plume. So the chemical source in cell $C_{i}$ keeps releasing plume during the time $[t_{0} \begin{matrix} , t_{k} \end{matrix})$ , and the calculation formula of $γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})$ is

γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) = \prod_{l = 0}^{k - 1} (1 - μ S_{i j} (t_{l} \begin{matrix} , \end{matrix} t_{k}))

where vector ${[γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})]}_{i = 1}^{M}$ is the source likelihood map of losing plume in the situation of continuous releasing of plume.

It is hard to determine the initial moment of starting to release plume, because the source may have released plume for a long time, and plume was already moving outside the research area, so it is already not helpful for searching and locating. If the AUV detects plume, we can determine that plume was released from the source not long ago. The AUV cannot keep all the observation information, but can only keep information in the last l time steps. So $t_{0}$ is set as the initial moment, and it is considered that plume which was released early has already moved outside the search area.

Observation function—In the previous content, the observation function is defined as $O (a, s^{'}, z^{'}) = P (z^{'} | a, s^{'})$ , and the decision space is represented by cells, the real state s is the constant. So the observation function can represent the observation probability in the cell $C_{j}$ , $O (a, s^{'}, z^{'}) = O (C_{j} \begin{matrix} , c^{'}, z^{'} \end{matrix}) = P (z^{'} | C_{j}, c^{'})$ .

Based on the model of $Z = {d, \bar{d}}$ , for $z^{'} = \bar{d}$ , the observation function is

O (C_{j} \begin{matrix} , {c^{'}}_{i}, \bar{D} \end{matrix}) = P (\bar{D} | C_{j} \begin{matrix} , {c^{'}}_{i} \end{matrix}) = γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})

For $z^{'} = d$ , the observation function is

O (C_{j} \begin{matrix} , {c^{'}}_{i}, D \end{matrix}) = P (D | C_{j} \begin{matrix} , {c^{'}}_{i} \end{matrix}) = 1 - γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})

Because $\sum_{z^{'} \in Z} O (C_{j} \begin{matrix} , c^{'}, z^{'} \end{matrix}) = 1$ , combing functions (15) and (16), the observation function becomes

O (C_{j} \begin{matrix} , {c^{'}}_{i}, z^{'} \end{matrix}) = {\begin{matrix} γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) \begin{matrix} ​ & z^{'} = \bar{d} \end{matrix} \\ 1 - γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) \begin{matrix} ​ & z^{'} = d \end{matrix} \end{matrix}

Based on the model of $Z = {d, \bar{d}}$ , the observation function becomes

O (C_{j} \begin{matrix} , {c^{'}}_{i}, z^{'} \end{matrix}) = {\begin{matrix} (1 - b (c_{j})) γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) \begin{matrix} ​ & ​ \end{matrix} z^{'} = \bar{d} \\ ​ \\ (1 - b (c_{j})) (1 - γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})) \begin{matrix} ​ & z^{'} = d \end{matrix} \end{matrix}

The reason for using $γ_{i j}$ is that it needs no assumption. If using $β_{i j}$ , releasing of plume should be assumed obeying uniform distribution in the time zone $[t_{0} \begin{matrix} , t_{k} \end{matrix})$ .

Online mapping of chemical source likelihood map

In the previous content, update formula for the belief state is estimated as

b_{t + 1} (s^{'}) = \frac{O (a, s^{'}, z^{'}) \sum_{s \in S} T (s, a, s^{'}) b_{t} (s)}{P (o^{'} | a, b_{t})}

and

P (z^{'} | a, b_{t}) = \sum_{s^{'} \in S} {O (a, s^{'}, z^{'}) \cdot \sum_{s \in S} T (s, a, s^{'}) b_{t} (s)}

Hidden state s is unchanged for searching and locating task, $s^{'} = s$ , $T (s, a, s^{'}) = 1$ , and equation (19) becomes

b_{t + 1} (s^{'}) = \frac{O (a, s^{'}, z^{'}) b_{t} (s^{'})}{\sum_{s^{'} \in S} {O (a, s^{'}, z^{'}) b_{t} (s^{'})}}

Decision a is replaced by the cell $C_{j}$ , and hidden state s′ is replaced by the cell $c_{i}$

b_{t + 1} (c_{i}) = \frac{O (C_{j}, c_{i}, z^{'}) b_{t} (c_{i})}{\sum_{i = 1}^{M} {O (C_{j}, c_{i}, z^{'}) b_{t} (c_{i})}}

Therefore, the belief state is updated, which means the chemical source likelihood map is updated for the model of $Z = {d, \bar{d}}$ .

When $z^{'} = d$

b_{t + 1} (c_{i}) = \frac{1 - γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) b_{t} (c_{i})}{\sum_{i = 1}^{M} {1 - γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) b_{t} (c_{i})}}

When $z^{'} = \bar{d}$

b_{t + 1} (c_{i}) = \frac{γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) b_{t} (c_{i})}{\sum_{i = 1}^{M} {γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix}) b_{t} (c_{i})}}

The model is established on the premise that there is only single chemical source in the entire search area. $\sum_{i = 1}^{M} O (C_{j}, c_{i}, d) = \sum_{i = 1}^{M} (1 - γ_{i j} (t_{0} \begin{matrix} , t_{k} \end{matrix})) = 1$ because of the single chemical source situation, in the equation, plume which is detected must be released from some cell in the entire search area. The sum of all observation functions is 1 by calculating all the cells. Belief state $b_{0}$ at the initial moment, the probability of source exists in each cell is the same, is $1 / M$ .

The belief state updated equation based on POMDP method is proposed in this section, it is also the method of constructing chemical source likelihood map. Updating of belief state $b_{t + 1} (c_{i})$ in each cell is decided by the position of the observation cell $C_{j}$ . Therefore, the belief state updated formula can be promoted. If the AUV obtains the observation information at any position, equations (23) and (24) can be used for updating the chemical source likelihood map. But in actual operation, when the AUV makes a decision, this movement will take a while not a moment to finish. In the process of finishing decision, the AUV keeps observing, with the belief state being updated at the same time. If it shows updates of observation and the belief state is having the same frequency, it means there exists any observation, and so the belief state must be updated, with the chemical source likelihood map updating. There is a weak coupling relationship between decision, observation, and updating of belief state. If there is a strong coupling relationship, once the observation occurs, the belief state is updated immediately, the result of each observation will be applied to the decision, and this may disrupt the decision of the AUV at the last moment. It is very unreasonable that it did not make full use of chemical source likelihood map information. Therefore, we propose that the belief state should be updated as long as an observation appears, and then should decide on the starting of the next moment after finishing the decision at the last moment.

Artificial potential field-based online planning

APF method was first proposed in 1986, used in movement planning of robotic arm by Khatib, and made a good exercise effect.^24
–26 APF method has the following characteristics of simple model, small calculation, and real-time performance, is widely used as method of path planning, and is a relatively stable and efficient method in current path planning methods.

In this article, APF method is used for tracing plume, and the movement of robot can be considered as causing the force field in a virtual space. The force field is APF, which includes gravitational field and repulsive field. Repulsive field is generated by the obstacle for the AUV, and gravitational field is generated by the plume information for the AUV. We assume that there is no obstacle in the plume search and localization area. Traditional APF method needs to be modified: Repulsive field is removed and gravitational field is retained. When the target is near the obstacle in the traditional APF method, the AUV cannot reach the target location, and the previous method we proposed can avoid this situation. In this case, the AUV is only subjected to the gravitational field in the APF environment and is moving toward the source. But the location of the plume is uncertain, and we need to calculate source likelihood map for generating gravitational field to attract the AUV. Source likelihood map represents the probability values of chemical source in each cell. The greater the probability value, the more likely the chemical source is in the cell. When the AUV is moving, it can get a series of plume as detected d and not detected $\bar{d}$ at different locations. A series of d and $\bar{d}$ events will update the source likelihood map online based on the source estimated probability method. When the AUV uses information which is obtained by itself, it plans an optimal search trajectory to reach the source location based on APF method.

When the AUV is moving in the flow field, a $w \times w$ cell area will generate accompanying the AUV, and the AUV is always at the center position of the cell. The motion direction of the AUV can be obtained by improving the APF method. Figure 2 is the schematic diagram of APF method. The circle represents location of the AUV, and arrow direction is the direction of the field joint force F calculated by the APF method.

Figure 2.

Method diagram of potential field.

$w \times w$ cell area calls “accompanying window,” and the AUV is always at the center of the window. The probability value of source in each cell generates a virtual gravitational force. The virtual gravitational force is proportional to the probability value of the source in each cell and is inverse proportion to the distance between the AUV and the cell.

The virtual force $F_{i}$ generated by the cell $C_{i}$ for the AUV can be expressed as

F_{i} = \frac{P V_{i}}{d s} (\frac{L_{v} - L_{i}}{d s})

d s = ‖ L_{v} - L_{i} ‖

where $P V_{i}$ is the probability value of source in the cell $C_{i}$ , $d s$ is the distance between the AUV location $L_{v}$ and the cell $C_{i}$ location $L_{i}$ . The virtual gravitational joint force F is the sum of all virtual gravitational forces generated by cells in the “accompanying window.”

F = \sum_{i \in w} F_{i}

The direction of joint force F is the command heading angle for the AUV at the next moment, the AUV is tracing plume along with the command heading angle until the source location.

The chemical source search method flow is shown in Figure 3. Firstly, the AUV will start a massive z-type search; when the detected chemical concentration is higher than the set threshold, the AUV will update the source probability map, and the APF algorithm will drive the AUV tracing plume toward the source position. In the tracing process, if the AUV did not detect the plume information in the last 30 s, it will change back to z-type search. If the AUV did detect plume information in the 30 s, and the distance of multiple detected plume is less than 5 m, the AUV can determine that the source position is confirmed, and the plume tracing task is considered to be success.

Figure 3.

Behavior switching diagram of chemical source search.

Simulation experiment

The simulation experiment in this article is carried out in the underwater semi-physical simulation system. The structure of the simulation system is shown in Figure 4.

Figure 4.

The structure of underwater semi-physical simulation systems.

The simulation system consists of three parts: an AUV motion monitoring system, a chemical plume environment simulation system, and a PC104 embedded computer. The PC104 is an embedded computer used on real underwater AUVs for chemical plume tracking path planning. The sensor information such as concentration and flow rate/flow direction detected by the AUV in the semi-physical simulation is calculated by the chemical plume environment simulation system. The motion monitoring system can display the information of the current motion state of the AUV, transmit the information to the chemical plume environment simulation system through the Ethernet, and use the obtained motion information to determine whether the AUV tracking process is accurate, as shown in Figure 5.

Figure 5.

Underwater semi-physical simulation operation pictures.

This section represents the underwater simulation experiment for the method proposed in the article. The search area is set to 100 × 100 m². Figure 6 shows the search area is divided into $50 \times 50$ cells; the location of the source is set at (70 m, 50 m). Each cell size is 2 × 2 m², the same size as the AUV. During the experiment, the flow rate is a random value between 0 and 1 m/s, and the flow range is from −45° to 45° (the negative direction of the x-axis is set to be the positive direction of flow). The sampling period of the AUV is 0.5 s, the planning duration is 5 s, the loss period is 30 s, and retains 40 flow rate/flow direction information. The size of the accompanying cell is 11 × 11, the concentration threshold is set to 0.2, and motion speed of the AUV is 1.5 m/s.

Figure 6.

Sketch of cell partitioning in simulation.

Figure 7(a)–(d) shows the source likelihood map obtained by the POMDP method at four different moments in the AUV tracking process.

Figure 7.

The four different moments in the AUV tracking process. (a) AUV extensive tracing diagram, (b) AUV detected plume for the first time, (c) process of AUV tracing, and (d) process of AUV confirmed source.

In Figure 7(a), the AUV starts from the initial position (0 m, 25 m) and performs z-type motion. The color region is used to represent the source likelihood map. The higher the probability value, the darker the color. At the initial moment, the AUV obtains very little plume information, and the probability values of source existence are equal in most cells. The probability values decrease near the area that passes by itself.

In Figure 7(b), the AUV first detected the plumage information and immediately turned to the APF tracking behavior at t = 42 s. After detecting the plume information, the AUV updated the source likelihood map. The situation where the source may exist in most cells disappeared, and correspondingly, a dark area appeared in the upstream direction of the AUV. This area is the calculated source likelihood map at the current moment.

In Figure 7(c), with the increase of AUV acquisition information, the source likelihood map converges to a small area at t = 83 s. Figure 7(d) shows that the AUV confirms the source at t = 145 s. The position of the red point is the cell with the highest probability of the source existence. At this moment, the source likelihood map converges to a very small area, which makes it easy for the AUV to perform source localization.

The source localization method used in this article is to compare the distance information of the recently detected plume. When the distance of the detected plume is less than a certain threshold (5 m), it is determined that the location is the location of the source. The simulation results show that the proposed method can make the AUV accurately confirm the chemical plume source location.

To verify the stability and reliability of the proposed tracing method, it is compared with the mature bionics tracing method. The bionics tracing method has the advantages of simple calculation and high success rate. It is a widely used chemical plume tracing method.

Figure 8 shows bionics tracing process. It can be seen that the bionics method is mainly to imitate the process of bio-tracing food, detailed descriptions of the bionics tracing method can be found in the study by Farrell et al.²⁷

Figure 8.

Process of plume tracing based on bionics.

This article compares the bionics tracing method with the tracing method proposed in this article from the following two aspects: (1) Tracing success rate refers to the probability of successfully confirming the source within the specified time. 250 s as the specified time is selected in this article. (2) Time spent in the tracking process refers to the time taken from the AUV’s first detection of chemical plume concentration information to the end of the source confirmed.

Through 60 simulation experiments, the success rate comparison diagrams of the two tracing methods are obtained. Figure 9(a) is the success rate of the tracing method proposed in this article and Figure 9(b) is the success rate of bionics tracing method.

Figure 9.

Two tracing methods success rate comparison chart. (a) The success rate of POMDP and APF method and (b) the success rate of bionics tracking method. POMDP: partially observable Markov decision process; APF: artificial potential field.

In the 60 simulation experiments of the tracing method proposed in this article, 58 successful, 2 failures, the success rate is about 96%. In the two failed experiments, the reason for one failure was because the AUV did not detect the plume and was always searching over a wide range, for the other failure is because the AUV is always unable to confirm the source position.

In the 60 simulation experiments of the bionics-based tracing method, 55 successful, 5 failures, the success rate is about 92%. In the five failed experiments, the reason for two failures was because the AUV lost plume during tracing, and then it did not continue to trace the plume; for the other three failures was that the AUV has never detected plume information.

In 58 successful experiments based on the tracking method presented in this article, the tracking process took an average of 114 s. In 55 successful experiments based on bionics tracking method, the tracking process took an average of 139 s.

Comparing the performance indicators of two different tracking methods, we can get the following conclusions:

The method proposed in this article has a good tracking effect. After the plume information is detected, the loss of the plume does not occur. The bionics method has two losses and no trace of the feather.

The method presented in this article has a higher success rate (96%) than the bionics method, the bionics method also has a high success rate (92%).

This method presented in this article has a faster tracking speed than the bionics method, and the lower average time consumption can save about 20% of the time.

Through comparison experiments, it can be concluded that the tracking method combining source likelihood map and APF planning method has better reliability and stability and is a more efficient tracking method.

Make full use of the chemical plume concentration and flow rate/flow direction information to construct a source likelihood map, guide the AUV to carry out the purposeful tracking movement, and greatly improve the tracking efficiency. This is an important reason why the tracking method proposed in this article has better performance. The bionics method has less computational complexity, and in the process of running the bionic method program, there is no jamming phenomenon caused by excessive calculation. So in the case of the AUV with a low-performance processor, it is also a good choice to use the bionics method for plume tracing.

Conclusion

In this article, a method based on POMDP to construct source likelihood map is proposed for AUV chemical plume tracing. The semi-physical simulation system is used to verify the proposed tracking method, and the experimental results are compared with the bionic tracking method. Experiments show that the POMDP source likelihood map method and the APF planning method can guide the AUV to find the source position and have a higher success rate and better stability than the bionic tracing method. The success of the algorithm can prove that the olfactory tracking method based on engineering theory is applicable in real environment. And because of the advantages of engineering-based method in using sensors and computing resources, it will achieve better results than bio-bionic tracing method.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by Harbin University of Commerce Funding (No.167274).

References

Dusenbery

. Sensory ecology: how organisms acquire and respond to information. New York: WH Freeman and Co, 1992.

Vickers

. Mechanisms of animal navigation in odor plumes. Biol Bull 2000; 198(2): 203–212.

Zimmer

Butman

. Chemical signaling processes in the marine environment. Biol Bull 2000; 198(2): 168–187.

Hassler

Scholz

. Olfactory imprinting and homing in salmon. New York: Springer-Verlag, 1983.

Nevitt

GA.

Olfactory foraging by Antartic proellariiform seabirds: life at high Reynolds numbers. Biol Bull 2000; 198(2): 245–253.

Cardé

Mafra-Neto

. Effect of pheromone plume structure on moth orientation to pheromone. In: Carde

Minks

(eds) Perspectives on Insect Pheromones. New York: Chapman and Hall, 1996, pp. 275–290.

Cardé

. Odour plumes and odour-mediated flight in insects. In: Olfaction in mosquito-host interactions. CIBA foundation symposium, London, UK, 29–31 October 1996, pp. 54–70. John Wiley & Sons.

Mafra-Neto

Cardé

. Fine-scale structure of pheromone plumes modulates upwind orientation of flying moths. Nature 1994; 369(6476): 142–144.

Bell

Tobin

. Chemo-orientation. Biol Rev 1982; 57: 219–260.

10.

Lytridis

Kader

Virk

. A systematic approach to the problem of odour source localization. Auton Robot 2006; 20(3): 261–276.

11.

King

Horine

Daly

. Explosives detection with hard-wired moths. IEEE Trans Instrum Meas 2004; 53(4): 1113–1118.

12.

Hayes

Martinoli

Goodman

. Distributed odor source localization. IEEE Sens 2002; 2(3): 260–271.

13.

Farrell

Pang

. Moth-inspired chemical plume tracing on an autonomous underwater vehicle. IEEE Trans Robot 2006; 22(2): 292–307.

14.

Berg

Brown

. Chemotaxis in Escherichia coli analysed by three dimensional tracking. Nature 1972; 239(5374): 500–504.

15.

Belanger

Willis

MA.

Adaptive control of odor-guider location: behavioral flexibility as an antidote to environmental unpredictability. Adapt Behav 1998; 4(3): 217–253.

16.

Belanger

Wills

. Biologically-inspired search algorithms for locating unseen odor sources. In: Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell, Gaithersburg, USA, 17 September 1998, pp. 265–270. USA: IEEE.

17.

Grasso

Consi

Mountain

. Locating odor sources in turbulence with a lobster inspired robot. In: From Animals to Animals 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior (eds Maes

Mataric

Meyer

Pollack

Wilson

), Cambridge, MA, 8 January 1997, pp. 104–112. Cambridge: MIT Press.

18.

Grasso

Consi

Mountain

. Biomimetic robot lobster performs chemo-orientation in turbulence using a pair of spatially separated sensors: progress and challenges. Robot Auton Syst 2000; 30(1): 115–131.

19.

Grasso

. Invertebrate-inspired sensory-motor systems autonomous, olfactory-guided exploration. Biol Bull 2001; 200(2): 160–168.

20.

Farrell

Pang

. Moth behavior based subsumption architecture for chemical plume tracing on a REMUS autonomous underwater vehicle. IEEE Trans Robot Autom 2006; 22(2): 292–307.

21.

Farrell

Cardé

. Tracking of fluid-advected odor plumes: strategies inspired by insect orientation to pheromone. Adapt Behav 2001; 9(3–4): 143–170.

22.

Pang

Zhu

. Reactive planning for olfactory-based mobile robot. In: Proceedings of the 2009 IEEE/RSJ international conference on intelligent robots and systems, St. Louis, 10–15 October 2009, pp. 4375–4380. IEEE: Piscataway.

23.

Pang

Farrell

. Chemical plume source localization. IEEE Trans Syst Man Cybern B Cybern 2006; 36(5): 1068–1080.

24.

Khatib

. Real-time obstacle avoidance for robot manipulator and mobile robots. Int J Robot Res 1986; 5(1): 90–98.

25.

Cui

. New potential functions for mobile robot path planning. IEEE Trans Robot Autom 2000; 16(5): 615–620.

26.

Barraquand

Langlois

Latombe

. Numerical potential field techniques for robot path planning. IEEE Trans Syst Man Cybern 1992; 22(2): 224–241.

27.

Farrell

Pang

. Chemical plume tracing via an autonomous underwater vehicle. IEEE J Ocean Eng 2005; 30(2): 428–442.