A biological-inspired episodic cognitive map building framework for mobile robot navigation

Abstract

This article proposes a self-learning method of robotic experience for building episodic cognitive map using biologically inspired episodic memory. The episodic cognitive map is used for robot navigation under uncertainty. Two main challenges which include high computational complexity and perceptual aliasing are addressed. The episodic memory-driving Markov decision process is proposed to simulate the organization of episodic memory by introducing neuron activation and stimulation mechanism. Episodic memory self-learning model and algorithm are presented for building the episodic cognitive map based on episodic memory-driving Markov decision process. Uncertain information is considered to improve mapping performance. The presented method can realize robotic memory real-time storage, incremental accumulation, integration and updating. Based on the episodic cognitive map, the predicted episodic trajectory can simply be computed by activation spreading of state neurons. The experimental results for a mobile robot indicate that the method can efficiently performs learning, localization, mapping and navigation in real-life office environments.

Keywords

Episodic memory state neuron self-learning cognitive map robot navigation

Introduction

Building an integrated mobile robot that can navigate freely and deliberately in an indoor environment is a challenging task.¹ The robot needs to be integrated with several functional mechanisms, such as scene understanding, mapping, localization and path planning. Spatial cognition is the basic ability of mammals to perform navigation tasks. Researchers have been investigating how animals perceive space and navigate in environment. Cognitive map,^2–3 a map-like representation, which represents the spatial relationship among salient landmarks of an environment, is developed to solve navigation problems. Inspired by the navigation ability of humans or animals, the studies,^4
–6 which are related to how mammals perform mapping, localization and navigation, have gained extraordinary interests from the robotics community.

Current systems for robotic behavioural organization typically abstract from the low-level sensory-motor embodiment of the robot, leading to a gap between the level at which a sequence of actions is planned and the levels of perception and motor control. This gap is a major bottleneck for the autonomy of systems in complex and dynamic environments.⁷ Human can better adapt to the complex environments and tasks, accumulate knowledge, recall previous experience to complete tasks and produce new knowledge and skills. Such critical behaviours are often made by experiences, relying on their learning. Similarly, for robots, the premise here is that experience does matter as well.⁸ We argue that the learned experience can be integrated with the cognitive map to enable robot perceive and navigate freely in an environment. It is thus natural to seek inspiration in human experience in order to achieve their adaptivity and cognition.

This article presents a novel framework called episodic memory-driving Markov decision process (EM-MDP) to build the cognitive map of real environment using episodic memory. The biological inspiration of episodic memory comes from the mammalian hippocampus.⁹ It is the collection of past experiences that occurred at a particular time and place. Previous works on memory implementing were from the viewpoint of engineering and learning the specific action at a sense.^10
–12 A episodic robot memory was proposed to improve action planning based on past experiences using image appearance and behaviour. It can provide one-shot learning capabilities to robot.¹³ The episodic memories retained a sequence of experienced observation, behaviour and reward.⁸ Simulation research of navigation task was processed through the observation of distance information. The event modelling in this article is partly inspired by the sequence. Stachowicz and Kruijff¹⁴ adopted an index data structure storing episodic memory to provide knowledge for robotic cognitive model and carried out several long-time simulation experiments. Kelley¹⁵ implemented a memory store to allow a robot to retain knowledge from previous experiences based on image to construct event. Park et al.¹⁶ proposed an integrated adaptive resonance theory neural model for episodic memory with task memory. The model was used for robot performing serve cereal tasks. Most episodic memory models are focused on episodic-like memory which is a kind of memory structure instead of cognitive neuroscience episodic memory, not considered the uncertainty processing and applied for simple robotic tasks or simulation environments. The proposed episodic memory model in this article is inspired by the biological basis of hippocampal neurons. Using the episodic memory model, episodic cognitive map can efficiently organize robotic experience including robot internal state and the external environment based on the EM-MDP.

For robotic planning under uncertainty, partially observable Markov decision process (POMDP) is a powerful framework with a solid mathematical foundation and wide applicability. The uncertainties include imperfect robot control, sensor noise and unexpected environment changes. POMDP models the robot taking a sequence of actions under uncertainty to achieve a goal.¹⁷ However, the use of POMDP for robot planning under uncertainty is not widespread. The resulting curse of dimensionality is one major obstacle. Despite the impressive progress of point-based POMDP algorithms,^18–19 solving POMDP with a large number of states remains a challenge. One way to overcome the difficulty is building approximations of belief.^20–21 These algorithms have been successfully applied to a variety of robotic tasks, including navigation,²¹ autonomous driving¹⁸ and robot motion planning.²² Another way to accomplish such reduction is to represent the state space hierarchically.^23–24 The other obstacle is that robotic systems often have the pervasive problem of perceptual aliasing. The current motor output depends solely on the current perception and the past inputs are treated as irrelevant for determining the current behaviour. The effects of aliasing can be reduced by incorporating additional sensory information that suffices to disambiguate between any two given situations.^25–26 However, these methods still rely on current perceptions and thus are unable to overcome aliasing in general environments. Thus, through introducing the activation and stimulation mechanism of state neurons, we propose the EM-MDP to build episodic cognitive map for robot navigation under uncertainty. It further scales up POMDP algorithms for realistic robotic navigation tasks under uncertainty and can address the problems of high computational complexity and perceptual aliasing.

The proposed self-learning method can autonomously and real-timely learn the environmental experience to construct episodic memory based on the EM-MDP. Different from the situation sequences which generated by specific grammar or rule in learning process, the structure of event sequences in episodic memory has no inherent law, and it is organized only by the experience of robot. The sequences are flexible and their complexity varies in accordance with variation of scene appearance. Also, different from the learning methods applying in classification of visual objects and scenes, such as storing knowledge as the weight parameter, the proposed learning method models environment using episodic memory. It can realize the accumulation, integration and updating of experience. The contribution of this article is we use biology-based episodic memory to build cognitive map for robot navigation under uncertainty based on the EM-MDP framework. We employ state neurons to derive a compact low-dimensional representation of belief spaces, propose the activation and stimulation mechanism of state neurons to localize events and organize episodic memory and implement and validate the self-learning approach for building episodic cognitive map. According to the episodic cognitive map, robot can simply predict an optimal behavioural strategy to adapt to the uncertain environment and the navigation task by activation spreading of state neurons.

EM-MDP framework

The EM-MDP framework utilizes episodic memory in cognitive domain to represent the components of a robot’s experience. Neuroscience research²⁷ shows that episode stimulation contributes to the changing of the firing patterns of hippocampal CA1 neurons. This change is related to the episodic events’ characteristics and the occurrence of environment. Statistical methods indicate that the neurons activation in low-dimensional space forms an obvious cluster coding mode. By imitating this mechanism, the mathematical model of robotic episode is tried to establish. An episode includes a temporal sequence of m events e and a goal g which is the robotic target observation after the episode

E = {(e_{1}, e_{2} \dots e_{m}), g}

Event e is a tuple < o,s,b,R > of sensor observations o which are a set of observed feature, state neuron s for estimating current robotic state, robot behaviour b which is robot action output and reward R which indicates how much the current state is desirable for robot. The behaviour b is computed according to the odometry. The reward R is represented by the event distance between current event and target event.²⁸ The state neuron is defined and inspired by the biological significance of Hippocampal CA1 neuron for constructing the neural network of episodic memory. Driven pattern of state neurons sequence is shown in Figure 1. During robot learning, successive assemblies of neurons respond sequentially owing to the changing of environmental landmarks. During robot navigation, sequential activation is supported by self-organized patterning. Not only first-order (neighbour) but also higher order (non-neighbour) connections can be represented in state neurons network. A many-to-one mapping projection $f : o \to s$ from high-dimensional observation o to one-dimensional state neuron s is defined. There is always a unique corresponding state neuron for any observation.

Figure 1.

Driven pattern of the activation sequence of the state neurons.

The EM-MDP framework is shown in Figure 2. For current input observation o_c, the state neuron is activated when the similarity measure between o_c and the observation in episodic memory is greater than a desired activation threshold θ_o, that is, $μ_{i} \geq θ_{o}$ . The activation of state neuron s_i in current time t can be represented by $s_{i}^{n} (t) = 1$ . Then, the output activity of state neuron is represented by

s_{i}^{n} (t) = {\begin{cases} w_{i} s_{i}^{n} (t - 1) \begin{matrix} s_{i}^{n} (t) > θ_{n} \end{matrix} \\ 0 \begin{matrix} \begin{matrix}  \end{matrix} & s_{i}^{n} (t) \leq θ_{n} \end{matrix} \end{cases}

Figure 2.

The EM-MDP framework shows the episode organization by state neurons activation and stimulation. Yellow circle represents the current activated state neuron, while green circle represents its context state neurons. EM-MDP: episodic memory-driving Markov decision process.

where decay weight $w_{i} = e^{- 1 / τ}$ , and we set τ = 10 to simulate the decay of events. Threshold θ_n determines the depth of memory for state neurons. We define the set of current active state neurons as the context state neurons s^C of the current activated state neuron (Green circle). The transition weight $w_{i j} = s_{i}^{n} (t) s_{j}^{C} (t)$ denotes the event connection strength for constructing an episode and represents the relevance level of the context state neuron $s_{j}^{C}, j \in Γ_{i} = {i - k, \dots, i - 1}$ to the current activated neuron s_i .

The property of the EM-MDP framework benefits the computation in two ways. First, it assumes that the transition probability between any two events can be approximated by Hebbian learning based on the activation between corresponding state neurons. The computational time required to estimate the current state can be reduced. Second, when assessing the belief of each event, the transition between two state neurons that does not belong to a common episode trajectory can be ignored for reducing the computational steps.

Robotic episodic cognitive map building

Aiming at the issues of real-time storage, incremental accumulation and integration for robot experience and the self-learning model and algorithm based on the EM-MDP framework are proposed to simulate the creation of episodic memory for building episodic cognitive map. Robot without experience can perform task relying on learning environmental knowledge. In this process, robot autonomously learns scenario experience for building episodic cognitive map. We assume that the episodic cognitive map is represented by a discrete finite event space e and a set of transitions $T = {(e_{i}, e_{j}) | e_{i}, e_{j} \in E}$ . Each pair of events $(e_{i,} e_{j}) \in T$ which has a feasible transition is associated with a positive transition weight $w_{i j}$ . Thus, we have an edge-weighted episodic trajectories graph G with a vertex set e and edge set T. Each trajectory is a unidirectional linear transition of events. This can reduce behaviour control to a small-scale graph plan problem and make the robot predictable.

Episodic memory self-learning model

The episodic memory self-learning model is presented by Hebbian rule, inspired by adaptive resonance theory and sparsely distributed memory (SDM) which is a generalized random-access memory.²⁹ The self-learning model consists of observation input layer (O layer), observation similarity measure layer (U layer), state neurons layer (S layer) and episodic memory output layer (E layer), as shown in Figure 3. Based on the model, robot records activated events including observations and state neurons, changes transition weights $w_{i j}$ and the connection structure of events and then stores events and transition weights in E layer. U layer is a comparison mechanism between input observation and the mapped observations of state neurons in episodic memory. Through computing similarity measure, it updates the mapped observations of activated state neurons and decides which state neuron in S layer is most activated, and then updates the transition weights from presynaptic neurons set to this prevailed state neuron. In this model, observations in episodic memory can be updated through online learning, while the number of state neurons can increase dynamically.

Figure 3.

Structure of episodic learning model.

The structure of U layer is shown in Figure 4(a). The number of nodes in U layer is n which equals to the dimension of input observation o_c. The output u is determined by the current input observation o_c, a control signal C1 and the mapped observations o_i of state neurons in episodic memory through S layer’s feedback. It is computed based on these three signals through the principle of majority voting. When C1 = 1 and the mapped observation o_i is equal to 0 which means no observation exists in episodic memory, the output u is determined by the input observation, that is, u =o_c. When C1 = 0 and the mapped observation o_i is unequal to 0, the output u is determined by the comparison between the input observation and the mapped observations. If the similarity measure is greater than a threshold, the mapped similar observations $o_{i} (i \in H_{o})$ are updated based on the input observation, and Ho is the number of the similar observations to the input observation o_c. Otherwise, a new observation will be generated as output, that is, $o_{m + 1}$ =o_c.

Figure 4.

Structure of U layer and S layer. (a) U layer. (b) S layer.

The structure of S layer is shown in Figure 4(b). This layer has m state neurons. The number of state neurons can dynamically increase through generating new state neurons. The weight vector from U layer to state neuron s_i in S layer is represented by the mapped similar observation $o_{i} (i \in H_{o})$ . The adjacent set H_s , which represents the similar categories of current environment status, is obtained. Then, the prevailed state neuron is generated by competition. We define the only prevailed state neuron is the output of the S layer and set $s_{i^{*}} = 1, \forall i^{*} \in H_{s}$ . The output of other state neurons is 0. Also the transition weights w which represent the connect relationship of events forming the episodic memory are updated.

Control signal C1: If the outputs of all state neurons in the S layer are equal to 0, and the input observation is unequal to 0, then C1 = 1, otherwise, C1 = 0. It indicates that the model obtains u =o_c at the beginning of learning, that is, C1 = 1. Then C1 = 0 and the output u is determined by the comparison of the input observation and the mapped similar observations.

Control signal C2: If the input observation is equal to 0, that is, o_c= 0, then C2 = 0, otherwise, C2 = 1. It is used for detecting whether there is an input observation.

Episodic memory self-learning algorithm

Episodic self-learning model receives the input observation from robotic surrounding environment and confirms that whether the input observation exists in robotics episodic memory through computing the similarity, then chooses the storage way for current event. The episodic memory self-learning algorithm is shown in Algorithm I. Assume that there are m(t) occupied state neurons, i = 1∼m(t). Based on the current input observation o_c, the similarity measure of the mapped observations o_i in the S layer to o_c is computed by

μ_{i} = o_{i}^{T} o_{c} / ‖ o_{i} ‖ ‖ o_{c} ‖

Then numbers of similar observations (H_o ) and the adjacent set H_s of the state neurons are obtained (shaded ovals in Figure 5). The similarity measure directly translates into the location’s similar degree which is continuous in the 0, 1 interval. Then, the predicted value for the current input observation o_c is computed as

\hat{f} (o_{c}) = \frac{\sum_{i \in H_{o}} μ_{i} o_{i}}{\sum_{i \in H_{o}} μ_{i}}

Figure 5.

Each episodic trajectory, G_i , has an event space e (circle) and a set of transition weights T (blue arrows). The current observation o_c attempts to match it to the episodic memory and updates the similar observations in events. The matching is taken from similar events (shaded ovals in a local region) around the prevailed event (bold circle) in episodic memory.

Upon receiving observation o_c, all similar observations in episodic memory are updated using the standard gradient descent algorithm for linear function approximation

o_{h} (t) = o_{h} (t) + α [o_{c} (t) - \hat{f} (o_{c} (t))] \frac{μ_{h}}{\sum_{i \in H_{o}} μ_{i}}, \begin{matrix} \forall h \in H_{o} \end{matrix}

where α is a learning rate. The system considers s_i* as the current prevailed state neuron by computing

i^{*} (t) = \underset{i = 1, \dots, m}{\arg \max} μ_{i}

Then the transition weights between its context state neurons $s_{i}^{C}$ to the prevailed state neurons s_i* , $i \in Γ_{i^{*}} = {i^{*} - k, \dots, i^{*} - 1}$ , are updated

\underset{i \in Γ_{i^{*}}}{w_{i i^{*}} (t)} = {\begin{cases} \begin{matrix}  \end{matrix} s_{i^{*}}^{n} (t) \begin{matrix} s_{i}^{C} (t) & \begin{matrix}  \end{matrix} w_{i i^{*}} (t - 1) = 0 \end{matrix} \\ w_{i i^{*}} (t - 1) + η (s_{i}^{C} (t) - w_{i i^{*}} (t - 1)) \begin{matrix}  \end{matrix} otherwise \end{cases}

where η is a learning rate coefficient.

If no state neuron is activated, that is, $μ_{i} < θ_{o}$ , indicating that o_c is a new observation which can be distinguished even though there are episodes in robotic memory before episodic self-learning. Then the model generates a new observation o_m+1 storing in a new event e_m+1 and activates a new state neuron, that is, $s_{m + 1}^{n} (t) = 1$ . The connections between the new state neuron and its context state neurons Γ_m+1 are generated, and it can be represented by

\underset{i \in Γ_{m + 1}}{w_{i, m + 1}} = s_{m + 1}^{n} (t) s_{i}^{C} (t)

Thus, the episodic cognitive map which includes incremental episodic memory with state neurons network is built.

Algorithm 1: Episodic memory self-learning algorithm
Input: Current observation o_c and robot behavior b Output: Episodic trajectories graph G with a vertex set e and edge set T if t=1 $o_{1} (t) = o_{c} (t)$ , record m=1 as the occupied neuron else if t>1 compute μ_i if $μ_{i} \geq θ_{o}$ compute the predicted observation $\hat{f} (o_{c})$ update observations $o_{h}, \forall h \in H_{o}$ obtain the prevailed state neuron s_i* update transition weights $w_{i i^{}} (t)$ update behavior b and reward R else generate a new event e_m+1* ( $o_{c}, s_{m + 1}, b, R$ ) generate new weights $w_{i, m + 1}$ end if end if return G ( e , T)

Algorithm 1: Episodic memory self-learning algorithm

Input: Current observation o_c and robot behavior b

Output: Episodic trajectories graph G with a vertex set e and edge set T

if t=1

$o_{1} (t) = o_{c} (t)$ , record m=1 as the occupied neuron

else if t>1

compute μ_i

if $μ_{i} \geq θ_{o}$

compute the predicted observation $\hat{f} (o_{c})$

update observations $o_{h}, \forall h \in H_{o}$

obtain the prevailed state neuron s_i*

update transition weights $w_{i i^{*}} (t)$

update behavior b and reward R

else

generate a new event e_m+1 ( $o_{c}, s_{m + 1}, b, R$ )

generate new weights $w_{i, m + 1}$

end if

return G ( e , T)

Uncertain information processing

For robust similarity matching performance between the current observation and the observations in episodic memory, we use biology-inspired attention to obtain salient landmarks³⁰ of input scenes. An input image is subsampled into a dyadic Gaussian pyramid with nine scales in three channels (intensity (I); orientation (O) for 0°, 45°, 90° and 135°; colour (C) in red/green and blue/yellow). The feature intensity is computed by centre-surround mechanisms. The contributions of the sub-features are summed and normalized once more to yield conspicuity maps. All conspicuity maps are combined into one saliency map. The extraction of salient landmarks is shown in Figure 6(a). The bright pixels are salient pixels with respect to their backgrounds. Detection and matching of salient landmarks are shown in Figure 6(b). The salient landmarks are stable natural landmarks since there are usually few regions selected. It can reduce the amount of features to be stored and matched and has robustness for dynamic information and local environmental changes of environment. Then, the system calculates local binary pattern descriptor for each landmark and obtains feature histogram as the observation in event. The tested accuracy of event localization is over 94% for dynamic environment.

Figure 6.

(a) Bottom-up attention to obtain salient landmarks of input scenes. (b) Detection and matching of salient landmarks in two cases. Left: current detected landmarks. Right: matching landmarks in episodic memory.

In the learning process, we hope to obtain an orderly unidirectional linear transition of situations to represent the environmental model. Most of the time, the proposed learning method can achieve accurate localization for the scenes belonging to a same category. However, only rely on observation matching may lead to mislocalization, such as the occasions with observation change, imperfect robot control or much more dynamic targets. In order to reduce the influence of uncertain information and eliminate perceptual aliasing, we use the activation latency character of state neuron. Only when $s_{i}^{n} (t) \leq θ_{n}$ , the state neuron can be reactivated. For two nonadjacent observations and $μ \geq θ_{o}$ , if the state neuron corresponding to earlier observation is not activated, then they are mapped to the same state neuron but belong to different events. Otherwise, no event is recorded, and the later observation is considered as interference.

Robot navigation using episodic cognitive map

The EM-MDP framework enables us to treat a high-dimensional belief space as a low-dimensional state neurons. We can utilize the characters of episodic memory and state neurons to compute global planning policy for robot navigation. The concept of neuron synaptic potential is utilized to efficiently localize event. An episode E_i can be considered as the equivalent of a state space in POMDP. However, in an episode, events are organized in a unidirectional temporal linear trajectory. The planner does not need to consider the full configuration of the state in trying to figure out the feasibility of robot path. Based on linear transfer characteristics of episodic events and the activation characteristics of state neurons, the EM-MDP framework can overcome the curse of dimensionality problem and make the event matching process more efficient (O (n ²) to O (n)). It can also avoid the problem of perceptual aliasing through the prediction of robotic behaviour based on current event as well as its context events sequence.

The schematic of global planning for robot navigation is shown in Figure 7. Based on the learned episodic cognitive map, episodic recollection is utilized for dealing with the environment in which the robot currently being located. The relevant episodes are recollected based on similarity measure μ_i with respect to the goal g that robot pursues. The activation spreading of state neurons is used to predict optimal episodic trajectories. The transition weights set T and the state neuron activation $s_{i}^{n} (t)$ of retrieved episodes can be represented as a matrix

E = [\begin{matrix} s_{1}^{n} & w_{12} & \dots & w_{1 i} & \dots & w_{1 m} \\ w_{21} & s_{2}^{n} & \dots & w_{2 i} & \dots & w_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ \\ w_{i 1} & w_{i 2} & \dots & s_{i}^{n} & \dots & w_{i m} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ w_{m 1} & w_{m 2} & \dots & w_{m i} & \dots & s_{m}^{n} \end{matrix}]

Figure 7.

Schematic of global planning for robot navigation. Three episodes are retrieved based on episodic recollection according to robot’s task. E ₁, E ₂ and E ₃ are the episodic trajectories. The system selected an optimal episodic trajectory based on activation spreading.

The rows and columns of the matrix represent the serial number of events. Based on this matrix, the maximum transition weight for row vector is computed by

w_{i}^{*} = \underset{j = 1 \sim m}{\arg \max} w_{i j}

If the quantity of maximum transition weight in a row is more than 1, we define the events for these rows as crossover events which are similar to the intersection of multiple routes. It means that there are varieties of driving choices from the start event to target event. Then, the episodic trajectories which have minimum summation of reward R from the crossover event to target event are predicted as the optimal episodic trajectories.

After the optimal episodic trajectories are selected, the system then decides whether current event is localized at some events existed in the episodic trajectories. If there is state neuron s_i , i = 1…n being activated, it indicates that the current observation is mapped and localized at some state neurons s_i . However, in the EM-MDP framework, the current event $e_{i} (t)$ in an episode is located by its activation and the gradually activation of k-units events $e_{i - k, \dots,} e_{i - 1}$ , where k > 1 is a parameter determining the amount of memory available for predicting behaviours

L_{e_{i} (t)} = {\begin{cases} 1 \begin{matrix} μ_{i} \geq θ_{o} \land s_{j}^{C} (t) \end{matrix} > θ_{n}, j = {i - k, \dots, i - 1} \\ 0 \begin{matrix} \begin{matrix} otherwise \end{matrix} \end{matrix} \end{cases}

This process is a condition for event localization. It can effectively avoid perceptual aliasing. Thus, the system computes the belief of being at some events through introducing neuron synaptic potential. The effect of the sequence of past activated state neurons for the current activated neuron s_i is expressed by the synaptic potential

V_{i} (t) = \sum_{j \in Γ_{i}} B_{j i} ϕ (s_{j}^{n} (t), ω_{j i})

where B_ji is a bias factor, which means the importance of state neuron s_j in the presynaptic set of neuron s_i , while the value can be set in advance, $\sum_{j \in Γ_{i}} B_{j i} = 1$ . Function φ represents the similar degree of the activation value between the state neuron s_j in presynaptic set and the context of state neuron s_i in episodic memory

φ (x, y) = {\begin{cases} \min (| \frac{x}{y} |, | \frac{y}{x} |) \begin{matrix} x y \neq 0 \end{matrix} \\ \begin{matrix}  \end{matrix} \begin{matrix} 0 & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix}  \end{matrix} \end{matrix} \end{matrix} & x y = 0 \end{matrix} \end{matrix} \end{cases}

Thus, the event that the activated state neuron has maximum synaptic potential and the synaptic potential exceeds a threshold θ_v is localized. The localized event can be considered as the best representation of robot current event. Then, the behaviour b is obtained for robot navigation.

Experimental results

The proposed method is implemented to perform learning, localization, mapping and navigation in real-life situations which include dynamic pedestrians passing through and local environmental changes. We use a P3-DX Pioneer robot, which is equipped with a web camera, as well as encoders and sonars to calculate odometry and distances, respectively. Image size is 320 × 240. The robot obtains a 259-dimensional feature vector as observation. The desired matching bound θ_o is 0.93 and the synaptic potential threshold θ_v is 0.6. The learning rate is 0.1. We consider nine past events as the episodic memory depth. The driving speed of the robot is 100–400 mm/s and the turning speed is 10°–15°/s according to the environment type. The operating frequency is around 4 Hz on a 2.93 GHz CPU. This can meet the real-time learning and navigation requirement of our robot system.

Episodic cognitive map building

Figure 8 shows the results of the robot system creating episodic memory for learning experience based on EM-MDP. The robot autonomously wanders in an office environment which includes pedestrians passing through and local environmental changes. The relevant information is recorded as events and transition weights. In the learning process of events sequence forming episodic memory, subsequent state neurons are activated in succession after the activation of current state neuron. The activated state neurons become the context of the current state neuron. The robot first learns 300 samplings and then learns 200 samplings for testing the performance of the proposed method. Initially, surrounding everything is new for robot, so everything is saved in the first learning. In the second learning, when the robot revisits the route, it typically needs to record less and less each time. From Figure 8(c), we can see that the reactivation of state neurons is less for the first 300 samplings, while more and more for the last 200 samplings. The system can continuously learn new event sequence rather than repeating training. It records state neurons when robot only drives a new environment and updates episodic memory after each sampling. Approximately 40% of the samplings are familiar in the second learning. At the end of learning, the system records 321 state neurons and 352 events. Based on the EM-MDP, the system generates events chain without redundancy and then forms episodic cognitive map storing as episodic memory.

Figure 8.

Episodic memory learning process for building the episodic cognitive map in real-life office site. (a) Robot wandering routes. (b) A part of saliency landmarks of input scenes. (c) The record process of the state neurons during robot wandering.

For the overlapping path in Figure 8(a), the proposed learning method employs events sequence over time to indicate the robot path, rather than forming an optimal trajectory like other learning methods. So the robot system will dynamically optimize and integrate the observations and the state neurons of the overlapping path, but the overlapping path in different paths is represented by different events rather than events merging. This expression is benefit for robotic planning because of the well-organized events sequence.

Figure 9 shows the activation update process of state neurons in the learning process of Figure 8. We only show the changing of state neurons which the mark numbers are 6 and 149. For state neuron 6, it is activated again after m equals 263, which indicates that the robot reminds a previous experience (same observation) in episodic memory. For state neuron 149, it is activated four times, which indicates that the robot travels the same place four times. The transition weights which connect state neurons are shown in Figure 10. They are constantly updated in the learning process and have attenuation process. The structure of transition weights can also be adjusted. The proposed EM-MDP framework can better simulate the organization and decay process of robotic episodic memory and avoid redundancy. The episodic memory is dynamic and can incrementally accumulate.

Figure 9.

Activation changing of the state neurons in episodic memory learning process.

Figure 10.

Creation of the transition weights during robot wander.

Robot navigation using the episodic cognitive map

Figure 11 illustrates the experimental result of robotic planning control in corridor. Robot only travels once for learning episodic cognitive map since there are few features in corridor. Giving an arbitrary route, the system learns 116 state neurons from a sequence of 147 observations. When we put the robot back into the environment, it can follow the planned route despite there are mislocalized events due to the uncertainty. Figure 12 illustrates the results of localized event and observation similarity in each visit step. The first large event localization failure is around visit steps 3–8 and 14–16 because there are pedestrians passing through for the first time. So the route has something the robot had not encountered before in learning. But our system can also successfully achieve the travel based on the past episodic memory according to equation (11). The full traverse only has five localization failure. Note that robot records five new events in this travel and updates episodic memory to form total 121 events at last. It will be benefit for robot planning next time.

Figure 11.

Results of robot’s planning control in corridor.

Figure 12.

Results of localized event and observation similarity in each visit step.

Based on the created episodic memory shown in Figure 8, a common task which the robot helps user to pour a cup of water is designed to verify the navigation performance. For mobile robot, the tasks are that the robot moves from current location A to the kettle location B and then moves to the user location C. From the results of Figure 13, the robot can directly move from the current location A to the kettle location B since the path has been experienced during learning. When the robot moves from the kettle location B to the user location C, It will result in unnecessary path according to the learning route. Actually, there are many crossover events. The episodic trajectory which leads to the shortest route is predicted. Thus, the proposed episodic cognitive map can predict a suitable pass to avoid selecting time-consuming route. The results of robot navigation performance under uncertainty are shown in Table 1. Average similarity and event localization degree refer to the average similarity measure between the current observation and the observations in episodic memory and the accuracy for event localization in the episodes, respectively. Average error refers to the navigational error which is the deviation from centre of the learned route. The experimental results show that the robot predicts different routes according to different tasks to adapt to navigation under uncertainty.

Figure 13.

Results of robot navigation in office.

Table 1.

Results of robot navigation performance under uncertainty.

Site	Average similarity	Event localization (%)	Average error (m)
Office	0.922	97.1	0.020
Corridor	0.903	94.4	0.026

Discussion

We verify the robustness for interferences of uncertain information in robotic episodic memory learning. Through the proposed learning method, system constructs a well-organized event sequences for environmental cognition. The comparison of event sequences over time before and after processing uncertainty is shown in Figure 14(a). Only depend on observation matching will result in mislocalization (blue line). In Figure 14(a), the 3, 49 and 62 time steps have located the robot’s experienced events (red ellipse). We find that illumination condition is the main cause of the mislocalization. However, using our method for processing uncertainty, the system can get a unidirectional linear event chain (green line) corresponding to the robot’s experience. The effective rate (events continuity over time) of the event sequences can be 100%. The system shows a strong robustness.

Figure 14.

(a) Comparison of events sequence over time before and after processing uncertainty. (b) Comparison of events usage number over time.

The comparison of events usage number over time when SDM is exploited (limited sampling) and not exploited³¹ (full sampling) is shown in Figure 14(b). Using the episodes learning method, the event numbers are reduced from 53 to 37. It indicates that the system will be more reliable for handling uncertainty and more efficient. The episodic cognitive map can be used to repeat previously visited routes similar to topological map. However, it often predicts multiple feasible passes for the task based on current environment cognition. The robot has a wider region of operations than a single pass teach-and-repeat system. A key property of our system is that the process of episodic memory learning and navigation can be run in parallel. This can realize the update of robotic experience in real time.

Different from the simultaneous localization and mapping method, the proposed episodic cognitive map is built for robot navigation under uncertainty based on EM-MDP. The structure of event sequences in episodic memory has no inherent law. It can efficiently represent environmental experience using the edge-weighted episodic trajectories graph with the discrete event space set and transitions set and realize the autonomous and real-time accumulation, integration and updating for environmental experience. Thus, the planning algorithm based on episodic cognitive map can simply use the characters of state neurons. The robot can predict an optimal episodic trajectory (the trajectory may be unfamiliar and the robot did not learn it in episodic learning) to adapt to the uncertain environment and navigation task according to episodic memory.

Because the events in episodic memory are formed as several unidirectional linear trajectories, the global planning process can be computed by activation spreading of state neurons without iteration. The computational time required to estimate the current state can also be reduced through the introducing of one-dimensional state neuron activation and synaptic potential. Thus, the robot navigation under uncertainty can be computed efficiently based on EM-MDP (O (n)) compared to POMDP (O (n ²)). We also compare the accuracy of event localization between EM-MDP and POMDP, as shown in Table 2. The table shows that the EM-MDP framework can effectively deal with the perceptual aliasing problem with the correct rate over 95% for most cases. Note that the mislocalized five samplings in the EM-MDP framework are caused by no events localization due to execution error rather than perceptual aliasing. Our system records these five new events in real time and updates episodic memory to realize the self-learning in robot navigation. The proposed algorithm does not require a cost map like,¹⁷ and it differs from many other navigation modules where robot does not need to follow a global path. Instead, the robot follows the predicted episodic trajectory including a sequence of behaviours. This is similar to human travel to a target location by the following guidance.

Table 2.

Comparison of accuracy for event localization.

Sample number		40	80	120	160	200
POMDP	Correct number	35	74	110	137	173
POMDP	Correct rate (%)	87.50	92.50	91.67	85.63	86.50
EM-MDP	Correct number	37	77	117	155	195
EM-MDP	Correct rate (%)	92.50	96.25	97.50	96.88	97.50

POMDP: partially observable Markov decision process; EM-MDP: episodic memory-driving Markov decision process.

Conclusion

In this article, we provide a cognitive map building method for robot navigation under uncertainty using biologically inspired episodic memory. It can realize that the robot evaluates past events, predicts current state and plans desired behaviour sequence. The proposed EM-MDP framework can avoid the problem of curse of dimensionality and perceptual aliasing in POMDP. Considering the biological basis of hippocampal neuron, the mapping from multidimensional observation to one-dimensional state neuron is built. The activation and stimulation pattern of state neurons is introduced to EM-MDP. The learning method has the bionic ability of self-organizing through changing the structure of state neuron network. It can realize robotic memory real-time storage, incremental accumulation, integration and updating. The behaviour sequence is predicted using the activation spreading of the state neurons for robot navigation based on the episodic cognitive map. Several real-life tasks are carried out to demonstrate the applicability and usefulness of the developed approach. Experimental results show that the robot predicts different routes according to different tasks under uncertainty. Future work will focus on the application of path integration in the navigation system.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The project received support from the National Natural Science Foundation of China (61503057).

References

Shim

Tian

Yuan

. Direction-driven navigation using cognitive map for mobile robots. In: Burgard

Arai

. (eds) IEEE/RSJ international conference on intelligent robots and systems, 14–18 September 2014, Chicago, IL, USA, pp. 2639–2646. IEEE.

Chatty

Gaussier

Kallel

. Adaptation capability of cognitive map improves behaviors of social robots. In: Nagai

(ed.) IEEE international conference on development and learning and epigenetic robotics, San Diego, California, USA, 7–9 November, 2012, pp. 1–6. IEEE.

Lowry

Sunderhauf

Newman

. Visual place recognition: a survey. IEEE Trans Robot 2016; 32(1): 1–19.

Milford

Wyeth

. Persistent navigation and mapping using a biologically inspired SLAM system. Int J Robot Res 2010; 29(9): 1131–1153.

Tian

Shim

Yuan

. RGB-D based cognitive map building and navigation. In: Okamura

(ed.) Proceeding of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo, Japan, 3–7 November, 2013, pp. 1562–1567. IEEE.

Chatty

Gaussier

Hasnain

. The effect of learning by imitation on a multi-robot system based on the coupling of low-level imitation strategy and online learning for cognitive map building. Adv Robot 2014; 28(11): 731–743.

Mathis

Yulia

Gregor

. A robotic architecture for action selection and behavioral organization inspired by human cognition [C]. In: Amato

Burgard

Cheng

. (eds) IEEE/RSJ international conference on intelligent robots and systems, Vilamoura-Algarve, Portugal, 7–12 October, 2012, pp. 2457–2464. IEEE.

Endo

. “Anticipatory robot control for a partially observable environment using episodic memories”. In: Bekey

Mataric

Schenker

. (eds) Proceeding IEEE international conference on robotics and automation, Pasadena, CA, USA, May 19–23, 2008, pp. 2852–2859. IEEE.

Tulving

Kapur

Craik

FIM

. Hemispheric encoding/retrieval asymmetry in episodic memory: Positron emission tomography findings [J]. Proc Natl Acad Sci USA 1994; 91(6): 2016–2020.

10.

Jockel

Lindner

Zhang

. Sparse distributed memory for experience-based robot manipulation[C]: IEEE international conference on robotics and biomimetics, Guilin, Guangxi, China, 2009, pp. 1298–1303.

11.

Baxter

Browne

. Memory-based cognitive framework: a low-level association approach to cognitive architectures advances in artificial life[C]. Lect Notes Comput Sci 2011; 5777: 402–409.

12.

Rebai

Azouaoui

Achour

. Bio-inspired visual memory for robot cognitive map building and scene recognition[C]. In: Amato

Burgard

Cheng

. (eds) IEEE/RSJ international conference on intelligent robots and systems, Vilamoura-Algarve, Portugal, 7–12 October, 2012, pp. 2985–2990. IEEE.

13.

Jockel

Westhoff

Zhang

. EPIROME - A novel framework to investigate high-level episodic robot memory[C]. In: Meng

Hamel

Kosuge

. (eds) IEEE international conference on robotics and biomimetics, Sanya, China, 15–18 December, 2007, pp. 1075–1080. USA: IEEE.

14.

Stachowicz

Kruijff

GJM

. Episodic-like memory for cognitive robots [J]. IEEE Trans Auton Men Develop 2012; 4(1): 1–16.

15.

Kelley

. Robotic dreams: a computational justification for the post-hoc processing of episodic memories [J]. Int J Mach Consci 2014; 6(2): 109–123.

16.

Park

Yoo

Kim

. Integrated adaptive resonance theory neural model for episodic memory with task memory for task performance of robots. In: Tan

Yen

Estevez

. (eds) IEEE Congress on Evolutionary Computation, Vancouver, Canada, 24–29 July, 2016, pp. 4873–4880. IEEE.

17.

Bai

Cai

. Intention-aware online POMDP planning for autonomous driving in a crowd. In: Okamura

Bajcsy

Parker

(eds.) Proceeding IEEE international conference on robotics and automation, Seattle, USA, 26–30 May, 2015, pp. 454–460. IEEE.

18.

Wei

Dolan

Snider

. “A point-based MDP for robust single-lane autonomous driving behavior under uncertainties”. In: Li

Papanikolopoulos

(eds.) Proceeding IEEE international conference on robotics and automation (ICRA), Shanghai, China, 9–13 May, 2011, pp. 2586–2592. IEEE.

19.

Shani

Pineau

Kaplow

. A survey of point-based POMDP solvers. Auton Agents Multi Agent Syst 2013; 27(1): 1–51.

20.

Roy

Gordon

Thrun

. “Finding aproximate POMDP solutions through belief compression”. J Artif Intell Res 2005; 23: 1–40.

21.

Devin

Mark

Lydia

. Automated model approximation for robotic navigation with POMDPs. In: Parker

Dillmann

Siegwart

. (eds) Proceeding IEEE international conference on robotics and automation, Karlsruhe, Germany, 6–10 May, 2013, pp. 78–84. IEEE.

22.

Chen

Frazzoli

Hsu

. POMDP-lite for robust robot planning under uncertainty. In: Okamura

Kragic

Asfour

. (eds) IEEE international conference on robotics and automation, Stockholm, Sweden, 16–21 May 2016, pp. 5427–5433. IEEE.

23.

Ong

SCW

Png

Hsu

. POMDPs for robotic tasks with mixed observability. In: Trinkle

Matsuoka

Castellanos

(eds.) Proceeding Robotics: Science and Systems, Seattle, USA, 28 June–1 July, 2009. MIT Press.

24.

Jiang

. Hierarchical policy iteration for large-scale POMDP systems. In: Xue

Satoshi

(eds.) Chinese Control Conference (CCC), Hangzhou, China, 28–30 July 2015, pp. 2401–2406. TCCT and SICE.

25.

Luo

Lai

. Enriched indoor map construction based on multisensor fusion approach for intelligent service robot. IEEE Trans Indust Elect 2012; 59(8): 3135–3145.

26.

Nagla

Uddin

Singh

. Multisensor data fusion and integration for mobile robots: A review. Int J Robot Autom 2014; 3(2): 131–138.

27.

Lin

Osan

Shoham

. “Discovery and identification for the functional units of real-time encoding on episodes experience of mouse hippocampal neural networks”. Natural Sci 2005; 5(6): 208–216.

28.

Liu

Cong

. Robotic cognitive behavior control based on biology-inspired episodic memory. In: Okamura

Bajcsy

Parker

(eds.) IEEE international conference on robotics and automation, Seattle, USA, 26–30 May, 2015, pp. 5054–5060. IEEE.

29.

Bohdana

Swaminathan

Doina

. Sparse distributed memories in reinforcement learning: case studies [J]. In: Jean-François

Fosca

Floriana

. (eds) 15th European conference on machine learning, Pisa, Italy, 20–24 September, 2004, Springer.

30.

Liu

Cong

. GMM-based visual attention for target selection of indoor robotic tasks [J]. Indust Robot Int J 2013; 11(6): 583–596.

31.

Chen

Birchfield

. Qualitative vision-based path following [J]. IEEE Trans Robot 2009: 25(3): 749–754.