Abstract
This article proposes a self-learning method of robotic experience for building episodic cognitive map using biologically inspired episodic memory. The episodic cognitive map is used for robot navigation under uncertainty. Two main challenges which include high computational complexity and perceptual aliasing are addressed. The episodic memory-driving Markov decision process is proposed to simulate the organization of episodic memory by introducing neuron activation and stimulation mechanism. Episodic memory self-learning model and algorithm are presented for building the episodic cognitive map based on episodic memory-driving Markov decision process. Uncertain information is considered to improve mapping performance. The presented method can realize robotic memory real-time storage, incremental accumulation, integration and updating. Based on the episodic cognitive map, the predicted episodic trajectory can simply be computed by activation spreading of state neurons. The experimental results for a mobile robot indicate that the method can efficiently performs learning, localization, mapping and navigation in real-life office environments.
Introduction
Building an integrated mobile robot that can navigate freely and deliberately in an indoor environment is a challenging task. 1 The robot needs to be integrated with several functional mechanisms, such as scene understanding, mapping, localization and path planning. Spatial cognition is the basic ability of mammals to perform navigation tasks. Researchers have been investigating how animals perceive space and navigate in environment. Cognitive map, 2–3 a map-like representation, which represents the spatial relationship among salient landmarks of an environment, is developed to solve navigation problems. Inspired by the navigation ability of humans or animals, the studies, 4 –6 which are related to how mammals perform mapping, localization and navigation, have gained extraordinary interests from the robotics community.
Current systems for robotic behavioural organization typically abstract from the low-level sensory-motor embodiment of the robot, leading to a gap between the level at which a sequence of actions is planned and the levels of perception and motor control. This gap is a major bottleneck for the autonomy of systems in complex and dynamic environments. 7 Human can better adapt to the complex environments and tasks, accumulate knowledge, recall previous experience to complete tasks and produce new knowledge and skills. Such critical behaviours are often made by experiences, relying on their learning. Similarly, for robots, the premise here is that experience does matter as well. 8 We argue that the learned experience can be integrated with the cognitive map to enable robot perceive and navigate freely in an environment. It is thus natural to seek inspiration in human experience in order to achieve their adaptivity and cognition.
This article presents a novel framework called episodic memory-driving Markov decision process (EM-MDP) to build the cognitive map of real environment using episodic memory. The biological inspiration of episodic memory comes from the mammalian hippocampus. 9 It is the collection of past experiences that occurred at a particular time and place. Previous works on memory implementing were from the viewpoint of engineering and learning the specific action at a sense. 10 –12 A episodic robot memory was proposed to improve action planning based on past experiences using image appearance and behaviour. It can provide one-shot learning capabilities to robot. 13 The episodic memories retained a sequence of experienced observation, behaviour and reward. 8 Simulation research of navigation task was processed through the observation of distance information. The event modelling in this article is partly inspired by the sequence. Stachowicz and Kruijff 14 adopted an index data structure storing episodic memory to provide knowledge for robotic cognitive model and carried out several long-time simulation experiments. Kelley 15 implemented a memory store to allow a robot to retain knowledge from previous experiences based on image to construct event. Park et al. 16 proposed an integrated adaptive resonance theory neural model for episodic memory with task memory. The model was used for robot performing serve cereal tasks. Most episodic memory models are focused on episodic-like memory which is a kind of memory structure instead of cognitive neuroscience episodic memory, not considered the uncertainty processing and applied for simple robotic tasks or simulation environments. The proposed episodic memory model in this article is inspired by the biological basis of hippocampal neurons. Using the episodic memory model, episodic cognitive map can efficiently organize robotic experience including robot internal state and the external environment based on the EM-MDP.
For robotic planning under uncertainty, partially observable Markov decision process (POMDP) is a powerful framework with a solid mathematical foundation and wide applicability. The uncertainties include imperfect robot control, sensor noise and unexpected environment changes. POMDP models the robot taking a sequence of actions under uncertainty to achieve a goal. 17 However, the use of POMDP for robot planning under uncertainty is not widespread. The resulting curse of dimensionality is one major obstacle. Despite the impressive progress of point-based POMDP algorithms, 18–19 solving POMDP with a large number of states remains a challenge. One way to overcome the difficulty is building approximations of belief. 20–21 These algorithms have been successfully applied to a variety of robotic tasks, including navigation, 21 autonomous driving 18 and robot motion planning. 22 Another way to accomplish such reduction is to represent the state space hierarchically. 23–24 The other obstacle is that robotic systems often have the pervasive problem of perceptual aliasing. The current motor output depends solely on the current perception and the past inputs are treated as irrelevant for determining the current behaviour. The effects of aliasing can be reduced by incorporating additional sensory information that suffices to disambiguate between any two given situations. 25–26 However, these methods still rely on current perceptions and thus are unable to overcome aliasing in general environments. Thus, through introducing the activation and stimulation mechanism of state neurons, we propose the EM-MDP to build episodic cognitive map for robot navigation under uncertainty. It further scales up POMDP algorithms for realistic robotic navigation tasks under uncertainty and can address the problems of high computational complexity and perceptual aliasing.
The proposed self-learning method can autonomously and real-timely learn the environmental experience to construct episodic memory based on the EM-MDP. Different from the situation sequences which generated by specific grammar or rule in learning process, the structure of event sequences in episodic memory has no inherent law, and it is organized only by the experience of robot. The sequences are flexible and their complexity varies in accordance with variation of scene appearance. Also, different from the learning methods applying in classification of visual objects and scenes, such as storing knowledge as the weight parameter, the proposed learning method models environment using episodic memory. It can realize the accumulation, integration and updating of experience. The contribution of this article is we use biology-based episodic memory to build cognitive map for robot navigation under uncertainty based on the EM-MDP framework. We employ state neurons to derive a compact low-dimensional representation of belief spaces, propose the activation and stimulation mechanism of state neurons to localize events and organize episodic memory and implement and validate the self-learning approach for building episodic cognitive map. According to the episodic cognitive map, robot can simply predict an optimal behavioural strategy to adapt to the uncertain environment and the navigation task by activation spreading of state neurons.
EM-MDP framework
The EM-MDP framework utilizes episodic memory in cognitive domain to represent the components of a robot’s experience. Neuroscience research
27
shows that episode stimulation contributes to the changing of the firing patterns of hippocampal CA1 neurons. This change is related to the episodic events’ characteristics and the occurrence of environment. Statistical methods indicate that the neurons activation in low-dimensional space forms an obvious cluster coding mode. By imitating this mechanism, the mathematical model of robotic episode is tried to establish. An episode includes a temporal sequence of m events
Event

Driven pattern of the activation sequence of the state neurons.
The EM-MDP framework is shown in Figure 2. For current input observation oc, the state neuron is activated when the similarity measure between oc and the observation in episodic memory is greater than a desired activation threshold θo, that is,

The EM-MDP framework shows the episode organization by state neurons activation and stimulation. Yellow circle represents the current activated state neuron, while green circle represents its context state neurons. EM-MDP: episodic memory-driving Markov decision process.
where decay weight
The property of the EM-MDP framework benefits the computation in two ways. First, it assumes that the transition probability between any two events can be approximated by Hebbian learning based on the activation between corresponding state neurons. The computational time required to estimate the current state can be reduced. Second, when assessing the belief of each event, the transition between two state neurons that does not belong to a common episode trajectory can be ignored for reducing the computational steps.
Robotic episodic cognitive map building
Aiming at the issues of real-time storage, incremental accumulation and integration for robot experience and the self-learning model and algorithm based on the EM-MDP framework are proposed to simulate the creation of episodic memory for building episodic cognitive map. Robot without experience can perform task relying on learning environmental knowledge. In this process, robot autonomously learns scenario experience for building episodic cognitive map. We assume that the episodic cognitive map is represented by a discrete finite event space
Episodic memory self-learning model
The episodic memory self-learning model is presented by Hebbian rule, inspired by adaptive resonance theory and sparsely distributed memory (SDM) which is a generalized random-access memory.
29
The self-learning model consists of observation input layer (O layer), observation similarity measure layer (U layer), state neurons layer (S layer) and episodic memory output layer (E layer), as shown in Figure 3. Based on the model, robot records activated events including observations and state neurons, changes transition weights

Structure of episodic learning model.
The structure of U layer is shown in Figure 4(a). The number of nodes in U layer is n which equals to the dimension of input observation oc. The output

Structure of U layer and S layer. (a) U layer. (b) S layer.
The structure of S layer is shown in Figure 4(b). This layer has m state neurons. The number of state neurons can dynamically increase through generating new state neurons. The weight vector from U layer to state neuron si in S layer is represented by the mapped similar observation
Control signal C1: If the outputs of all state neurons in the S layer are equal to 0, and the input observation is unequal to 0, then C1 = 1, otherwise, C1 = 0. It indicates that the model obtains u =oc at the beginning of learning, that is, C1 = 1. Then C1 = 0 and the output
Control signal C2: If the input observation is equal to 0, that is, oc= 0, then C2 = 0, otherwise, C2 = 1. It is used for detecting whether there is an input observation.
Episodic memory self-learning algorithm
Episodic self-learning model receives the input observation from robotic surrounding environment and confirms that whether the input observation exists in robotics episodic memory through computing the similarity, then chooses the storage way for current event. The episodic memory self-learning algorithm is shown in Algorithm I. Assume that there are m(t) occupied state neurons, i = 1∼m(t). Based on the current input observation oc, the similarity measure of the mapped observations oi in the S layer to oc is computed by
Then numbers of similar observations (Ho ) and the adjacent set Hs of the state neurons are obtained (shaded ovals in Figure 5). The similarity measure directly translates into the location’s similar degree which is continuous in the 0, 1 interval. Then, the predicted value for the current input observation oc is computed as

Each episodic trajectory, Gi
, has an event space
Upon receiving observation oc, all similar observations in episodic memory are updated using the standard gradient descent algorithm for linear function approximation
where α is a learning rate. The system considers si* as the current prevailed state neuron by computing
Then the transition weights between its context state neurons
where η is a learning rate coefficient.
If no state neuron is activated, that is,
Thus, the episodic cognitive map which includes incremental episodic memory with state neurons network is built.
Uncertain information processing
For robust similarity matching performance between the current observation and the observations in episodic memory, we use biology-inspired attention to obtain salient landmarks 30 of input scenes. An input image is subsampled into a dyadic Gaussian pyramid with nine scales in three channels (intensity (I); orientation (O) for 0°, 45°, 90° and 135°; colour (C) in red/green and blue/yellow). The feature intensity is computed by centre-surround mechanisms. The contributions of the sub-features are summed and normalized once more to yield conspicuity maps. All conspicuity maps are combined into one saliency map. The extraction of salient landmarks is shown in Figure 6(a). The bright pixels are salient pixels with respect to their backgrounds. Detection and matching of salient landmarks are shown in Figure 6(b). The salient landmarks are stable natural landmarks since there are usually few regions selected. It can reduce the amount of features to be stored and matched and has robustness for dynamic information and local environmental changes of environment. Then, the system calculates local binary pattern descriptor for each landmark and obtains feature histogram as the observation in event. The tested accuracy of event localization is over 94% for dynamic environment.

(a) Bottom-up attention to obtain salient landmarks of input scenes. (b) Detection and matching of salient landmarks in two cases. Left: current detected landmarks. Right: matching landmarks in episodic memory.
In the learning process, we hope to obtain an orderly unidirectional linear transition of situations to represent the environmental model. Most of the time, the proposed learning method can achieve accurate localization for the scenes belonging to a same category. However, only rely on observation matching may lead to mislocalization, such as the occasions with observation change, imperfect robot control or much more dynamic targets. In order to reduce the influence of uncertain information and eliminate perceptual aliasing, we use the activation latency character of state neuron. Only when
Robot navigation using episodic cognitive map
The EM-MDP framework enables us to treat a high-dimensional belief space as a low-dimensional state neurons. We can utilize the characters of episodic memory and state neurons to compute global planning policy for robot navigation. The concept of neuron synaptic potential is utilized to efficiently localize event. An episode Ei can be considered as the equivalent of a state space in POMDP. However, in an episode, events are organized in a unidirectional temporal linear trajectory. The planner does not need to consider the full configuration of the state in trying to figure out the feasibility of robot path. Based on linear transfer characteristics of episodic events and the activation characteristics of state neurons, the EM-MDP framework can overcome the curse of dimensionality problem and make the event matching process more efficient (O (n 2) to O (n)). It can also avoid the problem of perceptual aliasing through the prediction of robotic behaviour based on current event as well as its context events sequence.
The schematic of global planning for robot navigation is shown in Figure 7. Based on the learned episodic cognitive map, episodic recollection is utilized for dealing with the environment in which the robot currently being located. The relevant episodes are recollected based on similarity measure μi with respect to the goal

Schematic of global planning for robot navigation. Three episodes are retrieved based on episodic recollection according to robot’s task. E 1, E 2 and E 3 are the episodic trajectories. The system selected an optimal episodic trajectory based on activation spreading.
The rows and columns of the matrix represent the serial number of events. Based on this matrix, the maximum transition weight for row vector is computed by
If the quantity of maximum transition weight in a row is more than 1, we define the events for these rows as crossover events which are similar to the intersection of multiple routes. It means that there are varieties of driving choices from the start event to target event. Then, the episodic trajectories which have minimum summation of reward R from the crossover event to target event are predicted as the optimal episodic trajectories.
After the optimal episodic trajectories are selected, the system then decides whether current event is localized at some events existed in the episodic trajectories. If there is state neuron si
, i = 1…n being activated, it indicates that the current observation is mapped and localized at some state neurons si
. However, in the EM-MDP framework, the current event
This process is a condition for event localization. It can effectively avoid perceptual aliasing. Thus, the system computes the belief of being at some events through introducing neuron synaptic potential. The effect of the sequence of past activated state neurons for the current activated neuron si is expressed by the synaptic potential
where Bji
is a bias factor, which means the importance of state neuron sj
in the presynaptic set of neuron si
, while the value can be set in advance,
Thus, the event that the activated state neuron has maximum synaptic potential and the synaptic potential exceeds a threshold θv is localized. The localized event can be considered as the best representation of robot current event. Then, the behaviour b is obtained for robot navigation.
Experimental results
The proposed method is implemented to perform learning, localization, mapping and navigation in real-life situations which include dynamic pedestrians passing through and local environmental changes. We use a P3-DX Pioneer robot, which is equipped with a web camera, as well as encoders and sonars to calculate odometry and distances, respectively. Image size is 320 × 240. The robot obtains a 259-dimensional feature vector as observation. The desired matching bound θo is 0.93 and the synaptic potential threshold θv is 0.6. The learning rate is 0.1. We consider nine past events as the episodic memory depth. The driving speed of the robot is 100–400 mm/s and the turning speed is 10°–15°/s according to the environment type. The operating frequency is around 4 Hz on a 2.93 GHz CPU. This can meet the real-time learning and navigation requirement of our robot system.
Episodic cognitive map building
Figure 8 shows the results of the robot system creating episodic memory for learning experience based on EM-MDP. The robot autonomously wanders in an office environment which includes pedestrians passing through and local environmental changes. The relevant information is recorded as events and transition weights. In the learning process of events sequence forming episodic memory, subsequent state neurons are activated in succession after the activation of current state neuron. The activated state neurons become the context of the current state neuron. The robot first learns 300 samplings and then learns 200 samplings for testing the performance of the proposed method. Initially, surrounding everything is new for robot, so everything is saved in the first learning. In the second learning, when the robot revisits the route, it typically needs to record less and less each time. From Figure 8(c), we can see that the reactivation of state neurons is less for the first 300 samplings, while more and more for the last 200 samplings. The system can continuously learn new event sequence rather than repeating training. It records state neurons when robot only drives a new environment and updates episodic memory after each sampling. Approximately 40% of the samplings are familiar in the second learning. At the end of learning, the system records 321 state neurons and 352 events. Based on the EM-MDP, the system generates events chain without redundancy and then forms episodic cognitive map storing as episodic memory.

Episodic memory learning process for building the episodic cognitive map in real-life office site. (a) Robot wandering routes. (b) A part of saliency landmarks of input scenes. (c) The record process of the state neurons during robot wandering.
For the overlapping path in Figure 8(a), the proposed learning method employs events sequence over time to indicate the robot path, rather than forming an optimal trajectory like other learning methods. So the robot system will dynamically optimize and integrate the observations and the state neurons of the overlapping path, but the overlapping path in different paths is represented by different events rather than events merging. This expression is benefit for robotic planning because of the well-organized events sequence.
Figure 9 shows the activation update process of state neurons in the learning process of Figure 8. We only show the changing of state neurons which the mark numbers are 6 and 149. For state neuron 6, it is activated again after m equals 263, which indicates that the robot reminds a previous experience (same observation) in episodic memory. For state neuron 149, it is activated four times, which indicates that the robot travels the same place four times. The transition weights which connect state neurons are shown in Figure 10. They are constantly updated in the learning process and have attenuation process. The structure of transition weights can also be adjusted. The proposed EM-MDP framework can better simulate the organization and decay process of robotic episodic memory and avoid redundancy. The episodic memory is dynamic and can incrementally accumulate.

Activation changing of the state neurons in episodic memory learning process.

Creation of the transition weights during robot wander.
Robot navigation using the episodic cognitive map
Figure 11 illustrates the experimental result of robotic planning control in corridor. Robot only travels once for learning episodic cognitive map since there are few features in corridor. Giving an arbitrary route, the system learns 116 state neurons from a sequence of 147 observations. When we put the robot back into the environment, it can follow the planned route despite there are mislocalized events due to the uncertainty. Figure 12 illustrates the results of localized event and observation similarity in each visit step. The first large event localization failure is around visit steps 3–8 and 14–16 because there are pedestrians passing through for the first time. So the route has something the robot had not encountered before in learning. But our system can also successfully achieve the travel based on the past episodic memory according to equation (11). The full traverse only has five localization failure. Note that robot records five new events in this travel and updates episodic memory to form total 121 events at last. It will be benefit for robot planning next time.

Results of robot’s planning control in corridor.

Results of localized event and observation similarity in each visit step.
Based on the created episodic memory shown in Figure 8, a common task which the robot helps user to pour a cup of water is designed to verify the navigation performance. For mobile robot, the tasks are that the robot moves from current location A to the kettle location B and then moves to the user location C. From the results of Figure 13, the robot can directly move from the current location A to the kettle location B since the path has been experienced during learning. When the robot moves from the kettle location B to the user location C, It will result in unnecessary path according to the learning route. Actually, there are many crossover events. The episodic trajectory which leads to the shortest route is predicted. Thus, the proposed episodic cognitive map can predict a suitable pass to avoid selecting time-consuming route. The results of robot navigation performance under uncertainty are shown in Table 1. Average similarity and event localization degree refer to the average similarity measure between the current observation and the observations in episodic memory and the accuracy for event localization in the episodes, respectively. Average error refers to the navigational error which is the deviation from centre of the learned route. The experimental results show that the robot predicts different routes according to different tasks to adapt to navigation under uncertainty.

Results of robot navigation in office.
Results of robot navigation performance under uncertainty.
Discussion
We verify the robustness for interferences of uncertain information in robotic episodic memory learning. Through the proposed learning method, system constructs a well-organized event sequences for environmental cognition. The comparison of event sequences over time before and after processing uncertainty is shown in Figure 14(a). Only depend on observation matching will result in mislocalization (blue line). In Figure 14(a), the 3, 49 and 62 time steps have located the robot’s experienced events (red ellipse). We find that illumination condition is the main cause of the mislocalization. However, using our method for processing uncertainty, the system can get a unidirectional linear event chain (green line) corresponding to the robot’s experience. The effective rate (events continuity over time) of the event sequences can be 100%. The system shows a strong robustness.

(a) Comparison of events sequence over time before and after processing uncertainty. (b) Comparison of events usage number over time.
The comparison of events usage number over time when SDM is exploited (limited sampling) and not exploited 31 (full sampling) is shown in Figure 14(b). Using the episodes learning method, the event numbers are reduced from 53 to 37. It indicates that the system will be more reliable for handling uncertainty and more efficient. The episodic cognitive map can be used to repeat previously visited routes similar to topological map. However, it often predicts multiple feasible passes for the task based on current environment cognition. The robot has a wider region of operations than a single pass teach-and-repeat system. A key property of our system is that the process of episodic memory learning and navigation can be run in parallel. This can realize the update of robotic experience in real time.
Different from the simultaneous localization and mapping method, the proposed episodic cognitive map is built for robot navigation under uncertainty based on EM-MDP. The structure of event sequences in episodic memory has no inherent law. It can efficiently represent environmental experience using the edge-weighted episodic trajectories graph with the discrete event space set and transitions set and realize the autonomous and real-time accumulation, integration and updating for environmental experience. Thus, the planning algorithm based on episodic cognitive map can simply use the characters of state neurons. The robot can predict an optimal episodic trajectory (the trajectory may be unfamiliar and the robot did not learn it in episodic learning) to adapt to the uncertain environment and navigation task according to episodic memory.
Because the events in episodic memory are formed as several unidirectional linear trajectories, the global planning process can be computed by activation spreading of state neurons without iteration. The computational time required to estimate the current state can also be reduced through the introducing of one-dimensional state neuron activation and synaptic potential. Thus, the robot navigation under uncertainty can be computed efficiently based on EM-MDP (O (n)) compared to POMDP (O (n 2)). We also compare the accuracy of event localization between EM-MDP and POMDP, as shown in Table 2. The table shows that the EM-MDP framework can effectively deal with the perceptual aliasing problem with the correct rate over 95% for most cases. Note that the mislocalized five samplings in the EM-MDP framework are caused by no events localization due to execution error rather than perceptual aliasing. Our system records these five new events in real time and updates episodic memory to realize the self-learning in robot navigation. The proposed algorithm does not require a cost map like, 17 and it differs from many other navigation modules where robot does not need to follow a global path. Instead, the robot follows the predicted episodic trajectory including a sequence of behaviours. This is similar to human travel to a target location by the following guidance.
Comparison of accuracy for event localization.
POMDP: partially observable Markov decision process; EM-MDP: episodic memory-driving Markov decision process.
Conclusion
In this article, we provide a cognitive map building method for robot navigation under uncertainty using biologically inspired episodic memory. It can realize that the robot evaluates past events, predicts current state and plans desired behaviour sequence. The proposed EM-MDP framework can avoid the problem of curse of dimensionality and perceptual aliasing in POMDP. Considering the biological basis of hippocampal neuron, the mapping from multidimensional observation to one-dimensional state neuron is built. The activation and stimulation pattern of state neurons is introduced to EM-MDP. The learning method has the bionic ability of self-organizing through changing the structure of state neuron network. It can realize robotic memory real-time storage, incremental accumulation, integration and updating. The behaviour sequence is predicted using the activation spreading of the state neurons for robot navigation based on the episodic cognitive map. Several real-life tasks are carried out to demonstrate the applicability and usefulness of the developed approach. Experimental results show that the robot predicts different routes according to different tasks under uncertainty. Future work will focus on the application of path integration in the navigation system.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The project received support from the National Natural Science Foundation of China (61503057).
