Sage Journals: Discover world-class research

Abstract

To improve the natural human-avoidance skills of service robots, a human motion predictive navigation method is proposed, namely PN-POMDP. A human-robot motion co-occurrence estimation algorithm is proposed which incorporates long-term and short-term human motion prediction. To improve the reliability of probabilistic and predictive navigation, the POMDP model is utilized to generate navigation control policies through theoretically optimal decisions. A layered motion control structure is proposed that combines global path planning and reactive avoidance. Multiple comity policies are integrated with a decision-making module that generates efficient and human-compliant navigational behaviours for robots. Experimental results illustrate the effectiveness and reliability of the predictive navigation method.

Keywords

Predictive Navigation Motion Estimation Uncertainty Pomdps

1. Introduction

As service robots have been designed to provide interactive tasks in domestic and office environments, they must reliably navigate around a populated room. When robots and people encounter each other, human-aware motion planners^[1] help robots treat people as social entities and aim to endow robots with safe and human-friendly navigational behaviours^[2][3].

Predicting the motion of moving people is an effective way for compliant robot navigation in dynamic environments^[4]. Many researchers^[5][6][7][8] have developed efficient replanning algorithms to cope with the environmental dynamic and satisfy the real-time requirement by feeding the updated information to a grid map and optimizing the robot's path to minimize the expected time to its destination. Although reactive motion planners^{[9][10][11][12]} are able to rapidly query the next appropriate action, they are prone to getting robots blocked in complex environments because of their greedy property and the uncertainty of human motion. According to research on human indoor motion modelling and understanding, a human's daily motions in a specific room environment presents certain long-term patterns. Nevertheless, uncertainties are pervasive in the velocity and heading direction of people's movement. Several studies have exploited the spatial-temporal nature of human motion using a chain of Gaussian distributions^[13], clustering the trajectories with K-means^[14], and learning human motion patterns from tracking data using an EM algorithm^[6]. However, most past research ignores the combination of human motion uncertainty prediction with motion pattern prediction.

Another key factor of predictive navigation is inherent uncertainties^[15][16], which typically necessitate intelligently reacting to unknown moving people. Due to the non-linear nature of human and robot motion, as well as sensor noise, the popular robot localization and people-tracking algorithm based on Bayes filtering can only estimate position distributions. The probabilistic human motion prediction algorithm is also likely to produce larger errors when making motion prediction.

Many researchers have already pointed out that probabilistic representation and reasoning is appropriate and very effective for navigating in noisy environments in the real world. In situations of probabilistic decisionmaking, Partially Observable Markov Decision Processes (POMDPs)^[17][18] have already been widely used in robot navigation and interacting with people. Robots such as Flo^[19] or Pearl^[20] use POMDPs at all levels of decisionmaking, and not only in low-level navigation routines. But since finding optimal control strategies in POMDP cases is computationally intractable due to the continuous and high-dimensional beliefs space, POMDPs have usually been applied to topological navigation. For example, Foka^[16] proposes a method for combining the prediction of the destination of a moving obstacle and its one-step-ahead positions. However, the proposed method relies on complex hierarchical decomposition of the environment and has only been implemented and successfully tested in a simulation application.

In this paper, a novel approach called POMDPs for Predictive Navigation (PN-POMDP) is proposed. The idea of predictive navigation is largely inspired by Sisbot's^[1] human-aware planner, which focuses on providing human-friendly robot behaviours that imitate human motion habits. However, the human-aware planner does not take into account uncertainties when robots work in unstructured environments. This paper deals with the robustness and reliability requirements of a navigation system using the probabilistic reasoning (POMDP) method. We compute the uncertainties of human motion in two parts: the ambiguity in path selection and the motion uncertainty along each path. Then, the human-robot co-occurrence probability is estimated by analysing two situations: conflict and obstruction. Our major contribution is the PN-POMDP framework that coordinates the global path planner, the motion reactor and the speed controller in the context of probabilistic decision-making under conditions of multiple uncertainties. The control framework combines the objectives of goal-guiding reducing the probability of human-robot conflict. More importantly, by considering high perceptual aliasing and other uncertainty factors, it combines the probabilistic robot localization, people-tracking and human motion prediction in a natural probabilistic decision-making framework to generate robot control policies resulting in efficient and polite navigation behaviours.

This paper is organized as follows. After an overview of the navigation system framework in Section 2, Section 3 describes the human motion prediction method and Section 4 introduces the human-robot co-occurrence estimation in the spatial-temporal aspect. Section 5 describes the decision-making mechanism of predictive navigation and the POMDP. Finally, some experimental results are reported in Section 6, followed by a final discussion and conclusion that summarizes the paper.

2. Navigation System Architecture

2.1 Uncertainties in predictive navigation

Firstly, the trajectories of human motion are uncertain; velocity and direction usually vary within a range when they are engaged in specific motion patterns. Secondly, the localization errors are pervasive. A Simultaneous robot Localization And People-tracking (SLAP) system using global cameras and an onboard laser range finder has been developed in our previous work^[21]. This jointly estimates a robot's pose r_t and people's ground-plane position h _t = (x_t,y_t, θ_t) in the global coordinate frame using two sets of particles, as shown in Figure 1. But in clustering environments with table and chair legs, the localization errors are prone to deteriorate the human motion prediction. Thirdly, the control uncertainty^[14] caused by wheel slip, time-delay and other unexpected factors is commonly reported in the robot navigation domain.

Figure 1.

Uncertainty of robot localization and people-tracking

2.2 PN-POMDP control framework

The navigation system uses the robot's abstracted positional relation with the human as the states of the decision-making subsystem. The PN-POMDP system is modelled by a sextuple, < S, A, T, R, O, B >. More specifically, S = {s₁,…,s_n} is the agent state set and A = {a_1,…,a_L} is the agent action set.

T : S' × A × S → S is a state transition matrix, in which T(s',a,s) specifies the conditional probability distribution of transition from state s to s' by executing action a. R(s,a) is the reward function that determines the immediate utility of executing action a at state s. O(s',a,o) and O: S × A × O form the observation model that computes the probability of obtaining observation o in state s' when executing action a. b(s) is the probability that agent is in state s with s₀ as the initial state. The collection of all states formulates an agent's belief space B of | S | dimension. The PN-POMDP solution is to seek an optimal navigational policy ∂ to specify the corresponding action of the robot for each possible state, i.e., to find ∂^POMDP : b → a.

In the Control module, the POMDP-based decisionmaking sub-module generates a suitable predictive navigational (PN) policy that minimizes the risk of conflict with humans. We have designed four types of action: detour, slow-down, speed-up, and halt, which will be explained in Section 5. In order to ensure goal-directed and predictive navigation performance, the motion controller is constructed with a two-layered architecture augmented by policies generated from the POMDP controller. The wavefront-based global path planner calculates the optimal path and the reference points along the path based on mapped obstacles. The Nearness Diagram^[12] based local reactive obstacle avoidance controller computes actual translational velocity v and rotational velocity ω based on the reference points and real-time sensory data.

Finding optimal policies in the POMDP case is computationally intractable because the beliefs space is continuous and high-dimensional. The solution adopted in this work is a hybrid control structure that combines the reactive motion control and the probabilistic strategy selection for generating optimal navigational behaviours, as shown in Figure 2. In the system, sensory data obtained from laser, global cameras and other sensors are processed by the SLAP module and fed to the Perception module. Human motion patterns learned by the Modelling module are also inputted to the Perception module, in which the future motion tendencies of people are predicted in both their long-term and short-term aspects. Then, the human and robot motion states are abstracted and three types of abstracted observations are formatted, namely People's Action Observation (PAO), People-robot Relation Observation (PRO) and Robot State Observation (RSO). These abstracted observations are inputted to the Control module.

Figure 2.

PN-POMDP system framework

A more detailed illustration of the POMDP controller structure is depicted in the right part of Figure 2. The POMDP controller contains a state estimator (SE) and a policy generator. The state estimator computes probabilistic distributions upon the belief b_t according to o, a and b_t-1. Meanwhile the policy generator maps the belief onto an optimal behaviour of the robot, i.e., a = ∂(bel).

3. Human motion prediction

3.1 Long-term modelling of motion pattern

Based on the collection of the tracking trajectories, a set of motion patterns of people are clustered hierarchically using a fuzzy K-means algorithm based on the spatial and temporal information. This algorithm results in a set Ψ of M different motion patterns Ψ = {Ψ₁,…,Ψ_M}. Each motion pattern Ψ_m with 1 ≤ m ≤ M, is approximated by a mixture of K Gaussian distributions Ψ¹_m,…, Ψ^k_m. Spatial probability of the person located at h_t given step k of the motion pattern Ψ_m is computed according to the Gaussian distribution and denoted as p(h_t | Ψ^k_m):

p (h_{t} | Ψ_{m}^{k}) = \frac{1}{\sqrt{2 π} Σ_{m}^{k}} \cdot \exp (- {‖ h_{t} - μ_{m}^{k} ‖}^{2} / 2 {(Σ_{m}^{k})}^{2}) .

(1)

Without considering the uncertainty of the velocity and heading orientation during the person's motion, the robot's belief that the motion of the person is engaged in a motion pattern Ψ _m is computed by the person's position h_t at time t, given the history of his/her motion z_1:t :

\begin{array}{l} p_{pattern} (h_{t} | Ψ_{m}, z_{1 : t}) \\ ​ ​ = \sum_{k = 1}^{K} \sum_{k^{'} = k}^{K} p (h_{t} | Ψ_{m}, k, k^{'}, z_{1 : t}) p (Ψ_{m}, k, k^{'} | z_{1 : t}) . \end{array}

(2)

The probability p(h_t | Ψ_m, k,k', z_1:t) evaluates the probability of the person that covers the point h_t at time t, given a sequence of observations z_1:t and given that z_1:t starts at ψ^k_m and ends at ψ^k'_m. The probability p(Ψ_m,k,k' | z_1:t) can be decomposed according to the Bayes rule: (3)

\begin{array}{l} p (Ψ_{m}, k, k^{'} | z_{1 : t}) \\ ​ ​ = η p (z_{1 : t} | Ψ_{m}, k, k^{'}) p (Ψ_{m}) p (k, k^{'} | Ψ_{m})^{'} \end{array}

(3)

where e˜ is a normalizer, p(z_1:t | Ψ_m,k,k') is the observation likelihood of z_1:t and p(Ψ_m) and p(k,k' | Ψ_m) are two prior probability distributions.

Examples of learned human motion patterns are shown in Figure 3, which indicates that people typically move between places with important objects to manipulate: fridge, a printer, a washing machine, etc.

Figure 3.

Human motion patterns learned from tracking data

3.2 Short-term motion prediction

To account for the short-term uncertainty of the movement along the path of Ψ _m , the variation in velocity and heading orientation of the person are modelled.

We assume the following on the movement of a person^[7]: δT is the time step in which a person keeps to a certain velocity and heading direction; the possible ranges of his/her motion velocity and orientation are represented as [^V_min, ^V_max] ^and [θ_min, θ_max], respectively; he/she changes the velocity and orientation only at every time step δT. In this sense, the velocity (orientation) of a time step is constant, and is randomly and independently selected within the above range. According to these assumptions, the sequence of velocity (orientation) along time contains a list of velocities (orientations) that are independently distributed random variables. This assumption describes a common indoor motion style where people move smoothly between two places.

Firstly, the orientation variance is modelled by a fan-shaped area called the field of view, as shown in Figure 4. The field of view defines a coordinate system originated at the current position of the person, h₀ = (x₀,y₀, θ₀), and takes the Person's Instantaneous Orientation (PIO) as the symmetry axis. The maximum angular distance from the PIO is Λ = θ_max and is defined by the size of the field of view. Within the field of view area, a point h = (r, α) has a probability of

Figure 4.

Instantaneous heading direction of movement uncertainty estimation

p_{orien} (h | h_{0}) = \exp (- α^{2}),

(4)

to be the goal of movement in the next time step. Eq. 4 indicates that the higher the value of α, the less likely it is that the person will head in that direction.

Secondly, the velocity variance is modelled by a distribution p_vel(h_t;t) that calculates the probability of reaching a point h _t along a straight line path, as demonstrated in Figure 5. Let θ_h,0 and σ²₀ be the current heading direction and the positional variance of the person, respectively. Then the position h _t = (x_t,y_t) after t time steps is given by

Figure 5.

Motion velocity uncertainty estimation

x_{t} = x_{0} + \sum_{i = 1}^{t} v_{i} Δ T \cos θ_{0},

(5)

y_{t} = y_{0} + \sum_{i = 1}^{t} v_{i} Δ T \sin θ_{0} .

(6)

Since each v_i (i = 0,…,t) follows the same but independent uniform distribution of v_i ∼ U(v_min,v_max), the variance σ²_step of the movement (i.e., the velocity variance) added by one time step is calculated as:

\begin{array}{l} σ_{step}^{2} = \int_{- \infty}^{\infty} {(v_{i} - E v_{i})}^{2} f (v_{i}) d v_{i} \\ ​ ​ ​ ​ ​ ​ = \frac{1}{v_{\max} - v_{\min}} \int_{v_{\min}}^{v_{\max}} {(v_{i} - \frac{v_{\max} + v_{\min}}{2})}^{2} d v_{i} . \\ ​ ​ ​ ​ ​ ​ = \frac{1}{12} {(v_{\max} - v_{\min})}^{2} \end{array}

(7)

Let Ev_i = μ_i and Dv_i = σ²_i be the mathematical expectation and variance of v_i, respectively. According t to the central limit theorem, $\sum_{i = 1}^{t} v_{i}$ approximately follows $N (\sum_{i = 1}^{t} μ_{i}, \sum_{i = 1}^{t} σ_{i} ^{2})$ , and so does x_t. Thus the probability p_vel( h _t;t) is computed as:

\begin{array}{l} p_{vel} (h_{t}; t) = \\ ​ ​ \frac{1}{2 π σ_{x, t} σ_{y, t}} \exp {- \frac{1}{2} (\frac{{(x_{t} - {\bar{x}}_{t})}^{2}}{σ_{x, t}^{2}} + \frac{{(y_{t} - {\bar{y}}_{t})}^{2}}{σ_{y, t}^{2}})}, \end{array}

(8)

where

{\bar{x}}_{t} = x_{0} + (\sum_{i = 1}^{t} μ_{i}) Δ T \cos θ_{0},

(9)

{\bar{y}}_{t} = y_{0} + (\sum_{i = 1}^{t} μ_{i}) Δ T \sin θ_{0},

(10)

σ_{x, t}^{2} = σ_{y, t}^{2} = σ_{0}^{2} + t σ_{step}^{2} .

(11)

Moreover, according to the assumption that v_i (i = 0,…,t) follows the same but independent uniform distribution, the sequence of variables v₀,…,v_t have the same mean and variance. This indicates that Ev_i = μ_i = V̄ = (v_max + v_min)/2, Dv_i = σ²_i = σ² and $(\sum_{i = 1}^{t} μ_{i}) = t \bar{v}$ . As a result, the above equations can be rewritten as:

{\bar{x}}_{t} = x_{0} + t \bar{v} Δ T \cos θ_{0},

(12)

{\bar{y}}_{t} = y_{0} + t \bar{v} Δ T \sin θ_{0},

(13)

σ_{x, t}^{2} = σ_{y, t}^{2} = σ_{0}^{2} + t σ_{step}^{2}, ​ \bar{v} = \frac{v_{\max} + v_{\min}}{2},

(14)

To combine the long-term and short-term prediction, the heading orientation probability P_orien( h | h ₀) is used as the exponent discount factor to the velocity probability, and the probability of the motion pattern that the person is involved in at current position h ₀ is also normalized by a normalization factor e˜ for all M motion patterns. Finally, the probability of reaching h _t at time t is computed as:

\begin{array}{l} p ​_{predict} (h_{t}; t) = η p (h_{t} | Ψ_{m}) p_{vel} {(h_{t}; t)}^{γ}, ​ \\ γ = p_{orien} (h_{t} | h_{0}) \end{array}

(15)

4. Human-robot Co-occurrence Estimation

The probability of human and robot co-occurrence is estimated in the spatial-temporal aspect according to the robot's travelling route obtained from the global path planner and the human motion prediction. In the PN-POMDP system, situations of human-robot co-occurrence are classified into two types.

The first type is human-robot motion conflict, which is denoted as θ = | θ_h – θ_r |, where θ_r and θ_h are the orientation of robot and human, respectively. In wide open indoor areas, a situation that satisfies θ_th1 θ< ∂ indicates that the robot's movement along its planned path and the person's future movement are likely to encounter both spatial and temporal factors. Let L_safe be a safe distance between person and robot, and P_c denote the intersection point of the robot's path and the person's predicted motion trajectory. At current time t₀, the predicted human-robot motion conflict at a future time has to satisfy the condition that the person is now at a position within $[P_{c} - D_{safe}^{in}, P_{c} + D_{safe}^{out}]$ (as shown in Figure 6), where

Figure 6.

Human-robot motion conflict

D_{safe}^{in} = \frac{L_{safe}}{sin θ} (\sqrt{\frac{v_{r}^{2} + v_{h}^{2} - v_{r} v_{h} \cos θ}{v_{r}^{2}}} + \frac{v_{h}}{v_{r}}),

(16)

D_{safe}^{out} = \frac{L_{safe}}{sin θ} (\sqrt{\frac{v_{r}^{2} + v_{h}^{2} - v_{r} v_{h} \cos θ}{v_{r}^{2}}} - \frac{v_{h}}{v_{r}}) .

(17)

This is because if the person and the robot are moving along their respective paths at constant speeds of v_h and v_r, the robot will arrive at the place P_c at time t = t₀ + Dⁱⁿ_safe/v_h, and then its distance to the person will be less than L_safe. If the motion uncertainty of the person is taken into account, the probability that he/she will arrive at place P_c at a future time t is computed according to Eq. 18:

p_{conflict} = p_{predict} (P_{c}; t),

(18)

where t = (L_safe/sin θ)/v_r. Moreover, in a special situation where θ = ∂, which indicates that the robot is moving towards the person, it will be impossible to calculate variables P_c, Dⁱⁿ_safe and D^out_safe. In this case, a face-to-face human-robot motion conflict is accounted.

The second situation is human-robot motion obstruction (as shown in Figure 7). This represents a situation where the robot's path will block a human's intentional trajectory, which happens to traverse a narrow passage. In this type of situation, a detected θ < θ_th2 event indicates that the paths of robot and human are generally parallel in opposite directions. To calculate possible spatial-temporal obstruction, the first and the last point along the robot's planned path that satisfies the condition of Eq. 19 and Eq. 20 are computed:

Figure 7.

Human-robot motion obstruction

p_{pattern} (s | Ψ_{m}) \leq Σ_{m}^{k}

(19)

k = \underset{k = 1, ..., K}{\arg \min} {p_{pattern} (s | θ_{m}^{k}, π_{m}^{k})}

(20)

These two points are denoted as the InPoint sⁱⁿ and the OutPoint s^out of their path intersection, respectively. Suppose that human and robot are moving along their corresponding path at constant speeds of v_h and v_r, respectively. The person will then arrive at the points sⁱⁿ and s^out at future moments tⁱⁿ_h, t^out_h, respectively. Similarly, the robot will arrive at the points sⁱⁿ and s^out at future moments tⁱⁿ_h and t^out_h, respectively.

If |tⁱⁿ_h-t^out_h| < θ_th3 and the motion uncertainty of the person is taken into account, the probability of human-robot motion conflict obstruction at point sⁱⁿ is computed as:

\begin{array}{l} p_{obstruct} = η p_{pattern} (s^{in} | Ψ_{m}) p_{vel} {(s^{in}; t_{r}^{in})}^{γ_{1}} p_{vel} (s^{in}; t_{h}^{out})^{γ_{2}} \\ γ_{1} = p_{orien} (s^{in} | r_{t - 1}), γ_{2} = p_{orien} (s^{in} | h_{t - 1}) \end{array}

(21)

Similarly, if tⁱⁿ_h – t^out_h < θ _th4 and the motion uncertainty of the person is taken into account, the probability of human-robot motion conflict obstruction at point s^out is computed as:

\begin{array}{l} p_{obstruct} = η p_{pattern} (s^{out} | Ψ_{m}) p_{vel} {(s^{out}; t_{h}^{in})}^{γ_{1}} p_{vel} {(s^{out}; t_{r}^{out})}^{γ_{2}}, \\ γ_{1} = p_{o r i e n} (s^{out} | h_{t - 1}), γ_{2} = p_{o r i e n} (s^{out} | r_{t - 1}) . \end{array}

(22)

5. POMDP-based decision-making for polite avoidance

5.1 The Elements: States, Actions and Observations

To automatically compile the POMDP model < S, A, T, R, O, B > , it is necessary to define some action and observation uncertainties.

In this case, the states of the POMDP model are defined based on the abstraction of a set of variables.

Human motion state, which takes values of move(PM) and stay(PS);

Human motion tendency, which takes values of move(GM) and stay(GS);

Human-robot distance, which is obtained by discretization of metric distance between human and robot, taking values of safe(DS), minor danger(DB), major danger(DJ) and high danger(DE);

Robot motion state, which takes values of normal path following(RN), accelerating along path(RF), decelerating along path(RS), and dynamic replanning(RR).

A number of 64 states s₁,_…, s₆₄ can be composited; in analysis, 32 of these seldom appear, and the remaining 32 situations are selected as system states for the POMDP model.

Observations (O) in our model are the abstractions of the movements and positional relation between human and robot. In each state, the robot makes three kinds of observation:

People's Action Observation (PAO), which can be abstracted as {Move,Stay}.

People-robot Relation Observation (PRO), which can be classified as eight possible values {PRO₁,…, PRO₈) according to the abstraction of relative position between human and robot.

Robot State Observation (RSO), which can be abstracted as {Path-following Accelerating, Decelerating, Replanning}.

Actions ( $A$ ) are human-compliant avoidance behaviours that the robot can execute to give way to human in a polite manner:

Normal path following a_n;

Accelerating along path a_u;

Decelerating along path a_s;

Dynamic replanning for detour a_r.

The first three actions indicate that the robot follows the planned path with only velocity changes. These actions are usually more efficient for avoiding conflict with or obstruction of humans. The fourth action indicates that the robot replans a new path according to the updated environmental map by incorporating the probability distribution function (PDF) of p( h _t; t) with the occupancy grid map.

The reward (ℛ) defines the reward function that determines the immediate utility of executing action a at state s. In our system, the reward matrix is manually specified based on a criterion that behaviours ensuring more safety and politeness receive higher rewards. Nevertheless, the optimum choice of settable parameters can be adjusted through a user-supervised learning system when a robot is installed in a new environment and performs a daily room exploration task, as suggested by Lopez ^[22].

5.2 POMDP Compilation

The transition model T(s',a,s) specifies the conditional probability distribution of transiting from state s to s' by executing action a. O(s',a,o) is the observation model that computes the probability of obtaining observation o in state s' when executing action a. Usually, the transition model and observation model can be rewritten as T(s',a,s) = p(s' | s,a), O(s',a,o) = P(o | s',a) where $o \in O$ , $s \in S$ , $a \in A$ .

To automatically learn the observation model and the observation model, initial values are given (examples are shown in Figure 8) and the EM algorithm is employed for learning from collected data until the output parameter converges. The Randomized Point-based Value Iteration algorithm^[18] is utilized to solve the above-defined POMDP model. In our system, 32 iterations and 63.578 seconds are required for offline model compiling. Figure 9 shows the errors during the iteration. The error between two successive iterations is plotted in the y-axis, with higher value indicating faster convergence rate.

Figure 8.

Rules for constructing the POMDP model from empirical uncertainties

Figure 9.

The converge of learning error change

6. Experimental Results

The proposed approach was validated in a real office environment of size 12 m × 7 m. An ActivMedia Peoplebot was used in the experiments. We assumed that participants in the experiment walked at a smooth speed and intended to follow certain motion patterns. The sensory system for robot localization and people-tracking consists of five stationary CCD cameras mounted on each side of the room above head level and the robot's onboard laser range finder. The environmental grid map was previously built by a SLAM algorithm with a resolution of 0.1 m. Based on the collection of tracking trajectories, typical indoor motion patterns of humans are learned, as presented in our previous work^[23][24].

6.1 Predictive Navigation

During online predictive navigation, the robot collected laser scan data with a time period of 200 ms, and updated the local map grid_local with a time period of 80 ms, according to the positions of detected human legs. In the PN-POMDP algorithm, θ_th1, θ_th2, θ_th3 and θ_th4 are the threshold values that can be set and adjusted in the experiment.

In the first three testing scenarios, the robot and the human were initially located in the same room area within a short distance of each other (less than 5 m). In the first case, where the robot was moving towards the human, the system predicted human motion 5 seconds ahead of time. Figure 10(a) shows that the robot began to avoid possible human-robot conflict using the detour policy when it was still about 3 m away from the human. In the second scenario (Figure 10(b)), when the robot had predicted that its path would intersect with the predicted human trajectory from one side, it selected the slow down action a_s. This policy was efficient because frequent replanning was avoided and the robot would continue to move along the path with normal speed when the human had passed the predicted intersection point. To test the reliability of the algorithm, in the third scenario (Figure 10(c)) the robot was following a person through a narrow corridor. In this case, the robot made the prediction that it would not interfere with the human's motion if it did not overtake him/her. Thus the robot followed its planned path with regular speed.

Figure 10.

Predictive navigation in three experimental scenarios

The fourth testing scenario involved predictive navigation. The robot and the human were initially positioned in two different rooms and global cameras were utilized for people-tracking. In the experiment, the robot was initially positioned at place A as in Figure 11(a), and it planned to navigate through a narrow passage (passage II) to place B. In the meantime, a person intended to walk through the same passage in the opposite direction.

Figure 11.

Comparison in predictive navigation scenario

Before the robot approached the passage entrance, the probability of human-robot motion confliction at the entrance of the passage (place C in Figure 11 (b)) was estimated. More specifically, the tendency likelihood of the person's temporary motion to be engaged in the motion pattern ending at place C was estimated at as high as 97.5%. However, since the predicted occupancy probability of the grid cells within the passage was volatile, the traditional replanning method caused the robot to switch paths between two candidate routes (plan1, detour via passage I and plan2, continue via passage II). This method guided the robot to unnecessarily move back and forth at the passage entrance and finally reach the goal, requiring as long as 633 time periods. In comparison, the PN-POMDP method supporting multiple comity policies generated a highly efficient and human-friendly behaviour. When the human-robot conflict probability within the passage II was predicted, the robot consequently drove to a free space outside the entrance of the passage and waited. After the person had passed through the passage, the robot proceeded to cross the doorway and continued on its route. The resulting behaviour of the robot improved the navigation efficiency (only 342 time periods for reaching the goal) by avoiding unnecessary repeated zigzagging and wandering before entering the passage. The pose and translational velocity of the robot during the navigation test is shown in Figure 12. As shown in Figure 12(a), the replanning method caused the robot to switch between the two candidate paths during the interval (2) to (5). In contrast, Figure 12(b) shows that the PN-POMDP method ensures smooth and efficient robot navigation. More importantly, the polite navigation behaviour is comprehensible to humans and shows full respect to the human.

Figure 12.

Robot pose and velocity change during the navigation scenario

6.2 Predictive navigation in highly populated areas

Figure 13 shows the experimental result in crowded environments with three participants walking around the robot. Since the PN-POMDP method supports multiple policies of predictive navigation, the robot frequently adjusted the policy according the predicted human-robot co-occurrence situations. In fact, after raising the reward function of the deceleration action a_s, the robot tended to slow down for the human to pass first. This indicates that the PN-POMDP method is feasible to be applied to service robots that work in crowded environments such as exhibition halls and museums.

Figure 13.

Slow movement of robot in crowded environment

6.3 Trial study

A statistical trial study was also conducted to verify the success rate of the PN-POMDP method. We invited 12 participants (eight male and four female), ranged in age between 21 and 34. 33% of them were from non-technological fields, while 67% worked in technology-related areas. The trial tests involved different types of situation as described above. In the trial tests, the following situations were treated as “failure”: (i) The robot blocked the human's intended route of movement (subjective scoring); (ii) The robot failed to reach the goal because of getting trapped or localization failure; (iii) The robot reached the goal with time consumption as high as four times that needed in situations without humans moving around. Figure 14 shows that the PN-POMDP method achieved a higher success rate than the traditional real-time replanning method.

Figure 14.

Comparison of success rate

7. Conclusion

In this paper, we have presented a predictive navigation method for service robots in the POMDP framework. By learning human motion patterns and combining long-term and short-term human motion prediction, space-time estimation of human-robot co-occurrence is achieved. In order to execute tasks in typical partially observable environments, POMDP-based probabilistic decision-making is incorporated to generate a theoretically optimal policy that allows the robot to behave in an efficient and polite manner. Thus the risk of conflict with human motion is minimized. The feasibility of the proposed methodology is validated by navigation experiments as well as user trials, in which the robot's navigational behaviour is interpreted by humans as safe, comprehensible and polite.

Although the system makes use of external cameras for human tracking, the proposed methodology framework does not rely on specific means for the acquisition of human motion. In situations where robots are not close to people, we suggest the utilization of global cameras to ensure seamless and reliable human-tracking, which improves the performance of predictive navigation.

Footnotes

8. Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant no. 61105094, No. 61075090, No.61005092 and No. 60805032) and the open fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education (no. MCCSE2012B02).

References

Sisbot

E.A.

Marin-Urias

L.F.

Alami

Simeon

, “A Human Aware Mobile Robot Motion Planner,” IEEE Transactions on Robotics, Vol. 23, pp. 874–883, 2007.

Althaus

Ishiguro

Kanda

Miyashita

and Christensen

H.I.

, “Navigation for human-robot interaction tasks,” Proceedings of IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, pp. 1894–1900, 2004.

Gockley

Forlizzi

Simmons

, “Natural person-following behavior for social robots,” Proceedings of Human-Robot Interaction, pp.17–24, 2007.

Chung

S. Y.

and Huang

H. P.

, “Predictive Navigation by Understanding Human Motion Patterns”, International Journal of Advanced Robotic Systems, Vol. 8, pp. 52–64, 2011.

Bennewitz

Burgard

Cielniak

and Thrun

, “Learning motion patterns of people for compliant robot motion,” The International Journal of Robotics Research, Vol. 24, pp. 31–48, 2005.

Osentoski

Manfredi

Mahadevan

, “Learning Hierarchical Models of Activity,” Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, pp. 891–896, 2004.

Miura

Shirai

, “Modeling motion uncertainty of moving obstacles of robot motion planning,” Proceedings of the 2000 IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, pp. 2258–2263, 2000.

Hoeller

Schulz

Moors

Schneider

F. E.

, “Accompanying Persons with a Mobile Robot using Motion Prediction and Probabilistic Roadmaps,” Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, California, USA, pp.1260–1265, 2007.

Djekoune

A. O.

Achour

and Toum

, “A sensor based navigation algorithm for a mobile robot using the DVFF approach”, Int. J. of Advanced Robotic Systems, Vol. 6, pp.97–108, 2009.

10.

Koenig

and Likhachev

, “Fast replanning for navigation in unknown terrain,” IEEE Trans. on Robotics, Vol. 21, pp. 354–363, 2005.

11.

Poncela

Urdiales

Perez

E. J.

and Sandoval

, “A new efficiency-weighted strategy for continuous human/robot cooperation in navigation,” IEEE Trans. on Systems, Man and Cybernetics Part A: Systems and Humans, Vol. 39, pp. 486–500, 2009.

12.

Minguez

Montano

, “Sensor-based robot motion generation in unknown, dynamic and troublesome scenarios,” Robotics and Autonomous Systems, Vol. 52, pp. 290–311, 2005.

13.

Zhou

Tan

, “Principal Axis-Based Correspondence between Multiple Cameras for People Tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, pp. 663–671, 2006.

14.

Yen

H. C.

Huang

H. P.

and Chung

S. Y.

, “Goal-Directed Pedestrian Model for Long-Term Motion Prediction with Application to Robot Motion Planning,” Proc. IEEE Int. Conf. on Advanced Robotics and its Social Impacts, Taipei, Taiwan, pp. 1–6, 2008.

15.

Lopez

M. E.

Bergasa

L. M.

Barea

Escudero

M. S.

, “A Navigation System for Assistant Robots Using Visually Augmented POMDPs,” Autonomous Robots, Vol. 19, pp. 67–87, 2005.

16.

Foka

Trahanias

, “Real-Time Hierarchical POMDPs for Robot Navigation,” Robotics and Autonomous Systems, Vol. 55, pp. 561–571, 2007.

17.

Pineau

Gordon

Thrun

, “Anytime point-based approximations for large POMDPs,” Journal of Artificial Intelligence Research, Vol. 27, pp.335–380, 2006.

18.

Spaan

Vlassis

, “Perseus: Randomized Point-based Value Iteration for POMDPs,” Journal of Artificial Intelligence Research, Vol. 24, pp. 195–220, 2005.

19.

Roy

Baltus

Fox

Gemperle

Goetz

Hirsch

Magaritis

Montemerlo

Pineau

Schulte

and Thrun

, “Towards personal service robots for the elderly,” Proc. Of the Workshop on Interactive Robotics and Entertainment (WIRE), Pittsburgh, PA, 2000.

20.

Montemerlo

Pineau

Roy

Thrun

and Verma

, “Experiences with a mobile robotic guide for the elderly,” Proc. of the AAAI National Conference on Artificial Intelligence. Edmonton, Canada, pp. 587–592, 2002.

21.

Qian

X. D.

and Dai

X. Z.

, “Simultaneous Robot Localization and Person Tracking Using Rao-Blackwellised Particle Filters With Multi-modal Sensors,” Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, pp. 3452–3457, 2008.

22.

Lopez

M. E.

Barea

Bergasa

L. M.

Escudero

M. S.

, “A Human-Robot Cooperative Learning System for Easy Installation of Assistant Robots in New Working Environments,” Journal of Intelligent and Robotic Systems, Vol. 40, pp. 233–265, 2004.

23.

Qian

X. D.

and Dai

X. Z.

and Fang

, “Socially Acceptable Pre-collision Safety Strategies for Human-Compliant Navigation of Service Robots,” Advanced Robotics, Vol. 24, pp. 1813–1840, 2010.

24.

Qian

X. D.

and Dai

X. Z.

and Fang

, “Robotic Etiquette: Socially Acceptable Navigation of Service Robots with Human Motion Pattern Learning and Prediction,” Journal of Bionic Engineering, Vol. 7, pp. 150–160, 2010.

Decision-Theoretical Navigation of Service Robots Using POMDPs with Human-Robot Co-Occurrence Prediction

Abstract

Keywords

1. Introduction

2. Navigation System Architecture

2.1 Uncertainties in predictive navigation

2.2 PN-POMDP control framework

3. Human motion prediction

3.1 Long-term modelling of motion pattern

3.2 Short-term motion prediction

4. Human-robot Co-occurrence Estimation

5. POMDP-based decision-making for polite avoidance

5.1 The Elements: States, Actions and Observations

5.2 POMDP Compilation

6. Experimental Results

6.1 Predictive Navigation

6.2 Predictive navigation in highly populated areas

6.3 Trial study

7. Conclusion

Footnotes

8. Acknowledgements

References