A Hybrid System of Hierarchical Planning of Behaviour Selection Networks for Mobile Robot Control

Abstract

An office delivery robot receives a large amount of sensory data and there is uncertainty in its action outcomes. The robot should not only accomplish its goals using environmental information, but also consider various exceptions simultaneously. In this paper, we propose a hybrid system using hierarchical planning of modular behaviour selection networks to generate autonomous behaviour in the office delivery robot. Behaviour selection networks, one of the well-known behaviour-based methods suitable for goal-oriented tasks, are made up of several smaller behaviour modules. Planning is attached to the construct and adjust sequences of the modules by considering the sub-goals, the priority in each task and the user feedback. This helps the robot to quickly react in dynamic situations as well as achieve global goals efficiently. The proposed system is verified with both the Webot simulator and a Khepera II robot that runs in a real office environment carrying out delivery tasks. Experimental results have shown that a robot can achieve goals and generate module sequences successfully even in unpredictable situations. Additionally, the proposed planning method reduced the elapsed time during tasks by 17.5% since it adjusts the behaviour module sequences more effectively.

Keywords

Office Delivery Robot Hybrid Robot Control Behaviour Selection Networks

1. Introduction

Advances in robot technology have resulted in service robots that support people in their daily activities [1]. A mobile robot, which runs in an office environment, can help users by performing routine tasks. However, under the conditions of insufficient environmental information and uncertainties in the real situations of an office or a house, high-level functions including perception or planning are required to generate autonomous behaviour [2]. Several researchers have tried to devise appropriate control structures for office delivery robots using various approaches [3 –9].

In order to generate behaviours in mobile robots, conventional planning techniques have been applied to controlling them in well-designed environments. These approaches can generate behaviour sequences optimized in predefined environments, but unfortunately they have their own weakness of low flexibility in complex environments. Contrary to conventional planning-based systems, reactive systems can generate behaviours quickly according to environmental conditions [10]. Even though the systems work well in various and complex domains, it is difficult to generate behaviours robustly in situations of consistency or stability. These limitations encourage the necessity of hybrid behaviour generation approaches that combine the deliberative and reactive systems, which is the main motivation of this paper. Many researchers have studied hybrid architectures to control mobile service robots. Most of them used simple rule or description logic to generate a robot's behaviours in the reactive layer. However, these methods cannot guarantee sufficient decisions because the sensory information is too uncertain and the computational resources of mobile robots are limited. A behaviour selection network can select appropriate behaviours for unpredictable and changing situations by considering both goal and sensory information [11].

In this paper, we propose a hybrid system based on behaviour selection networks [11], which is a behaviour -based method. It is known for its robust performance even in insufficient environments [12]. This method can generate behaviours autonomously to achieve goals, but it is hard to perform well in the complex problems that some plans are required to solve since it was designed to run reactively. The contribution of the proposed method lies in that it overcomes this limitation by combining the behaviour network with a planning mechanism that plans the sequences of modules and adjusts the plans according to exceptions.

The rest of the paper is organized as follows. Section 2 presents background and related works for hybrid systems and applications of office delivery robots. Section 3 describes in detail the hybrid behaviour network module proposed. Section 4 reports the experiments conducted to show the usefulness of the proposed method. The experimental environments were designed in both a simulator and the real world. The final section presents a summary and future works.

2. Related Works

2.1 Hybrid Control Systems

Traditional AI techniques have the limitation that they usually work only in given environments and problems, and a lot of pre-processing steps are required. On the other hand, behaviour-based AI techniques mainly based on reactive behaviour selection are not easy to apply to complex problems; a hybrid technique of them is required to generate flexible behaviour [13].

Hybrid means a combined structure of two or more things, which have different functions. In the domain of a mobile robot's behaviour generation, the hybrid systems have two levels, the reactive and deliberative levels. The reactive level contains sensors and actuators that perceive environments and perform the chosen behaviours, respectively. The deliberative level contains information about maps of environments and the modules, which generate plans to control the reactive level [14, 15].

Hybrid systems have the merit that they take only the strengths of components and complement the weaknesses by exploiting each other. Recently, the hybrid systems have been applied to various problems with diverse approaches. Bayesian networks and behaviour networks are combined to generate robot's behaviours [16] and a planning function was attached to the reactive system for the same purpose [17].

One of the important issues for the hybrid system is the way it combines the deliberative and the reactive systems. The deliberative approach to generating the paths optimized for a specific map is useful for localization problems, but it is hard to respond to the environmental changes and requires plenty of information to plan in the domain of performing tasks [18].

Nevertheless, for a service robot, it is better to use the behaviour-based method mainly since it is more important to achieve the robot's goal and maintain autonomy. For this reason, we propose to mainly use the behaviour networks, one of the behaviour-based approaches, for autonomous behaviours of an office delivery robot, since it is known for its usefulness in goal-oriented problems [19 –22]. Since we use modularized behaviour network modules in the proposed system, the planning method is attached hierarchically in order to generate and adjust sequences of them according to situations. In the proposed system, the behaviour network and the planning are regarded as the reactive and deliberative levels, respectively.

2.2 Mobile Service Robots

Since sensors on the mobile robot cannot recognize environments perfectly, different algorithms are needed according to the domain to which the robot belongs. Especially in real-world environments, including an office, delivery robots interact with environments and there are chances to face various new circumstances during their tasks. To deal with these points, many researchers have tried to devise structures of office delivery robots with several different approaches, as shown in Table 1.

Table 1.
Previous studies on mobile service robots

Authors Methods Type of systems

Simmons et al. [3] Hierarchical planning Deliberative

Beetz et al. [4] Plan-based logic Deliberative

Breuer et al. [23] Hierarchical Task Network Deliberative

Liu et al. [24] High level planning Deliberative

Gardiol and Kaelbling [5] POMDP Reactive

Chung and Williams [6] Dividing the original problem Reactive

Theocharous and Kaelbling [7] Grid-based reinforcement learning Reactive

Ramachandran and Gupta [8] POMDP based reinforcement learning Reactive

Milford and Wyeth [9] Obstacle and experience maps Hybrid

Ferasoli-Fihlho et al. [25] Control architecture with three layers Hybrid

Lyons et al. [26] Cognitive robotics architecture Hybrid

Yun et al. [27] Control architecture with three layers Hybrid

Mendonca et al. [28] Fuzzy cognitive map based reinforcement learning Hybrid

Sgorbissa and Zaccaria [29] Roaming trail method Hybrid

Palomeras et al. [30] Reinforcement learning and Petri nets Hybrid

Wang et al. [31] Upper planning and motion control Hybrid

Nurmaini [32] Self organizing map and fuzzy logic Hybrid

Arkin and Mackenzie [33] Finite state automata Hybrid

Nurmaini [34] Fuzzy logic Hybrid

Authors	Methods	Type of systems
Simmons et al. [3]	Hierarchical planning	Deliberative
Beetz et al. [4]	Plan-based logic	Deliberative
Breuer et al. [23]	Hierarchical Task Network	Deliberative
Liu et al. [24]	High level planning	Deliberative
Gardiol and Kaelbling [5]	POMDP	Reactive
Chung and Williams [6]	Dividing the original problem	Reactive
Theocharous and Kaelbling [7]	Grid-based reinforcement learning	Reactive
Ramachandran and Gupta [8]	POMDP based reinforcement learning	Reactive
Milford and Wyeth [9]	Obstacle and experience maps	Hybrid
Ferasoli-Fihlho et al. [25]	Control architecture with three layers	Hybrid
Lyons et al. [26]	Cognitive robotics architecture	Hybrid
Yun et al. [27]	Control architecture with three layers	Hybrid
Mendonca et al. [28]	Fuzzy cognitive map based reinforcement learning	Hybrid
Sgorbissa and Zaccaria [29]	Roaming trail method	Hybrid
Palomeras et al. [30]	Reinforcement learning and Petri nets	Hybrid
Wang et al. [31]	Upper planning and motion control	Hybrid
Nurmaini [32]	Self organizing map and fuzzy logic	Hybrid
Arkin and Mackenzie [33]	Finite state automata	Hybrid
Nurmaini [34]	Fuzzy logic	Hybrid

Simmons et al. proposed a hierarchical architecture, which consists of four steps: avoiding obstacles, navigating, planning routes, planning tasks and dealing with uncertain sensory inputs and environmental information [3]. Beetz et al. designed a plan-based logic system that regards an exception as one of the adjustable plans [4]. However, these planning-based methods have their own limitations, such as that they must control lots of information about the environments and cannot respond flexibly to dynamically changing situations and exceptions. Breuer et al. developed an autonomous service robot named Johnny [23]. The control approach of the robot is based on hybrid deliberative layer (HDL), which extends the planning component with a description logic reasoning system. They also provide multimodal human machine interactions like gesture, speech and emotion detection. Liu et al. proposed a comprehensive urban search and rescue robotic system and intelligent control architecture [24]. The intelligent control architecture provided the robot system with the ability to make decisions.

On the other hand, Gardiol and Kaelbling used the partially observable Markov decision process (POMDP) to design the reactive architecture [5]. Additionally, Chung and Williams divided the original problem into several sub-problems to generate plans by reducing the complexity of the problem [6] and Theocharous and Kaelbling represented the environment in grids and used reinforcement learning to perform temporal plans [7]. Ramachandran and Gupta proposed POMDP-based reinforcement learning for a delivery robot [8].

Reactive methods adopt similar approaches to our method, which enable the robot to deal with changes robustly without environmental information. These are highly likely to show poor performance in the domain of performing tasks, since they usually only consider achieving local goals and reacting to current exceptions without maintaining global goals.

To prevent this, some hybrid architectures have been proposed [35]. Milford and Wyeth used different obstacle and experience maps for local and global navigation, respectively [9]. Ferasoli-Filho et al. developed a mobile robot, which can be controlled by myoelectric and movement sensors [25]. It can provide disabled children with a certain degree of independency in the exploration of the environment. The robot has control architecture with three layers, which are deliberative, intermediate and reactive layers. The intermediate layer accepts commands from the deliberative layer and it activates or inhibits the basic behaviours of the reactive layer.

Lyons et al. presented an autonomous platform that searches or traverses a building to observe unstable masonry or rubble [26]. In these situations, a reactive approach may produce behaviour that works against the robots' long-term goals, such as taking the quickest or safest route to a disaster victim. They used a hybrid reactive-deliberative approach to overcome the limitations of an entirely reactive approach. Yun et al. presented a robot platform named CIROS for proactive human searches in an undiscovered indoor environment [27]. They classified the control architecture with three layers including a deliberative layer using a task manager, a sequencing layer based on a configuration manager and a reactive layer. It provides human registration and identification in different places and it also provides efficient autonomous exploration and indoor modelling.

Mendoca et al. developed an autonomous navigation system using fuzzy cognitive maps (FCM) [28]. They used the model to represent the robot's dynamic behaviour in both reactive and deliberative layers. Sgorbissa and Zaccaria developed a two-level navigation architecture with deliberative and reactive agents for the navigation of a mobile robot [29]. Palomeras et al. presented a control architecture for an autonomous underwater vehicle [24]. They implemented a component-oriented, layer-based control architecture structured in three layers: reactive, execution and mission layers. They used a discrete event system based on Petri nets to implement an execution layer. Wang et al. used a deliberation-reaction hybrid control architecture to deal with motion control and mobile robot path planning [31].

The method used low-level controls for reactive actions that were managed by high-level controls. As an extension of the hybrid approaches, the proposed method is based on reactive approaches since it mainly uses behaviour networks but the planning is externally placed at a higher level to control them dynamically and consider the global goals in order to overcome the limitations of the conventional reactive methods.

3. Hybrid Behaviour Network System

The proposed system to generate behaviours in an autonomous office delivery robot consists of two levels. The low level contains behaviour selection networks, which can reflect temporary environmental changes and the high level controls the goals and plans flexibly according to the situation. Figure 1 shows the proposed system. The low level includes the specific and common behaviour selection network modules, which are selected by deliberative control plans. To generate plans flexibly according to the situations around the robot, the queue for module sequences and the database for basic behaviour sequences are used. Feedback from the users can be used to determine priorities in the queue at the deliberative control.

Figure 1.

Architecture of the proposed hybrid behaviour network system

3.1 Behaviour Network Modules

Contrary to conventional reactive systems, the behaviour network not only generates behaviours instantly but also has goals so as to solve some simple planning problems. However, as the problem gets more complex, it is difficult to select behaviours accurately with only a flat network [36, 37]. In order to overcome this limitation, the behaviour selection network is divided into several behaviour network modules.

The purposes of the usage of the modularized behaviour networks are as follows.

The modular behaviour network is more easily designed and reused than a flat network [19].

Confusions, which can occur when selecting behaviours in a large flat network, can be reduced by giving only one goal to each smaller network module [37].

Each behaviour network module in the proposed method includes a behaviour network oriented to a single corresponding goal. The behaviour network is used as the method for selecting the most natural and suitable behaviours for the situations. The behaviour networks are a model that consists of relationships between behaviours, goals and external environments, and selects the most suitable behaviour for the current situation.

In the behaviour network, behaviours, external environments and internal goals are connected with each other through links. Each behaviour contains preconditions, an add list, a delete list and activation. The preconditions are a set of conditions that must be true in order to execute behaviours. The add list is a set of conditions that are highly likely to be true when behaviours are executed. The delete list is a set of conditions that are likely to be false when the behavioural entities are executed. The activation represents to what extent the behavioural entity is activated.

The activation energies of behaviours are first induced from external environments and the goal. The activation of the ith behaviour A_i can be presented as follows:

A_{i} = A_{i} + \sum_{n} w_{e} E_{i, n} + \sum_{m} w_{g} G_{i, m} (E_{i, n}, G_{i, m} = 0, 1)

(1)

where w_e and w_g are the weights needed to induce activation energies from environments and goals respectively. E_i,n and G_i,m represent whether the nth environment element and the mth goal are connected with the ith behaviour or not, respectively.

After the first induction, behaviours exchange their activation energies with other behaviours considering the types of their links. The behaviour exchange can be presented as follows:

\begin{array}{l} A_{i} = A_{i} + \sum_{n} (w_{p} P_{i, j} + w_{s} S_{i, j} - w_{c} C_{i, j}) \\ (i \neq j, P_{i, j}, S_{i, j}, C_{i, j} = 0, 1) \end{array}

(2)

where w_p, w_s and w_c are the weights needed to exchange activation energies through predecessor, successor and conflictor links, respectively, and P_i,j, S_i,j and C_i,j represent whether the ith and jth behaviours are connected by each type of link, respectively.

The behaviour networks have a threshold to decide which behaviours are executable. Using this, the behaviour networks select the behaviour where all the preconditions are true and the activation energy is larger than the threshold. Unless any behaviour is selected, the behaviour selection system constantly reduces the threshold until the behaviour is selected. Figure 2 shows how the behaviour network generates behaviour.

Figure 2.

The process of the behaviour network

A behaviour network module consists of one goal, external environments and behaviour nodes. Each module is mapped to a sub-goal from the planning system. If the planning system chooses a single sub-goal to achieve, the corresponding behaviour network module is activated and generates behaviour sequences.

In this paper, we have designed two behaviour network modules – go to a room and find objects and two common modules – navigate and avoid obstacles. Figure 3 shows the designed behaviour network modules.

Figure 3.

Designed behaviour networks (a: go to a room, b: find objects, c: navigate, d: avoid obstacles)

3.2 Planning of Goal Sequences

The deliberative control system does not plan the sequences of all primitive behaviours or trajectories, but plans sequences of sub-goals to control the behaviour network modules. Since we designed several small separate behaviour modules with sub-goals, they should be controlled explicitly to achieve the global goal. To plan goal sequences, the deliberative module and the behaviour network-based modules are connected as shown in Figure 4.

Figure 4.

Architecture of the goal sequences planning for behaviour modules

Since the behaviour modules do not have any information about the map of the environment, it is difficult to perform plans correctly in complex environments. To deal with this, the deliberative module checks the accomplishments of the sub-goals and controls plans when situations are changed and the plan in each behaviour network module controls only partial behaviour sequences to achieve the sub-goal of the corresponding module.

The deliberative control module makes plan by deciding priorities of goal sequences to achieve the global goal and adjusting priorities when exceptions or feedbacks occur. The module uses the basic behaviour library that includes the basic sequences of behaviours required to perform when tasks are given. The library is defined before the usage and can be modified by the feedbacks of the user. When the user gives tasks the sequences are planned using the library and inserted into the queue. At the ‘Check event’ stage, the robot checks for changes of situations and adjusts the sequences.

3.2.1 Priority-based Sequence Planning

To plan and adjust the sequences, the priorities of tasks are used. In this paper, the priority is defined as the deadline of the delivery required by the user. For this process, we define several parameters as follows.

C = {c_i}: a command set

D = {{d_i}: {d_i} ∊ C}: a decomposed command set

Q = {q_i: q_i = d₁, …, d_k, i < max_queue}: the command queue

X = {Wait, Critical, Minor}: a user feedback set

Firstly, priorities are determined according to the requested deadline and the order of tasks as shown below:

P_{F i x}^{i} = \frac{(t_{M a x} - t_{i})}{t_{M a x}} \times 10 + (O_{M a x} - O_{i})

(3)

where t_i and O_i indicate the remaining time and the order of the ith task, respectively. Max means the possible maximum value of the corresponding variable.

Secondly, priorities are adjusted by additionally considering the position of the robot as follows:

P_{D y n a m i c}^{} (q_{i}) = {\begin{matrix} P_{F i x} (q_{i}), if F r o m (i) = S or t_{i} < θ \\ P_{F i x} (q_{j}), if F r o m (i) \neq S and \exists j \cdot ϶ \cdot F r o m (j) = S \\ f (S), if S ∊ X \end{matrix}

(4)

where S is the current state of the robot, From(i) indicates the starting point of the ith task and f(S) is the priority decided by the feedback.

3.2.2 Sequence Queue and User Feedback

The sequence queue contains feedbacks from the user. Each of them consists of an index of the user, a type of command, a deadline, a point of departure and a destination. When the feedback is given, the robot seeks sequences for the corresponding command and puts the sequences into the queue. If there is no relevant sequence in the library, the robot requests feedback to the user.

The priorities of behaviour modules in the sequence are computed with the order of the task and the deadline by using Eq (3) and (4) in Section 3.2.1. Each module is sorted by priority in the sequence queue. For this job, the queue contains information as shown in Figure 5. The front four fields are input by the user and the next five fields are used to manage the plan flexibly.

Figure 5.

Information for each item in the sequence queue

Each task has a segmented sequence with subtasks. For example, a single delivery task is split into the subtask to bring the object from the point of departure and another subtask to move the object to the destination. Each task has a checkpoint that indicates which subtask is performed last. The checkpoint enables the plan to be adjusted flexibly according to the change of situation. The subtask has a sequence of several behaviour modules. Figure 6 shows examples of the sequence queue, which manages the tasks and sequences of behaviour modules activated by the task.

Figure 6.

Examples of the sequence queue and sequences of behaviour modules

Task adjustments are preceded according to the position of the robot as follows:

S e q (q_{i}) = {\begin{matrix} q_{k}, if \exists {Pos}_{k} \cdot ϶ \cdot P o s_{k} = S and t_{i} > θ \\ q_{j} \to d_{l}, if \exists {Pos}_{j} \cdot ϶ \cdot P o s_{j} = S and (q_{j} \to d_{l} = T a k e or q_{j} \to d_{l} = G i v e and \exists o b j) \\ q_{i}, otherwise \end{matrix}

(5)

where Seq(q_i) indicates the target command to be placed instead of q_i, Pos_i is a set of positions that q_i contains and q_i→d_l is the lth behaviour in q_i. For example, the robot may pass another room not required for the task during its movement from the starting point to the destination. In this case, it searches for the task, which the robot should fulfil at its current location. If the deadline of the task in progress is greater than the threshold, it changes the plan to execute the task found in advance. Otherwise, it ignores the task found and continues its previous job.

4. Experiments

In order to show the feasibility and the usefulness of the proposed system, we conducted experiments for the office delivery tasks of the mobile robot.

4.1 Experimental Setup

The hybrid behaviour generation system was applied to the mobile robot, Khepera II, which has a wireless camera sensor, eight infrared sensors, eight light sensors, one gripper and two motors, as shown in Figure 7. Each proximity sensor value is between 0 and 1023. The experiments were performed in both the Webot simulation environment and a real-world environment.

Figure 7.

Khepera II mobile robot. (a: a snapshot with one gripper and camera, b: the infra-red proximity sensors in Khepera II).

For the office delivery tasks, we designed an office environment that includes four rooms and one aisle. The colours of each pair of a door and a room were coloured identically and the robot could recognize each room by referring to the colour of the corresponding door. If some doors were closed, we changed colour of them to black. Since the robot does not have any information about the environment, it should navigate using only recognized colours of rooms. Figure 8(a) and (b) show the experimental environment constructed in the simulator and real world, respectively.

Figure 8.

The experimental environment with four rooms and a corridor. (a: the environment implemented in a simulator, b: the environment built in real-world.)

In order to analyse the results, the elapsed time measured by running steps, trajectories, sequences of modules and the behaviour of the robot were obtained. Infrared camera-based tracking was used to obtain the trajectories and a light was placed where the object was so that the robot could find the object by following the light.

4.2 Qualitative Analysis

In this section, we analysed the planned goal sequences from various tasks. We obtained the rates of success and failure after performing all the tasks and analysed changing the sequences according to errors and feedback from the user.

The task of delivering the object from a specific room (A) to another room (B) was given in the experiments. First of all, we obtained the trajectories of the robot during the task. Figure 9(a) and (b) show the results from the simulation environment. They are the trajectories for the delivering task from Room 2 to Room 1 and the task from Room 4 to Room 3, respectively. If the robot was located in the room or in the corridor, it started the behaviour module for searching the destination and used the camera for sensing since it did not have map information of the environment. When the robot reached the destination room, it followed the light to find the object.

Figure 9.

Trajectories of the robot (a: delivering the object from Room 2 to Room 1, b: delivering the object from Room 4 to Room 3, the red circle and rectangle indicate a point of departure and a destination, respectively).

Additionally, we obtained the results of behaviour selection for the task as shown in Figure 10. Figure 10(a) and (b) show the sequence of chosen behaviour modules and behaviours, respectively.

Figure 10.

Sequences of chosen behaviour modules (a) and chosen behaviours (b) during the task of delivering the object from Room 2 to Room 1.

Additionally, in order to verify the usefulness of the sequence adjusting process (see Section 3.2.2), we designed seven delivery tasks, shown in Table 2. Experiments were conducted both with and without sequence adjustments for the tasks. The sequences of the chosen modules and the robot's location were obtained as shown in Figure 11.

Figure 11.

Sequences of chosen behaviour modules (a) and robot's location (b) during given seven tasks (a: without sequence adjustments, b: with sequence adjustments)

Table 2.

Seven delivery tasks given

Task	Deadline	Point of departure	Destination
1	1	Room 1	Room 2
2	1	Room 3	Room 1
3	2	Room 2	Room 3
4	3	Room 3	Room 1
5	1	Room 1	Room 4
6	2	Room 3	Room 2
7	1	Room 4	Room 1

Using sequence adjustment processes, the robot modified its behaviour sequence according to its location. If the robot had achieved its goal in a certain room, it sought the task that could be started in that room. As a result, it reduced the steps wasted in the corridor. The robot finished all the tasks within 3956 steps without sequence adjustments, whereas it completed them within 3264 steps, a 17.4% reduction, with adjustment processes.

Additionally, we compared the proposed method with a reactive approach. The reactive approach only has rules without a deliberative method. The rule-based reactive system can generate behaviours quickly according to environmental conditions, but the reactive behaviour selections are not easy to apply to a changing environment. Figure 12 (a), (b), (c) and (d) show the trajectories of the reactive method and the proposed method in a changing environment, which is different from previous experiment. The size and location of the doors in rooms were changed. In Figure 12 (a) and (b), the robots are in Room 1 and the door is closed. The reactive method tries to find the exit continuously, but the proposed method waits in front of the door of Room 1. Figure 12 (c) and (d) show the trajectories from Room 2 to Room 4. Because the door of Room 2 is opened after a fixed time, the robot with only a reactive approach conducts useless trials for the closed time, but the robot with the proposed method waits in front of the door and goes to Room 4 immediately after the door to Room 2 is opened. This reduces the battery usage and the delivery time. The experiments show that the proposed method is suitable for controlling a mobile service robot for effective services.

Figure 12.

Comparison between reactive and proposed method in changing environments

4.3 Quantitative Analysis

For quantitative analysis, we obtained the elapsed time during tasks. We initially located the robot randomly and made it repeat random delivery tasks 30 times. Table 3 shows the minimum, average, and maximum number of steps after 30 tasks.

Table 3.
Minimum, average, and maximum steps after 30 tasks

Minimum Average Maximum

804 1930 5370

Minimum	Average	Maximum
804	1930	5370

Figure 13(a) and (b) show the trajectories obtained from results with the maximum and minimum steps, respectively. The task from Room 4 to Room 2 took the smallest number of steps. The maximum number of steps were taken in the case where the robot was initially located in the corridor because it took a long time to find the target room according to the state of the sensors. Even though the robot started the task in the corridor, differences between results were shown in accordance with the distance between the room and the sensory states.

Figure 13.

Trajectories from results with (a) maximum and (b) minimum steps

5. Concluding Remarks

We have proposed a hybrid behaviour system for an autonomous mobile robot for office delivery tasks. The system is oriented to behaviour network modules, which is useful for performing tasks in real-world environments. Moreover, a method for planning is attached to supplement them. The planning system generates and manages the overall sequences of behaviour modules and the behaviour modules achieve several sub-goals and generate autonomous behaviours quickly. This approach enables independent environmental modules to be designed and solves ambiguities among behaviour selection in one complex behaviour network. Additionally, controlling behaviour modules via planning accomplished maintenance of the global goal even though each module only concerns its sub-goal.

Experiments were conducted to verify the feasibility and the usefulness of the proposed method. We implemented a simple office environment in both a simulation and the real world with the Khepera II mobile robot and designed several delivery tasks. As the result, it is confirmed that the robot can achieve the goal even though there are temporary exceptions and it changes its plan when adjustments are required to complete tasks more efficiently.

Since we focused only on modelling the hybrid architecture and implementing a workable system, there are still many experimental and analytical issues to be considered in future work, such as validating the suitability of the proposed method and comparing the proposed method with conventional approaches. Though the behaviour network modules can be applied independently to various environments, it should be designed by experts in a manual way. For future work the method for learning structures of networks and controlling them automatically should be investigated. Moreover, the proposed method should be tested on more complex environments.

Footnotes

6.

This research was supported by the Original Technology Research Program for Brain Science through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010–0018948). We express our personal appreciation of the valuable assistance given by Hyeon-Jeong Min and Jong-Won Yoon who were members of Soft Computing Laboratory, Yonsei University.

References

Huttenrauch

Green

Norman

Oestreicher

Eklundh

K. S.

(2004) Involving users in the design of a mobile office robot. IEEE Trans. on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 34(2): 113–124.

Beetz

(2002) Plan-Based Control of Robotic Agents: Improving the Capabilities of Autonomous Robots, Springer.

Simmons

Goodwin

Haigh

Koenig

O'Sullivan

Veloso

(1997) Xaiver: Experience with a layered robot architecture. ACM SIGART Bulletin, 8(1–4): 22–33.

Beetz

Arbuckle

Belker

Cremers

A. B.

Schulz

Bennewitz

Burgard

Hahnel

Fox

Grosskreutz

(2001) Integrated, plan-based control of autonomous robots in human environments. IEEE Intelligent Systems, 15(5): 56–65.

Gardiol

N. H.

Kaelbling

L. P.

(2003) Envelope-based planning in relational MDPs. Advances in Neural Information Processing Systems, 16: 1040–1046.

Chung

S. H.

(2003) A Decomposed Symbolic Approach to Reactive Planning, Master's Thesis, MIT.

Theocharous

Kaelbling

P. L.

(2004) Approximate planning in POMDPS with macro-actions. Advances in Neural Information Processing Systems, 16: 775–782.

Mirford

Wyeth

(2010) Hybrid robot control and SLAM for persistent navigation and mapping. Robotics and Autonomous Systems, 58(9): 1096–1104.

Ramachandran

Gupta

(2009) Smoothed SarSa: Reinforcement learning for robot delivery tasks, In Proceedings of International Conference on Robotics and Automation (ICRA), Kobe, Japan, 2125–2132.

10.

Mataric

M. J.

(1998) Using communication to reduce locality in distributed multi-agent learning. Journal of Experimental and Theoretical Artificial Intelligence, 10(3): 357–369.

11.

Maes

(1989) How to do the right thing. Connection Science, 1(3): 291–323.

12.

Tyrell

(1994) An evaluation of Maes's bottom-up mechanism for behavior selection. Adaptive Behavior, 2(4): 307–348.

13.

Arkin

R. C.

Mackenzie

D. C.

(1994) Planning architecture for mobile manipulation. In Proceedings of 5th International Symposium on Robotics and Manufacturing, Maui, Hawaii, 5–12.

14.

Aaron

Admoni

(2009) A framework for dynamical intention in hybrid navigating agents. Lecture Notes in Computer Science, 5572: 18–25.

15.

Chernova

Arkin

R. C.

(2007) From deliberative to routine behaviors: A cognitively inspired action-selection mechanism for routine behavior capture. Adaptive Behavior, 15(2): 199–216.

16.

Min

H. J.

Cho

S.-B.

(2010) Adaptive behaviors of reactive mobile robot with Bayesian inference in nonstationary environments. Applied Intelligence, 33(3): 265–288.

17.

Tan

A.-H.

Ong

Y.-S.

Tapanuj

(2011) A hybrid agent architecture integrating desire, intention, and reinforcement learning. Expert Systems with Applications, 38(7): 8477–8487.

18.

Thrun

Beetz

Bennewitz

Burgard

Cremers

A. B.

Dellaert

Fox

Hahnel

Rosenberg

Roy

Schulte

Schulz

(2000) Probabilistic algorithms and the interactive museum tour-guide robot Minerva. Journal of Robotics Research, 19(11): 972–999.

19.

Nicolescu

M. N.

Mataric

M. J.

(2002) A hierarchical architecture for behavior-based robots. In Proceedings of First International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Bologna, Italy, 227–233.

20.

Weigel

Gutmann

J. S.

Dietl

Kleiner

Nebel

(2002) CS Freiburg: Coordinating robots for successful soccer playing. IEEE Trans. on Robotics and Automation, 19(5): 685–699.

21.

Yoon

J. W.

Cho

S. B.

(2010) A mobile intelligent synthetic character with natural behavior generation. In Proceedings of International Conference on Agents and Artificial Intelligence (ICAART), Valencia, Spain, 315–318.

22.

Lim

S. S.

Yoon

J. W.

K. H.

Cho

S. B.

(2009) Gesture based dialogue management using behavior network for flexibility of human robot interaction. IEEE International Symposium on Computational. Intelligence in Robotics and Automation (CIRA), Daejeon, Korea, 592–597.

23.

Breuer

Macedo

G. R. G.

Hartanto

Hochgeschwender

Holz

Hegger

Jin

Müller

Paulus

Reckhaus

Ruiz

J. A. Á.

Plöger

P. G.

Kraetzschmar

G. K.

(2012) Johnny: An autonomous service robot for domestic environments. Journal of Intelligent & Robotic Systems, 66(1–2): 245–272.

24.

Liu

Tong

Zhang

(2012) An intelligent control architecture for search robot based on orthogonal perception information. In 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012), Sichuan, China, 2348–2352.

25.

Ferasoli-Filho

Caldeira

M.A.C.

Pegoraro

Franco dos Reis Alves

Valadao

Bastos-Filho

T.F.

(2012), Use of myoelectric signals to command mobile entertainment robot by disabled children: Design and control architecture. In ISSNIP Biosignals and Biorobotics Conference (ISSNIP 2012), Bauru, Brazil, 1–5.

26.

Lyons

D. M.

Paramesh

Benjamin

D. P.

(2012) Navigation of uncertain terrain by fusion of information from real and synthetic imagery. In SPIE Proceedings, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications, Baltimore, Maryland, 8407: 84070D–1–8.

27.

Yun

Jun

Kim

Lee

Choi

M. T.

Kim

Park

J. T.

Song

J. B.

(2012) Proactive human search for the designated person with prior context information in an undiscovered environments. Journal of Intelligent & Robotic Systems, 67(2): 79–99.

28.

Mendonça

de Arruda

L. V. R.

Neves

Jr. (2012) Autonomous navigation system using event driven-fuzzy cognitive maps. Applied Intelligence, 37(2): 175–188.

29.

Sgorbissa

Zaccaria

(2012) Planning and obstacle avoidance in mobile robotics. Robotics and Autonomous Systems, 60(4): 628–638.

30.

Palomeras

El-Fakdi

Carreras

Ridao

(2012) COLA2: A control architecture for AUVs. IEEE Journal of Oceanic Engineering, 37(4): 695–716.

31.

Wang

Liu

Gao

Sun

Xiao

(2012) Information fusion-based mobile robot path control. In 24th Chinese Control and Decision Conference (CCDC), Shijiazhuang, China, 212–217.

32.

Nurmaini

(2012) Memory-based reasoning algorithm based on fuzzy-Kohonen self organizing map for embedded mobile robot navigation. International Journal of Control and Automation, 5(3): 47–64.

33.

Arkin

R. C.

MacKenzie

D. C.

(1994) Planning to behave: A hybrid deliberative/reactive robot control architecture for mobile manipulation. In International Symposium on Robotics and Manufacturing.

34.

Nurmaini

(2012) Development mobile robot control architecture with integrated planning and control on low cost microcontroller. Journal of Theoretical and Applied Information Technology, 35(1): 100–111.

35.

Xie

Yang

Zhang

(2012) Research on classification of intelligent robotic architecture. Journal of Computers, 7(2):450–457.

36.

Decuqis

Ferber

(1998) An extension of Maes' action selection mechanism for animats. In Proceedings of Fifth International Conference on Simulation of Adaptive Behavior, 153–158.

37.

Tyrell

(1993) Computational Mechanisms for Action Selection, PhD Thesis, University of Edinburgh.

38.

Tyrell

(1993) The use of hierarchies for action selection. Adaptive Behavior, 1(4): 387–420.

39.

Lee-Johnson

C. P.

Carnegie

D. A.

(2010) Mobile robot navigation modulated by artificial emotions. IEEE Trans. on Systems, Man, and Cybernetics - Part B: Cybernetics, 40(2): 469–480.

40.

Goller

Kerscher

Zollner

J. M.

Dillmann

(2008) Behavior network control for a holonomic mobile robot in realistic environments. In Proceedings of Eleventh International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines (CLAWAR), Coimbra, Portugal, 98–105.

A Hybrid System of Hierarchical Planning of Behaviour Selection Networks for Mobile Robot Control

Abstract

Keywords

1. Introduction

2. Related Works

2.1 Hybrid Control Systems

2.2 Mobile Service Robots

4.1 Experimental Setup

Table 3. Minimum, average, and maximum steps after 30 tasks Minimum Average Maximum 804 1930 5370

Footnotes

6.

References

Table 3.
Minimum, average, and maximum steps after 30 tasks

Minimum Average Maximum

804 1930 5370