Abstract
It is still a challenge for robots to interact with complex environments in a smooth and natural manner. The robot should be aware of its surroundings and inner status to make decisions accordingly and appropriately. Contexts benefit the interaction a lot, such as avoiding frequent interruptions (e.g., the explicit inputting requests) and thus are essential for interaction. Other challenges, such as shifting attentional focus to a more important stimulus, etc., are also crucial in interaction control. This paper presents a hybrid automatic control approach for interaction, as well as its integration, with these multiple important factors, aiming at performing natural, human-like interactions in robots. In particular, a novel approach of architectural attentional control, based on affection is presented, which attempts to shift the attentional focus in a natural manner. Context-aware computing is combined with interaction to endow the robot with proactive abilities. The long-term interaction control approaches are described. Emotion and personality are introduced into the interaction and their influence mechanism on interaction is explored. We implemented the proposal in an interactive head robot (IHR) and the experimental results indicate the effectiveness.
1. Introduction
In recent years, human-robot interaction (HRI) has drawn more and more attention from researchers in multiple disciplines. HRI requires the robot not only to receive information passively from the environment, but also to make decisions appropriately and change the environment actively and thus be more autonomous and intelligent[1]. The robot should be aware of its surroundings, including people, location, time, events and so on, in order to generate proper interactive behaviours accordingly.
Moreover, frequently arising requests in the interaction, such as the explicit request for inputting the user identification, etc., will interrupt the interaction process and thus should be minimized or avoided. To make the interaction more natural and smoother, sufficient contextual information should be obtained unobtrusively and should be used and integrated tightly with the interaction decision. Applying context-ware computing [2] in HRI will promote the interaction performance significantly. Context-aware technology is used widely in many domains [3-5], while the research that integrates context-aware computing into HRI remains lacking. We believe this integration (i.e., a contextual interaction) is of great importance in performing well in interactions.
What is the start of a contextual interaction? For the interaction process the robot has with the environment, the robot should first “notice” an object and then the consequent cognitive activities (such as the interaction activities) concerning this “focus” are able to start. The shift from one attentional “focus” to another runs through the whole interaction process and also concerns the management of attentional resources. However, the environment that the robot interacts with is often dynamic and complex and the simultaneous interactors are always multiple [6]. How to allocate and manage the attentional resource in a proper manner and how to shift the attentional focus at the right time to the most important and urgent stimulus is currently a significant and difficult problem of HRI and is crucial to perform a successful contextual interaction for the robot.
A large research effort in attentional control has been limited to visual attention, aiming at selecting a proper visual focus in the vision and then moving it through the vision by a route according to the requests of the task [7-9]. At the architectural level [10, 11], however, the research in attentional control remains lacking and concentrates on developing mechanisms or approaches for agents to automatically ignore undesired/unrelated external events, to divert the attentional focus to more urgent and important stimulus and to manage the attentional resource effectively.
Most of the research mentioned previously is based on the work from a pure cognitive perspective. There is still a lack of investigations into attentional control from the point of view of affect (such as emotion) and its interaction with cognition, even though emotion plays an important role in cognition and interacts with cognition deeply [12-14]. Therefore, we are inspired to propose an approach of architectural attentional control based on emotion and personality, aiming to solve the problem mentioned above. Meanwhile, the influences of emotion and personality on the attention system and their integration with the interaction process are carefully considered.
Hence, in this paper we present a hybrid automatic control approach for interaction, which contributes in the following ways: 1) to endow the robot with the ability to perform personalized and proactive interactions, context-aware computing is combined with the interaction, 2) aiming at the challenge to control the robot's attention in a natural and proper manner, an architectural attentional control based on emotion and personality is proposed and 3) the integration of interaction control with other essential functions to perform a natural interaction is discussed in detail.
2. Contextual interaction integrated with attentional control and context-awareness
2.1. Context generation and computing
One goal of context-aware computing is to obtain contextual information and then utilize it to generate interactive behaviours proactively and intelligently that correspond to particular people, locations, times, events, etc., in the process of HRI. However, conventional HRI systems seldom automatically refer to context data, which is indispensable for proactive and intelligent interactions.
Taking the characteristics of HRI into account, we roughly group the contexts of HRI into four categories: 1) environmental context, such as time, temperature, location, people nearby, etc., 2) device context, such as the state of battery power, angle of the camera, etc., 3) user-related context, such as the user's preference, habit and other personal information and 4) interaction history, which can be arranged by the user or by the robot.
Some of this context data can be obtained directly through perceptual data, such as time and temperature. However, due to the diversity of sensors, the sensed data varies in format and content. Some of it is too primitive to be used directly and so the converting of primitive contextual data into an inferred context is necessary. The reasoning contexts can be divided into three categories: backward reasoning, forward reasoning and mixed reasoning. During the interaction, some supposed contexts must be verified and inferred as targets. In contrast, the primitive context and some specific perceptual data should be reasoned to get a more meaningful inferred context. Both backward and forward reasoning are used in our work.
The generated primitive and inferred contexts will be delivered to high-level contextual applications for further use. This delivery process can be implemented in a passive manner or in an active manner. In a passive manner, the contexts will be delivered whenever they are requested. In an active manner, contexts will be first checked if they can meet the constraints of the precedence of behavioural triggering rules. Only the ones that have passed this filtering will be delivered to the high-level contextual application to trigger or promote its running.
2.2. Attentional control of interaction
2.2.1. Attentional definition and management
The parameter dynamic_attrib describes the dynamic attributes and varies over time. Herein, dynamic_attrib = <p, age, k, r, state >, where p is the attentional intensity of the AO, age is its existing time, k is the decay rate of attentional intensity, r is attentional resource allocated to this AO and state describes the current states of the AO, state ∈ {created, active, inhibited, destroyed}. The created state represents that the AO was created but has not been sent to the attentional pool. The active state means it is active, while the inhibited state indicates that the AO is inhibited by the attention system. The destroyed state represents that the AO will be destroyed and the allocated attentional resource will be reclaimed.
2.2.2. Architectural attention shift
The management of AOs includes the adding, deleting and refreshing operations carried out on them and the monitoring of the changes of AOs. All these operations are based on a set of AOs, which is called an AO pool. When adding a new AO into the pool, a judgment will be made whether the corresponding AO has already been in the pool. If so, the AO should be refreshed rather than a new one created. The refresh algorithm concerns the attentional intensity and the allocated attentional resource.
where α+β=1, ci is the initial attention intensity. Refreshing the allocated attentional resource follows:
Herein, θ is a proportional coefficient and rtotal is the total amount of the spare attentional resource.
The function to delete an AO occurs when one of the following conditions is satisfied: 1) its attentional intensity decreases below a threshold, 2) its state becomes destroyed or 3) the attentional resource is exhausted and the AO has the smallest interest degree. When an AO is deleted, its attentional resource will be reclaimed.
According to the attention theory in psychology, the attentional intensity of each AO decays over time, as described in the following equation.
where c is the initial attentional intensity, t is the time step, tb is the birth time and k is the decay rate of the attentional intensity.
In general, the AO with the greatest attentional intensity will be selected as the active AO of the interaction if its state is not inhibited. To maintain the continuity of an interaction process, this selection algorithm is not always carried out. As an alternative, the changes in attentional intensities of AOs are monitored all the time. The supervision of the status of each AO in the AO pool is performed by the attention monitor. If this module observes a large variation of attentional intensity, which is greater than the threshold, it will start an evaluation of attention shift. After the AO has passed the evaluation, an attention shift signal will be sent to the interaction module to perform an attention shift and then the AO becomes the active AO.
It should be noticed that both increases and the decreases in the attentional intensity are monitored. For example, when the robot's emotion changes to sadness, which will result in suppression of the active AO, the drop in attentional intensity of the active AO is brought to the attention of the monitor and then an evaluation of the attention shift is conducted. After passing the evaluation, the new AO wins the competition of being the active AO and manages to draw the robot's attention.
2.3. Automatic control of contextual interaction
2.3.1. Interaction decision
Interaction decision aims at promoting and controlling the whole interaction process with the support of context-aware computing, attentional control, detection of user intention, etc. It processes the perceptual information, contexts, users' intention and active AO and then generates the behavioural instructions accordingly and sends them to the behaviour system.
We group the users' intentions into two categories: explicit and implicit intention. The user's commands and requests and some well-defined information from the user will be considered as explicit intentions. While the intentions obtained by means of analysing the user's interactive behaviours are called implicit intentions. For the sake of simplicity, the implicit intentions are reduced and only several limited but frequently used intentions are considered. We implemented the following intentions: 1) command, an order requiring the robot to complete a specific task, usually by voice, 2) trying to start an interaction, for example, a user approaches the robot and the distance is less than a certain threshold and 3) trying to terminate the interaction, for instance, the user leaves the robot and the distance is greater than the threshold.
As the interaction process may last for a long time, the process state of the interaction will be maintained to control the interaction course, as discussed bellow. The judgment of the current status of the interaction will be made first for an interaction decision. If the robot is not interacting with an interactor, it generates the behavioural instructions, such as sleep or explore, accordingly and sends them to the behaviour system to perform. If several AOs exist in this case, some corresponding actions will be generated to express the robot's concerns about it.
When an interaction begins, several factors concern the interaction decisions. The intention of a user is used to regulate the process state of the interaction or to generate behaviours directly. Other factors, such as perceptual information, can also change the process state. Reversely, the process state can help the interaction decision generate behavioural instructions accordingly. For example, the process state of Being Alone may cause the interaction decision to send an instruction “Sleep” to the behaviour system. It should be noticed that some of the user intentions (e.g., a user's command or request) will be directly mapped into behaviours.
The perceptual information and contexts are primarily used to generate the detailed interactive behaviours. In the case that the active AO has been determined, the generated behaviours will be led to this AO, even if the interaction has not started. If there is no active AO in the attention pool, the perceptual information and contexts are also used to generate behaviours such as “Explore” in the environment and to feed the attention system to generate a new active AO to prepare for interaction. Hence, with consideration of the active AO and the current process state, the perceptual information and contexts are first analysed and inferred to determine the behavioural categories they belong to and then will be mapped to the corresponding behaviour modules in the behaviour system to generate the detailed actions.
The active AO denotes the current interaction focus when the robot is interacting. All the consequent interactive behaviours are generated and are concentrated on this focus during the interaction. Therefore, the arising of a new active AO, if there is any, often means a series of changes in the interaction context. The interaction decision module will then rearrange its resources and operations on this new focus and the former one will be suppressed.
The classification of interactive behaviours may benefit the understanding of the interaction processes. Inspired by the behaviour classification of ethological models, we primarily grouped the robot's interactive behaviours into five categories: fundamental behaviour, contextual command, contextual reactive behaviour, contextual deliberative behaviour and contextual social behaviour. From left to right, their levels of autonomy increase. In [15] we report findings from another set of interaction studies that explored the developmental learning skills of the robot, in both online and offline manners, from its interactions.
2.3.2. Control of long time-span interaction process
A method based on process state is presented to control the long time-span interaction process. The transferring of process states is illustrated in Figure 1, in which the robot has five states: Being Alone, AO Detected, Focus Confirmed, Interaction Ready and Active Interaction and the activities of transferring these states or keeping on one state run through the whole interaction process.
Being Alone is the initial state of an interaction, and the robot detects no person or interesting objects in the environment.
AO Detected is used to describe a situation where the robot has detected at least one attractive object in the surroundings, but the focus of attention has not been confirmed yet.
Focus Confirmed describes the situation that the robot has confirmed its attentional focus. This state may result in a series of actions, such as turning the cameras towards the object, etc. This process state can be caused by various reasons, such as the detection of an approaching user (but the distance is still above the threshold), the loud voice of someone and so on.
Interaction Ready is the state where the requirements to start an interaction have been satisfied and both the robot and the interactor are ready to interact. This process state can be influenced by the robot's personality. We establish a different personality for the robot and in certain personalities such as a “warm” robot (agreeableness = warmth), the robot will interact actively with the active AO even if it is not willing to interact with the robot. In this case, the process state will change from Focus Confirmed to Interaction Ready directly and to the next interaction state after sending out a greeting message.

Transferring of process states
Active Interaction is used to describe the situation where the robot begins interacting with the focused interactor. In this state the attention of the robot and the subsequent generated behaviours are concentrated to the interactor until the event of state transferring or the interruption of a current interaction has been detected. The reasons for moving from this state are described as follows.
By influencing the interaction decision, each process state has corresponding generated behaviours, which are summarized in Table 1.
Summary of the process states and corresponding behaviours
2.3.3. Interaction interruption
We proposed a method based on priority and personality to solve interaction interruptions. If the priorities are available, they will be computed and used to make a decision on whether to interrupt the current interaction, or else the decision will be made based on the robot's personality.
Several factors contribute to the computation of priority, such as the personal information of the interactor, the familiarity with the interactor and the importance of the corresponding activity. The interactor's identity and status are selected as the personal information to compute the priority. The importance of an activity depends on its classification and how desirable it is to the robot under the current interaction context. The computation of familiarity is based on previous interaction records. The total time (in hours) and the total days that the person spends interacting with the robot are used to calculate the familiarity rating as follows.
All these factors are calculated to get a weighted average priority. If one of them is not available, then the computation of this factor will be cancelled.

Interaction interruption processing based on priority and personality
We set the robot to have two typical personalities: dominance or submissiveness. A robot with the dominance personality is not easily interrupted by other interaction intentions, while the robot with the submissiveness personality, on the contrary, may change its interaction focus easily. Figure 2 illustrates the interruption processing procedure based on personality and priority. The advantage of this method for processing interaction interruptions is that it is more natural and is more accordant with the robot's own personality.

An interaction integrated architecture
Description of the interaction process
2.4. An integrated architecture for hybrid interaction
We present an integrated architecture for contextual interaction, as illustrated in Figure 3, which aims to integrate the interaction control mechanism with other essential functions tightly, to perform effective interactions. It consists of two layers, namely an affective layer and a cognitive layer. The cognitive layer is primarily composed of a perceptual system, context-aware computing, contextual interaction, an attention system and a behaviour system and aims to implement the main function of the interaction model. The affective layer consists of emotion and personality components, which mainly contribute to interactive focus shifting and attentional control, influencing the interaction process and generating emotional responses.
Tight and close relations exist between intention detection, allocation and control of attention, interaction control, and context-aware computing, and all these elements cooperate together to perform a successful interaction.
3. Integrating interaction with emotion and personality
3.1. Introducing emotion into interaction
We implemented a categorical model of emotions [16, 17]. Several basic emotions suggested by [16] include sadness, joy, anger and fear and only some of them are implemented in this work. Since this model is intended for social interactive robots, emotions are triggered primarily by the events or objects of the interaction. Hence, we established several emotional triggers for the robot to evaluate events, objects, etc. and emotions occur when the stimuli are intense enough to pass the evaluation.
All the triggering thresholds of emotions form a vector T = [δ1, δ2,…, δn]. Each triggered emotion has an associated intensity level, which is represented as a real number ranging from 0 to 2 (high threshold), as well as a valence rating (positive or negative). In contrast with the long-term affect phenomena such as moods, emotions are short-lived and their intensity decays over time after the stimuli causing these emotions disappear. When this intensity is below a certain threshold, the corresponding emotion will be set to be inactivated.
As the intensity of an emotion Si may be above the high threshold due to accumulation or below the low threshold due to decay, we employ a function f to modulate it.
where qi is the intensity of emotion Si and δ l and δ h are the low threshold and high threshold of emotion intensity respectively. The decay of emotion intensity differs according to the emotion type and the personality of the robot and it can be simply described by a line with a different slope. We define the emotional decay vector as a static n-dimensional vector: D = [γ1, γ2, …, γn], where n is the total number of emotions.
The activated emotions can influence specific AOs, such as the ones triggering these emotions. The emotions that the robot has are limited and can be described by a set: S={e1, e2, e3, …, en}. Due to the capacity theory of attention [18], the capacity of the attention pool is limited, that is, the maximum number of AOs in the attention pool can be determined and we set it as m. A relationship between an emotion and a certain AO exists. For example, a triggered emotion of fear is related closely with the AO causing this emotion (e.g., an enemy) and one aim is to make the robot avoid such threats. Thus we use a weight ω to describe such a relationship and we get an n×m matrix [19].
Herein W is the weight matrix and ω ij ∈ [0, 1].
For each emotion ei, its influence on AOs can be considered as an influence vector Ii=[ωi1, ωi2,…, ω im ]. If it has no influence on an AO, the corresponding weight ω equals 0. Here is an example of an influence vector: Ifear=[0.05, 0.3, 0.5, 0.1, 0.05], which denotes that the emotion fear influences AO3 and AO2 most and the influence on other AOs can be ignored.
In order to distinguish whether an emotion is active or not, we adopt a dependency vector to describe this, T=[l1,…, lm], li ∈ E{0,1}, where li is the dependency element for emotion ei and the value 1 indicates that the emotion ei is active. If the emotion ei is not triggered or its intensity is below the low threshold δ l , then li equals 0.
Therefore, after taking the influence of emotions into account, the intensity of attentional object AOj can now be expressed by the following formula.
where pt,j is the attentional intensity of AOj before considering the influence of emotions.
3.2. The role of personality in interaction
One of the most widely accepted models of personality is the Five Factor Model (FFM) [20]. We chose to implement two typical dimensions of the FFM for the robot: extraversion and agreeableness. Agreeableness is used to determine whether the robot is willing to interact with others actively and ranges from warmth (willing to interact) to hostility (hates to interact). A warm robot (i.e., agreeable) tends to interact with others actively and is more conversational and polite. A hostile robot tends to avoid such behaviours.
The robot's personality influences the triggering thresholds (i.e., δ) of emotions and can also influence the allocation of attentional resources and the assignment of attentional parameters. We chose the personality of extraversion to determine the modes of allocation of attentional resources and parameter assignment: 1) dominant kind. A robot with this personality will pay prolonged attention to an AO and does not easily lose its interest in the current focus or be interrupted by other stimuli and 2) submissive kind. A robot with this personality tends to lose interest in the current focus easily and thus shifts its attentional focus frequently. Several preset allocation policies of attentional resources have been implemented according to personality.
4. Experiments
4.1. Control of interaction in typical interaction scenario
The experiments are implemented and tested on the interactive heard robot (IHR) [15], which has two cameras and can rotate like a human neck. We implemented the reception service as a typical application for IHR.
Here is a scenario of reception process (see Figure 4). The lab room is empty and the robot is originally in a state of Being Along. When a user came into this room, he was first detected by the perceptual system and the distance from the robot was judged. This processed perceptual information was sent to the attention system to generate a corresponding AO and added into the attention pool after assigning appropriate values to its attentional parameters. Then the process state changed from Being Alone to AO Detected. At this point in time, the focus of attention had not been confirmed yet, since it was a temporal phase. The behavioural instruction based on the current process state was sent by the interaction decision to the behaviour system to generate the corresponding behaviour, i.e., the explore behaviour (see Table 1).
Obviously, this AO became the active AO and the process state of the robot changed to Focus Confirmed. This means that the robot confirmed its current attentional focus and much of its attentional resource would be concentrated on the AO. Under the instruction of the interaction decision, this process state resulted in a series of actions, such as turning the cameras towards the user, etc.
The person went on approaching this robot until the distance was within the threshold, then the intention detection module explained this behaviour as the intention “trying to start an interaction” and sent it to the interaction decision and the process state of the robot changed to Interaction Ready. The robot began to acquire the related contextual information about the user, such as his private information and hobbies, etc. All these help to generate contextual behaviours. Both the interaction information received from the user and the activity performed by the robot cause the process state to change to Active Interaction and then the interaction between the robot and the user would start.
During the interaction, the user could communicate with the robot via speech or GUI and many kinds of behaviours were generated and combined to provide services. When the user intended to end the interaction and left the robot, the process state moved to AO detected. When the user walked out of the room, the process state moved to the initial state, Being Alone. The interaction process of the reception is illustrated in Figure 4 and the detailed information can be seen in Table 2.
4.2. Interaction integrated with emotion and personality
We implemented a subset of the basic emotions: S={happiness, solitude, disgust, fear, anger} and this list can be extended if required. The corresponding emotional trigger for each emotion is implemented accordingly.
We now introduce emotions and personality into the interaction process. Consider the following interaction scenario. There were a user (named AO1), a blue balloon (named AO2) and a green one (named AO3) in the room and the robot was willing to interact with them. The associated weight between the emotion happiness and the user (i.e., AO1) was much greater than the weights between this emotion and other AOs and the same with fear and the enemy (i.e., its related AOs if there are any), solitude and the blue balloon (AO2) and disgust and the green balloon (AO3). In contrast, the associated weights between anger and all the AOs were small and nearly equal (which means a suppression of all the AOs).

Illustration of an interaction process
At first, the robot's initial active emotion was happiness. After a period of time, there was a bigger red balloon (named AO4) in the room, which resulted in the active emotions changing to fear and disgust. We then removed the red balloon and added punishment information to the robot in the later interaction. This punishment made the robot feel anger at all. This interaction process was accompanied by a shift in attentional focus because of the influence of changing emotions and the user's interaction might be ignored if the attentional focus was moved away from him. The user terminated the interaction and left the room after a while and the robot's emotion changed to solitude.
Figure 5 shows the shift of attentional focus and changes in active emotions of the robot. For simplicity, we denote “happiness” by “H”, “fear” by “F” and “disgust” by “D”. The personality of the robot was set as dominant in the experiment, which was characterized by setting a small decay rate for AOs so that the robot could keep its attention on them as long as possible.
In the first period of time (t=[1, 25]), the robot interacted with the user (AO1) and the active emotion was happiness. During this interval, the attentional focus of the robot was on the user and the user could interact with the robot. At the time of t=26s the enemy (AO4) appeared and the robot's active emotions changed to fear and disgust. Such active emotions affected the attentional intensities of AOs by the associated weights and an attention shift occurred. The robot lost interest in continuing to interact with the user. During this period of time, the interaction information from the user, such as the contextual command, conversation information, etc., was ignored by the robot. At the time of t=51s, the enemy left.
At the time t=67s, the punishment information was added to the robot, which released the anger emotion. The triggered emotion anger, in our design, will suppress all the AOs by way of small associated weights. The drop in attentional intensity for the user (AO1) is highest and is above the threshold so an attention shift occurred. Because of the influence of anger, the attentional intensities of all the AOs were small and the robot's attention system could not select the appropriate attentional focus. So in this period of time, the robot lost its attentional focus and made no response to the interaction information from the surroundings.

Attention shift and emotions variant (dominant)

Variant of attention intensity (dominant personality)

Illustration of an affective interaction process

Attention shift and the variant of emotions (submissive)
At the time of t=82s, the user terminated the interaction and left the room and this resulted in the solitude emotion. At the time of t=100s, the whole interaction was terminated. The changes in attentional intensity during this interaction are illustrated in Figure 6 and the whole interaction process is illustrated in Figure 7.
We changed the personality to submissive and tested the interaction in the same situation. The attention shift sequence is illustrated in Figure 8. We found that the robot lost attentional focus on the current AO easily and the emotions influenced the decision making strongly, which resulted in the frequent shift of attentional focus during the interaction process. In a word, even in the same situation, a different personality made the robot exhibit different characteristics, which is consistent with our original intention of endowing the robot with personality. The robot's behaviour also exhibited different personality characteristics during the interactions.
5. Summary and conclusions
In our work, to endow the robot with the ability to be aware of its surroundings and to perform personalized and proactive interactions, context-aware computing is utilized and combined with the interaction decision and the behaviour system. Aiming at the challenges of controlling attention when interacting with the environment, an architectural attentional control based on emotion and personality has been presented, with a focus on shifting attentional focus at the right time, to the right thing and in an appropriate manner.
Emotions and personality are adopted to realize certain affective characteristics for the robot and their influencing mechanism on attention system is explored. To control the whole interaction process and perform well in an interaction, interaction control has been carefully designed. Specifically, a process state-based method is presented to control the interaction process, especially the long time-span interaction. For solving interaction interruptions, a method based on personality and priority is proposed. The integration of an interaction decision with other essential functions has been discussed in detail.
We have presented our implementation of the proposal on the interactive head robot (IHR), along with findings giving insight into how interactions progress and during the interaction, how it runs along with the interplays between its main portions. Furthermore, we hope to extend the testing of our proposal to other robots and to more applications, which will benefit its potential use.
Footnotes
6. Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61142012, 61171141) and the Science and Technology Planning Project of Guangdong Province, China (No. 2012B010600014, 2012B010500025).
