Abstract
This paper presents an innovative motion system that is used to control the motions and animations of a social robot. The social robot Probo is used to study Human-Robot Interactions (HRI), with a special focus on Robot Assisted Therapy (RAT). When used for therapy it is important that a social robot is able to create an “illusion of life” so as to become a believable character that can communicate with humans. The design of the motion system in this paper is based on insights from the animation industry. It combines operator-controlled animations with low-level autonomous reactions such as attention and emotional state. The motion system has a Combination Engine, which combines motion commands that are triggered by a human operator with motions that originate from different units of the cognitive control architecture of the robot. This results in an interactive robot that seems alive and has a certain degree of “likeability”. The Godspeed Questionnaire Series is used to evaluate the animacy and likeability of the robot in China, Romania and Belgium.
1. Introduction
The social robot, Probo [1] (Figure 1), is one of a range of social robots that are used for Robot Assisted Therapy (RAT), with a special focus on children. Social robots that serve a similar purpose are Paro [2], iCat [3], Keepon [4], Kaspar [5], The Huggable [6] and Nao [7]. The behaviour of these social robots manifests itself through moving body parts. The movements of these parts are a crucial aspect of the social interactions necessary for RAT. Traditionally, feedback loops control the movement of the body parts. For instance, an object tracking behaviour is created by a feedback loop between the estimated object position from a camera and the servos of the robot's head. This results in machine-like behaviour that – in contrast to life-like behaviour – cannot be naturally interpreted. Therefore, different authors argue that principles of classic animation can be used to create an illusion of life with robotic characters [8] [9] [10]. With this “illusion of life” we are talking about the viewers' perception of a character that seems alive and therefore exhibits life-like behaviour in contrast with the machine-like behaviour that is encountered in classic functional robots (e.g., industrial robots).

Outer (Left) and inner (Right) appearance of the social robot, Probo
Shen et al. [11] suggest that a humanoid robot with a good overall perception as a “social entity” or an “illusion of life” may facilitate “engaging” interactions with a human.
Other studies have also shown the importance of life-like behaviour for robots that are used for social therapy with children. Yoshikawa et al. [12] suggest that the context of the interaction (e.g., blinking of the eyes) would improve the perceived responsiveness of a robot. A recent study by Cabibihan et al. [13] states that imitation, eye contact, turn-taking and self-initiation are important target behaviour for autism therapy. The study also states that a social robot can serve as an actor, enacting suitable behaviour in specific social situations to give the child opportunities to learn. This is also in line with the first results from interaction studies performed with the robot, Probo, for autism [14] [15].
The motion system of the huggable robot, Probo, is responsible for creating the smooth and natural motions that are needed to obtain more “engaging” interactions with humans. The system controls all the motions of the robot that are transferred to the motor controllers for each of the robot's Degrees Of Freedom (DOF). The robot has a fully actuated head with 20 DOF capable of communicating emotions and attention via its facial expressions and gaze. Previous studies on the recognition of emotions in Probo's facial expressions showed a recognition rate of 84% [16], making the robot fit for social interactions.
2. Software Architecture
The software architecture of the robot is defined as a modular structure, grouping all the control systems together in a robotic control centre (Figure 2). The presented architecture is implemented in the robot, Probo, but could be implemented in any virtual agent that benefits from the provided functionalities. The modularity in the design of the architecture allows for the use of different ‘building blocks’ or systems. The architecture consists of four important categories: the Perceptual-System, the Cognitive Control-System, the Expressional-System and the Motor-System. The first two systems process the perceptual stimuli (audio, vision and touch) to provide the robot with attention (by gaze) and facial expressions. These systems work semi-autonomously using the perceptual stimuli as an input. For example, the Animation System can be set to track a certain coloured object, a certain face or a directional voice. The Emotional System (subsystem of the Cognitive Control-System) simulates an emotional state for the robot, based on internal needs that are influenced by detected actions. For example, petting the robot will influence his need for affection, which will increase the valence dimension of the emotional state of the robot, changing its facial expression (via the Expressional-System) towards happy. Since the robot will be used for therapy, we need to be able to obtain meaningful social interactions. Therefore, we implemented a system of shared control between the robot's autonomous systems and a human operator. In this way, the operator can control the robot's behaviour and use the robot as an interface. Therefore, the operator needs a motion system that enables him to act as a kind of puppet master. As research and development in robotic systems evolves, the level of autonomy can be gradually increased, providing the operator with higher levels of control over the robot's actions and behaviour, making it easier to control the robot, so more complex interactions will become possible in the future. Each block has its own functionalities and can be used in other robots or agents.

An overview of the software architecture of the social robot, Probo
It is only the Motor Control block that translates the motions on a set of DOF into the control of the maxon motors needed in the specific hardware for Probo. The same output from the Motion Mixer is connected to a virtual model of Probo that is rendered in a 3D environment. In the architecture, this can also be used to control virtual agents, as has been done in the Probogotchi game developed at the VUB robotics lab.
Other software such as Aldebaran's Choregraphe [17] or OpenHRP3 [18] provide similar functionalities. However, their main focus is on kinematics and locomotion, while our architecture allows for a shared control with an operator who focuses on real-time role-playing using gaze and facial expressions to convey attention and emotions. This approach makes it more suitable for human-robot interactions, where the robot needs to act as a social character in face-to-face communication.
To evaluate this motion system, we performed the Godspeed Questionnaire, developed by Bartneck et al. [19]. We agree with their statement that such a standardized measurement tool for HRI studies is necessary to make progress in this field and to be able to compare the results from different studies. The advantage is that the reliability and validity of this questionnaire has already been confirmed. In this paper we focus mostly on animacy and likeability, since they relate to the performance of the motion system.
3. Animating Robots
Believable imaginary creatures with human-like characteristics and smooth, natural motion have been successfully created for many years, starting with the first hand-drawn 2D animation films in the late 1930s and continuing with the more recent successes of 3D animation. The principles employed in creating a successful “illusion of life” have been used in Walt Disney's famous cartoon characters for many decades [20]. While inspiration can be constructively drawn from such principles as to how to apply similar strategies to designing social robots and create the illusion of life and intelligence, the problem for the functional design of the social robot is much more complex, of course, than cartoon characters, as behind each character is a puppet master [21], while a robot must be able to react immediately to events happening in real time and in the environment of the robot. For this reason, we argue that a good robotic control system for RAT should consist of a shared control between operator and (low-level) autonomous software systems.
Some social robots are only able to show a discrete set of facial expressions (e.g., [4] [6] [7] [22]) or move abruptly and unnaturally (e.g., [5] [23]), in contrast to the smooth, elegant motion displayed by humans and animals. An additional technique for creating this “illusion of life” is to implement some form of unpredictability into the motion and behaviour of the robot to make it appear more “natural”. Different authors, such as Scheeff [24] and Takayama [10], discussed how techniques from traditional animation can be used in social robot design. This opinion is also shared by Van Breemen [8], who argues that: “in order to bring robots to life - such that they show behavior that can be naturally understood and anticipated - principles known from the field of character animation should be applied”. Van Breemen [8] and Ribeiro [9] propose applying the 12 principles of traditional animation (adapted from Thomas [20]) to make a robot's behaviour more understandable. We share this point of view and developed a Sequence Editor, inspired from the principles of character animation, to create motion sequences. Other software tools, such as Animation Triggers and the Combination Engine, are developed around this concept to trigger and combine these sequences into believable animations.
3.1 Techniques of Computer Animation
In computer animation, most of the animators manipulate the animation variables (avars) that control the position of a part of an animated object (e.g., a character). Rather than set the variables for every frame, they usually set variables at strategic points (frames) in time and let the computer interpolate or “tween” between them, a process called keyframing. Keyframing puts the control in the hands of the animator, leaving the computer to render the smooth transitions between the control points.
A newer method called motion capture (or performance driven animation) makes use of live action. When computer animation is driven by motion capture, a real performer acts out the scene as if he is the character to be animated. Video cameras and markers are used to record the motions. The recorded performance is then applied to the animated character. This method has also been tested on androids by Ishiguro [25]. Each method has its advantages. While motion capture can reproduce the subtle expressions of a particular actor, keyframing can produce motions that would be difficult or impossible to act out. Another difficulty of motion capture is the equipment and setup that are needed each time new motions need to be captured. A third technique is called Rule-based animation. This technique is very interesting as it is able to create more autonomous systems. However, it requires the development of complex models that can simulate life-like behaviour. Cassel et al. have tested this approach to create a conversational agent [26]. Our motion system is part of the bigger modular software architecture presented in section 2 and receives its inputs from different Expressional Systems. We chose to use the keyframing technique, because it is easy to use for different operators and keyframed motions are easy to combine with other underlying autonomous expressive systems.
3.2 A Novel motion system
The motion system (Figure 3) is composed of different modules. First, the Sequence Editor is used to create motion sequences that can be triggered and combined by the Animation Triggers, including the Animation Player and Animation Keys. The different outputs from these systems are combined via the Combination Engine with the outputs from the direct controllers (Joystick and Slider) and the autonomous controllers (Facial Expressions and Gazing). In this way, creating a shared control between the operator and the autonomous systems reacting to the input stimuli perceived via the robot's sensors. The Motion Mixer provides the operator with an additional control over the different motion outputs of all the motion systems, allowing them to be tested and mixed individually without the Combination Engine. The output of the Motion Mixer holds the normalized positions of the DOFs that are sent to the motors. The Motor Thread will first smooth (using filters) these DOF positions and transform them into motor positions. Next, the Motor Thread will use the tools, provided by the Actuation unit, to send the motor commands over the CAN bus to the EPOS motor controllers.

The architecture of the motion system
3.2.1 The Sequence Editor
The Sequence Editor (see Figure 4) is developed using the same principles that are applied in computer animation software (e.g., Adobe Flash and Autodesk 3ds Max). The Sequence Editor has a timeline that is composed of a sequence of frames. Identical to a video film, each frame can be seen as a still picture and, after pressing play, the frames are shown at a certain frame rate. A linear interpolation is used to fill in all the frames between two keyframes, to achieve a smooth transition.

The GUI of the Sequence Editor
The linear interpolation to calculate the position at a certain time P(t) between two keyframes p1(t1) and p2(t2) on a single DOF track is depicted in Figure 5.

The interpolation that determines a frame's position at a certain time between two keyframes
The editor has a separate timeline (or track) for each of the DOFs. If certain DOFs are not used in the sequence they can be set to non-active, so they will not be taken into account when the sequence is later combined with other motions. A loop can be defined in every sequence by setting a start and end frame. When this loop is active, only the frames between the start and end frame will be played. Clicking on a certain frame in a DOF track will create a keyframe. By default, a Normal Keyframe will be created. The editor supports four types of keyframes:
Each Keyframe can be dragged and dropped or copied and pasted. To create a smooth motion sequence loop the following steps need to be considered. First, Input Keyframes are used before the start frame of a loop, to guide any previous motion smoothly to the Normal Keyframes (start frames of the loop). Using Normal Keyframes inside the loop will override any underlying motion at the same DOF track. To finish, it is best to use Input Keyframes again at the end of the sequence (outside the loop) to return smoothly to any underlying motion that is generated by other sequences or systems. This is depicted with an example of a “Yawn” sequence in Figure 4. The Superposition Keyframe is useful if you want to add a certain motion sequence on top of another motion. For example, if the gaze is directed towards a face, a sequence for nodding “yes” or “no” has to be added to the underlying head motion to maintain the gaze direction. If types of keyframes are used, other than the Normal Keyframe, p1 and/or p2 are replaced (according to the type) with: the current motion value (Input), the keyframe value added with the current motion value (Superposition) and a random value (Random).
Because different motions can be triggered on top of other motions, the keyframe types provide a way of controlling their combination. The first example (Figure 6) uses an input keyframe to ensure a smooth transition when a (higher priority) motion is activated. This is shown going from a triangle wave function towards a constant value (e.g., transition of the head pan from nodding ‘no’ to looking left).

The influence of the input keyframe on underlying motions
The second example (Figure 7) uses superposition keyframes to superimpose a new motion on top of an underlying motion (e.g., nodding ‘yes’ when looking at someone). After using these keyframes to create smooth motion sequences, each sequence needs to be saved for use in the Animation Triggers unit.

The influence of the superposition keyframe on underlying motions
3.2.2 The Animation System
The Animation Triggers unit is used to control the sequences that have been created using the Sequence Editor. The operator is presented with the GUI of the Animation Triggers unit, depicted in Figure 8. This GUI provides the operator with the ability to control the different components contained in this unit:

The GUI of the Animation Triggers
3.2.3 The Combination Engine
Van Breemen states that a mixture of pre-programmed motions and feedback loops is required [8]. The pre-programmed motions (e.g., nodding “yes” or “no”) are designed to make the robot act more communicatively, whereas feedback loops let the robot react to stimuli from the environment. To comply with the design specifications concerning autonomy, we developed the Combination Engine to allow for control to be shared between the operator and the reactive systems of the robot.
The dataflow of the Combination Engine is depicted in Figure 9. Different DOF tracks that are generated from the facial expressions (eye brows, eye lids, ears and mouth), the gaze (eyes and neck) and the joystick control (trunk, mouth opening and different animations and emotions) are joined into one list of temporary DOFs. In the next stage, the DOFs of the eyelids are corrected to keep them relative with respect to the position of the eyes. The sequences that are played by the Animation Player are then combined according to their priority level and their keyframe types. The resulting motion is subsequently combined with the temporary DOF list. If one key is triggered on the keyboard, the corresponding sequence will be combined with the temporary DOF list. If more keys are triggered, the sequences will first be combined with each other, following a priority from left to right, before they are combined with the temporary DOF list. Finally, at random time intervals, a sequence of blinks of the eyes and flapping of the ears is combined with the DOF list. Notice that random intervals receive the lowest priority. For example, when a “surprising animation” is triggered from the Animation Keys, the eyes will not (randomly) close during the time that the animation is triggered. The resulting DOF list is then the output of the Combination Engine that is fed into the Motion Mixer.

The dataflow of the Combination Engine
3.2.4 The Motion Mixer
The Motion Mixer is developed based on the principles of an audio/video mixer. In this way, different motion inputs can be mixed together into one output. The gain of each of the inputs determines its influence on the combined output signal. The Motion Mixer will dynamically create motion channels for each of its inputs. All the inputs are provided with a slider to control their gain. Each of the active DOF tracks of all the inputs is added together according to their gain. The mean value is subsequently set as the output of the mixer (see Equation 1). The GUI for the Motion Mixer is depicted in Figure 10. The Motion Mixer produces the output DOFs that are updated continuously.

The GUI of the Motion Mixer
3.2.5 The Motor Control
The Motor Thread unit is responsible for transforming the normalized DOF values into motor positions. First, the DOFProboFilter is applied to ensure smooth motion. All the (normalized) DOFs are subsequently converted into motor positions (Mapping) and are put on the PositionQueue. The PositionQueue is the buffer serving the MotorWorkerThread. The requested motor positions are compared with the actual motor positions. If there is a difference, the velocity required to reach the requested position is calculated. Finally, the new target motor positions are sent to the EPOS motor controllers. The DOFProboFilter contains all the filters for each of the DOFs. All the filters are low-pass filters that are tuned to provide smooth, natural motion. Each filter consists of a cascade of first order software low-pass filters as depicted in Figure 11. Different α values are used taking into account the motor transmission and the body part that needs to be actuated.

A cascade of first order software low-pass filters
By using compliant actuators with soft and flexible materials, Probo becomes more huggable and softer in behaviour. Traditional actuators as electrical drives with gearboxes are unsuitable to use in Probo because they are stiff, giving an unsafe behaviour and an unnatural hard touch. Two special actuation systems are introduced in the actuation layer to comply with the hardware design specifications [27]. In both actuators the flexible element plays an essential role since it decouples the inertia of the colliding link from the rest of the robot, reducing the potential damage during impact.
4. Evaluating Animated Robots
Different studies have been performed to explore users' perceptions of robots. Castellano et al. [28] have used the detection of visual cues (looking at the robot and smiling) to measure the user's engagement. Riek et al. [29] have used the Interactant Satisfaction Survey (adapted from Kang SH, 2008) in combination with users' gesture analysis and in-depth interviews. Saygin et al. [30] took a more objective approach, interpreting neurologically active areas on subjects' brain scans in order to explain the concept of the uncanny valley. Delaunay et al. [31] assess the user's ability to read gaze direction providing interesting results for joint attention research in HRI studies. For RAT, it is very important for the robot to be a believable character for its role in social storytelling. It is very difficult to measure a robot's “aliveness” or “likeability”. A standardized test, named the Godspeed Questionnaire Series [19], has been developed to assess these parameters. This test was chosen as the best way to evaluate the user's perception of our robot's animation abilities. Some previous studies have already used the Godspeed Questionnaire Series to evaluate the perception of social robots [32] [33] [34].
4.1 Evaluation with Godspeed Questionnaire Series
Our studies were performed in three different countries: China-CH (see Figure 12), Romania-RO and Belgium-BE. The advantage of performing such experiments outside the lab is that the subjects do not necessarily have an interest in robotics, are of a wide range of ages and are of both genders. Children younger than 16 were excluded from the test since we considered that the terminology used would be too difficult for them to understand. The experimental setup was based on the habituation phase that is frequently used in RAT [15]. Before the intervention starts, the patient will undergo a habituation phase in order to familiarize themselves with the robot and accompanying therapist. A similar Wizard of Oz interaction was presented with the public. This presentation includes basic social interactions. The robot says hello, reacts to questions from the public by nodding yes or no, reacts with winks of the eye, and shows emotions and other animations depending on the context, while internal processes make the robot look at the audience's faces, blink its eyes and flap its ears. The robot performs its motions according to a control shared with an operator. The Random Generator was used to make the eyes blink and the ears flap at random but realistic intervals. The Gazing Unit controlled the position of the eyes and neck. This unit places the point of attention on faces detected by the camera image or the sound localization detected by a microphone array. From the Joystick Control, the human operator was able to select the facial expressions (using the emotional vector [16]), move the trunk and trigger animations (i.e., winks of the eye, falling asleep, nodding yes/no or looking pathetic). All these possibilities have overlapping degrees of freedom that are combined in the Combination Engine. The human operator was able to react to events happening in the audience using an Xbox 360™ controller. This allowed the robot, for example, to smile and move the trunk if someone waved to the robot.

Probo at the Belgian Pavilion at the World Expo in Shanghai, China
Random subjects visiting the stand were given a paper version of the Godspeed Questionnaire and asked to fill it in. The short time required to fill it in made the number of people refusing to participate very low. The survey in China was in Chinese, in Romania in English and in Belgium in Dutch. Translations in English, Japanese, Chinese, Spanish and Dutch can be found at a website maintained by Bartneck. Animacy (Godspeed II) is measured on a five-point scale with the potential responses: dead - alive, stagnant - lively, mechanical -organic, artificial - lifelike, inert - interactive, apathetic -responsive. For likeability (Godspeed III), they were: dislike - like, unfriendly - friendly, unkind - kind, unpleasant - pleasant, awful - nice. We performed the survey with Probo in three different contexts.
Table 1 gives some data about the number of participants and their age. We tested the internal consistency of the questionnaire by using the total Cronbach alpha (Godspeed I-V). The alpha values are well above 0.7 and hence we can conclude that it has sufficient internal consistency.
Data about the participants
4.2 The Results
The general results of the questionnaire are presented in Table 2. From the results for animacy (total mean = 3.6/5), depicted as a boxplot in Figure 13, it can be seen that the majority of the subjects judged that the robot is perceived as “alive”. The lowest score in all the countries was given on the organic – mechanical scale, but still favouring the perception of organic motions.

Total values for animacy
Average values of the Godspeed Questionnaire
Probo attained higher scores for likeability (total mean = 4.3/5), as can be seen in Figure 14. In the design phase it was decided not to build highly human-like androids, since they would not be liked as much as more machine-like robots, as shown in a study by Bartneck et al. [35]. Probo's appearance is that of an imaginary creature based on the appearance of ancient mammoths. According to the classification suggested in [36] and [37], Probo's morphology can be defined as caricatured-zoomorphic. The decision not to build a humanlike robot was based on the assumption that the users would not have expectations about the robot's abilities and thereby a unique character identity could be created more easily. An interpretation of the high scores could be that there is no contradiction between the user's expectations and the expressions shown by the robot.

Total values for likeability
5. Conclusions and Future Work
Tools inspired by the creators of computer animations for creating life-like motion for social robots have been implemented and evaluated with the robot, Probo. A human operator can use this toolkit to create interactive stories that are especially useful for RAT and HRI studies. The toolset provides the ability to create keyframed motion sequences that are the building blocks of larger interactive animations. A GUI is provided to manage these motions that are combined by the Combination Engine with the motions originating from the autonomous systems. The motion system provides a way of performing social interactions. However, all the animations must be pre-programmed and have to be triggered through the GUI by an operator during the session.
In the context of RAT, most of the tasks are very repetitive and have a closed format, but in, as yet unforeseen, situations, this system has its limits. A performance based system using motion capture - such as that used by Ishiguro [25] - would be very interesting. The operator could become an actor, playing the role of the virtual character instead of controlling the character as an animator, allowing the operator to react faster to unforeseen situations. The immediate benefit of such a Wizard of Oz is that this allows for there to be a focus on (social) interactions without the actual need to implement sophisticated behaviours in the robot. In the long-term, however, there is a need for therapeutic robots that only need to be controlled in a minimal, high level way and do not need an operator to act out the actions. The development of more substantial levels of autonomy would allow the robot to adapt to the individual needs of children over longer periods of time (while remaining under the ultimate supervision of a therapist) [38]. The system presented in this paper supports this vision, where the combination of motions of different subsystems can be mixed into a lifelike motion. The modular architecture allows the control systems to become gradually more autonomous.
The Godspeed Questionnaire Series offers a good solution for the standardized testing of cross-cultural users' perception of robots. Nonetheless, there are still few studies found in the literature that use these questionnaires in such a way that they can be used to compare different robots. Another good alternative, suggested by Ho and Macdorman [39], combines the concept of Mori's Uncanny Valley [40] to present new Godspeed indices. We strongly encourage the development and validation of tools that try to measure user experiences. We acknowledge the limitations of the Godspeed Questionnaire Series and suggest that in-depth interviews and co-creation with therapists are also important methods for designing social robots for RAT. The presented motion system will be further tested in RAT sessions. After these sessions, the engineers, designers and psychologists will participate in brainstorming and co-creation sessions to redesign some of the modules in order to improve their usability. After this iteration, new validations can be performed during new trials of RAT sessions. Because these tests can be classified as real-world HRI, the validation will take into account previous studies by Sabanovic [41] and Burke [42]. Both studies provide interesting guidelines on HRI evaluation in the real world.
Studies performed using the Godspeed Questionnaire, where the animacy and likeability of the robot were studied, showed very positive results. They showed that the sharing of the control between motions from the robot's cognitive units (face detection, sound localization, eye blinking) and the motions triggered by the operator produced the subjective impression that the robot, Probo, was “alive” (animacy high) and that the subjects had a positive impression (likeability high). No significant differences between subjects in China, Romania and Belgium were found. In the future it would be interesting to incorporate the Negative Attitudes Toward Robots Scale (NARS) developed by Nomura et al. [43], because recent work by Syrdal et al. [44] suggests that this measure for assessing people's prior attitudes towards robots could influence how they might evaluate a robot.
