Abstract
The purpose of this research is to develop multi-talented humanoid robots, based on technologies featuring high-computing and control abilities, to perform onstage. It has been a worldwide trend in the last decade to apply robot technologies in theatrical performance. The more robot performers resemble human beings, the easier it becomes for the emotions of audiences to be bonded with robotic performances. Although all kinds of robots can be theatrical performers based on programs, humanoid robots are more advantageous for playing a wider range of characters because of their resemblance to human beings. Thus, developing theatrical humanoid robots is becoming very important in the field of the robot theatre. However, theatrical humanoid robots need to possess the same versatile abilities as their human counterparts, instead of merely posing or performing motion demonstrations onstage, otherwise audiences will easily become bored. The four theatrical robots developed for this research have successfully performed in a public performance and participated in five programs. All of them were approved by most audiences.
1. Introduction
As robots are developed to offer highly integrated and complicated features, their applications are now very diverse. The range of their applications has extended from industrial purposes to fields in our daily lives. The trend of performance robots has emerged in the last decade. Onstage performances by robots, instead of human performers, to entertain audiences have gradually evolved into a new application and onstage robot performances soon became a new form of communication and interaction between human beings and robots. Examples of important onstage performance robots in the last decade include the following: Sony demonstrated QRIO [1] at the end of 2003. This robot can nimbly perform a group Japanese fan dance [2]. By the end of 2004, Toyota launched Partner [3] and had a group of Partners play drums and trumpets [4] in a band. Earlier that year, HRP-2 [5], a robot jointly developed by the University of Tokyo and AIST, publicly performed the Aizu-bandaisan dance with human dancers [6]. In January 2007, the BBC hosted the

Robot Theater from Taiwan©, featuring Thomas and Janet, or humanoid robots in
In February 2009, a musical [18] jointly performed by human performers, EveR-3 [19] and SeRoPi-1 [20], was hosted at the National Theater of Korea, Seoul, Korea. In May of that year, SeRoPi-2 [21] from KITECH performed with human performers at the same location [22]. In the same month,
This study emphasizes only the development of humanoid robots used in theatrical entertainment. To perform in planned programs, theatrical humanoid robots must be designed with relevant performance abilities to please audiences. Robot performances can only be diverse or entertaining if robot performers feature multiple performance talents so that audiences are less likely to feel bored. The objective of this study is to develop such multi-talented, theatrical humanoid robot performers. Thus, the choices of performance sites are less limited and the design of programs can be flexible.
2. Stage program plan
Specifications of the humanoid robot need to be defined in the initial phase of the development and they are derived from possible programs that robots will be performing onstage. To develop programs that can truly entertain audiences, it will require collaborative efforts by experts from both the fields of performing arts and robotics. Thus, several theatrical performance directors from Taipei National University of the Arts were recruited to participate in this study. To present high-tech and entertaining performances, only important and representative technologies, such as image recognition, speech synthesis, real-time motion control and kinematics were selected from a list of highly integrated robot technologies, as the scientific foundations for selecting programs. After careful selection by the research team members with backgrounds in theatrical performing arts and robotics, the possible programs were listed in Table 1. The first six items in Table 1 have been publicly performed in 2008. The third and fourth items were combined into one performance. Designs of robot functions and body parts of robot that can meet the program requirements are detailed in the following sections.
Program plan for NTUST theatric robots
3. Head, facial movement and drama/opera
During all kinds of performances - singing, opera and cross talk in particular - the head appearance and shape of the theatrical humanoid robots have a significant influence on the bonding emotions of audiences. Through facial expressions, a bionic robot head and face can better convey subtle emotional changes to audiences, and life-like humanoid robots can play more human characters in theatres, dramas and operas.
Thomas is used here as an illustration of the head design for a theatrical humanoid robot and the architecture is shown in Fig. 2. Two CCD cameras installed inside the skull are used to capture outside images. A microphone and speaker are equipped to recognize detected sounds and output synthesized voices for the humanoid robots.

System configuration of the Thomas robot
A total of 23 servo motors are deployed to drive corresponding mechanisms to create the motions and facial expressions of the head, face, eyes and neck. Those servo motors are connected to a SSC32 servo controller through its 32 PWM driving channels and the controller is connected to a mini-computer inside the upper chest of the humanoid robot via RS232 interface. The images captured through the cameras are sent back to and processed by the mini-computer through its USB interface. The reason that the system is developed on the PC-based platform is because it offers a strong computing capability and fast image processing speed. The learning barrier for developing such a system is relatively low and the development duration can be reduced.
The head and neck are covered by artificial skin made of polyurethane gel that features uniform material properties. The head is shaped by manual moulding based on the head of a real human being. The reason that the head is made by hand, instead of being created with scan techniques of reverse engineering, is to preserve the subtle defects on the skin's surface and reduce production costs. The 3D model of the skull is constructed through scanning the reproduced mould and further modified in a CAD environment. The skull is finally created with rapid prototyping techniques. This skull development approach can ensure that the hand reproduced facial skin is better aligned with the internal mechanism designed by a CAD application and prevent shape distortion caused by assembling deviation.
The head of the theatrical humanoid robot features the ability to generate at least three kinds of facial expressions, as shown in Fig. 3. Facial expressions are actually deformed facial skin generated through pulling facial control points. This method was first adopted on the expressional mechanism of a face robot developed by Hara and Kobayashi [45] in 1996. Referring to facial anatomy, the locations of control points are determined according to the positions of facial expressional muscles and the directions of their motions, and locations of control points and the endoskeleton are shown in Fig. 4.

Three typical facial expressions on the face of the Thomas robot (without the toupee)

Mechanism of facial expression generation
With this approach, robot heads with different faces can be created based on characters in designed programs. The same mechatronic system architecture can be shared by robot heads with different faces, and except for child characters, this architecture can be applied to robots sharing the same torso. Using this method, the stability can be enhanced, and assembling complexity can be reduced for the mechatronic system when character changes are required. The research team has developed several robot heads for different characters in the play, as shown in Fig. 5.

Different heads of the humanoid for various characters
4. Vision and musical notation reading and singing
To enhance interactivity between onstage theatrical robots and offstage audiences and entertainment sensations, instantaneous notation reading and singing demonstrated by the robots have been added to the program list. In this program, offstage audiences will randomly edit a simplified 16-section score with lyrics, and then both are printed on pre-formatted paper. Simplified here refers to converting musical notation into numerical symbols and if the lyrics are written in Chinese, Pinyin is used for editing them. The score image is captured by one of the cameras inside the eyes in the head of the theatrical humanoid robot and after notations from the image are recognized, the head of the humanoid robot will instantaneously and loudly sing with a voice generated from the voice synthesis process. Logitech QuickCam© MPs are used as the cameras and their stereopsis function is used to compute the distance from the robot to the surrounding environment and can be used in related applications.
4.1 Simplified musical notation reading
Besides the image preprocessing process, the score recognition process also includes position registration, boundary recognition, image distortion correction and background removal processes. The RGB images captured from the view in front of the robot are converted into the HSV images. This is because the score format marked with specific colours and frames is fixed, as shown in Fig. 6, and the image recognition application can easily locate and cut out the desired block that contains notations highlighted with default colours and saturations. The edges of those images cut out are detected [46] to confirm whether boundary frames exist and identify their locations, and the blocks in which numerical symbols exist.

Synthesized voice in Mandarin Chinese
Those blocks in the image are usually tilted from the original score image. By computing the warping parameters of these tilted quadrilateral images [47], these distorted image blocks can be corrected. The blocks where numerical symbols exist are grey-scaled to separate numerical symbols from the background [48] and avoid influences from light and noise sources. The height and width of each symbol and pixel density are checked to determine if they are score numbers or Pinyin characters. Finally, the captured image blocks are calibrated based on music theories and the score symbol blocks along with the Pinyin blocks are passed back to the recognition process to establish the corresponding coding database storing sample images of notation and Pinyin. This recognition approach can achieve a recognition rate of 96% at least and by using this method, theatrical humanoid robots can quickly interpret a score.
4.2 Singing voice synthesis
A number of known music signal synthesis techniques have been introduced and the Harmonic plus Noise Model (HNM) technique is used specifically to synthesize Mandarin singing voices. This method was first proposed by Stylianou [49, 50] and was further adapted in this study. The HNM technique offers better signal clarity and the voices produced sound more natural. More importantly, each of the 408 syllables in Mandarin Chinese only needs to be recorded once. Voice synthesis is the next step after notation and lyric recognition. A desired syllable firstly needs to be divided into a series of sound durations from its composing phones and a piecewise linear time mapping function of the corresponding segment is constructed with reference to the original recordings and sound duration of the phones from the desired syllable. Two sound frames can be located by mapping control points on the timeline of the desired syllable to the recorded syllable and time interpolation is performed for both sound frames to resolve for the HNM parameters. With consistent musical qualities, the syllable pitch-contour is interpolated to acquire HNM parameters for control points. After all HNM parameters of control points related to the desired syllable are determined, the HNM synthesizing equation is reformulated. Finally, syllables of other singing voices are synthesized through modifying the height of pitch-contour for each syllable. The advantage of this approach is that the voice of each syllable only needs to be recorded once to synthesize multiple prosodic features and signal attenuation of the singing voice is not easily discernible by audiences.
5. Arm and human face portrait
To theatrical humanoid robots, dexterous robot arms are indispensable to enhance their performance presentation for audiences. Thus, since 2008, a number of 7-degree-of-freedom (7-DOF) robot arms and 8-DOF robotic palms have been developed based on the size of a human arm to improve defects in traditional robot arms. The new type of robot arms feature a 2-DOF wrist, 2-DOF elbow and 3-DOF shoulder. Each of them is enabled to move like a human arm by seven joints. For the new palm, each finger can bend independently and the bending finger is achieved through an actuator with a linkage. Excluding the thumb, a gripping palm is achieved by using an actuator to control the linkage of each finger. Two actuators are utilized for the thumb bending and folding toward the inner palm, and this design will ensure a firm grasping ability for the palm. Fig. 7 illustrates the robot arm under different statuses and its ability to hold objects of different shapes.

New type of robot arm and its grasping ability
5.1 Human portrait generation system
To add diverse performances for humanoid robots onstage, a human portrait generation system has been developed and the system is integrated with the robot arms to sketch on the spot. Through cameras installed inside the robot head, the face image is captured by the system and converted to a wire-frame style picture for a robot arm to draw. For the robot arm, the simplified portrait can shorten the drawing process and is excellent for stage performance under real-time constraints. Additionally, in the robot drawing system, the pixels of a portrait will be automatically converted into relative coordinates and the actuators of the robot arm are controlled through PI controllers to draw facial portraits and signs in real-time. Only one arm is used during drawing. The space required during drawing is 450mm(x) × 600mm(y) in size and the displacement resolution is 1mm of each coordinate. This resolution is the best for robots to generate accurate and high quality portraits. Additionally, the quality of illumination is very important for the vision of the drawing system. After repeated luminosity detection, image shots under 1∼10 Lux are ideal for the robot arm to draw facial portraits.
Fig. 8 illustrates the process and techniques adopted in the portrait generation system. During the process, the location of a human face is first detected by the face detection system [51] and the captured colour image is then converted into a grey-scale image. The Centre-Off algorithm is then adopted to extract the facial profile and hair line [52, 53] small image piece by piece. Noises and extra unnecessary points are eliminated using the Median method and the Noise-Removal technique. With the above approach, the image of a clean face profile can be generated to improve the quality of overall drawing and for the robot arm to draw more easily. Thus, detail treatments are required to process this acquired face profile image.

Flowchart of the image processing algorithm
With the facial geometric characteristics, the system can identify the locations of facial features based on their ratios on the face and facial feature image processing technology [54] is used to process facial features including the eyebrows, eyes, nose and mouth. For hairs, their range is detected by scanning the original colour images using an image binarization technique, and the primary facial profile and outer hair contours are refined and simplified into small point-based data to reduce the drawing duration. All the coordinate data from the image is passed on to the arm motion control system for drawing the portrait.
2D coordinate data of the captured portrait image is generated after the image processing process. Then 3D coordinates and drawing paths for the robot arm are calculated accordingly. The drawing result by the robot arm is shown in Fig. 9. From this figure, we can compare the outcomes generated during each stage (actual images captured by cameras, wire frame generated by the portrait generation system and actual drawing by a robot arm). The results prove that this real-time portrait generation system can produce simplified contour lines that are similar to actual faces and identifiable images can be successfully drawn by a robot arm.

Drawing of human portrait by the theatric humanoid robot
6. Body, dancing and Marionette operating
The theatrical humanoid robot measures 160cm in height and 72 Kgw in weight. Except for the head and both hands, other parts of the body, along with their joints, need to feature motion abilities. For the humanoid robots that performed onstage in 2008, the motion DOF of each part of their body is listed in Table 2. The most distinguished mechanical features of those robots are as follows: first, their waist DOF allows their upper bodies to roll from left to right. Second, their hip joints with three DOFs allow the robots to look real during imitating human walking. Third, they have the ability to walk a short distance. We also apply this torso design to the newly developed theatrical panda robot, as shown in Fig. 10.
DOFs for the theatric humanoid robot

Theatric robots, Thomas, Janet and Panda
6.1 Control system of the theatrical humanoid
The mechatronic system adopted by the theatrical humanoid robots is developed based on the Digital Signal Processor (DSP) system. The architecture of this mechatronic system, as shown in Fig. 11, includes three layers: the bottom layer (Level 3) is a joint motor motion control module. It features a PID control function, can generate PWM signals and drive joint motors with full-bridge switch amplification circuits to control the speed and position of a robot. TMS320F2812, or a DSP, is adopted as the joint-motor-motion control module to control joints on both robot legs and each module can simultaneously control three sets of joint motors. After the full-bridge switch amplification circuit, signals generated from the three sets of the PWM motors are used to control the three sets of leg motors.

The system architecture diagram of the theatrical humanoid robot
6.2 Robotic motion mapping system
For theatrical performances, humanoid robots need a system to rapidly compile their onstage motions and this compiling process needs to be applicable to similar architectures shared by other robots to eliminate unnecessary processes during their control. Hence, a motion recording mechanism is developed to fulfil this need. Through this mechanism, all human motions can be captured by motion-capture applications and mapped to match corresponding joints to generate motion commands for robots. Humanoid Animation (H-Anim) [55] is adopted in this system as its file format for constructing a 3D human body shape model.
The purpose of using H-Anim is to make the motion animation of 3D humanoid characters look as complicated and flexible as human beings. H-Anim is an ISO standard and if the virtual skeletal frame of a human is recorded in H-Anim compatible format, this virtual human skeletal frame data will be compatible for use in other virtual humanoid model editing applications and can be used in other virtual humanoid models that comply with this standard. As enough joints and DOFs are required to match up to the human skeletal frame in order for the humanoid robot to pose sophisticatedly and as joint deployment needs to be considered, LOA2 (Level of Articulation) of H-Anim is adopted in this study to record human skeletal frame information, as shown in Fig. 12. We further added more input fields to record H-Anim nodes and define Humanoid-Robot-Root nodes and Humanoid-Robot-Joint nodes to record data that are relevant to generating robotic motions.

Structural mapping zone
To ensure that this motion conversion mechanism can be applied to all sorts of bipedal robots and the same set of motion database can still be used even under different conditions, such as limb length, joint deployment and DOF, the Structural Mapping Zone technique was proposed to match the same motion configuration to robots with different virtual skeletal frames. Structural Mapping Zone means all joints of virtual skeletal frames can be grouped into several joint mapping zones. The joint mapping zone which belongs to the virtual human skeletal frame is called the Source Mapping Zone (SMZ) and that which belongs to the virtual humanoid skeletal frame is called the Target Mapping Zone (TMZ). The number of joints could be one or more in every mapping zone and every mapping zone contains joints corresponding to one or more joints of the virtual human skeletal frame or of the virtual humanoid robot skeletal frame. The total joint effect of every mapping zone is the same or similar. In every mapping zone, four specific mapping methods (1–1, 1−n, m−1 and m−n) can be derived to generate the required motion data.
The 1–1 mapping method is described below. This method is applicable when the number of joints in the SMZ equals that in the TMZ, and when each joint in the SMZ has a counterpart of the same DOFs in the TMZ, as shown in Fig. 13. If the terminal joint location is used as the reference for motion similarity, then forward kinematics is used to calculate the terminal joint positions. As the limb length is only one parameter among others in these two mapping zones using this 1–1 method, we first set up the joint angle for all joints in the TMZ to be the same as those in the SMZ and fine tune the joint angles for all joints until they are close to the terminal joint positions in every SMZ.

One-to-one mapping method from the SMZ(A Block) to the TMZ (B Block)
Fig. 14 and Fig. 15 show the marionette operating using the theatrical humanoid robots as they performed onstage in 2008. The dance was performed to music by both robots and human dancers, and the marionette show was performed by two humanoid robots. Robot arms were used to manipulate the marionettes as part of the show in

Performance of

Marionettes manipulated by theatrical humanoid robots
7. Performance evaluation
To assess whether these programs are correctly and appropriately designed, and whether the performance of the theatrical humanoid robots is acceptable to audiences, and if the relevant robot technologies chosen for the performance are up to a high level, a questionnaire was specifically designed in this study to conduct a questionnaire survey. The close-ended design was adopted for all questionnaire items and the survey was conducted online. Questionnaire respondents were required to watch a short version [15] of the humanoid robot performance online instead of attending the actual performance. The advantage of this approach is that the questionnaire can be conducted in a cost and time efficient manner, and is not restricted by time, geographic location or boundary constraints.
The questionnaire was posted in an online community website for a week and over this period 246 questionnaires were collected. After 51 ineffective questionnaires (questionnaires with incomplete basic respondent profiles, unanswered questions and repetitive and contradicting answers) were eliminated, there were 195 effective questionnaires remaining. Respondent gender and age distribution is listed in Table 3. Questions and answers in the questionnaire were divided into four categories: program appropriateness, classic and representative capabilities of the performance robots, technical levels required for the program and development potentials of the theatrical humanoid robots, as shown in Table 4–7. From Table 4, positive responses were received from over 2/3 of respondents for the seven questions, indicating that the program plan co-designed by specialists in the fields of robotics and theatrical performing arts is reasonable and matches the expectations of the audiences, and the whole planned program is appropriate for humanoid performers. Table 5 lists questions designed to assess the view of audiences toward the representative abilities of the theatrical humanoid robots. The red parts indicate that less than half the respondents answered questions in this category with positive replies. This also points out that the research team needs to improve mouth shapes for the robots when they speak or sing onstage and the motion smoothness and speed of the theatrical humanoid robot's body movements.
Respondent gender/age distribution (only valid online-questionnaire samples)
Appropriateness of every program planned for the humanoid robot
Representative capability of the robot performance in every planned program
Technical level of the robot performance in every planned program
Expansibility of the theatric humanoid robot
Similar to assessments on the representative capability reflected in Table 5, more than half of the audiences were not satisfied with the resemblance of the robots to human beings in terms of facial and limb movement, as revealed in Table 6. For motion, sounds, facial expressions and mouth shapes, when robots imitate human performance, audiences deemed that the level of technology maturity is rather low. Even in Table 5, respondents rated highly some of the technologies adopted by robots in the performance, but they were influenced by the psychological factors narrated in
This also revealed that audiences care a great deal about details other than the abilities that the robots possess, such as resemblance of appearances, motions and sounds. Overall, more than half the respondents positively rated the technologies possessed by the robots and only 6.67% of the respondents rated them negatively. In other words, the abilities of the study team to develop technologies for the theatrical humanoid robots and achieve the study objectives were positively approved by the respondents, but the insufficient elements in the study still need to be further enhanced and improved.
Finally, the results from Table 7 have inspired us as to some of the directions of future theatrical humanoid robot research. For the appearance of theatrical humanoid robots, about 2/3 of respondents believe that robots still need to share the same appearances and motions as human beings. For performances, more than 2/3 of respondents point out that robot theatre needs to be performed by both robots and human performers to increase the level of entertainment sensation for the audiences. For commercial development, near 1/2 of respondents indicate that they may consider paying to watch robot performances, but about 1/3 of respondents were undecided about paying for robot entertainment. Those respondents may be potential customers, but this still depends on what kind of entertainment sensations robot performances can induce. This result reveals that robot entertainment may be commercially viable in the future and offer great potential for advanced development.
8. Conclusion
The theatrical humanoid robots developed in this study have participated in a successful public performance in 2008. Most questionnaire respondents approve the entertainment values of the robot performance and are positive about their potential in the future. The important purpose of developing theatrical humanoid robots is for them to integrate into our society and even please human beings. Thus, their interaction with human beings should be enhanced in future applications and program designs. Moreover, before the efficiency and talents of theatrical robots are mature, human performers should participate in actual robot performances as this kind of combination can more effectively entertain audiences. In terms of appearance, facial expressions, motions and sounds, the resemblance of robots to human beings still requires further development to truly meet customer expectations.
Footnotes
9. Acknowledgments
This research was financially funded by the National Science Council of the Republic of China (Taiwan) under grant numbers: NSC 94-2212-E-011-032, NSC 94-2218-E-011-012, NSC 95-2218-E-011-009, and NSC 96-2218-E-011-002. Their support made this research and the outcome possible.
