Abstract
Frontline services involving high social-emotional complexity remain a challenge for artificial intelligence, which lacks the empathetic and communicative capabilities these contexts demand. Hence, this article investigates how frontline services can be redesigned to address high social-emotional complexity through Multi-Human/Multi-Avatar (MH/MA) systems. The article addresses the central research question: How can AI’s limitations in delivering frontline services with high socio-emotional complexity be overcome through the implementation of MH/MA systems? It aims to examine how avatar robots can enhance inclusivity and emotional quality in service delivery. Using a qualitative case study of the DAWN Avatar Robot Cafe in Tokyo, Japan, we employed semi-structured interviews with 7 robotics experts and 17 customers, supplemented by direct observation and organizational documents. Our findings reveal that MH/MA architectures, where avatar robots are remotely operated by individuals with physical disabilities, effectively improve inclusivity and emotional engagement in service delivery. The DAWN Cafe is as a good example where socially excluded individuals contribute to frontline services by remotely operating OriHime and OriHime-D robots. This case validates a conceptual framework that extends human-robot interaction models by introducing MH/MA systems as an alternative to AI-driven robotics. The study offers new and original insights into how inclusive service models can leverage avatars to overcome the current limitations of AI in emotionally demanding contexts. Future research should explore MH/MA+AI hybrids, develop targeted training for operators and staff, and examine long-term impacts on both service quality and social inclusion.
Keywords
Introduction
In recent years, Artificial Intelligence (AI) has introduced significant innovation into the global service industry. AI systems excel in executing routine, rule-based, and analytical tasks (Plathottam et al., 2023), however, they remain fundamentally limited in contexts requiring high levels of social and emotional interaction (Huang & Rust, 2018). These limitations are particularly evident in frontline services, where human empathy, communication, and adaptability are essential to customer satisfaction and service quality. Despite these advancements in “feeling AI” and empathetic machine learning models, current AI technologies are unable to adequately replicate the socio-emotional intelligence embedded in human-to-human service interactions (Bock et al., 2020; Chi et al., 2020; Reis et al., 2020; Rosete et al., 2020).
This gap has led to alternative innovations that bypass rather than replace human emotional input. One such innovation is the use of avatar robots, physical robotic interfaces remotely operated by humans, which enable individuals, including those with severe mobility impairments, to engage in meaningful social and professional roles in the service sector. These systems, referred to as Multi-Human/Multi-Avatar (MH/MA) architectures, offer a promising response to both the technological limitations of AI and the need for greater inclusivity in labor participation.
However, academic literature lacks comprehensive conceptual frameworks and empirical analyses that address MH/MA systems in service delivery. Most research has focused either on AI-human interaction or on robotic automation, overlooking hybrid models where human agency is mediated through technological embodiments. Furthermore, although countries like Japan have pioneered real-world implementations of avatar-mediated services, these cases have not been systematically examined within the broader discourse on AI limitations and inclusive service design. Hence, to address this gap in the literature, we focused on pioneering Japanese companies that have implemented Robot Avatars (Ishiguro, 2021). These avatars provide personalized and empathetic services that current AI cannot achieve.
To ground our investigation, we conducted a targeted bibliometric overview using Elsevier Scopus (June 2024) for documents including “Avatar” and “Robot*” in their title, abstract, or keywords. The results revealed a marked increase in interest over the past two decades, with Japan leading global publication output. This trend shows the relevance of Japan as a context for examining the deployment of MH/MA systems in service environments. Moreover, the results indicate an increase in publications since 2006, with conference papers constituting the majority at 56%, followed by journal articles at 30%. Geographically, Japan leads in the number of publications with 233, followed by the United States with 206, and Germany with 105. This trend underlines the growing interest in and research activity in connecting avatars and robotics, particularly in these leading countries.
This study addresses the following question: How can AI’s limitations in delivering frontline services with high socio-emotional complexity be overcome through the implementation of MH/MA systems?
Accordingly, this article aims to develop and empirically validate a conceptual framework that explains how avatar-mediated service models can expand the frontier of inclusive and emotionally intelligent frontline services. Using a qualitative case study of the DAWN Avatar Robot Cafe in Tokyo, we explore the architectural, operational, and social dimensions of MH/MA service systems, offering theoretical and practical insights into their potential and limitations. We conclude our work with suggestions for future research, including the analysis of customer experiences and satisfaction, the development of training programs for MH/MA systems, and the assessment of long-term impacts on various stakeholders.
This article follows a structure widely accepted in the academic community, organizing the content into the following sections: Introduction, Literature Review, Methodology, Results, and Conclusion. We begin by outlining the research objectives and the research question. Next, we conduct a thorough analysis of the existing literature, which forms the basis for our theoretical framework. The methodology section provides a detailed description of the methods used, ensuring clarity for the reader. We then present the research results, including the empirical validation of the conceptual framework through the analysis of a real-life case. Finally, in the conclusion section, we highlight the contributions of the study, examining its originality and novel additions to existing theory. Additionally, we discuss the practical implications, limitations, and potential avenues for future research.
Literature Review
As previously noted, despite the progress in AI, its application in frontline services remains limited due to its inability to replicate human empathy, emotional intelligence, and adaptability. These limitations are particularly relevant in-service environments that require rich social and emotional interactions, where AI systems, despite recent advances in “feeling AI,” still fall short. In contrast to the bibliometric analysis presented in the introduction, which highlights a sharp rise in academic interest, particularly in Japan, this growing attention has yet to produce clear answers to critical questions facing the service industry. This gap forms the basis for our study’s objective, which is to develop and empirically validate a conceptual framework that explains how MH/MA systems can address the socio-emotional limitations of AI while promoting inclusivity in frontline service environments. To that end, the following literature review lays the foundation for this investigation. Overall, our research explores how the implementation of MH/MA systems can overcome AI’s limitations in delivering frontline services that require high levels of emotional and social complexity.
While much of the literature has focused on AI-human interaction and robotic automation, there remains a lack of conceptual frameworks and empirical studies exploring hybrid service models in which human agency is mediated through avatars. In particular, Multi-Human/Multi-Avatar (MH/MA) systems, where remote human operators control avatar robots, have not been systematically examined as a viable alternative to AI in emotionally complex frontline services.
In the context of human-robot relationships, Reis (2024) introduces a conceptual framework for human-human, human-robot, and multi-robot relations within the service industry. This model details the dynamics of collaboration across these three relationship types. In this regard, Figure 1 emphasizes three primary levels of relationships. Firstly, Human-Human Relations (HHR) focuses on traditional dynamics where humans share task-related information through physical interaction. Secondly, Human-Robot Teams (HRT) encompass various forms of relationships, including Human-Robot Collaboration (HRC), Human-Robot Interaction (HRI), and Human-Robot Interaction and Collaboration (HRI-C). HRI includes cooperation, collaboration, and coexistence, with coexistence involving minimal interaction aimed at preventing impediments and collisions. HRC emphasizes collaborative robots (i.e., cobots) that work alongside humans, adjusting their actions in real time based on human movements. Lastly, Multi-Robot Systems (MRS) comprise Robot-Robot Interaction (RRI), Robot-Robot Collaboration (RRC), and Robot-Robot Interaction and Collaboration (RRI-C), where robots interact and collaborate to perform tasks more efficiently and reliably.

Conceptual framework for human-human, human-robot, and multi-robot relations in the service industry context.
The conceptual framework presents a multilevel structure to categorize existing relationships in the service industry. Level 1 involves strictly human relationships, Level 2 encompasses human-robot relationships, and Level 3 comprises robotic systems working jointly. The author examines the case of Henn na Cafe in Japan to illustrate the practical application of this theoretical framework. Known for its pioneering use of robotics, Henn na Cafe demonstrates how robotic systems can enhance operational efficiency and customer engagement. The findings indicate that current implementations at Henn na Cafe primarily employ Human-Robot Interaction and Collaboration (HRI-C). The study anticipates that service companies will further integrate AI into their operations, potentially adopting advanced Human-Robot Interaction and Collaboration (HRI-C), Robot-Robot Interaction and Collaboration (RRI-C), or Multi-human/Multi-Robot Systems to improve service delivery and efficiency.
Overall, Reis (2024) provides a three-level framework for analyzing human-human, human-robot, and robot-robot relationships. The model does not capture emergent service architectures where human empathy is preserved through avatar mediation rather than replaced by automation. This reflects the relevance of our study, since it addresses this theoretical omission by extending the model to a fourth level, Multi-Human/Multi-Avatar (MH/MA) systems, where multiple remote human operators engage in emotionally complex service delivery through robotic avatars. Moreover, this configuration is not only conceptually distinct but has been realized in practice at the DAWN Avatar Robot Cafe in Tokyo. As such, our framework builds on Reis’ structure but introduces a new category that bridges the gap between full automation and human empathy. This extension is empirically validated in our findings and illustrated in the updated framework (Figure 1).
Huang and Rust (2018) propose a framework categorizing intelligence in the service industry into four types: mechanical, analytical, intuitive, and empathetic. They introduce a model illustrating how AI gradually replaces human workers across these intelligence types through five phases. This phased progression reflects the evolving capabilities of AI, beginning with mechanical tasks and potentially culminating in the replacement of empathetic tasks. The study underlines the shifting landscape of job requirements and emphasizes the necessity for workers to adapt by acquiring higher-level skills that are more challenging for AI to replicate. The ultimate theoretical endpoint is the complete replacement of all professional categories by AI.
This sequential replacement of jobs by AI starts with the most routine and mechanical tasks, advancing toward those necessitating higher levels of emotional intelligence. In this context, Reis et al. (2020) reveal that in environments with high customer interaction, service robots outperform humans in executing standardized tasks due to their mechanical and analytical capabilities. However, Reis et al. (2020) also acknowledge that, in some instances, service robots have not yet achieved the technological maturity required to replace humans efficiently. This indicates that while current technology is not fully capable, future advancements in AI-enabled robotic technologies will likely enable the replacement of tasks requiring empathetic intelligence.
A year later, building on the theoretical frameworks proposed by Huang and Rust (2018) and Huang et al. (2019), Huang and Rust (2021) introduced a conceptual model delineating three distinct types of AI (Figure 2): mechanical, thinking, and feeling AI. This tripartite classification suggests that different AIs are tailored to perform specialized tasks within customer engagement services. Specifically, mechanical AI is optimized for simple, standardized, repetitive, and routine operations. In contrast, thinking AI is designed to handle complex, systematic, rules-based, and well-defined tasks. Lastly, feeling AI is developed for tasks requiring social, emotional, communicative, and interactive competencies. According to Huang and Rust (2021), progression in AI development is cumulative. As AI technology advances to higher levels of sophistication, it inherently retains the capabilities of lower-level intelligence.

Three AI and their benefits to service.
Hence, Huang and Rust’s (2021) conceptual theoretical model offers a framework for analyzing the progression of AI developments, including advancements to higher levels of intelligence. This model (Figure 2) also paves the way for training AI robots to perform tasks requiring high socio-emotional complexity, typically better executed by humans today. Since it is not yet feasible to implement RRI-C systems at level 3 (Figure 1), we have identified an alternative for multi-human/multi-robot (MH/MR) systems. In this alternative, robots function as avatars instead of being enabled with AI. Thus, we refer to these as MH/MA (multi-human/multi-avatar) systems instead of the traditional MH/MR or MH/MR+AI systems (Figure 3). Our empirical observations at the DAWN Cafe confirm the absence of actual “feeling AI” in practice. Mechanical AI functions, such as repetitive motion and basic service tasks, are visible in the use of industrial robots like Nextage. Thinking AI capabilities, involving rule-based coordination and task management, remain partially supported by the robots’ operating systems but are limited in scope. However, tasks requiring empathy, adaptive social behavior, and nuanced emotional communication, those aligned with “feeling AI,” are not performed by any autonomous system at the DAWN Cafe. Instead, these tasks are carried out by remote human pilots operating OriHime and OriHime-D robots. In this sense, the MH/MA model represents a human-mediated workaround to the current technological gap in emotional intelligence, enabling high-quality service interactions without requiring emotionally capable AI. Thus, the DAWN Cafe reflects the limitations of Huang and Rust’s model in real-world service settings and illustrates an alternative trajectory where human empathy is scaled through avatar mediation rather than artificial replication.

Conceptual framework for frontline services. From single human to intelligent multi-robot systems (IMRS).
Takeuchi et al. (2020) provide a compelling analysis of MH/MA by examining “avatar work” in teleworking. This innovative approach enables individuals with severely reduced mobility to engage in complex physical tasks, such as customer service, which require high levels of socio-emotional interaction. Beyond its commercial applications, MH/MA fosters a more inclusive society. In avatar work, operators can remotely perform physical tasks by controlling an “OriHime-D” robot using either a mouse or gaze, tailored to their specific disabilities. To promote social integration, the research team conducted a 2-week trial of an avatar robot cafe, evaluating the remote employment of people with disabilities using the OriHime-D. Interviews with 10 participants revealed that avatar work contributes to mental fulfillment for individuals with disabilities and can be adapted to accommodate varying workloads, confirming its potential for enhancing inclusivity and well-being. In the same vein, Hatada et al. (2024) analyzed to determine if robotic avatars and telepresence technology enable individuals with disabilities to perform physical work. The researchers highlight that despite the growing interest in the metaverse, there is a scarcity of studies investigating the use of avatars and virtual environments by people with disabilities. In their study, seven participants with disabilities worked in a cafe where remote customer service was provided via robotic avatars. These participants were involved in developing and using personalized virtual avatars, which were displayed on large screens in combination with robots in physical spaces, thereby creating a hybrid cyber-physical environment. This innovative approach aimed to enhance the inclusivity and practicality of telepresence technologies for individuals with disabilities. Dafarra et al. (2024) recently introduced an advanced avatar system designed to enhance the integration of humanoid robots by human operators. Their work offers two primary contributions. Firstly, the authors demonstrate the iCub3 humanoid robot, which incorporates significant advancements accumulated over approximately 15 years of development, positioning it as a state-of-the-art robotic avatar. Secondly, they present a highly adaptable avatar system enabling humans to embody humanoid robots effectively. This system supports a wide range of functionalities, including locomotion, manipulation, voice, and facial expressions, while providing comprehensive sensory feedback across visual, auditory, tactile, weight, and touch modalities.
The above examples illustrate that Level 4 MH/MA (Multi-Human/Multi-Avatar) systems (Figure 3) are currently under development and hold substantial promise, particularly in developed countries (e.g., Japan, USA, Germany), where there is a higher acceptance and integration of such technologies in service sectors. This ongoing development reflects a broader trend toward more sophisticated human-robot interactions and collaborations, emphasizing the potential for robots to significantly augment human capabilities in various service-oriented tasks. As these technologies evolve, they are likely to become increasingly integrated into everyday service environments, driving efficiency and enhancing user experiences.
Taken together, the frameworks by Huang and Rust (2018, 2021) and Reis (2024) illustrate two distinct yet incomplete approaches to AI and robotics in service design. Huang and Rust propose a technologically deterministic trajectory in which AI evolves through successive stages, from mechanical to thinking to feeling intelligence, implying that full automation of human emotional labor is both inevitable and desirable. By contrast, Reis offers a more interactional view, focusing on how humans and robots collaborate within different service configurations, without assuming that AI will acquire emotional competence. However, both models overlook a critical middle ground: systems where emotional labor is not simulated by AI but mediated by remote human operators through robotic avatars. Again, this oversight reveals a conceptual and empirical gap in the literature. Huang and Rust focus on AI substitution and Reis on interaction structure; neither accounts for MH/MA systems that scale human empathy via technology. Our study addresses this omission by introducing and empirically validating a conceptual framework that integrates this missing category, an architecture capable of delivering emotionally intelligent service without relying on emotionally intelligent machines. This strand converges in the empirical case of the DAWN Cafe, which demonstrates the real-world relevance of MH/MA systems and offers a testing ground for the framework proposed here. In the following paragraphs, we advance this conceptual framework, extending and adapting Reis (2024), to incorporate MH/MA systems as a fourth service architecture. This addition is not only structural, but theoretical, as it repositions the debate from AI capacity to human inclusion and from automation to mediated empathy.
Therefore, the conceptual framework presented below (Figure 3) summarizes the preceding discussion and presents frontline services across five distinct levels. Furthermore, the figure is also an advanced and comprehensive adaptation of the Reis (2024) model.
Level 1 of Figure 3 exclusively represents service delivery by individual human personnel, who excel in tasks requiring high social-emotional intelligence (Tajeddini et al., 2020). However, humans are less efficient than machines at performing standardized and routine tasks (Nardo et al., 2020). To ensure effective teamwork and meet customer needs and expectations, it is essential to develop group work and leadership skills for proper team interaction (Moldoveanu & Narayandas, 2019). This level diverges from the original model (Reis, 2024), by incorporating service delivery by individual humans, rather than solely by teams.
At Level 2, service delivery is conducted by teams of humans and robots, allowing for the development of collaboration, interaction, or both. Human-robot Interaction and Collaboration (HRIC) encompasses cooperation, collaboration, and coexistence (Galin & Meshcheryakov, 2020; Ore et al., 2020). In this context, coexistence involves minimal interaction, such as measures to prevent collisions (Magrini et al., 2020). Cooperation primarily entails individuals working collectively to produce a final product, with each participant often completing their assigned tasks independently before integrating their results. Conversely, collaboration involves a more complex process, requiring the exchange of knowledge and continuous interaction among participants to achieve a common goal.
Level 3 involves Multi-robot Systems (MRS), which are divided into three sublevels. To explain Robot-Robot Interaction (RRI), it is helpful to draw a parallel with human-human interaction, where humans can work as a team with robots and share tasks (Wolf & Stock-Homburg, 2023). Robot-robot Collaboration (RRC) involves complex relations in which robots perform well-defined tasks to achieve successful operations. This form of collaboration is typically highly efficient, as it eliminates human fatigue. RRI-C (Robot-robot Interaction and Collaboration) refers to collaborative efforts and interactions between two or more robots working together to achieve a common goal or successfully perform a specific operation. This form of collaborative interaction involves communication, coordination, and joint action between multiple robots. The collaborative approach in RRI-C leverages each robot’s strengths, resulting in greater efficiency across various work environments. Essentially, RRI-C shows the ability of robots to work seamlessly, demonstrating accuracy and consistency, while utilizing their unique capabilities to perform complex tasks or operations as needed.
Level 4 is divided into two sublevels: multi-human/multi-avatar (MH/MA) and multi-human/multi-robot with artificial intelligence (MH/MR+AI). The MH/MA sublevel involves humans operating avatar robots, which are typically used by individuals with reduced mobility (Bremner et al., 2016). These avatars possess advanced intelligence capabilities, surpassing the limited interaction capacities of traditional robots. A more detailed analysis of this case will be presented in the results section. In the MH/MR+AI sublevel, teams of humans and robots can be augmented by AI to facilitate large-scale parallel operations. These systems demonstrate significant potential as they enable effective human-robot teamwork. Various configurations of this teamwork exist, including single-human–multi-robot, multi-human–single-robot, or multi-human–multi-robot systems (Dahiya et al., 2023). Nevertheless, a significant challenge lies in the difficulty humans encounter when supervising and interacting with multi-robot systems. To address this, hierarchical functions can be implemented to distribute critical tasks within the system. With the incorporation of AI, machines attain advanced capabilities, enabling them to operate with higher intelligence and thereby better meet customer needs and expectations.
Level 5 exclusively encompasses Intelligent Robots operating independently (ISR; Lee et al., 2008) or within Multi-robot Systems (IMRS; Choi & Kim, 2021). These robots must possess a degree of intelligence enabled by AI due to their role in delivering customer service. AI enhances the robots’ ability to interact meaningfully with customers, ensuring effective and satisfactory service delivery.
Overall, the conceptual framework developed above organizes frontline service configurations into five progressive levels. Transitions across these levels are driven by three key mechanisms: (a) the degree of technological augmentation (from no-tech to full AI autonomy); (b) the shifting distribution of agency between humans and machines (from human-centric to robot-led systems); and (c) the level of socio-emotional complexity the system can manage. Each transition, from individual human service (Level 1) to human-robot teams (Level 2), to multi-robot systems (Level 3), to hybrid MH/MA and MH/MR+AI systems (Level 4), and finally to fully autonomous intelligent robot systems (Level 5), shows an increase in automation, coordination complexity, and emotional abstraction. Importantly, in Level 4, we introduce an alternative trajectory that leverages human empathy via avatars rather than attempting to replicate it through AI. In our view, this shows a relevant divergence in the evolution of service automation.
Methodology
This study employs a qualitative case study research approach, guided by the methodological framework of Robert Yin (2009). We selected this strategy to investigate and describe a real-life phenomenon that is still little explored. Our case study focuses on the DAWN (Diverse Avatar Working Network) cafe, which offers specific details and facilitates an in-depth exploration of the phenomenon, allowing for partial validation of our conceptual framework derived from the literature review.
Our study comprises three distinct phases: exploratory, analytical, and conclusive. The exploratory phase began with a comprehensive search for scientific literature, primarily focusing on articles indexed in Elsevier Scopus as of June 2024. We used keywords such as “Avatar” and “Robot*” in titles, abstracts, and keywords. Significant contributions to the literature include studies by Takeuchi et al. (2020) and Hatada et al. (2024), which specifically examine the Avatar Robot Cafe, aiming to elucidate the performance of avatars in service delivery. Our research distinguishes itself by presenting a generalizable conceptual framework and providing empirical validation at level 4 of this framework.
The exploratory phase also involved planning and preparing the empirical component of the study. This included developing interview protocols and securing informed consent from participants. For data collection, we conducted interviews with academics specializing in robotics, capable of analyzing the phenomenon from diverse perspectives, and with customers of the Avatar Robot Cafe to gather firsthand testimonies. We interviewed 7 experts and 17 customers, with customer interviews lasting approximately 20 min each and expert interviews ranging from 40 to 60 min. All interviews were transcribed, and informed consent was obtained from all participants through signed statements. To ensure data triangulation and validate the primary data collected, we analyzed the organization’s official documentation available on its website (https://dawn2021.orylab.com/en/). Additionally, field notes were taken during interviews, including observations of interviewees’ behavior and informal conversations.
Experts were selected through purposive sampling based on their publication record, their academic profile, and their professional engagement in human-robot interaction or accessibility-focused design. We prioritized individuals with demonstrated expertise in analyzing robotics from technical and social perspectives. Customers were recruited through convenience sampling at the DAWN cafe site in Tokyo and online outreach facilitated by the organization. Inclusion criteria required participants to have experienced service at the Avatar Robot Cafe and to be willing to share insights into their experience. To enhance diversity, we aimed for variation in gender, nationality, and familiarity with robotics. The customer group included 9 females and 8 males, aged between 22 and 67 years, with diverse nationalities, 10 international visitors (primarily from Europe and North America), and 7 Japanese residents. Most customers had at least some familiarity with robotics, either through education, work, or prior exposure to robot cafes. Among the experts, five were university-affiliated researchers in human-robot interaction, one was a senior engineer at a robotics firm, and one worked in accessibility-focused design.
It is also important to note that cultural factors, particularly Japan’s relatively high social acceptance of robots in daily life. This may influence the implementation and reception of MH/MA systems. Japan’s unique sociotechnical environment, which is characterized by strong cultural narratives around coexisting with machines, may shape customer expectations, interaction norms, and inclusivity discourses. This context may also limit the immediate transferability of findings to countries with different cultural attitudes toward technology and disability. However, we believe that our study further shows Japan’s relevance as a pioneering site for studying the social and emotional dynamics of avatar-mediated service encounters.
In the analytical phase, we processed a substantial volume of collected data, totaling 642 pages. Initially, we revisited all gathered content, including transcriptions and field notes, to ensure comprehensive familiarity. Subsequently, we undertook a coding process, isolating the most salient phrases and identifying recurring patterns and insights highlighted by interviewees. This coding facilitated the formation of categories and subcategories, aiding in the organization of data into cohesive clusters and the construction of a comprehensive analytical framework. To improve precision and minimize discrepancies, we leveraged NVIVO 12, a computer-based qualitative data analysis software. This tool provided good support in systematically organizing, managing, and interpreting the extensive dataset, thereby ensuring rigor and consistency throughout the analysis process. The data analysis followed a posteriori, inductive thematic approach, guided by principles of grounded theory. No pre-established coding scheme was used; instead, codes and themes emerged organically from the data. We began by thoroughly reading all interview transcripts and field notes to ensure deep familiarity with the material. Using NVivo 12, we applied open coding to identify meaningful phrases, recurring patterns, and significant narratives mentioned by participants. These initial codes were then grouped into broader themes and subthemes through axial coding, enabling the construction of an interpretive framework grounded in lived experiences.
The conclusion phase of our study centered on the presentation and discussion of the findings of the data analysis. In this phase, we synthesized the key insights, highlighted emergent themes, and explored the implications of our findings within the broader context of the research question and existing literature. Furthermore, we critically assessed the validity and reliability of our conclusions, acknowledging the limitations and proposing avenues for further research. Through this concluding phase, we intend to contribute to the understanding of the phenomenon investigated and advance scholarly discourse in the field.
In summary, to ensure transparency and replicability, we followed a structured, step-by-step approach, in phases as can be identified above. Therefore, as noted previously, participants were selected through purposive (experts) and convenience (clients) sampling, with attention to diversity of training and familiarity with robotics. We then developed customized semi-structured interview protocols and obtained written informed consent from all participants. Interviews were audio-recorded, transcribed verbatim, and enriched with field notes. Data were analyzed using an a posteriori inductive thematic coding strategy in NVivo 12. Open and axial coding were applied to develop themes based on the data. A second researcher reviewed a subset of codes to ensure consistency, and triangulation between interviews, field notes, and organizational documents increased reliability. An audit trail documented all analytical decisions to support reproducibility.
Findings
This section is divided into three subsections. The first subsection focuses on the DAWN cafe case, drawing parallels with similar studies such as those on Henn na Hotel (Reis et al., 2020) and Henn na Café (Reis, 2024). The second subsection presents an analysis of the DAWN cafe, situating it within the conceptual framework outlined in the literature review. The final subsection offers a concise discussion on potential future directions, which currently remain speculative but may soon become tangible realities.
The Case of DAWN Avatar Robot Cafe, Japan
The Avatar Robot Cafe DAWN (Diverse Avatar Working Network) is an initiative by OryLab Inc. (https://orylab.com/en/) in Tokyo, Japan (Dawn, 2021). This establishment functions as a permanent experimental platform that enables individuals with limited mobility or other constraints to operate avatar robots remotely. By using robotics, the DAWN Cafe provides an inclusive environment where technology facilitates new forms of social participation and professional engagement for people with disabilities.
Currently, the DAWN Cafe utilizes three types of robots (Dawn, 2021): OriHime, is a small avatar robot developed by OryLab Inc. It is designed to serve as a surrogate presence for those individuals who cannot physically be present in-store. Controlled via the Internet, OriHime allows remote users to see, hear, and interact with others, fostering new forms of social engagement and enabling participation in environments they might otherwise be unable to access. Building on these capabilities, OriHime-D is a larger, anthropomorphic humanoid robot, also developed by OryLab Inc. OriHime-D is more advanced and capable of performing complex tasks, including serving food and drinks at the cafe. This robot also creates job opportunities for people with physical limitations, allowing them to work remotely in the service industry by controlling the robot and interacting with customers and staff. In addition to OriHime and OriHime-D, the DAWN Café also features the Nextage, which is a sophisticated industrial robot developed by Kawada Robotics Corporation (http://www.kawadarobot.co.jp/en/) in Tokyo. Nextage is a robot with two arms and is designed for high precision and adaptability in various tasks. At the cafe, Nextage handles tasks that require precision and consistency, such as preparing and serving food, thereby supporting the human and avatar staff. Through the integration of these robots, the DAWN Cafe exemplifies a forward-thinking approach to inclusivity and technology. This combination of MH/MA aims to redefine social participation and workplace inclusivity through innovative technological solutions (Figure 4).

Avatar Robot Café DAWN: (A) OriHime. (B) OriHime-D. (C) OriHime Nextage.
Regarding the operational structure and customer experience, multiple OriHime units are available throughout the café, allowing customers with reservations to interact with a dedicated pilot who explains the menu and takes meal orders. Those without reservations can engage with OriHime pilots for shorter and more flexible interactions. An OriHime-D robot delivers drinks to customers, showing the integration of robotics into the dining experience. Table 1 compares and summarizes the several robot types used at DAWN Café.
Comparative Analysis of Robot Types Used at DAWN Café.
The DAWN Cafe is divided into four distinct areas, each designed to maximize the potential of robotic technology for inclusive social and professional participation (Figure 5). The first area is the dining area, where remotely operated OriHime robots serve customers at their tables. OriHime pilots, who control these robots, offer an exclusive menu and facilitate engaging conversations. The second area is the bar counter. Here, OriHime pilots prepare and serve a selection of drinks, demonstrating the robots’ versatility in handling different types of beverages and providing personalized service. The third area is the barista zone, where trained OriHime pilots use the “OriHime Nextage” robot to prepare and serve coffee. Mika, who previously worked as a barista at a coffee shop before being diagnosed with Amyotrophic Lateral Sclerosis (ALS) now operates Nextage. That is, despite this diagnosis, Mika has successfully transitioned to operating the “Tele-Barista” system, which integrates OriHime with the Nextage robot. This system enables her to prepare and serve coffee remotely, highlighting the craftsmanship involved in coffee preparation. This stands in contrast to the widespread use of fully automated coffee vending machines or robots like Sawyer, which operate in a human-robot interaction-controlled (HRI-C) environment (Reis, 2024). The fourth and final area is the customer interaction zone. In this section, customers have the opportunity to operate an OriHime robot themselves, under the guidance of OriHime pilots. This hands-on experience allows customers to engage directly with the technology. The DAWN Cafe exemplifies how innovative technology can create inclusive opportunities for individuals with disabilities to participate remotely in social and professional activities, showing a revolutionary approach to inclusivity and technological advancement.

In-store template layout.
Upon entering the cafe, customers are greeted by a robot that questions them about their reservation status and explains the cafe’s rules. Customers without reservations can experience the service by purchasing an OriHime PASS, which requires the payment of an admission fee. As shown in Figure 4, upon entry, customers encounter the cashier directly in front of them. To the right, they will find the OriHime Diner, designated for customers with reservations, as well as the bar, and the tele-barista operated by the Nextage robot.
The concept of providing employment opportunities for individuals with disabilities or conditions that prevent them from working in person is ground breaking and has received positive attention from customers. Customers frequently described feelings of surprise, empathy, and inspiration when interacting with OriHime pilots. Several noted the emotional resonance of “speaking to someone who was not physically present, yet very human,” highlighting how the pilots’ voices, personalized responses, and visible effort to engage contributed to a meaningful service experience. Others mentioned that seeing individuals with disabilities in active service roles challenged their assumptions and led to moments of reflection and admiration. However, a few customers expressed initial discomfort or confusion during their first interaction with the avatars, pointing to the need for clearer onboarding or interface guidance. The above diverse emotional responses illustrate how avatar-mediated service encounters extend beyond functional service delivery, becoming moments of human connection and social learning. Another issue some customers reported was challenges in placing their orders through the OriHime robots, necessitating assistance from in-person staff. To improve the cafe’s operations and inclusivity, several recommendations have been proposed. There is a consensus among customers and experts that the cafe should train staff to facilitate better communication with the remotely operated robots. Interestingly, some of the customers argued about the absence of Japanese clients during their visits, with most of the clientele being foreigners. This observation suggests that the cafe’s unique business model attracts a global audience, curious about its inclusive and futuristic approach. However, the limited diversity of the menu and the absence of advanced AI systems may deter local customers, who might perceive the robotic systems as overly simplistic. Enhancing the cafe with AI-driven features, to a Multi-Human/Multi-Robot+AI (MH/MR+AI) can make the experience more appealing to a broader audience, including locals, by providing a more sophisticated and engaging experience.
Overall, the cafe’s pioneering concept of integrating people with disabilities into the workforce through remote-operated robotics is commendable, according to both customers and experts. With strategic enhancements in language support, menu, and technological diversity, the cafe can further its mission of inclusivity while attracting a wider range of customers. We do not believe that Level 5 applies to this cafe concept, as it would undermine the fundamental purpose for which the cafe was developed. Hence, the concept is better aligned with a mix of the first sub-level (MH/MA) and the second sub-level (MH/MR+AI) of Level 4. We have not yet considered the possibility of adding a third MH/MA+AI (Multi-Human/Multi-Avatar+AI) sub-level at level 4 to the conceptual framework, but it remains an option.
The operational structure and spatial organization of the DAWN Cafe are not simply logistical choices; they represent a deliberate implementation of a Multi-Human/Multi-Avatar (MH/MA) system as theorized in Level 4 of our conceptual framework. Each area of the cafe (dining, bar, barista, interaction) is designed to operationalize remote human agency via avatars, distributing tasks based on technological capability and social interaction intensity. OriHime robots handle low-mobility, high-interaction functions such as order-taking and conversation, while OriHime-D and Nextage manage more physically demanding or precision-based tasks. This shows a socio-technical configuration where human pilots, robot capabilities, and customer expectations are interdependently aligned. Importantly, this spatial-functional division supports the operational efficiency but also emotional inclusivity; each interaction zone offers customers a different mode of connecting with remote operators, highlighting the spectrum of mediated empathy enabled by MH/MA architectures. Rather than simply automating labor, the DAWN Cafe reconfigures it, preserving human emotional input through technologically extended presence. This shows a departure from traditional AI-driven models of automation and supports the rationale for introducing MH/MA systems as a new service design logic.
While the study did not set out to collect quantitative performance data, field observations revealed several basic operational features. The average service time from order to delivery ranged from 7 to 12 min, depending on the robot type and staff availability. Technical disruptions were minimal during the observation period, though OriHime-D experienced occasional delays in response time, which pilots and in-store staff typically resolved within 1 to 2 min. The cafe operated with 6 to 8 active avatar pilots per day, covering all four service zones. These observations suggest that the MH/MA system maintains an acceptable level of service stability and responsiveness in daily operations.
Frontline Services in Complex Social-Emotional Environments
Overall, the article explores the potential and challenges of integrating Multi-Human/Multi-Avatar (MH/MA) systems in frontline services, particularly within complex socio-emotional environments. Using the example of DAWN Cafe, this study validates level 4 of the conceptual framework (Figure 3) through the implementation of an MH/MA (Multi-Human/Multi-Avatar) architecture. The discussion centered on three types of robots, with OriHime being a prominent example, facilitating a human-robot symbiosis (avatar). This concept not only enhances human-robot dynamics but also creates an inclusive space where technology compensates for the physical limitations of pilots and addresses the social-emotional limitations of robots/AI, as illustrated by Huang and Rust (2021) concerning feeling intelligence.
To empirically validate Level 4 of our conceptual framework, we analyzed how the DAWN Cafe’s MH/MA architecture fulfills the operational and emotional demands of frontline service. Interview data from both customers and experts reveal clear markers of successful MH/MA implementation. One customer noted: “I was surprised at how human the conversation felt, even though the pilot wasn’t physically here, it was like talking to someone sitting next to me.” This sense of interpersonal presence through an avatar points to the system’s capacity to simulate human empathy, not via AI, but through technologically mediated human agency. An expert emphasized this as well: “The value of DAWN is not just in using robots, it is in keeping the human emotional touch alive through telepresence. That’s what AI cannot yet do.” These responses indicate that the MH/MA system provides technical functionality and emotional resonance, validating its placement as a distinct architecture in Level 4.
Social-emotional complexity in this context refers to the need for nuanced, real-time communication, emotional responsiveness, and adaptive social interaction, elements central to inclusive customer service. We operationalized this concept by examining three dimensions observed during the study: (a) the degree of real-time interpersonal engagement; (b) the ability of pilots to read and respond to subtle customer cues (e.g., hesitations, facial expressions); and (c) the customer’s reported sense of connection and emotional impact. Across interviews, participants repeatedly described the service as “moving,”“thoughtful,” or “surprisingly human,” all of which suggest that socio-emotional complexity is actively managed within the MH/MA configuration. Importantly, these outcomes were not incidental but structurally embedded through the system design, where remote pilots are trained to use tone, timing, and personalization strategies to manage customer emotions. In this way, Level 4 is not just technologically defined but socially enacted.
From the outset, our article identifies a twofold opportunity. First, integrating avatars in customer service environments significantly improves the social participation of people with disabilities. Second, it enhances personalized and emotional service, addressing the technological limitations within frontline services. However, a key challenge identified pertains to the robotics and technological capacity of robots. The existing literature on MH/MR+AI systems needs further exploration into the benefits of such integration. Dahiya et al. (2023) analyzed multi-agent Human-Robot Interaction (HRI) systems, particularly those involving multiple agents (i.e., multiple humans and/or robots). They identified three main aspects of “multi-agent” HRI systems that help in understanding how these systems differ from dyadic systems: team structure, interaction style between agents, and computational characteristics. They also highlighted five attributes of HRI systems: team size, team composition, interaction model, communication modalities, and robot control. Managing these attributes is inherently complex. Emphasizing AI architecture within these systems necessitates greater exploration in the context of teamwork involving humans and pilots. This is crucial in high-interaction environments like DAWN Cafe. In that regard, interviewed experts stress that the rise of AI and its integration into MH/MA architectures can facilitate effective teamwork. This argument aligns well with the ones from Kamino and Sabanovic (2023), who highlight the interaction between the clients, the human employees, and the avatars. They noted that employees guide visitors to tables, involve remote pilots, and participate in conversations as needed. Thus, ongoing training and development for in-store personnel and robot operators are essential to ensure continuous interactions and customer satisfaction.
However, a significant challenge remains in (re)determining how interactions with multi-robot systems will be conducted in the future, and defining the specific role of AI is a concern shared by the experts we interviewed. Kamino and Sabanovic (2023) shared another issue about daily technical problems due to robot use. They argue that in-store employees and remote staff usually try to resolve these issues on-site whenever possible, with local engineers stepping in if needed. Considering this challenge, our experts believe that MH/MA+AI architecture has the potential to transform service delivery, making it more efficient, given that AI can address complex problems for which humans do not have an obvious primary solution.
Currently, the DAWN Avatar Robot Cafe exemplifies how frontline services can be reimagined in complex socio-emotional environments through the innovative use of MH/MA technology. In the future, other alternatives may be explored, such as developing a new sub-level at level 4 or fully automating front-line services (level 5) when AI can operate at higher intelligence levels, such as feeling AI, with high efficiency.
Brief Discussion on the Next Steps
The DAWN Cafe case study demonstrates the feasibility and benefits of implementing MH/MA systems in emotionally complex service environments. However, several areas remain underexplored, both in practice and in theory, which offer clear directions for future development and research.
First, there is an opportunity to expand the MH/MA model into an MH/MA+AI hybrid system, where remote human operators could be supported, not replaced, by AI. For example, AI could assist pilots by handling routine tasks such as scheduling, menu navigation, or even language translation, freeing operators to focus on the emotional and interactive dimensions of service. This integration would create a new sub-level within Level 4 of the conceptual framework (Multi-Human/Multi-Avatar+AI), blending the strengths of human empathy with machine efficiency.
Second, pilot training and support structures represent an important frontier. Our interviews revealed that while many pilots develop their interaction styles and techniques, there is currently no standardized training that ensures consistency in emotional delivery or service quality. Establishing best practices for mediated empathy, tone modulation, conversation pacing, and adaptive responsiveness could raise the overall performance of MH/MA systems and improve customer experience. Additionally, providing emotional and technical support for pilots is essential, given the cognitive and affective demands of avatar-based service work.
Third, future deployments should experiment with expanding the scope of MH/MA systems beyond hospitality, such as in education, healthcare, retail, or administrative services. The fundamental principles demonstrated at DAWN, remote human agency, emotional resonance, and inclusive access, are transferable to multiple domains where AI currently underperforms due to its socio-emotional limitations.
Fourth, cross-cultural replication and comparative studies are essential to test the generalizability of MH/MA systems. Japan’s unique acceptance of robotics and high tolerance for technologically mediated social interaction may not be shared by other cultures. Understanding how users in different societies perceive, trust, and emotionally engage with avatar-mediated services is crucial for adapting these systems globally.
Fifth, data-driven assessment mechanisms should be developed to monitor performance over time. While this study focused on qualitative indicators such as emotional engagement and inclusivity, future research should incorporate operational metrics (e.g., service time, error rates, user satisfaction scores) and psychological metrics (e.g., customer trust, pilot well-being) to provide a more comprehensive evaluation of MH/MA systems.
Lastly, the findings suggest a need for design innovation in robotic interfaces themselves. OriHime and OriHime-D have proven effective but remain limited in expressiveness and interaction range. Advancements in haptics, facial expressiveness, and nonverbal signaling could enhance avatar embodiment and deepen the emotional authenticity of remote interactions.
Taken together, these steps point toward a future in which MH/MA and MH/MA+AI systems coexist alongside AI-driven and robot-led configurations. Rather than viewing these as mutually exclusive paradigms, we propose a blended approach to service automation that centers on emotional intelligence, inclusion, and human presence, even in an increasingly autonomous world.
Concluding Remarks
This section emphasizes three key points we believe are the most pertinent. First, we address the theoretical contributions, highlighting the novel and original insights this article adds to the existing body of literature. Second, we shortly discuss the practical implications for management. Historically, practitioners have questioned the applicability of academic articles to their daily operations. In response, we provide several recommendations for managers and industrial engineers to enhance service delivery within their respective industries. Lastly, acknowledging that no research is without limitations, we outline a few significant constraints to maintain the article’s brevity and focus. To mitigate these limitations, we propose several avenues for future research, offering directions for both junior and senior researchers interested in further exploring this topic.
Theoretical Contributions
This study makes three original theoretical contributions to the fields of service innovation, human-robot interaction (HRI), and inclusive design.
First, it validates Level 4 of a newly extended conceptual framework for frontier service delivery, Multi-Human/Multi-Avatar (MH/MA) systems, which was previously unrepresented in existing models such as Reis (2024) or Huang and Rust (2021). While past frameworks emphasize either AI substitution or robot-human collaboration, our model introduces a distinct category where human empathy is not simulated by AI but preserved and distributed through avatar robots. This marks a glimpse into the AI-centric evolution of service intelligence and positions MH/MA systems as a scalable, human-centric alternative.
Second, the study introduces the conceptual sublevel of MH/MA+AI, which blends human-operated avatar systems with supportive AI functions (e.g., task coordination or translation). This hybrid form has not been previously theorized in the literature and offers a path toward more efficient and emotionally responsive service models that retain human agency while leveling the machine. If we recognize this emerging architecture, we are opening a new dimension in the automation-inclusion continuum and inviting future empirical testing.
Third, we theorize and operationalize the socio-emotional compensatory function of avatars, the idea that avatar robots serve not just as proxies for physical presence but as socio-technical mediators capable of carrying emotional intent, creating connection, and sustaining empathy at a distance. This function is not only symbolic but structurally embedded through training, interface design, and cultural context. It expands existing theories of mediated presence and remote embodiment by predicting emotional labor as the core competency of MH/MA systems.
Together, these contributions extend the boundaries of current service automation theories and provide a foundation for other conceptual and empirical models in complex, inclusive, and hybrid service environments.
Managerial Contributions
From a managerial perspective, the DAWN Avatar Robot Cafe offers several key insights for service industry leaders aiming to incorporate advanced robotics and AI into their operations. The cafe’s approach to employing individuals with disabilities through remote-operated robots provides a replicable model for creating inclusive work environments. This model not only addresses labor shortages in the service industry but also taps into a previously underutilized workforce, enhancing corporate social responsibility and brand reputation.
The operational structure of the DAWN Cafe, with its distinct areas for dining, bar service, barista operations, and customer interaction, shows the versatility of robotic systems in various service contexts. Managers can draw from this example to design service environments that maximize the potential of robotic technology while maintaining a high level of customer engagement and satisfaction.
Additionally, the cafe’s emphasis on providing a personalized and emotionally engaging customer experience through remote-operated robots offers relevant lessons for enhancing service quality. By investing in ongoing training and development for both robots and their operators, the service industry leaders can ensure continuous interactions and maintain high standards of customer service.
Limitations and Future Research Directions
This article has several limitations that should be acknowledged. First, the cultural specificity of the DAWN Cafe context in Japan may limit the generalizability of our findings. Japan’s cultural familiarity with robotics and social tolerance for mediated human-machine interactions may not translate easily to other regions. Future research should test the MH/MA framework in diverse cultural contexts to understand how societal attitudes, labor norms, and disability inclusion policies affect adoption and perception. Second, the reliance on qualitative data, including interviews, observations, and document analysis, presents challenges in quantifying and comparing results. We understand that qualitative data collection was appropriate for understanding and describing the real-life phenomenon at DAWN Cafe, but future research could benefit from incorporating quantitative analysis to further validate and expand upon our findings.
Future research should focus on several key areas: developing training programs for MH/MA systems and assessing the long-term impacts on various stakeholders, including employees, customers, and service providers. Expanding opportunities for research and implementing actionable activities in the public sector can also be highly beneficial, as suggested by Reis et al. (2019). Moreover, exploring the integration of AI-driven features into MH/MA architectures, and potentially creating a Multi-Human/Multi-Avatar+AI (MH/MA+AI) model can enhance the sophistication and appeal of such service environments. This exploration can lead to a more comprehensive understanding of how AI can facilitate effective teamwork and personalized service in high-touch environments.
Footnotes
Acknowledgements
The author extends deep gratitude to all the experts and respondents whose participation was indispensable to this investigation.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data can be provided upon request to the author.
