Abstract
BACKGROUND:
Avatars in Virtual Reality (VR) can not only represent humans, but also embody intelligent software agents that communicate with humans, thus enabling a new paradigm of human-machine interaction.
OBJECTIVE:
The research agenda proposed in this paper by an interdisciplinary team is motivated by the premise that a conversation with a smart agent avatar in VR means more than giving a face and body to a chatbot. Using the concrete communication task of patient education, this research agenda is rather intended to explore which patterns and practices must be constructed visually, verbally, para- and nonverbally between humans and embodied machines in a counselling context so that humans can integrate counselling by an embodied VR smart agent into their thinking and acting in one way or another.
METHODS:
The scientific literature in different bibliographical databases was reviewed. A qualitative narrative approach was applied for analysis.
RESULTS:
A research agenda is proposed which investigates how recurring consultations of patients with healthcare professionals are currently conducted and how they could be conducted with an embodied smart agent in immersive VR.
CONCLUSIONS:
Interdisciplinary teams consisting of linguists, computer scientists, visual designers and health care professionals are required which need to go beyond a technology-centric solution design approach. Linguists’ insights from discourse analysis drive the explorative experiments to identify test and discover what capabilities and attributes the smart agent in VR must have, in order to communicate effectively with a human being.
Introduction
Chatbots and conversational intelligent software agents, so called smart agents, implemented as text- or speech-based dialog systems are used in many domains and are becoming more and more part of our everyday life. They substitute communication with a human being and act as a natural language interface to more or less intelligent machines. For purely functional dialogues such as making an appointment or reserving a table in a restaurant, they are very efficient. But what about a complex communication task, such as advising a chronically ill patient who has to learn to live with diabetes?
In the current situation we have probably become particularly aware of how valuable and rich personal face-to-face conversations are. Social co-presence, the use of nonverbal and paraverbal communication as well as the possibility to interpret and react to the counterparts’ state of mind make face-to-face communication for many situations highly effective and pleasant. Multimedia chats pick up on this by giving conversational smart agents a human appearance in the form of avatars. Immersive Virtual Reality (VR) goes one step further and allows us to interact with smart agents in a three-dimensional world that creates a feeling of social co-presence. However, such a realistic setting, close to a real-life face-to-face communication raises numerous design challenges. A smart agent who talks to a diabetes patient may be given a humanoid shape, facial expressions and gestures, but this does not mean that the smart agent can interpret the mental state, body language and linguistic nuances such as irony or pauses of his human counterpart and react adequately. As human beings how do we want to communicate with an artificially intelligent being in virtual space? To what extent is communication different when a person communicates with a smart agent in VR? Do we act according to the same pattern of language behavior with a smart agent as with a human counterpart? Do we expect a simulation of an interpersonal face-to-face communication or will new practices for verbal and nonverbal communication with artificially intelligent “beings” emerge?
Relevance
To capture the full complexity of a conversation as a social interaction, research and design on embodied conversational smart agents have to be interdisciplinary and linked to a relevant and challenging communication task. The use case of patient education is predestined to investigate the effectiveness of communication. Healthcare professionals not only convey knowledge and skills in interventions, but also aim to initiate and recognize behavioral changes. One case, where patient education is of great importance is diabetes. The management of patients with diabetes belongs to one of the major challenges for healthcare worldwide. In 2000, the global estimate of diabetes prevalence was 151 million. According to the International Diabetes Federation the estimates have since shown alarming increases, tripling to the 2019 estimate of 463 million [1]. The prevalence of diabetes in men lies by 6.9% and 4.4% in women [2]. Improved patient self-management support and increased access to expertise are considered highly effective interventions for patients suffering from a chronical disease like diabetes [3]. Researching and prototyping technology-enabled interventions based on immersive VR and smart agents is therefore in line with the efforts to search for new ways to prevent and treat diabetes more successfully [2]. If we succeed in developing and validating a design approach for this challenging communication task, the basis for further conversational applications based on embodied smart agents would be created.
State of research
Immersive VR and conversational smart agents embodied by avatars enable new possibilities for humans to communicate with intelligent machines. This research agenda deals with the design space of this novel form of human-machine interaction from four perspectives. This is done with the intention of supplementing the predominantly technologically oriented work with empirical, applied, interdisciplinary aspects:
Linguistics (social interaction and language); Design (visualization and perception); Technology (systems behavior); and Healthcare (the chosen use case: patient education for patients with diabetes).
In the following section the four views on this challenge and the respective state of research are described.
Diabetes consultations can be seen as a specific type of the interaction type “consultation” [4]. As a communicative practice with specific patterns in a very limited time frame with few participants, it requires actors who have been socialized into the specific cultural practice and language pattern for a smooth process [5]; in this case a healthcare professional and a patient with diabetes is involved. A systematic examination of the communication between healthcare professionals and patients with diabetes has not yet taken place in linguistics. The most recent study was conducted by Bührig, Fienemann and Schlickau [6] and deals with the link between negative quality of diabetes consultations as perceived by patients on the one hand and the lack of patient action following medical advice (called compliance, adherence or empowerment) on the other hand. They also tried to measure the communicative quality of the diabetes consultations. In particular, they pointed out that the patient’s integrity zone was often infringed by the utterances of the healthcare professionals and that the communication was affected by different stocks of knowledge [6]. The transfer of knowledge from healthcare professional to patient, which can be understood in a broader sense as patient education [7, 8] seems to make up a large part of the diabetes consultation. Like doctor-patient communication, diabetes consultations are therefore located somewhere between shared-decision-making and informed-choice [9, 10, 11] in which healthcare professionals appear as consultants, who make [12].
The choice of the use case of patient education for this proposed research agenda is motivated by the fact that the willingness of patients to change their behavior accordingly (therapy adherence) can be positively influenced by successful communication between medical consultants and patients, which is supported by various studies [13, 14, 15, 16].
So far, the relevant literature on diabetes counselling has only examined communication between people. However, in our proposed agenda one of the conversation participants is an intelligent machine a smart agent operating in a virtual reality. Nevertheless we can still call it an interaction. The human participants of the conversation perceive their counterpart. Through the avatar’s reactions, the human partners construct a conversation situation, they recognize in a certain sense that they themselves are perceived. To a certain degree, a conversation situation is constructed in this way as an interaction [17, 18]. It is not surprising that this condition alters communication radically. (Effective) communication is made up of different units: verbal, para- and nonverbal behavior like mimicry, gestures, speech expression or eye contact are equally important and can even contribute to a better understanding of utterances [19]. The theory practices and patterns in linguistic also emphasizes, that grammatical and lexical structures are significant, interact with voice, gaze, facial expressions, gestures, movements in space and handling objects [5] and far more. It is often presumed that the acceptance of technically simulated, nonverbal behavior by viewers depends on the look and behavior of the avatar (e.g. gender, posture, figure, skin color, age and size or degree of realism and anthropomorphism) [20, 21, 22]. Some studies have shown that the more human a smart agent looks and acts, the more human traits are attributed to him [23]. Realistic-looking avatars provide stronger social presence [24] and participants show the similar language behavior as if they were talking to a human being [25]. So far, this humanoid appearance and behavior was interpreted in such a way that it has a positive effect on communication efficiency [26]. Previous linguistic research on human-machine communication, especially in the field of computational linguistics, has therefore focused primarily on the imitation of face-to-face communication between humans [27].
In the field of discourse analysis research which focuses on verbal data, also focuses on this optimization behavior on the human side and less on optimization principles for voice-controlled dialog systems [28]. In general, conversations, which take place between people and computers additionally, are preceded in each case by specific media and their affordances. They frame specific conversation situations and their requirements [29, 30]. These settings (e.g., a phone call), are decisive and limit the choice of communicative-linguistic possibilities and their stylistic realization. On the other hand, however, they also open up new possibilities for use. For example, in a conversation with an artificially generated interlocutor, metacommunicative information and units are used in a different strategic sense than in a situation in which metacommunicative units can be captured and processed.
Design: Visualization and perception of avatars
User acceptance of smart agents depends on behavioral realism as well as on visual and auditive realism. Research in this area dates back to the early days of VR [20], but results regarding user efficiency and acceptance in dependence of visual realism are inconclusive [31, 32]. However, studies have shown that an avatar’s behavioral fidelity should be coherent with its visual fidelity as well as the visual fidelity of its surroundings in order to avoid eery and alienating effects [33]. Published studies in the field VR are almost exclusively conducted in the fields of computer sciences, engineering and psychology and use the term “visual realism” to describe the fidelity of avatar appearance [34]. In this research agenda, a design perspective is applied and therefore, the term “visual abstraction” is used to emphasize the conscious approach of creating a coherent look and feel and reducing human appearance to a level which is best accepted by patients and most likely to guarantee successful communication.
Technology: Implementation of affective smart agent avatars in VR
We are witnessing a continuously growing prevalence of Augmented and Virtual Reality (mixed reality) technologies in both, private and business application domains. One factor fueling this development has been the gaming industry, which is producing ever cheaper and more powerful hardware as well as ever more advanced runtime environments and development tools [35]. These enable the design and development of novel mixed reality applications for other purposes than gaming. Meeting and running a conversation in virtual space represents such a field of application. In contrast to conversations mediated by video or telephone Virtual Reality creates a feeling of social presence and provides for a digital space in which the participants can do more than just see and talk to each other [35, 36, 37]. Essential for social presence in VR are avatars, who can talk, move, manipulate objects and thus engage in more complex forms of interaction [38]. Avatars in VR cannot only represent human participants, but they can embody intelligent software agents, and thereby enable a new paradigm of human-machine interaction. Embodied conversational agents could become a part of people’s everyday life, but to date they are not fully capable of understanding and realizing the seemingly “natural” and socialized aspects of human face-to-face communications [39]. One reason for this is the fact that communication not only consists in successfully conveying content verbally. Effective communication also includes non- and paraverbal signals expressed by means of body language such as gestures, facial expressions and tone of voice. Avatars in Virtual Reality are able to convey non- and paraverbal signals and have the potential to raise the effectiveness of virtual communication [40]. However due to technical limitations animated motions of current embodied conversational agents still have little variations and quickly become repetitive [41]. Furthermore, a smart agent also needs to receive and interpret the nonverbal and paraverbal signals of its human counterpart. But in today’s immersive Virtual Reality environments the transmission of non- and paraverbal information of the human participant is only partially supported. One reason for this is the fact that users need to wear a Head Mounted Display (HMD), which covers part of the face. Motion capturing of body and hands is realized by commercial products, but sensing and interpreting facial expressions relies on experimental hardware solutions like electromyography (EMG) sensors attached to the HMD to track facial muscle movements [42]. The key message of the technology-related state of research in this field is that contributions from different research areas and functionality of existing development tools (SDKs) can be combined, transferred, applied and tested to realize the novel idea of an experimental system to analyze and design dialogues with an avatar in immersive VR. Key building blocks include sentiment, tone and voice analysis: voice analysis [43, 44, 45, 46, 47, 48, 49], sentiment analysis [50, 51, 52, 53]; gesture recognition by vision and hand tracking [54, 55, 56, 57, 58, 59] and speech recognition and analysis [60, 61]. The overall envisioned architecture in the context of this research agenda is inspired by existing work on virtual human communication architectures and embodied conversational agents [62, 63, 64, 65, 66].
Health: Educational interventions for diabetes patients by healthcare professionals
Improved patient self-management support and increased access to expertise are considered highly effective interventions for patients suffering from a chronical condition like diabetes [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67]. Self-management is understood as a collective term for patient-centered intervention strategies [68]. It comprises symptom management, the concrete embedding of this in daily life of the patient and their family and finally the development of implementation strategies for successful symptom management in order to recognize disease crises in time and to get help [68]. The American Association of Diabetes Educators describes seven self-care behaviors in patients with diabetes [69]. At the core of these are lifestyle-influencing behaviors in the use of food and daily (sport) activities. Adherence to prescribed medication is just as central. Healthcare professionals support the patients and their families through appropriate self-management support. In doing so the focus is not only on knowledge transfer [70]. Patient education is also about initiating behavioral change and positively influencing clinical and health outcomes [70]. For this purpose a person-centred attitude and supportive activities are needed, among other things, to help the patient solve problems [71, 72, 73]. Antonovsky described such self-management support to increase the patient’s resilience with “sense of coherence” as early as 1979 [74, 75]. According to Antonovsky, supportive activities must be comprehensible to the patient, make sense in their everyday life and be applicable in their everyday practice. These requirements are considered as critical constraints in the proposed research agenda.
Research goal and proposed research agenda
In essence, this research agenda focuses on the question of how humans speak to machines and, related to this, what the machine must be able to do to advise humans in repetitive contexts, so that the overall goal can be achieved, that communication does not fail. For decades natural language processing research has concentrated on anticipating and reproducing human speech as faithfully as possible [76]. Significant progress has been made in recent years through quantitative approaches (e.g., IBM Project Debater [77]), but it is still only an inadequate approximation of the highly complex reality. Conversational agents mimicking human voices and paraverbals perfectly, such as Google Duplex, pass the Turing test for purely functional dialogues, such as making an appointment without people realizing they talk to a machine [78]. Immersive Virtual Reality now allows us to go one level further in this “imitation game”. It is obvious and tempting to perfect a conversation with an embodied smart agent in VR towards an imitation of a real face-to-face conversation. But on the one hand, there are still some technical challenges in the way of such an endeavor [41] and on the other hand, we ask ourselves whether and to what extent imitation is necessary and desirable in order to make a machine avatar an effective counterpart in complex dialogues. We are therefore pursuing a different approach: the point for us is not that avatars of smart agents look, speak and behave “like humans”, but rather the question of which patterns and practices must be constructed verbally, para- and nonverbally between humans and embodied machines in a counselling context so that humans can integrate counselling by the machine into their thinking and acting in one way or another. This may not require “all” aspects of a social interaction to be fulfilled in communication with a smart agent in VR, at best reduced patterns are sufficient to prevent failure. If one avoids failures in consulting sequences, depending on one’s perspective, it is also possible to become a success, so to speak. However, the judgement about “success” and “failure” can only be made in the context of a specific communication task. Educational interventions for diabetes patients with diabetes mellitus type 2 are a highly complex communication task that nevertheless have some potential of standardization. Healthcare professionals apply communication strategies, such as motivational interviewing to initiate behavioral changes of patients [79, 80]. They have frameworks to evaluate complex interventions [81] and are trained to recognize such changes based on the nonverbal and verbal utterances of the patient during a consultation. Furthermore, they can relate these observations to measurable health outcomes, such as blood sugar levels or number of steps as an indicator for physical activity.
Figure 1 outlines the interplay of the four involved disciplines and their research contributions. The goal is to identify, create and validate the linguistic, visual and functional prerequisites for effective communication with smart agents in VR using the case of patient education for diabetes. To this end, the following research questions are to be addressed.
Scientific content and focus by discipline aligned to research questions (RQ).
Research questions
To what extent is communication different when a person communicates with a smart agent in VR? Do we exhibit the same language behavior interacting with a smart agent as with a human counterpart? How do we as human beings want to communicate with an artificially intelligent being in virtual space? Do we expect a simulation of an interpersonal face-to-face communication or will new practices for verbal and nonverbal communication with artificially intelligent beings emerge?
Corresponding approach
Linguists’ insights from ethnomethodologically oriented conversation analysis drive the explorative experiments by identifying the discourse structures, language patterns and social practices of co-present conversations between healthcare professionals and diabetes patients: I. human
Expected generation of knowledge
Generation of a corpus for future conversation-analytical examinations of the conversation-type
Scientific impact
Testing the suitability of VR as an experimental environment for linguistic analyses.
RQ-2: Design (visualization and perception)
Research questions
How does the degree of visual abstraction influence user acceptance? How close does nonverbal communication behavior (mimics, gestures, posture) have to match that of a real human in order to clearly express a smart agent’s simulated emotions?
Corresponding approach
A morphological analysis frames the experiments to explore various possible constellations of appearance and capabilities of the smart agent, its environment and interaction with the patient in a systematic manner [82]. A morphological approach supports the required multi-dimensional view of and provides a structure to explore the non-quantifiable possible solutions [83].
Expected generation of knowledge
Validated set of variables for the visual design space of a conversational smart agent avatar.
Scientific impact
Parameterizable environment to test the impact of avatars with varying degrees of abstractions on users in any context.
RQ-3: Technology (systems behavior)
Research questions
What are the relevant nonverbal verbal, affective and cognitive capabilities the smart agent must have, in order to communicate with a human being in a way so that the conversation is effective and desirable, or at least does not fail?
Corresponding approach
Starting from the premise that enabling a dialogue with a smart agent in VR does not mean to imitate a human-to-human conversation, we cannot formulate relevant scenarios of a target state or requirements for a solution, because we can only speculate how the solution could be and how humans may behave when they communicate with increasingly intelligent embodied machines in virtual spaces. Therefore “classical” deterministic requirements, which are tested in terms of hypotheses based on a prototype, are not useful. Explorative experiments, on the other hand, are more suitable to test and discover what capabilities and attributes the smart agent in VR must have, in order to communicate with a human being in a way so that the conversation does not fail. Explorative experiments are particularly helpful in the context of human-machine interaction whenever we introduce new technologies into society, but we cannot yet assess their impact as socio-technical system [84, 85]. To ensure desirability a participatory human-centered design approach framed by self-determination theory [86] is applied: What does the patient but also the healthcare professional win and lose, in terms of autonomy, competence, relatedness by transferring communication to VR and by involving a smart agent? What motivates a human being to communicate with an intelligent machine?
Expected generation of knowledge
Validated functional capabilities for dialogues with affective smart agents in VR.
Scientific impact
Experimental system for dialogue analysis and design in immersive VR, usable and to be developed on a continuous basis
RQ-4: Health
Research question
What is the required interaction with a smart agent avatar in immersive VR during an initial consultation to promote physical activity in patients with diabetes mellitus type 2, non-insulin-dependent, with grade 1 obesity? How is behavioral change talked about in consultations?
Corresponding approach
Evaluating the outcome of complex interventions [81]: instructing the patient on lifestyle-changing lifestyles in patients with type 2 diabetes on being active based on the seven self-care behaviors used by the American Association of Diabetes Educators (AADE) [69, 70] for assessing patients and outcomes
Expected generation of knowledge
Generation of a corpus for future patient-centered self-management diabetes chronic-care lifestyle instructions in a smart agent VR; Smart agent VR-instructions on lifestyle-changing lifestyles in patients with type 2 diabetes on being active and taking medication based on the seven self-care behaviors of the AADE [69, 70]; Smart agent VR-interventions taking into account four critical time points to assess, to provide and to adjust self-management support and education. Thus patterns of complex interventions and corresponding patient responses in consultations of chronically ill patients can be developed.
Scientific impact
Assessing the value and the usability of immersive VR as an environment to evaluate complex interventions for behavioral change in diabetes chronic care management.
Conclusions
This research agenda investigates how recurring consultations of patients with healthcare professionals are currently conducted and how they could be conducted with an embodied smart agent in immersive VR. The interdisciplinary team consisting of linguists, computer scientists, visual designers and health care professionals will go beyond a technology-centric solution design approach. Linguists’ insights from discourse analysis drive the explorative experiments to identify test and discover what capabilities and attributes the smart agent in VR must have, in order to communicate effectively with a human being. This is done with the aim of developing a prototype for patient education in VR. The social added value and potential cost savings are evident given the current increase in diabetes. The need to standardize the process of managing the care of people with diabetes while still allowing for individualized care could not be timelier. Researching and prototyping technology-enabled educational interventions based on immersive VR and smart agents is, therefore, highly relevant and future-oriented.
Footnotes
Conflict of interest
None to report.
