Abstract
Objective:
The aim of this review was to explore the current evidence for conversational agents or chatbots in the field of psychiatry and their role in screening, diagnosis, and treatment of mental illnesses.
Methods:
A systematic literature search in June 2018 was conducted in PubMed, EmBase, PsycINFO, Cochrane, Web of Science, and IEEE Xplore. Studies were included that involved a chatbot in a mental health setting focusing on populations with or at high risk of developing depression, anxiety, schizophrenia, bipolar, and substance abuse disorders.
Results:
From the selected databases, 1466 records were retrieved and 8 studies met the inclusion criteria. Two additional studies were included from reference list screening for a total of 10 included studies. Overall, potential for conversational agents in psychiatric use was reported to be high across all studies. In particular, conversational agents showed potential for benefit in psychoeducation and self-adherence. In addition, satisfaction rating of chatbots was high across all studies, suggesting that they would be an effective and enjoyable tool in psychiatric treatment.
Conclusion:
Preliminary evidence for psychiatric use of chatbots is favourable. However, given the heterogeneity of the reviewed studies, further research with standardized outcomes reporting is required to more thoroughly examine the effectiveness of conversational agents. Regardless, early evidence shows that with the proper approach and research, the mental health field could use conversational agents in psychiatric treatment.
Introduction
Access to mental health services and treatment remains an issue in all countries and cultures across the globe. Worldwide, major depression is the leading cause of years lived with disability and the fourth leading cause of disability-adjusted life years (DALYs). 1 According to the Health Canada Editorial Board on Mental Illnesses in Canada, more than 20% of Canadians will suffer from a mental illness during their lifetime, and the global economic burden of mental health in 2010 was estimated at 2.5 trillion US dollars. 2 With rates of suicide now actually increasing in many countries like the United States, 3 it is clear that there is a need for new solutions and innovation in mental health.
Unfortunately, the current clinical workforce is insufficient in meeting these needs. There are approximately 9 psychiatrists per 100,000 people in developed countries 1 and as few as 0.1 for every 1,000,000 4 in lower-income countries. This inadequacy in meeting the present or future demand for care has led to the proposal of technology as a solution. Particularly, there is a growing interest surrounding chatbots, also known as conversational agents or multipurpose virtual assistants.
Chatbots or conversational agents are here defined as digital tools existing either as hardware (such as an Amazon Echo running the Alexa digital assistant software) or software (such as Google Assistant running on Android devices or Siri running on Apple devices) that use machine learning and artificial intelligence methods to mimic humanlike behaviours and provide a task-oriented framework with evolving dialogue able to participate in conversation (see Figure 1). Gaining traction in the popular press, their potential in mental health is well represented, considering one of the top requests to Alexa during the summer of 2017 was “Alexa, help me relax,” according to the

A sample interaction between a patient and a chatbot therapist.
This human-computer interaction technology was established academically half a century ago. In 1964, the programmable natural language processing program ELIZA was developed at the MIT Artificial Intelligence laboratory by Joseph Weizenbaum. Designed to act as a Rogerian psychotherapist, ELIZA could not understand the content of its conversations. However, many who used this chatbot believed it to be intelligent enough to comprehend conversation and even became emotionally attached to it. Weizenbaum would later remark that “[he] had not realized…that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.” 6 In 1972 at Stanford University, psychiatrist Kenneth Colby developed PARRY, a program capable of simulating the behaviour of a human with schizophrenia that was then “counseled” several times by ELIZA.
Fifty years later, the technology that made ELIZA possible is now available on the smartphones and smart home devices owned by billions around the world. According to market research, over three-quarters of Canadians own a smartphone, and already nearly 10% own a smart home device, such as the Google Home or Amazon Echo. 7 The technology has also advanced to the point that chatbots today incorporate natural language processing for speech, removing the need for a keyboard, as anyone who has ever used Siri can affirm.
Although there is still much to be explored when it comes to chatbots in mental health, their potential has already begun to surface. Chatbots are being used in suicide prevention 8 and cognitive-behavioural therapy, 9 and they are even being tailored to certain populations, such as HARR-E and Wysa. 10 In particular, chatbots may be helpful in providing treatment for those who are uncomfortable disclosing their feelings to a human being. Therefore, virtual therapy provided by a chatbot could not only improve access to mental health treatment but also be more effective for those reluctant to speak with a therapist. Veterans, for example, who are often reluctant to open up after a tour of duty, were significantly more likely to open up to a chatbot when told it was a virtual therapist than those who were told the chatbot was being controlled by a person 11 —offering the potential to increase needed access to care. 12
With increased access to technology and the ease of use that accompanies, interest in mental health chatbots has reached a point where some have labelled them “the future of therapy.” 13 However, there is no consensus on the definition of psychiatric chatbots or their role in the clinic. While they do hold potential, little is known about who actually uses them and what their therapeutic effect may be. Evaluation efforts are further complicated by the rapid pace of development in hardware and that such software may behave and respond differently depending on region. For example, when a user said he or she felt sad, one chatbot, the US-developed Google Assistant, replied, “I wish I had arms so I could give you a hug,” where the Russian-developed chatbot Alisa replied with “No one said life was about having fun.” 14 In this review, we explore today’s early evidence for chatbots in mental health and focus on fundamental issues that need to be addressed for further research.
Methods
A librarian at the Boston University School of Medicine assisted in generation of a search term for selected databases (PubMed, EmBase, PsycINFO, Cochrane, Web of Science, and IEEE Xplore) using a combination of keywords, including
Results
Of the 1466 references identified from search terms applied to selected databases, 1066 conference proceedings and 104 duplicates were removed, leaving a total of 296 studies for the abstract screening phase. Through criteria-based abstract screening, 10 studies were identified for full-text screening, and 2 studies were selected postscreening from searched reference lists. 15 –24 Ten studies were identified as relevant and entered the data extraction phase. The detailed PRISMA diagram in Figure 2 further outlines number of studies excluded per criteria. Listed for each selected study are summary data in Table 1, measures used in Table 2, application designations in Table 3, and functions or roles in Table 4. As only 2 of 10 selected studies provided full education and ethnicity demographic information, this information could not be summarized.

PRISMA diagram.
Reported Information about Each Selected Study and Chatbot Where Provided.
Summary statistics of interest are provided along with the measure.
CBT, cognitive behavioural therapy; NR, not reported; PTSD, posttraumatic stress disorder.
Outcome, Adherence, and Engagement Measures Used in Each Selected Study.
Application Designations of the Chatbot in Each Study.
Diagnostic: screen a subclinical population or correctly match the existing diagnosis of a clinical population. Monitoring: monitor a function of a clinical or subclinical population. Therapy: provide therapeutic intervention through methods such as cognitive behavioural therapy or encouraging medication adherence.
Features of the Chatbot in Each Study.
In contrast to the vast potential in marketing of chatbots to consumers, we found the academic psychiatric literature to be surprisingly sparse. The majority of research on chatbots appears to be conducted outside of traditional medical publication outlets, as we excluded nearly 75% of studies from our initial search that were largely engineering conference proceedings. The fact that much of this research is happening in other disciplines highlights a need to continue to bridge the multiple stakeholders advancing this work and create opportunities in this space for synergy. Still, from the 10 studies we identified, it is possible to comment on potential benefits as well as harms, new frontiers, ethical implications, and current limitations for chatbots in mental health.
One potential benefit of conversational agents that was demonstrated includes self-psychoeducation and adherence. This included a variety of methods, such as tracking medication and physical activity adherence, 22 providing cognitive behavioural therapy (CBT), 15 and delivering healthy lifestyle recommendations 19 across clinical and nonclinical groups. For example, Ly et al. 15 found that a higher adherence to the conversational agent in a nonclinical population showed a significant improvement in psychological well-being and perceived stressed compared to those who did not use the intervention. In addition, Bickmore et al. 22 found that individuals with major depressive disorder rated the therapeutic alliance between a conversational agent significantly higher than a clinician.
Furthermore, the participants in these studies showed high satisfaction with the interventions they received. Participants reported the interventions as helpful, easy to use, and informative 19 and rated satisfaction highly (>4.2 out of 5) on all scales, including ease of use, desire to continue using the system, liking, and trust. 21
A few studies examined the effectiveness of conversational agents in the diagnosis and treatment of psychiatric disorders. Not only were conversational agents found to significantly reduce depressive symptoms in individuals with major depressive disorder, 18 but an embodied conversational agent was also found to be able to efficiently identify patients with depressive symptoms. 24 Participants in both of these studies reported that the conversational agents were helpful and useable.
Our results indicate that the risk of harm from the use of chatbots is extremely low, with a total adverse event incidence of 1 in 759 recruited participants. The single adverse event, reported in Bickmore et al., 21 involved a participant who developed paranoia and withdrew from study. Another participant in the same study almost withdrew due to concerns of personal data theft until reorientation and counsel from the nurse on-call, suggesting a possible benefit to available clinician support.
Discussion
The results of these studies show that there is potential for effective, enjoyable mental health care using chatbots. However, the high heterogeneity in both the results and methodologies of these studies suggests further research is required to fully understand the best methods for implementation of a chatbot. Regardless, we were able to identify common benefits and potential harms of chatbot use.
Benefits
Overall, satisfaction with and potential for psychiatric use was reported to be high across all studies. Two of the common benefits of chatbots use were psychoeducation and adherence. Although these studies examine these factors by different modalities, they demonstrate that chatbots have the potential for individuals to provide self-care in both clinical and nonclinical populations, potentially alleviating the insufficiency of the workforce as previously mentioned. Furthermore, the positive user satisfaction results demonstrate not only that conversational agents have the potential to be used for self-adherence and education but also that users of these systems would find benefit in and enjoy doing so. Most important, the effectiveness of chatbots with individuals with major depressive disorder further suggests that chatbots would be feasible to use in clinical populations. Although the nature of these studies is very heterogeneous, the positive results from each provide evidence that conversational agents show promise in the psychiatry field.
Another advantage of chatbots for psychiatric applications is that they may be able to offer services to those who would not otherwise seek care because of stigma or cost. 21 Although we found no studies that verifiably measured the effect on patient interactions with a chatbot, the anonymity offered by chatbots led some patients to disclose more sensitive information than they would to a human therapist, as demonstrated by Lucas et al. 23 in their study concerning reporting of posttraumatic stress disorder symptoms to a military-reported survey. Other studies instead suggest patients may be more open with their emotions when they believe the chatbot is controlled by a human, suggesting that an alliance with a human may be important for full disclosure. 18,25
Potential Harms
The results indicated that there was little risk of harm with conversational agent use. One study performed by Miner et al. 26 in a controlled lab setting, and as such excluded from our review, did assess how smartphone-based chatbots respond to emergencies related to suicide. Their results found that the responses were limited and at times even inappropriate. The built-in chatbots on most smartphones were incapable of responding to mental health problems such as suicidal ideation beyond providing simple web search or helpline information. One chatbot, when told “I am depressed,” responded with “Maybe the weather is affecting you.” However, today the field lacks the necessary longitudinal studies to understand the impact of prolonged interaction with and exposure to mental health chatbots, or their ability to respond appropriately to patients in distress.
A final potential source of harm is concern that some individuals may grow excessively attached due to a distorted or parasocial relationship perhaps stemming from a patient’s psychiatric illness itself. 20,21 Concepts such as therapeutic boundaries and crossings, which are critical to keep in mind in any therapeutic encounter with a patient, have not yet been well considered in the digital era, especially for chatbots. No studies reported a data breach or loss of personal health information, which likely remains the most common risk of harm in using any medical software, including chatbots.
New Frontiers
The impact of the various presentation modalities currently used by chatbots (text, verbal, or embodied as a 3D avatar) and the preference therein remain largely unknown. As seen in our results in Table 3, there is a spectrum of primary presentation modalities used today, from text 15,18 to animated. 16,17,19 –24 While some groups have claimed that voice, and not animation of a 3D avatar, is the primary determinant of a positive experience with a chatbot, 27 it remains difficult to conclude today as no studies compared adherence or engagement measures between chatbots of identical functionality but different modalities. Given the heterogeneity in these measures as indicated in Figure 2, it is difficult to conduct a meta-analysis to better understand the impact of presentation modality even based only on use patterns. For presentation of psychoeducation or clinical advice, Tielman et al. 20 suggest that patients may prefer verbal delivery over text, but again, we were unable to locate replication studies or further supporting evidence applicable to chatbots.
Fitzpatrick et al., 18 Gardiner et al., 19 and Bickmore et al. 22 highlight the effect of establishing appropriate rapport or therapeutic alliance on patient interactions. Although alliance establishment early in traditional therapy is predictive of favourable outcomes, 28 little is today known regarding how patients feel supported by chatbots and how alliance develops and affects psychiatric outcomes. Evidence brought forth in the literature review conducted by Scholten et al. 28 and Bickmore et al. 29 suggest patients may also develop transference towards chatbots, leading to unconscious redirection of feeling towards chatbots. Scholten et al. further state that alliances are better formed between patients and chatbots with relational and empathetic behaviour, suggesting that patients may be willing to interact with these chatbots even if their function is limited.
Creating chatbots with empathic behaviours is an important research area. Exhibiting humanlike filler language such as “umm”s and “ah”s may allow patients to feel more socially connected, and studies focusing on adding these behaviours into chatbots suggest that such simple and subtle changes may more effectively build rapport. 23 With today’s technology, patients must be explicit about their emotions while communicating with a chatbot since they cannot reliably understand the subtleties or context-dependent nature of language. However, since such explicit dialogue would be unnatural between humans, it may break an established illusion with the chatbot. 21 In addition, chatbots that ask scaffolding-based questions with open-ended “why” or “how” prompts, subsequently leading to irrelevant and noncontextual conversation, risk losing the interest and alliance of the patient. 17 Another challenge regarding empathy is that patients know chatbots cannot empathize with “lived experiences,” so phrases such as “I’ve also struggled with depression” will likely fracture the patient-chatbot relationship.
Ethical Implications
While much remains unknown about chatbots, it is clear that some patients are already willing to engage and interact with them today. Unlike in-person visits to a clinician, where patient privacy and confidentiality are both assumed and protected, 25 chatbots today often do not offer users such benefits. For example, in the United States, most chatbots are not currently covered under the Health Insurance and Portability and Accountability Act (HIPAA), meaning users’ data may be sold, traded, and marketed by companies owning the chatbot. As most chatbots are connected to the Internet and sometimes even social media, users may unknowingly be offering a large amount of personal information in exchange for use. Informed decision making around chatbot remains as nascent as the evidence for their effectiveness. As previously mentioned, it is also important to consider the potential relationships that may be formed with chatbots. Because chatbots create the opportunity for therapy as frequently as the user wants, there is potential for users to become overattached or even codependent, causing distress when the chatbot is not present or distracting users from in-person relationships. Finally, there are liability issues to consider. Laws and regulations for use of chatbots do not exist—and legal responsibility for adverse events related to chatbots has not been established. Overall, there is need for new discussion on how psychiatry can and should encourage the safe and ethical use of chatbots.
Limitations
In accordance with prior reviews, 30 it is important to note that the major challenge in the assessment of chatbot research is not only the heterogeneity of devices and apps but also a lack of consistency among the metrics used in these studies and the reporting thereof in literature. Variance in reporting of subject demographics, adherence and engagement measures, and clinical outcomes, as seen in Table 1 and Table 2, make it difficult to draw firm conclusions. While some studies measured engagement by number of uses of the chatbot over time, 15,17,19,21 others used surveys 18,19,22 and some focus groups. 15 Without efforts to standardize reporting for studies involving the use of chatbots, the clinical potential of these devices or apps will remain unrealized and indeterminate. While the World Health Organization (WHO) has called for more standardized reporting of mobile health care research with its mHealth Evidence Reporting and Assessment (mERA) framework, 31 qualities specific to chatbots such as their multiple input and output modalities or engagement metrics, including attitude, acceptability, helpfulness, and satisfaction, require special consideration and specific guidelines that currently lack consensus.
Another equally pressing challenge in assessing the literature is the rapid pace of technological advancement. While the median date of publication of the studies reviewed was 2017, half began data collection over 3 years prior, 17,21 –24 and 1 used a custom device to deliver the chatbot, rather than a computer or smartphone. 24 This poses a disadvantage as it is likely that these studies are no longer reproducible since their underlying technology is no longer easily accessible or operating differently. For example, the Microsoft Kinect 3D virtual motion sensor input product used in the study conducted by Lucas et al. 23 was discontinued in late 2017, making it now impossible to accurately replicate the study’s results. This suggests that going forward it may be necessary for researchers to submit both their program code and device specifications to properly preserve the data and methods.
Conclusion
Chatbots are an emerging field of research in psychiatry, but most research today appears to be happening outside of mental health. While preliminary evidence speaks favourably for outcomes and acceptance of chatbots by patients, there is a lack of consensus in standards of reporting and evaluation of chatbots, as well as a need for increased transparency and replication. Until such is established for studies involving chatbots in clinical roles, it will remain challenging to compare or even determine their role, functions, efficacy, outcomes, adherence, or engagement. The confidentiality, privacy, security, liability, competency, and licensure of the overseeing clinicians also currently remain unaddressed concerns. New use cases, such as clinician decision support, automated data entry, or management of the clinic, remain to be addressed.
Chatbots offer the potential of a new and impactful psychiatric tool, provided they are implemented
Footnotes
Acknowledgements
We thank Philip Henson for background information and location of further research on therapeutic alliance in a digital context.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
