Abstract
Background
In recent times, digital mental health interventions (DMHIs) have been proven to be efficacious; however, most are available only for English speakers, leaving limited options for non-English languages like Spanish. Research shows that mental health services in one's dominant language show better outcomes. Conversational agents (CAs) offer promise in supporting mental health in non-English populations. This study compared a culturally adapted version of an artificial intelligence (AI)-led mental health app, called Wysa, in Spanish and English.
Objectives
To compare user engagement patterns on Wysa-Spanish and Wysa-English and to understand expressions of distress and preferred language in both versions of Wysa.
Methods
We adopted a cross-sectional retrospective exploratory design with mixed methods, analyzing users from 10 Spanish-speaking countries between 1 February and 1 August 2022. A quantitative sample A (n = 2767) was used for descriptive statistics, including user engagement metrics with a Wilcoxon test. A subset qualitative sample B (n = 338) was examined for word count differences based on valence, and a content analysis was conducted to examine idioms of distress.
Results
Compared to Wysa-English, Wysa-Spanish had more sessions (P < .001, d = 0.18) and a greater volume of disclosure of distress. In Wysa-Spanish, the average length of a conversation was significantly longer than in Wysa-English (P < .001, d = 0.44). Users preferred interventions with free text responses (“Thought recording”) in Spanish (P < .01, d = 0.41), and Spanish messages were significantly longer (P < .01, d = 0.24). Wysa-Spanish saw more frequent expressions of negative emotions and feelings of self-harm and suicide.
Conclusion
Given the high engagement within the Spanish version of Wysa, the findings demonstrate the need for culturally adapted DMHIs among non-English populations, emphasizing the importance of considering linguistic and cultural differences in the development of DMHIs to improve accessibility for diverse populations.
Keywords
Introduction
Ensuring an adequate and diverse set of competencies in mental health care provision is crucial in effectively serving the needs of a multicultural and global population. 1 However, historically, the mental health system has not effectively addressed the needs of culturally diverse populations,2,3 resulting in significant racial and ethnic disparities in service utilization and outcomes. 4 For instance, individuals with limited English proficiency have experienced difficulties in identifying their mental health needs and face longer durations of untreated mental illness.5,6 Moreover, Latinos were found to underutilize mental health services and less likely to receive equivalent care in the United States. 7 Common barriers to seeking mental health interventions, such as financial burdens, stigma, privacy concerns, and time constraints, persist across cultures. 8 However, strategies to overcome these barriers have been slower to implement in non-English speaking populations. 9 This contributes to cultural disparities in mental health services.
To design culturally sensitive mental health interventions, numerous studies exhibit the importance of a language match.10,11 Sessions in a client's dominant language have more positive therapeutic outcomes10,11 and also keep clients in treatment longer.12,13 They also demonstrated efficacy that was two times higher than interventions conducted in any other language.14,15 Furthermore, culturally responsive care can also serve to strengthen the therapeutic alliance necessary to increase treatment adherence.16,17
In recent years, digital mental health interventions (DMHIs) have attempted to address these gaps in mental health care. 8 DMHIs can include text- and video/audio-based interventions provided digitally either synchronously or asynchronously. These DMHIs increase accessibility while also addressing user concerns about costs, stigma, and privacy. Furthermore, DMHIs can be more readily adapted to various cultural and language contexts to cater to a diverse population. 18
In the past, non-English populations have shown openness toward DMHIs. 19 Latino populations were 20% more likely to use health apps than their Caucasian counterparts. 20 In addition, the utilization of DMHIs alongside group cognitive behavioral therapy (CBT) sessions for depression was associated with increased engagement and improved outcomes among Spanish users. 21 However, despite the openness to DMHIs, it has been found that many digital health apps have poor retention rates.22,23 One possible explanation for this lower usage is the unavailability of DMHIs in preferred languages.24,25
An alternative method for addressing this challenge involves the utilization of an artificial intelligence (AI)-powered conversational agent (CA) that emulates human conversation with users. 26 Numerous interventions utilizing CAs have demonstrated significant advancements in addressing prevalent mental health issues, including depression, psychological distress, anxiety, and feelings of isolation.27,28 The literature exploring the usage of CAs and DMHI in non-English speaking countries is promising, highlighting usability and acceptability. High engagement and positive feedback were observed in Tess, a spanish AI Chatbot for anxiety and depression. 29 Additionally, Spanish mental health chatbots addressing gaps in healthcare during the COVID-19 pandemic showed the highest engagement compared to other chatbots. 30 ChatPal, a multilingual chatbot accessible in English, Scottish Gaelic, Swedish, and Finnish, demonstrated non-significant improvements in well-being scores among highly engaged users, who were notably younger. 31 While there are many DMHIs available, only a few have published research or evidence to prove efficacy.32,33 Additionally, existing studies in this area were mostly pilot studies or have predominantly focused on non-English users’ interactions without comparing them to English versions of the same app. Comparing the differences in engagement, language, and disclosure in Spanish and English is one method to better understand cultural and language comfort in interacting with DMHIs.
This study employs a cross-sectional retrospective exploratory design with a mixed-methods approach to understand usage patterns and preferences between two versions of Wysa, that is, one in English and one in Spanish. To that end, the study aims to compare user engagement patterns and understand expressions of distress and preference of language in the Spanish and English versions of the Wysa app.
Methods
About the intervention
The Wysa app is supported by an AI-enabled CA that provides support based on the principles of cognitive behavioral therapy (CBT), dialectical behavioral therapy (DBT), and mindfulness-based techniques among others in English. It is publicly available through the Google Play Store and Apple App Store for free. The app has been available globally in English since 2017. It has the capacity to talk freely with the CA, equipped with models developed using natural language processing (NLP) and natural language understanding (NLU) that traverse the rule-based conversation logic. The app also categorizes specific interventions into need-based categories, which the user can access as part of self-management or through the suggestions of the CA. The app has an in-built ability to signpost users who mention thoughts indicating risk to self or others. The focus of this paper is on the AI-enabled CA and its self-care interventions.
In 2022, a version of the app was released in Spanish. This app was developed with the support of clinicians from Latin American countries and volunteers whose dominant language was Spanish and who were trained in psychology. Although most Latin American countries speak Spanish, each country has its own idioms and uses its own terms for certain objects. Therefore, it was decided that “neutral Spanish,” a form of Spanish that tends to reduce idioms, be used along with terms that are understood in all Spanish-speaking countries. 34 A total of 19 interventions comprising a subset of all interventions available in English were translated from Wysa-English to Wysa-Spanish. Users could opt to use these 19 interventions in either language. See Appendix for the list of interventions and more details about how the Spanish version was built.
Study design
On 1 February 2022, the Spanish version was made available to existing Wysa users (Figure 1) in 10 countries across Latin America and Europe: Argentina, Costa Rica, Ecuador, Mexico, Puerto Rico, Uruguay, Peru, Colombia, Chile, and Spain. These users had been using Wysa in English prior to the release of the Spanish version. Users could choose to engage with Wysa in English or Spanish. The eligibility (inclusion and exclusion) criteria for this study were users who downloaded the app during the study period from 1 February 2022 to 1 August 2022 (n = 2767) and resided in the 10 countries mentioned above.

Introducing users to Wysa in Spanish and example conversation flows. (A) Welcome screen to Wysa Spanish. (B) Example of an ongoing Wysa-Spanish conversation. (C) Example of an ongoing Wysa-English conversation.
Application programming interface
The conversations allow users to type in their responses following a “free-text” approach to engagement. The free-text input enables flexibility and genuineness in the conversation and allows the user to speak their mind. The app uses natural language processing, natural language understanding, and machine learning to interpret the user's messages. It uses a rule-based system where every response is pre-written by clinicians. 35 Wysa focuses on comprehending user input through classifications, where content is structured to align with these classifications, influencing the conversational flow. In the case of Spanish interactions, content is translated at the backend. Figure 2 shows the development of Wysa in Spanish. To assess safety and performance, we employ a metric known as appropriateness to measure the percentage of user messages to which Wysa responded appropriately, empathetically, and safely. Both the English and Spanish versions of the app displayed similar appropriateness in responses with 90%+ in translation appropriateness and model appropriateness.

Outline of the development process (left) and the final processing flow (right). “User message sent” indicates the initiation of the conversational flow with the user sending a message. “Aptness” refers to as appropriateness. It measures the percentage of user messages to which Wysa responded appropriately, empathetically, and safely.
Data collection and processing
All the data for this study were collected automatically via the app's usage log. Under the app's usage logs, anonymized data on user interactions with the CA, such as the start and end dates and times of a session, the number of sessions, emotions expressed during the chat, interventions utilized during the session, and limited conversation snippets were recorded. The quantitative data for this study were related to usage statistics with the CA and intervention utilization. Only interventions that were present in both versions were considered to avoid any discrepancies. App sessions were only counted toward engagement if a user completed a conversation with the CA or an intervention within the app. Passive events, such as opening the app or an element within the app, were not counted. A session is characterized as a singular interaction between a user and the CA, encompassing the period from when a user initiates a conversation with the CA to the point where they conclude the chat by closing the CA screen on the app.
The qualitative data were retrieved in the form of conversational snippets based on predetermined required keywords (emotion words). Emotion words were captured through free text from conversations with the CA. Any personal identifying information (PII) was removed from these messages using Wysa's proprietary PII detection and redaction algorithm. Spanish messages and their English translations were included in the data.
Ethics
This study analyzed real-world data from an anonymous non-clinical population; hence, it was exempt from registration in a public trial registry. 36 The users voluntarily downloaded Wysa and consented to the app's terms of service and privacy policy.37,38 These policies ensure that only minimal, required, and aggregated data are used for research purposes. These data are anonymized using irreversible redaction, and no PII is collected during app use, and any inadvertently shared PII is redacted. For ethical and privacy reasons, the authors did not have access to all the user conversations with the CA. Only anonymized, minimal, and limited conversation data were used from specific AI endpoints. Informed user consent for research was taken when the user downloaded the app along with data storage and usage. To opt out of data usage for research purposes, one could write to the developers on a provided email address.
Measures
App sessions were only counted toward engagement if a user completed a conversation or exercise with the CA. Passive events, such as opening the app or an element within the app, were not counted.
For the comparison of engagement in objective 1, the engagement metrics are (a) the number of sessions a user completed; (b) the average length of a conversation with the CA (avg. number of messages exchanged per session); (c) the number of users who preferentially used Wysa-Spanish or Wysa-English; (d) interventions utilized; (e) emotions expressed by valence; (f) average word count per user; and (g) recurring engagement period (in days) for users using interventions in both Wysa-English and Wysa-Spanish. Emotion words were categorized as positive, negative, or neutral based on the valence of the emotion expressed by the user. For instance, the emotion word ‘happy’ is categorized as positive. Recurring engagement periods for interventions in both English and Spanish were calculated as the difference between the first and last days for users whose last interaction with the interventions occurred 1 month prior to the conclusion of the study period (i.e. before 1 July 2022). This approach was adopted to ensure that users who had their final session in the days preceding the end of the study period were not prematurely classified as drop-offs as they may have continued using the app beyond that point. Recurring engagement period was also additionally calculated for users who used the interventions more than once (repeat users). For objective 2, conversational snippets were analyzed. Spanish messages and their English translations were included in the data.
Sample size
The study was based on two datasets, that is, one quantitative and one qualitative, examining users that used a similar set of interventions (referred to as overlapping interventions) in both Wysa-English and Wysa-Spanish. The quantitative dataset included a total of 2767 users (NA = 2767) who had access to both languages in the app, forming Sample A. All 2767 users used the interventions in Wysa-Spanish (nA-es = 2767), and 2479 of these users used interventions in Wysa-English (nA-en = 2479) during the study period.
The qualitative dataset termed as sample B (nB = 338) was focused on users who used at least one emotion word in the selected conversations, all of whom used at least one emotion word in Wysa-Spanish (nB-es = 338), and 59 used at least one emotion word in Wysa-English (nB-en = 59).
Analysis
Objective 1: Comparison of engagement patterns between Wysa-English and Wysa-Spanish
To examine users’ engagement, Sample A of NA = 2767 users (nA-en = 2479; nA-es = 2767) was considered to analyze the following: i) the number of sessions a user completed; ii) the average length of a conversation with the CA (avg. number of messages exchanged per session); iii) interventions utilized; and iv) recurring engagement period (in days) for interventions in both Wysa-English and Wysa-Spanish
To analyze emotion words, Sample B of nB = 338 (nB-en = 59; nB-es = 338) was used (Figure 3). To compare both versions, non-parametric tests were used as the data were not normally distributed. A Wilcoxon signed-rank test (V) was used to evaluate the mean differences between the number of sessions, recurring engagement period, and intervention usage. Cohen's d was used to determine effect sizes.
Overlapping users from Sample B (nB-en-es = 59) were used to analyze the word count. Statistical differences in word count were compared using Wilcoxon signed-rank tests with continuity corrections. Moreover, to account for any syntactical differences, translated Spanish message and English message word counts were also compared using a Wilcoxon signed-rank. Lastly, sample B was once again used to perform a two-proportion Z-test to examine the word count differences in negative valence expressions due to language.
Objective 2: Analyzing expressions of distress and language preference in Wysa-Spanish and Wysa-English
A content analysis was conducted to examine the expressions of distress and choice of language by Wysa Spanish and English users. An inductive approach was employed, where codes were identified based on conversational snippets from Sample B of 338 users (nB-en = 59; nB-es = 338). 39 The data were further categorized based on the emotion expressed. Phrases and idioms from the messages capturing distress were picked out by researchers proficient in the language they were analyzing. This method was used to evaluate if certain participants expressed more nuance in emotion when conversing in their dominant language. Only session messages that had a negative valence were selected for this content analysis. Messages referring to risk to self or others were signposted to helplines within the CA.
Results
Objective 1: Comparison of engagement patterns between Wysa-English and Wysa-Spanish
During the study period, users were found to use a significantly higher number of sessions on Wysa-Spanish compared to Wysa-English using the Wilcoxon test (Table 1), with 78.8% of all users using an equal or higher number of Spanish sessions. The total number of messages exchanged in Wysa-Spanish was 65,780, while that in Wysa-English was 56,569 (Table 1).
User engagement for overlapping interventions of Wysa-English and Wysa-Spanish.
anA-en = 2,479, nA-es = 2767; engagementb = average session per user; average length of conversationc = average number of messages exchanged per user per session; repeat usersd = users with more than 1 day on the app; V = Wilcoxon signed-rank test; W = Wilcoxon rank-sum test.
The most commonly used overlapping interventions are represented in Figure 4. Thought recording was the most utilized intervention between both versions and used significantly more in Wysa-Spanish (M = 1.17, SD = 0.40) than in Wysa-English (M = 1.02, SD = 0.13) (W = 3830, P = .007, d = 0.41).

Sample size allocation and associated tests for the study.

Utilization of common interventions in Wysa-English and Wysa-Spanish users.
Emotion words
There were more types of emotion words observed in Wysa-English than in Wysa-Spanish (80 emotion words were tagged in English, as opposed to 22 in Spanish). However, Wysa-Spanish saw a greater volume of emotion words and a larger number of negative emotion words (Table 2). Free-text words, such as “okay” and “bored,” were categorized as neutral; “hopeless” and “bad” were categorized as negative; and “good” and “calm” were categorized as positive emotion words.
Emotion word frequency categorized by valence from user messages in a subset of Wysa-English and Wysa-Spanish users.
Word count
Results show a significantly higher word count on Wysa-Spanish in comparison to Wysa-English (V = 460, P < .01, d = 0.24). On average, Wysa-Spanish users had a relatively larger mean and a median word count (Table 3). Furthermore, Spanish messages translated to English were significantly longer in word count compared to English messages (V = 415.5, P < .01, d = 0.25).
Word count characteristics categorized by language from user messages in a subset of Wysa-English and Wysa-Spanish users.
Results of differences in word count based on language and emotion valence found that users using the Spanish app had a significantly greater word count when expressing negative emotions (92%) than users using the English app to express negative emotions (76%) (χ2 = 36.43, P < .001).
Objective 2: Understand expressions of distress and language preference in Wysa-English and Wysa-Spanish
Through the content analysis of negative messages, four key themes emerged from the data: 1) anxiety and stress; 2) hopelessness and sadness; 3) self-harm and suicide; and 4) anger and frustration. A certain amount of overlap between these themes was observed, indicating the interconnectedness between the themes. Table 4 lists the number of instances and common phrases expressed under each key theme in Wysa-Spanish and Wysa-English. See appendix for the most commonly used phrases in English and Spanish.
Total number of messages that fall into each theme identified from a subset of Wysa-Spanish and Wysa-English users (nB-en-es = 59).
Language preference
Lastly, the qualitative data gave insight into users expressing their inability to effectively use the English version of Wysa. In the English version of Wysa, some users responded in Spanish, “No entiendo inglés” (I do not understand English), “No te entiendo…el idioma en Español se te entiende mejor” (I don't understand you, the Spanish language is better understood), and “No te entendí ni verga, no se ingles” (I don't understand you, I don't know English”). Some (n = 3) users spoke to the English version of Wysa in Spanish presumably because they preferred to communicate in Spanish instead. These users in the Spanish version of the app were able to clearly communicate with the CA.
Discussion
The present study utilized a mixed-methods approach and a cross-sectional retrospective exploratory design to evaluate differences in engagement and expressions of distress in the English and Spanish versions of the Wysa app. The findings demonstrate that users exhibited a higher engagement and a preference to express distress in a dominant language. Our results align with previous literature, indicating a lower disclosure of psychological distress among non-English-speaking populations using mental health apps in English. 14 This could be attributed to the ability to express oneself better in one's dominant language, along with greater familiarity with expressing oneself with phrases and cultural descriptions of one's feelings and emotions. 40
The high engagement and continued usage of Wysa-Spanish after the first Spanish session could be attributed to the acceptability of the cultural and linguistic adaptation of the Spanish CA. Few studies have demonstrated Spanish-speaking populations’ openness to using mental health apps 20 and showed significant levels of engagement with CAs in non-English languages.29,30,41 Higher levels of personalization, including cultural and linguistic adaptations, are possible with AI-based chatbots,29,30,41 making them useful in mental health care services looking to provide targeted users with treatments that can be accessed from anywhere at any time. 42 Additionally, with appropriate crisis detection and escalation pathways, CAs can also be utilized as preventive resources for people without access to treatment or as an addition to conventional therapies. The comfort level that users experience with a CA can also be useful for disclosure, 43 especially when such disclosures were possible in one's dominant language.16,17
The thought recording intervention was utilized the most in the Spanish version of Wysa. This may be because it allowed for free-text input by users, and the preference for this intervention in Spanish could be due to the comfort of expressing oneself in one's dominant language.44–47 This may also explain the larger word count and length of conversation observed in Wysa-Spanish, with strong engagement results.29,30,41
Lastly, a few participants in this study also expressed difficulty in communicating their mental health issues in their non-dominant language. This is in line with previous literature, which indicates a higher ability to express one's distress and the intensity in one's dominant language.44–47 Our results further strengthen research studies that suggest that psychological interventions in one's dominant language could lead to a higher engagement, which may influence outcomes.16,17
Therefore, these studies indicate the usability and acceptability of CAs in digital mental health among non-English speaking populations.
Future directions and research implications
The higher engagement observed on the Spanish version of Wysa highlights the need for improving accessibility and engagement for diverse populations and potentially reaching out to underserved communities. The findings also underscore the importance of cultural sensitivity and tailoring interventions to specific populations to improve engagement. The results also highlight a need for DMHIs in one's dominant language, as observed with users who stated that they did not understand English, but proceeded to communicate their distress effectively in Spanish. Moreover, this will allow DMHIs to capture the depth of the emotion and distress expressed by first-language Spanish speakers, aiding in a stronger working alliance.
Despite continued usage of the Spanish version of the app, the data for the study are limited to a year. Future studies should examine real-world evidence longitudinally on culturally adapted apps. Currently, there are a limited number of apps that are translated or culturally adapted into non-English languages. 44 Despite increasing interest in culturally adapted apps to address mental health, the current marketplace is not meeting the demand. Therefore, additional efforts are warranted to provide appropriate translations and culturally adapted versions of these interventions and techniques. Exploring usage patterns across demographics, such as geographical region, age, or gender, can add additional nuance for creating customized interventions. Additionally, it would be beneficial to collect qualitative feedback from users to understand their experiences in any culturally adapted features of an app. Lastly, while some studies indicated better therapeutic outcomes in one's dominant language, 48 future research should also examine if a higher engagement can improve the therapeutic outcomes with culturally adapted mental health CAs.
Strengths and limitations
To our knowledge, this study is the first of its kind that showed the benefits of using a chatbot in a non-English dominant language in comparison to English. It adds to the growing literature around creating culture- and language-specific DMHIs. Moreover, the mixed-methods approach of the study facilitated increased strength and comprehension to the results through triangulation of the data obtained through qualitative and quantitative approaches. Furthermore, by using qualitative data, we were effectively able to assess user satisfaction and causes of dissatisfaction concerning app usage in both languages. The study, being based on populations across 10 Latino countries, lends further generalizability to the findings.
The study also has some limitations. Since the Spanish app was recently launched, there were only a limited number of interventions translated in comparison to the English version. To address this, we only analyzed interventions that were available in both English and Spanish. However, the results on usability and engagement may have been skewed as the impact of the non-overlapping interventions is not defined. Another possible limitation is that we only analyzed qualitative conversational snippets based on a keyword search, which may introduce a sampling bias. Relying on predetermined keywords may overlook important themes or topics that are not included in the search criteria. This may only provide a partial view of the overall interaction as important context and nuance may have been missed. Moreover, “neutral Spanish” was used as the primary language for the Spanish app, and although it had the advantage of being able to cover more users, it also could have a limited personalization of the language for the users. Additionally, due to the anonymity of data, insight into demographics and personalization is limited.
Some limitations of culturally adapting existing DMHIs include challenges related to language translation. The Spanish app included gender nuances in nouns and adjectives. The initial goal was neutrality, but achieving this balance between formal and personalized tones proved challenging. Subsequent refinements, including user gender preferences, aimed at enhancing personalization. Spanish interventions initially translated directly underwent multiple stages, adapting expressions for universal understanding. Training data organization involved categorizing diverse user inputs representing various Spanish-speaking countries.
Conclusion
Existing literature highlights the gaps in accessible mental health services and the need to incorporate culturally and linguistically conscious DMHIs. Our comparison of the English and Spanish versions of the app indicates that there is a preference for using DMHIs in the primary language by users from Spanish-speaking countries, illustrating significant differences in engagement and disclosure of psychological distress.
Footnotes
Acknowledgements
We would like to thank Sophie Mizrahi, María Jimena Campos, Meheli Saha, Tanya Malik, Siddharth Thakeria, and Anand Gupta for their assistance and guidance in this research.
Author Contributions
CS and DND designed the study. DND and NR researched literature, conceived the methodology of analyses, and performed data analysis. DND and NR wrote the various drafts of the manuscript. CS reviewed the data analysis and manuscript. CS finalized the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. DND, NR, and CS are employees of Wysa Inc., and CS owns equity in the company. The authors obtained the requisite permission to include the brand name.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by Wysa Inc.
Guarantor
CS
