Abstract
Objective
This systematic review examines how artificial intelligence (AI), including machine learning (ML) models and AI-powered chatbots, contributes to the diagnosis, treatment and ethical governance of mental healthcare. It explores how AI-driven systems form interconnected healthcare networks that enhance accessibility, personalization and resilience of mental health services, aligning with the United Nations Sustainable Development Goal 3: Good Health and Well-Being.
Methods
A comprehensive search across PubMed, IEEE Xplore and Google Scholar (2017–2024) was conducted using Boolean combinations of “AI,” “machine learning,” “chatbots” and “mental health.” Screening followed Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 guidelines, yielding 37 high-quality studies for qualitative synthesis. Extracted data were categorized into three domains: (1) AI- and ML-based diagnostic models, (2) chatbot-enabled mental health support systems and (3) ethical and privacy considerations. Analytical dimensions included algorithmic performance, clinical outcomes, data governance and equity of access.
Results
AI-driven interventions improved accessibility, diagnostic accuracy and therapeutic personalization. Chatbots such as Woebot, Wysa and Tess effectively reduced symptoms of depression and anxiety, increased user engagement and provided scalable support, particularly during the COVID-19 pandemic. ML models, including MentalBERT, MentalRoBERTa and SR-BERT, achieved F1 scores of 68–93% in mental health classification tasks. However, limitations included dataset bias, lack of longitudinal evidence and limited cross-cultural generalizability. Ethical analyses revealed persistent challenges concerning privacy, informed consent, algorithmic bias and accountability.
Conclusion
AI technologies, when integrated with human oversight, offer transformative potential for global mental health systems by creating interconnected and adaptive care networks. These technologies can enhance efficiency, reduce barriers to care and support data-driven public health strategies. However, successful deployment depends on clear ethical frameworks that promote transparency, respect cultural contexts and preserve human oversight. Future research should prioritize longitudinal studies, inclusive datasets and ethical frameworks that maintain human-centered values in AI-enabled mental health systems.
Keywords
Introduction
Individual well-being is foundational to mental health and directly influences broader social health. As the complexity and urgency of mental health challenges grow, especially in the wake of rising global demand, it becomes crucial to rely on structured, evidence-based approaches to guide innovation. Systematic reviews serve this role by consolidating fragmented knowledge and providing a clear synthesis of trends, gaps and limitations in the literature. This is particularly important in rapidly evolving domains like artificial intelligence (AI)-driven mental healthcare, where the pace of technological advancement often outstrips regulatory and clinical integration. Through transparent and methodical analysis, systematic reviews offer valuable insights that support both clinical practice and policy development, ensuring that emerging AI applications align with ethical standards and patient-centered care. Mental health disorders, meanwhile, continue to impair daily functioning, personal relationships and quality of life, placing immense strain on individuals and communities. Despite the existence of effective treatments, access to quality care remains limited due to stigma, financial barriers, resource shortages and a general reluctance to seek help. These systemic issues underscore the urgent need for scalable, innovative solutions, a gap that AI technologies are increasingly being positioned to fill. These challenges call for connected approaches to mental healthcare that bridge geographic and social divides. Our review contributes to this global agenda, aligning with Sustainable Development Goal (SDG) 3 (Good Health and Well-Being) and its focus on improving mental health outcomes worldwide, particularly target 3.4, which aims to promote mental health and well-being. By exploring networked AI solutions, we address the critical need for innovative approaches to mental healthcare delivery that can scale across diverse populations and settings. Recent studies show that mental health disorders impact a large proportion of individuals across diverse age groups and ethnic backgrounds. 1 These disorders not only impact individual well-being but also impose substantial societal and economic burdens. 2 Factors such as scarce resources, geographical obstacles and negative perceptions surrounding mental healthcare have historically limited access to treatment. Consequently, many individuals do not receive a timely diagnosis or appropriate care. To overcome these challenges and provide timely and effective responses, it is imperative to explore innovative approaches and tactics.3,4
Recent global evidence underscores the escalating burden of mental health disorders. According to the World Health Organization (WHO, 2025), more than one billion people worldwide are currently living with mental health conditions such as anxiety and depression, which now represent the second leading cause of long-term disability and impose an annual economic cost of nearly US$1 trillion. Similarly, findings from the Lancet Global Study 5 revealed that the COVID-19 pandemic led to a 27.6% increase in major depressive disorder and a 25.6% increase in anxiety disorders, with the greatest impacts observed among women and younger populations. Despite national progress in policy reform, mental health spending remains critically low, around 2% of total health budgets, and access to care is strikingly unequal. 6 This is particularly crucial given the persistent disparities in treatment coverage: while access in high-income regions may reach around one-third or more of affected individuals, it remains below 10% in low- and middle-income countries. According to a meta-regression by Moitra et al., 7 treatment coverage for depressive disorders ranged from ≈33% in high-income settings to ≈8% in lower-income ones, underscoring deep structural inequities. Consistent with these findings, WHO 6 reports that fewer than 10% of those needing mental health services in low-income settings receive care, whereas in high-income countries, the proportion can surpass 50%. These figures collectively emphasize the urgent need for equitable, scalable and AI-driven interventions to strengthen access, personalization and sustainability of global mental healthcare. 7
This review synthesizes evidence on how AI is currently used in mental healthcare, where it helps (diagnostics, triage, engagement) and where it falls short (bias, generalizability, long-term outcomes). With growing demands and persistent barriers such as limited access, diagnostic delays and resource shortages, AI tools have improved screening accuracy and follow-up rates in small trials, but effects are heterogeneous and context dependent. Through improved diagnostic accuracy, tailored therapeutic recommendations and data-informed resource optimization, AI holds the potential to augment clinical judgment, streamline mental health service delivery and diminish the stigma surrounding help-seeking behaviors in mental healthcare contexts.3,8 In times of crisis such as the COVID-19 pandemic, the role of AI becomes even more critical, presenting opportunities to shift from reactive emergency responses to building sustainable and resilient mental health systems.
To guide this exploration, Figure 1 presents the conceptual framework underpinning this systematic review. The framework reflects the integrated and interdisciplinary nature of AI applications in mental health, encompassing three core domains: (1) AI-powered chatbots that improve accessibility and patient engagement, (2) AI and machine learning (ML) models that assist in diagnosis and treatment planning and (3) ethical and privacy considerations that ensure responsible and equitable implementation. This structure mirrors the review's broader aim to assess the current landscape of AI interventions while identifying gaps, opportunities and ethical imperatives. The framework also provides a consistent lens through which the subsequent sections are organized and interpreted.

AI in mental health: ethical considerations, diagnostic models and support systems.
Prior reviews often isolate one strand (chatbots, diagnostic ML or ethics). We analyze these jointly to show where design, clinical validity and governance intersect, which brings together diverse strands of technological development, ethical analysis and system-level application. This review moves beyond merely cataloguing tools or outcomes; it critically evaluates how AI technologies interact with clinical, ethical and infrastructural dimensions of mental healthcare. In doing so, the paper provides a more robust foundation for both academic inquiry and policy-oriented discourse.
In the Methodology and Results sections, prior research is categorized and analyzed based on its focus, design and outcomes, highlighting recurring challenges such as data bias, limited scalability and ethical oversights. Furthermore, the Discussion section reflects on these insights to identify systemic gaps and propose forward-looking strategies for responsible AI integration. The Results section report what studies found; the Discussion section interprets why findings converge or diverge and what that means for deployment.
The study follows a structured approach, starting with an introduction, outlining the research methodology, analyzing results, engaging in discussion and concluding with key findings and future directions. This exploration asks three questions: where AI adds value, where evidence is thin and what standards are needed for safe scale-up. The primary objectives of this review are as follows: (1) to analyze current AI applications in mental healthcare and evaluate their effectiveness; (2) to identify the major challenges and limitations of AI technologies in mental health; (3) to explore ethical considerations and propose practical solutions for mitigating associated risks; and (4) to provide recommendations for future research and the integration of AI in mental health services, examining how AI-based interventions function within broader healthcare networks to create more resilient and accessible systems of care. By addressing these objectives, this paper aims to enhance understanding of AI's evolving role in mental health and to guide practitioners, researchers and policymakers in responsibly and effectively harnessing AI technologies.
Background and overview
The state of global mental health: A WHO perspective

The role of AI in mental health: A journey from ELIZA to modern applications
The development of AI has profoundly influenced mental healthcare, evolving from early conversational programs such as ELIZA in the 1960s 17 to highly sophisticated systems integrating natural language processing (NLP) and deep learning. Although AI's conceptual foundations in psychology date back several decades, its tangible impact on mental health practice became prominent during the 2010s, driven by advances in ML, big data analytics and affective computing. Recent applications, including conversational agents such as Woebot, Wysa and Tess, have demonstrated measurable efficacy in reducing symptoms of depression and anxiety and in expanding access to psychological support during and after the COVID-19 pandemic.1,3,18,19 Furthermore, transformer-based models such as MentalBERT and MentalRoBERTa have enabled nuanced emotion recognition and contextual understanding, marking a new phase in AI-assisted diagnosis and treatment.20,21 These advancements illustrate a paradigm shift from rule-based simulations to adaptive, data-driven systems capable of supporting scalable, evidence-based mental health interventions.
AI-powered chatbots: A new frontier in mental healthcare
Chatbots and related AI tools are reshaping mental health support by improving access and continuity of care. These smart-device-based techniques, together with the AI-powered tools, can provide 24/7 accessibility, allowing individuals to receive emotional support and guidance, even cognitive-behavioral therapy (CBT)-based exercises, regardless of their location or time of need. 4 By leveraging NLP and conversational AI, chatbots can engage in personalized interactions, potentially reaching individuals who might not have access to traditional mental health resources. 2 This research explores the current state of chatbots, highlighting their potential as crucial tools in the field of AI for mental health. The use of chatbots addresses the anxieties often associated with seeking professional help, providing a more approachable and accessible entry point for individuals experiencing mental health challenges. 16 These technologies offer a significant opportunity to bridge the gap in mental healthcare, particularly in areas with limited access to mental health professionals. 1
AI/ML for assessment and diagnosis: Current signals and limits
Recent studies suggest gains in symptom classification and risk flagging; however, external validity and longitudinal benefit remain uncertain. The prevalence of mental health disorders affecting approximately 25% of the global population, according to the WHO, underscores the urgency for improved assessment and treatment strategies. 22 AI and ML offer promising solutions by harnessing the ability to analyze vast datasets and identify patterns to enhance diagnostic accuracy and personalize treatment plans. 23 By integrating these technologies into mental health research, we can gain a deeper understanding of the underlying mechanisms of these disorders and develop more effective treatments, potentially improving the lives of millions facing mental health challenges. Researchers are actively exploring the use of AI/ML models to analyze various data sources, including text, speech and physiological signals, to provide more comprehensive and accurate assessments and discuss various applications of ML in mental health, including analyzing text and physiological data and assisting with predictive diagnostics. 24 These models can detect subtle indicators and trends often missed by human practitioners, leading to earlier and more precise diagnoses. Insel's paper outlines digital phenotyping's role in mental health, focusing on how AI can use behavioral and physiological data for precise assessment and early detection. 25 Additionally, AI/ML models can assist mental health professionals by identifying risk factors, predicting the onset of mental health issues and monitoring the progress of interventions. 26 Topol discusses the convergence of AI and medicine, including mental health, and describes how AI models enhance diagnostics, detect subtle trends and support clinical decision-making. 23
Balancing innovation and ethics: Navigating the ethical landscape of AI in mental health
AI technologies have clear potential to expand access to care, refine diagnosis and personalize treatment, yet their use in mental health requires ethical vigilance. However, the deployment of AI in mental health requires careful attention to ethical and privacy concerns to protect individual rights. The collection and use of sensitive data, including mental health records and behavioral information, demand robust privacy and security frameworks to safeguard personal information and ensure data ownership and control are maintained by users.27,28 Furthermore, algorithmic bias presents a critical challenge, as biased AI models can yield discriminatory or inaccurate diagnostic outcomes, thereby affecting the quality of care delivered. 29 Additionally, the role of AI in therapeutic relationships raises concerns about maintaining the essential human connection and trust between patients and clinicians, as AI-driven tools become more prevalent in patient interactions. 30 To harness AI's full potential responsibly, it is essential to balance its benefits with ethical considerations, promoting fairness, transparency and trust in therapeutic contexts. 23 Recent evidence emphasizes that the ethical use of AI in psychiatry depends on transparent data governance, bias mitigation and continuous human oversight to preserve empathy and accountability in digital interventions. Researchers further argue that integrating explainable AI and clinician co-design can ensure safe, trustworthy and equitable AI applications in mental healthcare. 31 Recent scholarship highlights that the ethical deployment of AI in mental health extends beyond technical accuracy to encompass deeper regulatory and relational challenges. Also, bias may emerge at multiple stages of AI development, from data collection and model training to clinical implementation, potentially reinforcing existing social inequities in mental healthcare. Furthermore, safeguarding user privacy and ensuring compliance with fiduciary duties within therapeutic relationships remain central to maintaining trust in AI-assisted interventions. Addressing these multidimensional ethical issues requires transparent governance mechanisms, inclusive regulatory oversight and the active involvement of clinicians in the co-design of AI-based mental health tools. 32
Methodology
This comprehensive review was conducted by systematically searching multiple academic databases, including PubMed, IEEE Xplore and Google Scholar. The review targeted research published between 2017 and 2024, focusing on the application of AI and ML in mental healthcare. This methodology aimed to ensure a robust synthesis of the latest advancements in AI-driven mental health interventions.
Search strategy and data sources
We employed specific keyword combinations such as “AI in mental health,” “machine learning in psychiatry,” “chatbots in therapy” and “AI ethical concerns in healthcare” to identify relevant literature. The inclusion of these diverse areas provides a comprehensive overview of the current landscape of AI applications in mental health, highlighting the advancements, challenges and future directions of this rapidly evolving field.
To enhance methodological transparency and reproducibility, Boolean search logic was applied across all databases. The search strategy employed combinations of keywords using the operators AND, OR and NOT in PubMed, IEEE Xplore and Google Scholar. Representative Boolean strings included the following: (“Artificial Intelligence” OR “Machine Learning”) AND (“Mental Health” OR “Psychiatry” OR “Psychological Support”) AND (“Chatbot” OR “Conversational Agent”) AND (“Ethics” OR “Privacy” OR “Bias”). This Boolean-based approach enabled the systematic identification of studies published between 2017 and 2024. All retrieved records were screened and filtered according to PRISMA 2020 guidelines to ensure methodological rigor, comprehensiveness and reproducibility of the review process.
Data extraction and analysis
Data were systematically extracted from the selected articles using a structured template, designed to capture critical information relevant to AI applications in mental health. The extraction process involved key parameters, including AI model types, datasets utilized, clinical outcomes achieved and ethical and privacy considerations encountered during implementation.
Figure 3 illustrates the multifaceted role of AI in mental healthcare, categorized into four primary dimensions: AI models, dataset sources, clinical outcomes and ethical and privacy considerations. It details the reviewed studies that employed a range of AI architectures, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), along with analytical tasks including sentiment analysis and emotion detection, to address diverse mental health objectives. The data sources leveraged by these AI models are diverse and include wearable devices, electronic health records (EHRs) and social media interactions, thereby facilitating comprehensive and individualized mental health support strategies. Clinical outcomes discussed in the figure underscore AI's potential to enhance the personalization of mental health treatments, decrease hospital readmissions and improve the responsiveness and effectiveness of crisis interventions. Ethical and privacy considerations also play a central role, particularly the need for transparent decision-making and strong anonymization protocols, proactive mitigation of algorithmic biases and safeguarding of user autonomy and control. Additionally, our analysis explicitly examined the network effects resulting from AI implementation in mental health contexts. We explored how AI technologies interface with existing healthcare systems, foster new interactions between healthcare providers and patients and potentially restructure the landscape of mental health service delivery. By analyzing these network dynamics, our study contributes insights into the transformative impact of AI, identifying how such technologies can enhance connectivity, facilitate coordinated care and optimize mental health outcomes at both individual and systemic levels.

Framework for AI applications in mental health: models, data sources, clinical outcomes and ethical considerations. Source: Authors.
The conceptual framework presented in Figure 3 was developed a priori to guide the design and execution of this systematic review. It integrates methodological principles derived from established evidence synthesis standards, including PRISMA 2020, CONSORT-AI and SPIRIT-AI guidelines, as well as theoretical perspectives from human–AI interaction and digital mental health implementation models. 23 This framework served as an organizing scaffold for identifying, categorizing and synthesizing literature across three core domains: AI-driven chatbots, diagnostic and predictive AI models and ethical or privacy-related considerations before data extraction. Thus, it was not generated inductively from the included studies but employed as a pre-defined analytical structure to ensure transparency, replicability and coherence in the review process.
Figure 4(a) illustrates the distribution of publications on AI in mental health from 2017 to 2024. The largest segment, 47.2% (shown in coral), highlights significant research interest in AI-driven mental health interventions, particularly chatbots and conversational agents used for support and therapy. The second-largest category, 30.6% (light gray), focuses on AI and ML models for the evaluation and diagnosis of mental health conditions, underscoring the expanding role of predictive technologies in mental healthcare. The remaining 22.2% (gray) addresses ethical and privacy concerns, emphasizing the growing attention to the ethical challenges and privacy considerations in the application of AI in mental health.

(a) Distribution of publications on AI in mental health (2017–2024), (b) evolution of research focus on AI in mental health: 2017–2024. Source: Authors' synthesis based on screened literature corpus.
Figure 4(b) illustrates the number of publications on each topic from 2017 to 2024, highlighting trends in AI research in mental health. The graph shows a steady rise in studies on AI/ML models for diagnosis, emphasizing their growing role in mental healthcare. Research on ethical and privacy issues has also increased, underscoring the need for responsible AI use. Additionally, studies on chatbots in mental health show gradual growth, reflecting interest in conversational AI as a scalable tool for mental health support. The research articles related to the three areas of AI in mental health (chatbots and AI in mental health, ethical and privacy considerations and AI/ML for diagnosis) can be approximated based on recent reviews and systematic analyses of the field. A scoping review identified 18 focused on AI chatbots for mental health, specifically addressing applications such as mental health support during the COVID-19 pandemic and interventions for depression and anxiety.
Figure 4(a) and (b) is presented as contextual visualizations summarizing the broader publication trends in AI and mental health research between 2017 and 2024. These figures are not limited to the studies included in the current systematic review but rather illustrate the overall research landscape and thematic evolution of the field. Figure 4(a) depicts the distribution of global publications by research domain (chatbots, diagnostic AI/ML models and ethical/privacy studies), while Figure 4(b) highlights the temporal growth of research outputs in each category.
While the present review synthesized 37 high-quality studies across three domains: AI chatbots, diagnostic ML models and ethical frameworks, a previous systematic review that focused exclusively on ethical aspects 33 included 51 studies. This difference reflects the narrower scope and search objective of that review, which targeted only ethical discussions across a wider range of health-related AI applications, whereas the present review adopts a cross-domain synthesis integrating ethical, technical and clinical dimensions within mental-health-specific contexts.
PRISMA flow diagram and systematic review process
Figure 5 presents the PRISMA flow diagram, outlining the systematic process for selecting relevant AI in mental health research articles from 2017 to 2024. An initial search identified 197 records, with duplicates and irrelevant studies removed during screening. After a detailed eligibility assessment, 37 articles met the inclusion criteria for the qualitative synthesis. This rigorous process, involving comprehensive database searches and full-text reviews, ensures the inclusion of only high-quality, relevant studies, enhancing the research's reliability and validity.

PRISMA diagram-search and screening process for relevant research on AI in mental health (2017–2024). Source: PRISMA 2020 flow diagram adapted by authors.
Titles and abstracts were independently screened by Z.R. and A.K. Full-text articles were subsequently reviewed by Z.R. and D.S. Any disagreements at either screening stage were resolved through discussion and consensus, with Y.M.B. acting as the adjudicator when necessary. This multistage screening process was conducted in accordance with PRISMA 2020 recommendations to enhance methodological transparency and rigor.
It is important to clarify that although 48 full-text articles initially met the inclusion criteria after title and abstract screening, 11 of these studies were later excluded during the methodological quality assessment stage. These studies received overall quality scores below the pre-specified threshold of 70%, based on AMSTAR-2, 34 CASP 35 or Cochrane RoB 36 evaluation frameworks. The excluded articles primarily lacked sufficient methodological transparency (e.g. missing sample size justification, unclear AI intervention description or absence of outcome validation). Consequently, 37 studies were retained for final qualitative synthesis and comparative analysis. This approach ensured that only methodologically robust and replicable evidence was included in the review.
To facilitate a clearer delineation between empirical findings and interpretive analysis, this paper separates the Results and Discussion sections. The Results section presents the synthesized data from the 37 reviewed articles, categorized into AI-powered chatbots, ML-based diagnostic models and ethical considerations, without interpretive commentary. Following this, the Discussion section provides a critical evaluation of these findings, addressing implications for network-based AI systems in mental health, limitations and future directions, emphasizing the role of interconnected healthcare ecosystems in enhancing accessibility and resilience.
To enhance methodological transparency, the inclusion and exclusion criteria applied in the PRISMA process were explicitly defined. Studies were eligible if they (1) were published between January 2017 and December 2024; (2) were written in English; (3) explicitly addressed AI, ML or chatbot-based interventions in the context of mental health diagnosis, therapy or ethical analysis; and (4) provided empirical data, systematic analysis or a reproducible framework. Excluded studies comprised those (a) written in languages other than English, (b) focusing solely on general AI applications outside mental health, (c) lacking methodological transparency or (d) classified as low-quality based on PRISMA-adapted quality assessment criteria, including incomplete data reporting, absence of validation or reproducibility and failure to provide ethical or methodological details sufficient for independent verification.
Results
This systematic review synthesizes 37 scientific articles and one practical project that explore the application of AI/ML in mental healthcare. The key themes identified include the use of AI in chatbots for mental health support, AI/ML models for diagnosis and evaluation and ethical considerations surrounding their implementation.
Chatbots and AI in mental health support
The integration of AI-powered chatbots in mental health interventions has demonstrated significant promise, as evidenced by systematic reviews and meta-analyses that assess their efficacy, design and user engagement. For instance, key factors influencing patient engagement with mental health chatbots include design elements such as personalized interactions, the use of color, ambient sound and music, which have been shown to enhance user experience and promote patient-centered care. 37 Empirical studies further indicate that AI-driven chatbots can effectively support mental health by reducing symptoms of depression and anxiety while improving overall psychological well-being, with meta-analyses revealing pooled effect sizes that suggest moderate to large improvements in depressive symptoms (Hedges’ g = 0.39) and distress (Hedges’ g = 0.33), though evidence quality remains weak due to limited randomized trials. 1 This pattern highlights a dominant trend where chatbots excel in scalable, on-demand support for common mental health issues, but their long-term efficacy requires more robust longitudinal data to address potential relapse risks in networked care systems.
Comprehensive scoping reviews and thematic analyses have provided further insights into how mental health chatbots are employed in therapeutic contexts, revealing a shift toward hybrid models that combine rule-based systems with emerging NLP techniques. 19 The deployment of mental health chatbots creates a unique technological network architecture that connects users, AI systems and occasionally human professionals in novel care delivery patterns, forming distributed systems capable of transcending geographical and temporal barriers. For instance, Abd-Alrazaq et al. reviewed 53 studies covering 41 distinct chatbots, noting their primary roles in treatment, education and screening, with the majority operating as rule-based systems featuring text-based interfaces or virtual avatars, predominantly targeting conditions such as depression and autism. 2 Their review also integrated patient perspectives from computer science and medical viewpoints, underscoring a pattern where rule-based chatbots dominate due to their reliability and ease of implementation, yet they often lack the nuanced empathy of advanced AI models. Such dominance implies opportunities for broader accessibility in underserved regions but raises concerns about generalizability across diverse mental health spectra, necessitating adaptive algorithms to enhance integration within global healthcare networks.
Another paper 19 specifically explores the impact of question types and presentation, such as using GIFs or friendly tones, on adolescents’ emotional responses and likelihood of engagement with a mental health chatbot. The study found that adolescents exhibited more positive emotional reactions to questions featuring GIFs and multiple-choice response formats, achieving higher engagement rates (up to 20% increase in response likelihood) compared to yes/no or open-ended questions, which elicited less favorable responses. Importantly, the presence of GIFs and a friendly tone played a role in boosting engagement, although some participants found overly friendly tones off-putting, suggesting a need for balanced personalization to avoid alienating users. This finding reveals a clear pattern: adolescents, as a high-engagement demographic, respond best to visually interactive and empathetic designs, implying that chatbot developers should prioritize multimedia elements to foster emotional connection and improve retention rates, particularly in youth-focused networked interventions where dropout is a common challenge.
The impact of the COVID-19 pandemic has accelerated the development and utilization of mental health chatbots, with patterns emerging around targeted support for vulnerable groups like older adults and healthcare workers.38,39 Researchers have reported the creation and testing of chatbots designed to mitigate psychological stress during the pandemic, such as user-friendly interfaces that reduced anxiety scores by an average of 15–20% in older adults through tailored cognitive exercises and emotional check-ins 38 and programs for healthcare workers that facilitated triage and access to behavioral health resources, demonstrating high usability (system usability scale scores > 80) amid heightened burnout. 39 Analysis of chatbot interaction logs has provided valuable insights into user behavior, identifying engagement patterns such as peak usage during evenings and commonly used features like mood tracking and coping modules.40,41 These logs indicate a broader trend where chatbots serve as resilient nodes in mental health networks during crises, with higher engagement among stressed professionals and isolated elderly, implying the need for multilingual and culturally adaptive designs to sustain long-term adherence in post-pandemic care ecosystems.
Various research methods, including randomized controlled trials (RCTs),40,42 usability studies38,39 and descriptive studies, 43 have been employed to assess the effectiveness of chatbots, revealing consistent patterns of efficacy in school-based and workplace settings. For instance, an RCT evaluated the efficacy of a chatbot designed to help young adults manage depressive symptoms during the COVID-19 pandemic, showing significant reductions in Patient Health Questionnaire (PHQ)-9 scores (from moderate to mild depression) when compared to telehealth-delivered CBT, with promising clinical outcomes like 30% symptom remission rates. 42 Other studies have focused on using chatbots to address the mental health needs of healthcare workers 39 and reduce psychological stress in older adults suffering from anxiety or depression, where pre–post assessments indicated improved well-being metrics (e.g. Generalized Anxiety Disorder (GAD)-7 reductions). 28 Furthermore, research into the usage patterns of chatbots like Tess has shed light on how users interact with different modules, with descriptive analyses showing that emotion-focused modules were most accessed (40% of interactions), leading to sustained engagement over weeks. 41 The chatbot “Tess” is a mental health chatbot that has been used on various platforms, including mobile devices and web-based applications, integrated into messaging systems like SMS and therapy platforms used by healthcare professionals, where RCTs demonstrated significant reductions in depression (d = 0.64) and anxiety (d = 0.52) symptoms through CBT and mindfulness interventions. 8 This pattern underscores Tess's role as a versatile tool in hybrid networks, implying that modular designs enhance user agency and therapeutic flexibility.
Investigations have also assessed the efficacy of specific chatbot applications for depression, 43 well-being and maternal mental health support, 20 highlighting a trend toward specialized interventions for niche populations. For example, multilingual chatbots like ChatPal improved well-being scores in pre–post studies across diverse linguistic groups, with effect sizes indicating small but significant gains in positive psychology metrics. 43 Similarly, Rosie, a health education chatbot for new mothers, enhanced knowledge and self-efficacy in randomized pilots, reducing postpartum anxiety by promoting behavioral changes like sleep hygiene. 44 A recent systematic review expanded on AI-based chatbots using advanced techniques such as NLP, ML, deep learning and hybrid methods, finding high efficacy in promoting healthy behaviors such as smoking cessation (up to 25% quit rates) and medication adherence. 45 This review confirms the effectiveness not only in mental health contexts but also in broader behavioral promotions, implying that hybrid NLP models represent an evolving dominant paradigm, offering implications for scalable, personalized interventions that integrate seamlessly into global mental health networks while addressing gaps in evidence for underrepresented demographics.
Another paper 46 conducted a scoping review exploring patient perceptions and opinions about mental health chatbots by synthesizing findings from multiple qualitative and quantitative studies. This review highlights user attitudes toward chatbot usability, acceptability and therapeutic engagement, though conclusions are limited by heterogeneity in study designs and outcome measures.
Table 1 presents the methodological quality appraisal of the key studies included in this review. Standardized evaluation frameworks were applied according to study design: AMSTAR-2, Cochrane Risk of Bias 2.0, CONSORT, CASP, STROBE, NIH Quality Assessment Tool and JBI Critical Appraisal Checklist. The overall quality of evidence ranged from moderate to high, with quantitative studies (RCTs and systematic reviews) generally showing low risk of bias and strong adherence to PRISMA or CONSORT reporting standards. Usability and descriptive studies demonstrated moderate quality, primarily due to smaller sample sizes, lack of control arms or reliance on self-reported data. Despite these limitations, the collective evidence supports the growing rigor and reproducibility of AI-driven mental health chatbot research, while highlighting the need for more longitudinal, cross-cultural and multicenter randomized trials to ensure external validity.
Quality assessment of studies.
Categorization of mental health chatbots
To classify mental health chatbots by function and therapeutic approach, Figure 6 illustrates five major categories of mental health chatbots: (1) CBT-based interventions, (2) emotional support systems, (3) professional connection platforms, (4) self-tracking and monitoring tools and (5) personal development applications.

Classification of mental health chatbots: categories supporting psychological well-being and therapeutic interaction.
Table 2 summarizes the principal AI-driven and chatbot-based mental health applications, organized according to their primary therapeutic functions. Category 1 encompasses CBT-oriented systems such as Woebot,45,46 Youper 47 and Joyable, 48 which implement structured CBT and mindfulness protocols to alleviate symptoms of depression and anxiety. Category 2 covers emotionally supportive and wellness-focused chatbots like Wysa, 50 Replika 51 and Sanvello, 53 which deliver real-time self-help, motivational and emotion-regulation interventions. Category 3 refers to professional-connection platforms, including Talkspace, 56 Ginger 57 and BetterHelp, 58 that facilitate access to licensed therapists within secure digital environments. Category 4 includes self-tracking and monitoring tools such as Moodpath 59 and 7 Cups, 60 which enhance emotional awareness and promote peer-based support networks. Finally, Category 5 features integrative and personal-development applications such as Tess 8 and Remente, 63 emphasizing AI-assisted psychological support and goal-oriented self-improvement. Collectively, these chatbot systems illustrate the expanding spectrum of AI applications in mental health—from clinically validated therapeutic agents to wellness-oriented self-management tools—each contributing uniquely to improving the accessibility, personalization and scalability of digital mental healthcare services.
Categories of mental health chatbots by primary function and therapeutic approach.
Exploring the impact and efficacy of mental health chatbots
The objective of presenting the studies in this section is to deliver a thorough synthesis of existing research and insights regarding mental health chatbots. This review highlights their potential advantages, including the reduction of anxiety and depression symptoms, increased accessibility to mental health resources and support for underserved communities. Furthermore, the curated references illuminate the inherent challenges and limitations associated with chatbot technologies, enabling a nuanced understanding of their evolving role within contemporary mental healthcare contexts. Different categories of mental health chatbots establish varied network structures, ranging from centralized architectures where a singular AI entity delivers uniform support to more sophisticated systems comprising multiple interconnected AI agents and human mental health professionals. The specific network topology of these chatbot systems notably influences their operational efficacy, scalability and overall impact on service delivery. Abd-Alrazaq et al. conducted a systematic review and meta-analysis to evaluate the effectiveness and safety of using chatbots to improve mental health outcomes. 1 The study found that chatbots can significantly improve symptoms of depression, anxiety and stress and are generally safe to use. The review highlights the potential of chatbots as scalable and accessible interventions for mental health support. Understanding the functionalities that make chatbots effective is crucial. Abd-Alrazaq et al. provided an overview of the key features and functionalities of chatbots used in the mental health domain. 2 The review identified common features such as NLP, personalization and the ability to provide tailored support based on user needs. These features enhance user engagement and the therapeutic potential of chatbots. Patient acceptance is a significant factor in the success of chatbot interventions. Abd-Alrazaq et al. explored patient perceptions and opinions about mental health chatbots. 46 The study found that patients generally have a positive attitude toward chatbots, perceiving them as convenient and accessible. However, concerns were raised about the limitations in providing personalized care and the potential for misunderstandings, highlighting areas for improvement.
Vaidyam et al. provided a comprehensive overview of the use of chatbots and conversational agents in the mental health field. 3 They discussed potential benefits like increased accessibility and scalability, as well as challenges such as privacy concerns and the limitations in delivering personalized care. This overview underscores the importance of balancing technological advancements with ethical considerations. Cho et al. offered an integrative survey combining computer science and medical perspectives on conversational agents in mental healthcare. 37 The study explored technical aspects like NLP and dialogue management, alongside clinical applications and challenges, bridging the gap between technology and healthcare.
Khosravi and Azar conducted a thematic analysis to identify key factors influencing patient engagement with mental health chatbots. 66 The analysis highlighted the importance of perceived usefulness, ease of use, trustworthiness and the ability to provide personalized and empathetic support. These factors are essential for designing chatbots that effectively engage users and support mental health interventions.
Mariamo et al. investigated the emotional reactions and likelihood of response to questions designed for a mental health chatbot among adolescents. 19 The findings suggest that chatbots can effectively engage adolescents in mental health discussions, providing a comfortable platform for emotional expression. This highlights the potential of chatbots in addressing mental health needs in younger populations.
Kosyluk et al. explored the relationship between mental distress, label avoidance and the use of a mental health chatbot. 67 The study found that individuals with higher levels of mental distress and label avoidance are more likely to use a mental health chatbot. This indicates that chatbots may reach underserved populations who might avoid traditional mental health services due to stigma.
The COVID-19 pandemic has heightened the need for accessible mental health support. He et al. conducted an RCT evaluating a mental health chatbot for young adults with depressive symptoms during the pandemic. 68 The chatbot intervention led to significant improvements in depressive symptoms, anxiety and overall well-being compared to the control group. Chou et al. focused on developing a user-friendly chatbot to mitigate psychological stress among older adults during the pandemic. 38 The chatbot was well received and effective in reducing stress and anxiety, demonstrating the versatility of chatbots across different age groups.
Jackson-Triche et al. developed and evaluated a chatbot designed to address the behavioral health needs of healthcare workers during the COVID-19 pandemic. 39 The chatbot provided effective support and resources, highlighting its potential in high-stress professions. Nguyen et al. conducted a randomized pilot study evaluating “Rosie,” a chatbot providing health education and support to new mothers. 44 The chatbot was effective in improving knowledge and self-efficacy, indicating its usefulness in maternal health contexts.
Understanding how users interact with chatbots can inform improvements. Dosovitsky et al. examined usage patterns and user feedback of an AI-powered chatbot for depression. 40 Insights into user characteristics and interactions can guide enhancements in chatbot design. Booth et al. analyzed user event logs of a mental health and well-being chatbot to gain insights into engagement patterns and impact. 41 This analysis helps in understanding user behavior and tailoring interventions accordingly.
Xu et al. evaluated the functionality and effectiveness of a digital mental health clinic with a chatbot component in secondary schools. 42 The digital clinic effectively improved mental health outcomes among students, showcasing the potential of chatbots in educational settings. Potts et al. conducted a pre–post multicenter intervention study on a multilingual digital mental health and well-being chatbot. 43 The chatbot improved mental health outcomes across different cultural and linguistic contexts, emphasizing the importance of accessible mental health interventions worldwide. Aggarwal et al. evaluated AI-based chatbots for promoting health behavior changes. 20 Their study found that chatbots effectively promoted healthy lifestyles, smoking cessation, treatment adherence and substance misuse reduction. They utilized behavior change theories and expert consultation to personalize services.
Chiauzzi et al. described a protocol for an RCT evaluating a relational agent intervention for adolescents seeking mental health treatment. 48 This study represents future directions in integrating chatbots into mental health interventions. Tsoi et al. reported on stakeholder interviews regarding a web-based, stratified, stepped-care mental health platform incorporating a chatbot. 69 Stakeholder insights are crucial for successful implementation. Torous et al. provided a comprehensive overview of digital psychiatry, including apps, social media, chatbots and virtual reality. 70 They discussed current evidence, benefits and future directions, highlighting the growing role of technology in mental healthcare.
Recent advances further demonstrate how transformer-based architectures outperform traditional NLP approaches in mental health contexts. For instance, one study 71 analyzed more than 400,000 tweets to detect ASD, comparing classical ML models (decision trees, extreme gradient boosting (XGBoost), K-nearest neighbor (KNN)) with deep learning models such as RNN, long short-term memory (LSTM) and BERT/BERTweet. Their results showed that transformer-based models achieved the highest diagnostic accuracy, nearly 88% in identifying textual markers associated with ASD, confirming the strength of contextualized language representations for social-media-based mental health detection. This evidence reinforces the shift toward transformer architectures as reliable instruments for early mental health screening on digital platforms.
Complementing this trend, one study 72 applied deep learning models enriched with anaphora-resolution techniques to analyze linguistic patterns related to depression on Twitter. By resolving contextual references across sentences, their model captured subtle self-referential expressions and emotional dependencies often missed by conventional sentiment analysis pipelines. The study demonstrated that incorporating advanced linguistic reasoning into neural architectures significantly improved detection precision and interpretability, offering a pathway for integrating discourse-level understanding into AI-driven mental health assessment. Together, these contributions highlight the growing scientific consensus that transformer-based and linguistically aware models enable more robust and context-sensitive analysis of social media data in mental health research.
This collection of studies demonstrates the significant potential of chatbots in enhancing mental healthcare. They offer scalable, accessible and effective interventions for a variety of populations and settings. However, challenges such as personalization, privacy and ethical considerations remain. Future research and development should focus on addressing these challenges to maximize the benefits of chatbots in mental health interventions.
Table 3 presents a summary of key aspects of various mental health chatbots examined in the reviewed articles, including algorithms used, datasets, countries of origin, clinical outcomes and unique features. This overview highlights the diversity and efficacy of different approaches in applying AI to mental healthcare.
Summary of mental health chatbots: key algorithms, datasets, clinical outcomes and unique features
Table 2 provides an integrative overview of the empirical landscape surrounding mental health chatbots, capturing the progression from rule-based conversational systems to hybrid and AI-enhanced agents. The evidence synthesized in this table indicates that rule-based and menu-driven chatbots still dominate real-world deployments, primarily due to their predictability, auditability and lower risk in clinical and educational settings. In contrast, hybrid and NLP-driven chatbots, which combine ML-based intent detection with structured conversational flows, have achieved higher engagement and user satisfaction, particularly when designed with interactive and multimodal elements such as GIFs or empathetic response templates. The data also reveal consistent demographic patterns: adolescents, students and shift-workers exhibit the highest engagement and retention rates, while healthcare professionals and older adults benefit most from targeted stress reduction and self-management interventions.
Clinically, the studies summarized in Table 2 report significant short-term improvements in PHQ-9 and GAD-7 scores, confirming the efficacy of chatbot-based cognitive-behavioral interventions in mild to moderate cases of depression and anxiety. However, longitudinal validation and large-scale RCTs remain scarce. Multilingual systems such as ChatPal demonstrate the feasibility of culturally adaptive designs but show only modest effect sizes. Overall, the table reflects a field in transition from static, rule-based systems toward personalized, emotionally intelligent agents embedded within hybrid care networks. This synthesis underscores that the future of mental health chatbots depends not on greater automation alone, but on achieving a careful balance between personalization, safety and ethical transparency.
Using AI and ML models to evaluate and diagnose mental health
Recent studies, such as that of Rezaie and Banad, have highlighted the integration of diverse data sources, including neuroimaging and clinical records, with ML to enhance the precision of Alzheimer's disease diagnosis, which shares methodological overlaps with mental health assessments. 74 Additionally, Bain et al. provide a comprehensive tutorial on supervised ML variable selection methods, emphasizing their utility in classifying mental health conditions like depression and anxiety in social and health sciences. 75 Advancements in these technologies encompass a range of methodologies from wearable devices capturing physiological data to sophisticated algorithms analyzing behavioral patterns, all aimed at enhancing mental healthcare.1,4 These AI/ML models form the computational foundation of mental health support networks, with different architectures creating varying patterns of information flow between users and systems. The networked nature of these models allows for continuous learning and adaptation based on user interactions. These tools facilitate early detection, accurate diagnosis and personalized treatment plans, showcasing the transformative potential of AI in mental health evaluation.
NLP and specialized models
The application of NLP and large language models (LLMs) in mental health interventions is a burgeoning area of research, with significant advancements in recent studies.2,3,21 The development of infrastructure solutions, such as the MCP Bridge, a lightweight, LLM-agnostic RESTful proxy, enables efficient integration and deployment of these models in psychological applications by streamlining data processing and model interactions. 76 Specialized models like MentalBERT 45 and MentalLLaMA77,78 have been designed to enhance empathetic responses and deepen contextual understanding in mental health settings. By leveraging vast datasets of textual information, these models excel at detecting linguistic patterns and cues associated with mental health conditions, paving the way for more accurate and scalable interventions.
Challenges in NLP, such as linguistic diversity and inherent biases, are addressed by proposing frameworks like NLPxMHI to guide future research. 21 Analytical methods, including logistic regression, random forest and sentiment analysis, are frequently employed to evaluate text-based conversations with chatbots, achieving accuracy rates of 70–80% in predicting symptoms of depression and anxiety. 4
RCTs have demonstrated the effectiveness of AI-driven therapies. For instance, chatbots like Woebot have shown notable decreases in depression levels among users. Another study examined an AI-powered chatbot integrated into CBT for addressing work-related stress. The chatbot assisted patients in completing CBT homework tasks, such as ABC notes, resulting in significant improvements in psychometric test scores. 47
Tess, an integrated psychological AI, delivered individualized support by implementing interventions rooted in various modalities such as CBT, mindfulness and acceptance and commitment therapy. This approach led to significant reductions in depression and anxiety ratings. 8
Technological tools and interventions for mental health
Various AI-driven tools, including chatbots, virtual assistants and smartphone applications, are being utilized in the treatment of mental health issues.8,16,47 These technologies are examined for their effectiveness in therapy and assessment. The RCT of the AI named Tess illustrates the potential of AI in providing accessible mental health support, particularly for students, indicating a shift toward integrating technology in therapeutic practices.
Research methodologies in AI mental health studies
Diverse research methodologies are employed to investigate the role of AI and ML in mental health.4,20,69,73 These include systematic literature reviews, RCTs and qualitative interviews. Such methods provide a comprehensive understanding of user experiences, efficacy and practical implications of AI applications in mental healthcare. Evaluating user acceptance and effectiveness is crucial for the successful integration of these technologies into clinical practice.
Performance and effectiveness of AI models
The performance metrics of various AI models are critical in assessing their utility.45,73,77 Models like MentalBERT and SR-BERT have demonstrated impressive classification capabilities in identifying mental health issues from social media data. For example, MentalBERT and MentalRoBERTa, pretrained on mental health-related content from Reddit, achieved F1 scores ranging from 68% to 93% across several mental health classification tasks, indicating high accuracy.
Models such as MentalBERT and MentalRoBERTa, which have been trained on data derived from social media platforms like Reddit and Twitter, primarily capture the language, emotions and interactional behaviors of online users rather than those of clinical populations. Although achieving F1 scores in the range of 68% to 93% indicates high technical precision in text classification, this performance is meaningful only within the domain of “social language,” not in the context of “clinical diagnosis.” Social media discourse is saturated with metaphorical expressions, humor and colloquial idioms that are rarely present in therapeutic communication. Consequently, when these models are applied in clinical settings, they face the phenomenon of domain shift, a divergence between informal online language and formal clinical discourse, which leads to a marked decline in accuracy and reliability.
From a methodological standpoint, reliance on metrics such as F1-score or ROC-AUC reflects only the statistical and technical dimensions of model performance and does not convey therapeutic or clinical value. In mental health contexts, more critical indicators include the predictive reliability for clinical decision-making and the interpretability of model outputs by practitioners. A model trained on Reddit data may accurately recognize explicit terms like “depressed” or “kill myself,” yet fail to capture more nuanced linguistic markers of mental distress, such as defense mechanisms or cognitive avoidance patterns. Therefore, transferring such models to clinical environments requires cross-domain fine-tuning, clinical validation and the incorporation of diverse data sources such as structured interviews, medical records and demographically representative samples to ensure generalizability, accuracy and true clinical applicability.
The SR-BERT model, an extension of DialogBERT that incorporates psychological theory, attained an F2 score of 76.2% and an ROC-AUC of 92% for suicide risk prediction, significantly outperforming previous models. 79 This highlights the model's efficacy in identifying individuals at risk and its potential application in preventive interventions. To enhance empathetic responses, a novel approach combined the Chain-of-Empathy prompting strategy with GPT-3.5, utilizing concepts from psychotherapy models. This method led to a balanced accuracy of 0.340, demonstrating potential in generating empathetic response strategies. 80
Datasets and geographic context in AI research
Datasets utilized in AI and ML studies often reflect a reliance on data from industrialized nations.4,47 The USA, UK and Canada are primary sources of text-based conversations for chatbot engagements, indicating a need for more representative data to enhance generalizability. An Italian study involving 21 participants aged 33 to 61 with mild to moderate stress, anxiety or depression underscores the importance of including diverse populations. 47 The SR-BERT model was trained on over 40,000 anonymous chat sessions from the Sahar organization in Israel. 39 MentalBERT and MentalRoBERTa employed text data from Reddit and Twitter, encompassing a global user base. 29 The EPITOME dataset, consisting of Reddit posts from the United States annotated for empathy tactics, was also utilized. 80 The MentalLLaMA project likely combined general language corpora with specialized mental health dialogues, demonstrating the varied data sources used to train these models.77,78
Future directions and challenges in AI for mental health
While AI and ML hold great promise for mental healthcare, several challenges persist. 21 Key issues include replicability of results, algorithmic biases and the need for comprehensive, diverse datasets. Biases in AI models can lead to unequal treatment outcomes across different demographic groups. Addressing these challenges is essential to ensure that AI technologies contribute effectively and ethically to mental health interventions. Proposed frameworks and guidelines aim to guide future research in overcoming these obstacles. Emphasis is placed on developing transparent algorithms, ensuring data privacy and fostering collaboration between technologists and mental health professionals. Addressing these challenges will determine whether AI becomes a truly useful and equitable tool for mental healthcare worldwide.
Overview of AI/ML models in mental health chatbots.
Table 3 synthesizes recent advancements in AI and ML models applied to mental health diagnostics and support, offering insight into the technical diversity and practical limitations of current approaches. The comparative results show that transformer-based architectures, particularly MentalBERT and MentalRoBERTa, outperform traditional classifiers, attaining F1 scores between 68% and 93% on social-media corpora such as Reddit and Twitter. Yet, these seemingly strong results largely capture the nuances of social language, not clinical discourse, and therefore face a pronounced domain-shift problem when transferred to medical or therapeutic contexts. More specialized models, including SR-BERT, a psychologically informed extension of DialogBERT, and generative frameworks such as GPT-3.5 Chain-of-Empathy or MentalLLaMA, represent emerging attempts to incorporate empathic reasoning and context awareness into AI-mediated dialogue.
A deeper examination of the studies summarized in Table 4 highlights persistent methodological and ethical gaps. Fewer than half of the models report calibration metrics, interpretability analyses (e.g., SHAP or LIME) or bias audits across demographic subgroups. Most datasets originate from Western or English-speaking populations, leaving cross-cultural generalizability uncertain. Consequently, these models should currently be viewed as decision-support and risk-screening tools rather than diagnostic systems. Table 3, therefore, maps a critical trajectory: while technical performance is advancing rapidly, meaningful clinical translation will require cross-domain fine-tuning, external validation, transparent reporting and human oversight. The synthesis ultimately portrays AI and ML in mental health as promising yet ethically contingent technologies capable of augmenting, but not replacing, human judgment in mental healthcare.
Investigating ethical and privacy considerations in AI-based mental health interventions
Because mental health data are among the most sensitive forms of personal information, ethical and privacy safeguards must be integral to every AI-based intervention. This section investigates the ethical implications of deploying AI technologies in mental healthcare, emphasizing the significance of safeguarding user privacy, ensuring data security and maintaining transparency in AI operations. Ethical concerns highlighted include algorithmic bias, informed consent, therapeutic alignment and the necessity for culturally sensitive and adaptable AI systems. Privacy and security concerns are particularly relevant in networked AI systems, where personal data may traverse multiple nodes and processing points. The complex network of stakeholders involved in AI mental healthcare, including developers, healthcare providers, patients and regulatory bodies, necessitates clear frameworks for responsibility and accountability. Effective management of these complex interactions requires comprehensive policies and regulatory guidelines that clearly define roles, responsibilities and liabilities to safeguard user rights and promote trust in AI-driven mental health solutions.
Through a comprehensive analysis of the literature in this field, several challenges have been identified. The reviewed articles propose specific and standardized solutions within the domains of healthcare and mental health, which are outlined as follows:
Data privacy and security
AI technologies in healthcare, especially those used for mental health support, often handle sensitive personal information. The widespread use of AI chatbots and other mental health tools raises significant concerns about data privacy and security. Risks include improper storage, unauthorized sharing or exposure of personal health data to cyber-attacks, leading to breaches of patient confidentiality. The lack of robust privacy policies and secure data storage practices can exacerbate these issues, potentially undermining trust in AI systems. These concerns are heightened in mental health, where data is highly personal and sensitive.
To address privacy and security concerns83,87 AI systems must incorporate strong data governance protocols, including encryption and anonymization techniques, to protect user data. Organizations should comply with stringent data protection regulations such as the General Data Protection Regulation (GDPR). Additionally, user consent must be emphasized, ensuring that patients are fully informed about how their data is collected, stored and used. Regular security audits, breach detection systems and transparent communication with users about data usage are essential to ensure trust and compliance.
Moreover, new AI techniques such as federated learning have emerged as a promising privacy-preserving solution in Healthcare that allows AI models to be trained across decentralized data sources without sharing raw data, thereby minimizing privacy risks and enhancing data security. 88 This approach enables collaborative model training by aggregating locally computed updates while keeping sensitive data confined to its source, thus aligning with privacy requirements in sensitive domains such as mental healthcare.
A closer examination of the 37 reviewed studies reveals substantial variability in how ethical principles were operationalized, ranging from explicit adherence to ethical frameworks to complete omission of such considerations. Approximately one-third of the studies.1,8,37 explicitly mentioned compliance with institutional ethical approvals, informed consent procedures or anonymization of user data, particularly in RCTs and usability studies involving clinical participants. In contrast, most AI- and NLP-focused works.45,79,81 relied on publicly available datasets such as Reddit or Twitter without detailed discussion of consent, data provenance or user awareness, reflecting a growing “ethics gap” between clinical and computational research domains. Only a few studies, such as Luxton, 85 engaged directly with issues of accountability, bias or algorithmic fairness. This disparity highlights an urgent need for standardized ethical reporting in AI mental health research, including clear documentation of data sources, anonymization strategies, fairness audits and human oversight mechanisms. Integrating these dimensions would not only strengthen reproducibility but also ensure that algorithmic innovation aligns with ethical transparency and patient-centered care principles.
Although systematic ethical reporting is not yet common across all studies in the field of AI and mental health, initiatives such as the Canada Protocol for Artificial Intelligence in Suicide Prevention and Mental Health, which outlines criteria for transparency, security, health risk assessment and bias mitigation, have been developed to encourage more rigorous ethical documentation. 89 Furthermore, frameworks such as the IEACP (Identify–Examine–Act–Check–Plan) model, which integrates ethical decision-making stages with core moral principles, including beneficence, autonomy, justice, transparency and scientific integrity, 90 represent emerging efforts toward the standardization of ethics in computational mental health research. Therefore, future studies should incorporate standardized ethical reporting that explicitly addresses data provenance, anonymization methods, bias auditing, accountability mechanisms and human feedback loops to ensure transparency, reproducibility and patient-centered responsibility in AI-driven mental health applications.
Data biases and inequities in AI-based mental health interventions
A critical limitation of AI-based mental health interventions is the presence of data biases in training datasets, which can lead to inequitable and biased outcomes. Many AI models reviewed in this study, including chatbots and diagnostic systems, are trained on datasets that often overrepresent populations from high-income or Western contexts, limiting their generalizability across diverse cultural, socioeconomic and geographic groups.84–86 Such biases can manifest as inaccurate diagnostic predictions or inappropriate treatment recommendations that fail to address the unique needs of marginalized populations, thereby exacerbating health disparities. For instance, chatbot systems designed for specific linguistic or cultural contexts may be less effective in multilingual or culturally diverse settings, where socio-economic, racial or cultural factors significantly influence mental health outcomes. This can result in inequitable treatment suggestions, such as interventions that are misaligned with the lived experiences of underserved communities. To mitigate these biases, AI systems must be trained on diverse and representative datasets that encompass varied socio-economic backgrounds, cultures and mental health conditions. Continuous monitoring and testing of algorithms for fairness, coupled with the implementation of transparency protocols and fairness metrics, are essential to identify and address biases early in the development process. Collaboration between AI developers and mental health professionals is crucial to ensure contextual accuracy and cultural sensitivity in model design, reducing the risk of biased outcomes. These efforts intersect with the broader ethical challenges of algorithmic fairness and accountability discussed in this section, underscoring the need for inclusive data practices to promote equitable delivery of AI-driven mental health services. Failure to address data biases risks perpetuating inequities, undermining the potential of AI to support accessible and effective mental healthcare.
Short-term research gaps
A notable limitation in the current literature is the reliance on short-term studies to evaluate AI-based mental health interventions. Many studies reviewed, particularly those assessing chatbot efficacy or diagnostic models, focus on immediate outcomes, such as reductions in depressive or anxiety symptoms over weeks or months. However, the absence of longitudinal research hinders a comprehensive understanding of AI's sustained impact on mental health, including its role in preventing relapse or managing chronic conditions. This gap poses ethical challenges, as deploying AI interventions without evidence of long-term efficacy may lead to over-optimistic expectations or unintended consequences for patients. Future research must prioritize longitudinal studies to better inform the integration of AI into mental healthcare systems, ensuring alignment with the evidence-based approaches emphasized throughout this review.
Contexts of limited AI efficacy
AI-based mental health interventions may be less effective in certain contexts, particularly those requiring deep human empathy or complex clinical judgment. For example, severe mental health disorders, such as schizophrenia or bipolar disorder, often demand nuanced therapeutic interactions that AI systems, including chatbots and diagnostic models, are currently ill-equipped to provide. Similarly, in culturally diverse or low-resource settings, AI interventions may struggle to account for local cultural norms, linguistic variations or socioeconomic barriers, limiting their effectiveness. These limitations underscore the ethical imperative to deploy AI as a complementary tool rather than a replacement for human clinicians, as emphasized in discussions of human-AI collaboration within this review. Inappropriate application of AI in such contexts risks delivering suboptimal care, highlighting the need for careful consideration of environmental and clinical factors in AI implementation
Accountability and liability
A major ethical issue surrounding AI in healthcare is the lack of clear accountability when errors occur. Since AI systems can make autonomous decisions or recommendations, it can be difficult to pinpoint responsibility when adverse outcomes arise, such as incorrect diagnoses or harmful advice. This diffusion of responsibility creates a challenge in assigning legal liability, especially when multiple stakeholders (developers, clinicians, organizations) are involved in deploying and using AI systems. To address accountability issues, a shared responsibility framework is essential. This framework should clearly delineate roles and responsibilities among AI developers, healthcare providers and regulatory bodies. AI systems should always operate under human supervision, ensuring that healthcare professionals retain the final decision-making authority. Clear regulatory standards must be established to hold developers accountable for the safety and reliability of their systems.10,87 Additionally, AI decisions should be transparent and traceable, allowing clinicians to understand the reasoning behind AI outputs and intervene when necessary.
Therapeutic misalignment
AI chatbots and mental health tools, particularly those using large language models (LLMs), may fail to align with therapeutic goals. These systems can offer advice that is inappropriate, culturally insensitive or misaligned.85,86 With the emotional and psychological needs of the patient. This misalignment occurs due to the general-purpose nature of many AI systems, which may lack the contextual understanding necessary for personalized mental healthcare. Furthermore, AI may reinforce harmful behaviors or provide ineffective advice, especially in situations where human empathy is required. For AI systems to be therapeutically effective, they need to incorporate therapeutic alignment principles, ensuring that the AI's recommendations support healing and well-being. Developers should work closely with mental health professionals to design systems that embody core therapeutic values such as empathy, unconditional positive regard and patient-centered care. Cultural sensitivity and adaptability must also be embedded in AI models to provide contextually relevant and supportive recommendations. Regular feedback loops between users, clinicians and developers can help improve the therapeutic efficacy of AI-based mental health tools.
Trust and transparency
AI systems often operate as “black boxes,” meaning their decision-making processes are opaque to both users and healthcare professionals. This lack of transparency85,86 can undermine trust, especially when the AI provides mental health advice or recommendations that are unclear or difficult to justify. Users may struggle to understand why the AI made a particular recommendation, which can lead to skepticism or outright rejection of the technology. The absence of transparency also poses risks in situations where AI systems provide incorrect or potentially harmful suggestions. To build trust in AI systems, developers should prioritize explainable AI (XAI) approaches. 91 That allows users and clinicians to understand how decisions are made. This involves creating systems that can clearly articulate the rationale behind their recommendations, making it easier for users to evaluate the appropriateness of the advice. Transparency should also extend to the limitations of the system, with users being fully informed about where the AI might fall short. Incorporating user feedback into system design and regularly updating AI models based on real-world outcomes can further enhance transparency and trust.
Informed consent and autonomy
In mental health AI applications, there is a risk that users may not fully understand the extent to which AI is involved in their care. This lack of informed consent can undermine patient autonomy, as users may unknowingly rely on AI systems for mental health advice without fully understanding their capabilities and limitations. The complex nature of AI also raises concerns about whether patients can provide fully informed consent before engaging with AI tools, especially in vulnerable mental health situations.10,87 Ensuring informed consent requires transparent communication with users about how AI systems function, including their potential limitations and risks. Users should be given the choice to opt in to AI-driven mental health support and be informed about alternative options for care. Educational materials explaining AI processes in simple, accessible language should be made available to both clinicians and patients. By ensuring that users are fully aware of the AI's role in their care, healthcare providers can uphold patient autonomy while integrating AI into mental health treatment.
Cultural sensitivity and adaptability
AI systems, particularly those designed for global mental health applications, often reflect the cultural norms and biases of their developers or the predominant culture of the data used for training.85,86 This cultural misalignment can result in AI systems providing inappropriate or ineffective advice to users from diverse backgrounds. For example, users from non-Western cultures may find that AI recommendations do not align with their social norms, values or mental health practices, leading to dissatisfaction or mistrust. AI developers must prioritize the cultural adaptability of mental health tools, ensuring that they are trained on diverse datasets that reflect a wide range of cultural contexts. AI systems should be able to tailor their responses based on the cultural and linguistic background of the user. Engaging with local mental health experts from different regions during the development phase can also help create more culturally sensitive AI models. Continuous monitoring and updates should be implemented to ensure that AI systems remain relevant and respectful of diverse cultural needs in mental healthcare. The paper 92 highlights the cultural biases embedded in AI systems, especially in explainable AI (XAI) models, where predominant cultural norms (often Western) influence how explanations are presented. This issue arises due to “WEIRD” sampling of Western, Educated, Industrialized, Rich and Democratic populations, which can limit the generalizability of AI systems globally. When AI systems are trained on data from WEIRD populations, they often reflect individualist cultural norms. This misalignment can result in AI providing advice or insights that may not resonate with users from collectivist or non-Western cultures, who may interpret behavior through external, contextual factors rather than internal mental states (e.g., beliefs or desires). To mitigate this cultural misalignment, the authors suggest that developers should create adaptable AI systems that account for the diversity in users’ cultural backgrounds. This includes engaging with experts from various regions and continually updating models to align with different cultural explanatory needs. Furthermore, the article advocates for using diverse datasets and involving users from non-WEIRD populations to ensure AI responses are relevant and culturally respectful.
Discussion
Integrating AI and chatbot technologies into mental healthcare presents transformative opportunities for enhancing accessibility, personalization and scalability of psychological support systems. However, these innovations also introduce methodological, ethical and social challenges that require careful evaluation and systematic governance. Drawing upon the reviewed literature, this discussion synthesizes the major challenges and proposed directions for integrating AI-based systems, particularly chatbots and ML models, into sustainable, network-oriented mental health frameworks.
AI-powered chatbots have emerged as promising tools for scalable and accessible mental health interventions. By leveraging NLP algorithms, they can facilitate self-reflection, reduce stigma and promote help-seeking behaviors, particularly among individuals with limited access to traditional mental health services.3,40,43,45,73 Their deployment during the COVID-19 pandemic demonstrated their value in providing immediate psychological assistance to healthcare workers, students and older adults. Nevertheless, the efficacy of such systems remains constrained by methodological and contextual limitations. Studies often rely on self-reported, short-term data from small or demographically narrow samples, which limits external validity and generalizability.39,40 Furthermore, many evaluations lack longitudinal evidence to confirm sustained therapeutic benefits over time.40,48 Variations in user experience shaped by age, culture and digital literacy further complicate the optimization of chatbot design and implementation.48,67 Beyond technical factors, ethical and operational challenges such as data privacy, consent and over-reliance on automated care highlight the need for responsible, human-centered deployment.8,38,68,73
ML and AI models have similarly shown strong potential for improving diagnostic precision and early detection of mental health conditions such as depression and anxiety.4,44,74 Models like MentalBERT and MentalLLaMA, which integrate psychological theories into language understanding, have advanced the empathetic and interpretive capacities of computational systems.47,78 Yet, their current performance largely reflects technical proficiency rather than clinical maturity. The generalizability of AI models is hindered by the lack of diverse and representative datasets,35,48 while the interpretability of complex models remains limited, reducing transparency and trust among practitioners. 46 Moreover, AI systems often fail to reproduce the nuanced, context-dependent nature of human empathy, resulting in interactions that are linguistically accurate but therapeutically shallow.79,82 These shortcomings underscore the importance of positioning AI as an assistive complement to clinicians rather than a substitute for human therapeutic relationships.20,70 Ensuring algorithmic explainability, fairness and interpretability will be essential for translating these models from laboratory settings to real-world clinical environments.46,82
Ethical and privacy considerations are central to the responsible integration of AI in mental healthcare. The sensitive nature of mental health data intensifies concerns regarding confidentiality, informed consent and algorithmic bias.10,16,83,86 The rapid evolution of AI technologies complicates regulatory efforts, often outpacing the establishment of legal and ethical frameworks. 10 Furthermore, cross-border data sharing and the use of cloud-based systems raise complex issues surrounding data sovereignty and user autonomy.84,87 Without rigorous oversight, AI systems risk perpetuating or amplifying existing inequities in mental health access, particularly among marginalized or economically disadvantaged groups. 85 To mitigate these challenges, researchers advocate for the development of robust ethical and regulatory frameworks that ensure transparency, user engagement and accountability.85–87 Continuous monitoring and longitudinal evaluation are equally vital to assess the sustained safety and efficacy of AI interventions.40,67
From a systems perspective, AI-based interventions should be conceptualized as components of dynamic socio-technical networks that connect patients, clinicians and digital infrastructures. Network analysis reveals properties such as small-world connectivity and scale-free distribution, where certain chatbot platforms or AI models function as influential hubs capable of amplifying both therapeutic benefits and systemic risks. The resilience of these distributed networks allows for continued service delivery during crises such as the COVID-19 pandemic but also demands safeguards against centralization, bias propagation and data vulnerabilities. Understanding the diffusion dynamics of AI technologies within healthcare networks can inform evidence-based strategies for ethical deployment and equitable adoption.
Looking forward, the sustainable advancement of AI in mental health depends on interdisciplinary collaboration that bridges computational innovation with clinical expertise and ethical oversight.38,67
Future work
Future research must prioritize large-scale, diverse clinical trials to establish empirical evidence for long-term efficacy and safety.40,41 Standardized evaluation frameworks, transparent reporting practices and culturally inclusive datasets are necessary to enhance reproducibility and global applicability.46,66 Technical progress, such as improving linguistic empathy, contextual adaptability and model interpretability, should be guided by humanistic values and legal safeguards.43,48,82 Ultimately, the integration of AI into mental healthcare should not seek to replace human compassion but to augment it, fostering hybrid care ecosystems that combine computational precision with ethical, patient-centered care.
Furthermore, future research and policy development should place greater emphasis on the operationalization of ethical and regulatory frameworks within real-world clinical research environments. One practical direction involves streamlining institutional ethics approval workflows for studies utilizing anonymized or de-identified clinical notes in hospital settings. Simplifying these approval pathways while maintaining rigorous oversight would accelerate translational research and enable the responsible use of sensitive health data without compromising patient confidentiality. Establishing federated learning infrastructures and secure data-sharing agreements across institutions can further enhance collaboration between researchers and clinicians while safeguarding privacy. In addition, integrating ethics-by-design principles into AI development pipelines where data governance, consent management and algorithmic fairness are addressed at every stage of model design can ensure that compliance and innovation progress in parallel. Collectively, these measures represent a proactive approach to aligning ethical integrity with scientific advancement, enabling AI-driven mental healthcare research to evolve within transparent, patient-centric and trustworthy ecosystems.
Conclusion
This systematic review of 37 studies on the application of AI in mental healthcare highlights notable advancements alongside significant challenges. AI-powered chatbots and ML models show promise in improving the accessibility, personalization and scalability of mental health services, particularly when integrated with human oversight. These technologies can facilitate early and accurate diagnosis, enhance support for underserved populations and alleviate workloads for mental health professionals, as evidenced by empirical studies exploring chatbot efficacy and diagnostic models. For instance, AI-driven chatbots offer continuous mental health assistance, potentially addressing barriers to traditional care in resource-constrained settings. However, the deployment of AI in mental healthcare raises critical ethical and practical concerns, including data privacy, algorithmic bias, accountability and transparency. Addressing these issues is essential to ensure responsible and equitable AI implementation in healthcare contexts. The lack of clear regulatory frameworks and the complexities of embedding ethical considerations into AI development pose substantial barriers, particularly in balancing technological innovation with patient safety and rights. Moreover, over-reliance on AI risks diminishing the vital human-to-human therapeutic relationships that are central to effective mental healthcare.
From a network science perspective, AI-based mental health interventions offer opportunities to create more resilient, accessible and adaptive care networks. By fostering new connections between users and care resources, these technologies can help address gaps in traditional mental health service delivery. Nonetheless, realizing this potential requires careful attention to network design principles to ensure AI complements, rather than replaces, human interactions in therapeutic processes. This research advances the understanding of technological networks in healthcare, illustrating how AI systems can integrate into existing infrastructures to form equitable and robust mental health support networks. These efforts align with the United Nations SDGs, particularly those related to Good Health and Well-being; Sustainable Cities and Communities; and Industry, Innovation and Infrastructure.
Future efforts should prioritize the development of robust ethical frameworks, transparency in AI operations and strengthened interdisciplinary collaboration among technologists, clinicians, ethicists and policymakers. While AI-based chatbots and ML models hold promise for enhancing mental healthcare, their current applications are limited by technical, ethical and regulatory challenges. These limitations underscore the need for further research and interdisciplinary advancements to optimize AI's efficacy and ensure its responsible integration into mental healthcare systems.
Footnotes
Acknowledgements
The authors thank the researchers and institutions whose work contributed to the studies analyzed in this systematic review. No external funding was received for this work.
Consent to participate
Not applicable. This study is a systematic review of previously published literature and did not involve direct participation of human subjects.
Consent to publish
Not applicable. This manuscript does not contain any individual person's data in any form.
Contributorship
Z.R. and A.K. conducted the literature search, screening and data extraction. D.S. contributed to full-text review and thematic synthesis, particularly in the areas of psychological and ethical analyses. Y.M.B. conceptualized the study, supervised the review process, resolved screening disagreements and led the overall synthesis and manuscript preparation. All authors contributed to writing, reviewing and approving the final manuscript.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Guarantor
Yaser Mike Banad is the guarantor of the study and accepts full responsibility for the integrity of the work, from study conception to publication.
Availability of data and materials
All data analyzed in this study are derived from previously published articles and publicly available sources. No new datasets were generated or analyzed.
