Sage Journals: Discover world-class research

Abstract

Background

Artificial Intelligence (AI) has demonstrated significant potential in transforming psychiatric care by enhancing diagnostic accuracy and therapeutic interventions. Psychiatry faces challenges like overlapping symptoms, subjective diagnostic methods, and personalized treatment requirements. AI, with its advanced data-processing capabilities, offers innovative solutions to these complexities.

Aims

This study systematically reviewed and meta-analyzed the existing literature to evaluate AI's diagnostic accuracy and therapeutic efficacy in psychiatric care, focusing on various psychiatric disorders and AI technologies.

Methods

Adhering to PRISMA guidelines, the study included a comprehensive literature search across multiple databases. Empirical studies investigating AI applications in psychiatry, such as machine learning (ML), deep learning (DL), and hybrid models, were selected based on predefined inclusion criteria. The outcomes of interest were diagnostic accuracy and therapeutic efficacy. Statistical analysis employed fixed- and random-effects models, with subgroup and sensitivity analyses exploring the impact of AI methodologies and study designs.

Results

A total of 14 studies met the inclusion criteria, representing diverse AI applications in diagnosing and treating psychiatric disorders. The pooled diagnostic accuracy was 85% (95% CI: 80%–87%), with ML models achieving the highest accuracy, followed by hybrid and DL models. For therapeutic efficacy, the pooled effect size was 84% (95% CI: 82%–86%), with ML excelling in personalized treatment plans and symptom tracking. Moderate heterogeneity was observed, reflecting variability in study designs and populations. The risk of bias assessment indicated high methodological rigor in most studies, though challenges like algorithmic biases and data quality remain.

Conclusion

AI demonstrates robust diagnostic and therapeutic capabilities in psychiatry, offering a data-driven approach to personalized mental healthcare. Future research should address ethical concerns, standardize methodologies, and explore underrepresented populations to maximize AI's transformative potential in mental health.

Keywords

Artificial intelligence psychiatry diagnostic accuracy therapeutic efficacy machine learning mental health care

Highlights

AI achieved 85% pooled diagnostic accuracy, excelling in detecting complex psychiatric disorders.

Machine learning models demonstrated the highest diagnostic and therapeutic performance among AI methodologies.

Therapeutic efficacy of AI technologies reached 84%, enabling personalized treatment strategies in psychiatry.

Hybrid AI models effectively integrated diverse data sources for enhanced diagnostic and therapeutic outcomes.

Ethical challenges and methodological variability underline the need for standardized, inclusive AI applications in mental health care.

Introduction

Artificial Intelligence (AI) has emerged as a transformative force driving advancements across numerous fields, particularly healthcare.¹ Its capacity to revolutionize diagnostics, treatment planning, and patient care has attracted substantial attention.² Within psychiatry—a field dedicated to understanding the complexities of human emotions, cognition, and behavior—the integration of AI presents remarkable opportunities.^3,4 This is especially critical given the inherent challenges in addressing mental health disorders,⁵ which often feature overlapping symptoms, subjective diagnostic methods, and personalized treatment requirements.⁶ AI provides a data-driven pathway to enhance diagnostic accuracy, improve therapeutic outcomes, and foster personalized mental healthcare.^7,8

AI's strength lies in its ability to process and synthesize extensive, complex datasets, a capability highly relevant to psychiatry.⁹ Traditional diagnostic approaches typically depend on clinical assessments, interviews, and self-reports, which, while valuable, can lack consistency and fail to capture the intricate nature of mental health conditions.^10,11 AI technologies, including machine learning (ML) and natural language processing, bring a fresh perspective to psychiatric care by integrating diverse data sources such as electronic health records (EHRs),^12,13 neuroimaging scans, genetic information, and real-time behavioral data.¹⁴ By combining these inputs, AI can improve diagnostic precision, support early detection, and help clinicians develop tailored treatment strategies.^15,16

Accurate diagnosis is fundamental to effective psychiatric care, but it remains a significant challenge due to overlapping symptoms across many mental health conditions.¹⁷ Disorders like schizophrenia, depression, bipolar disorder, and anxiety share common characteristics, complicating efforts to distinguish between them using traditional methods.^18,19 AI offers potential solutions to these complexities.²⁰ For example, ML models can analyze neuroimaging data to identify biomarkers unique to specific disorders, while natural language processing tools can evaluate speech patterns and text inputs to detect early mental health issues.²¹ These advanced capabilities provide clinicians with deeper insights, enabling more precise and timely interventions.²²

Beyond diagnostics, AI is becoming an essential tool in therapeutic applications.²³ Virtual therapists and AI-powered chatbots are increasingly popular for delivering psychological support, including cognitive-behavioral therapy and stress management techniques.²⁴ These technologies offer scalable solutions, especially in underserved areas or among populations with limited access to trained professionals.²⁵ Moreover, AI can help clinicians personalize treatment plans by analyzing patient histories, real-time information, and predictive patterns to recommend therapies most likely to succeed.²⁶ This personalized approach is particularly crucial in psychiatry, where treatment effectiveness often varies significantly from person to person.²⁷

Despite its promise, integrating AI into psychiatry comes with challenges that need be carefully managed.²⁸ Key ethical concerns include safeguarding patient privacy, ensuring data security, and addressing potential biases in AI systems.²⁹ Additionally, rigorous validation and standardization of AI tools are essential to meet clinical and ethical benchmarks.³⁰ As the field evolves, collaboration between psychiatrists, AI developers, and other stakeholders is vital in overcoming these challenges and maximizing AI's benefits in mental healthcare.³¹

The growing interest in AI's role in psychiatry reflects its transformative potential.³² However, the existing research exhibits considerable variation in methodology, scope, and quality, highlighting the necessity for systematic reviews and meta-analyses to synthesize findings, assess AI's effectiveness, and identify areas that require further investigation.^33,34 By synthesizing the current evidence, a clearer picture can emerge of how AI can advance psychiatric care and improve outcomes for patients. This study systematically reviewed and meta-analyzed existing research to evaluate the diagnostic and therapeutic efficacy of AI in psychiatry.

Methodology

Conceptualization of the study

This study was structured as a systematic review and meta-analysis, employing a robust methodology to aggregate and analyze data from existing research on AI applications in psychiatry. The two primary goals were to assess the diagnostic precision of AI and evaluate its therapeutic effectiveness across various psychiatric disorders. The choice of this framework is driven by its unmatched capacity to thoroughly examine available data, enabling a quantitative synthesis that highlights both overarching patterns and specific results related to AI integration in psychiatric practice.

Guideline and registration

The research design adheres rigorously to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to ensure detailed and transparent reporting (Figure 1).³⁵ These guidelines provide the foundational structure for the review process, including the literature search, study selection, bias assessment, and data synthesis. Moreover, the study was registered with the Open Science Framework (OSF) (10.17605/OSF.IO/4PZXC), ensuring transparency and reproducibility of the research methodology.

Figure 1.

The PRISMA flowchart.

Literature search protocol

The comprehensive literature search was conducted across four major databases: PubMed, Scopus, Web of Science, and PsycINFO, utilizing a broad range of keywords and MeSH terms such as “artificial intelligence,” “machine learning,” “deep learning (DL),” “hybrid models,” “neural networks,” and “psychiatry.” Boolean operators were applied to refine the search, ensuring coverage of both peer-reviewed and grey literature (Table 1). The search was unrestricted by language or publication status to capture all relevant research. The search duration for each database spanned from November 1, 2024, to January 2, 2025, with the databases being last searched during this period. Specifically, PubMed was searched from November 1 to December 15, 2024; Scopus was searched from November 1 to December 10, 2024; Web of Science was searched from November 1 to December 20, 2024; and PsycINFO was searched from November 1 to December 30, 2024. This approach ensured that the most recent studies were included while also capturing foundational research and emerging AI methodologies in psychiatry.

Table 1.

Search strategy.

Database	Keywords/MeSH terms	Boolean operators	Search duration	Last search date
PubMed	"artificial intelligence,” “machine learning,” “deep learning,” “hybrid models,” “neural networks,” “psychiatry"	AND/OR between keywords to refine and expand the search (e.g., “artificial intelligence” AND “psychiatry” OR “machine learning” AND “deep learning”)	Inception to January 2025	January 2025
Scopus	"artificial intelligence,” “machine learning,” “deep learning,” “hybrid models,” “neural networks,” “psychiatry"	AND/OR between keywords to refine and expand the search (e.g., “artificial intelligence” AND “psychiatry” OR “machine learning” AND “psychiatry”)	Inception to January 2025	January 2025
Web of Science	"artificial intelligence,” “machine learning,” “deep learning,” “hybrid models,” “neural networks,” “psychiatry"	AND/OR between keywords to refine and expand the search (e.g., “artificial intelligence” AND “psychiatry” OR “neural networks” AND “psychiatry”)	Inception to January 2025	January 2025
PsycINFO	"artificial intelligence,” “machine learning,” “deep learning,” “hybrid models,” “neural networks,” “psychiatry"	AND/OR between keywords to refine and expand the search (e.g., “artificial intelligence” AND “psychiatry” OR “deep learning” AND “neural networks”)	Inception to January 2025	January 2025

Inclusion and exclusion criteria

Studies selected for the review included randomized controlled trials, cohort studies, and other empirical research that investigated AI applications in psychiatric settings, involving participants diagnosed with psychiatric conditions across various demographics. The AI interventions analyzed encompassed technologies such as ML algorithms, DL models, hybrid models, and neural networks, with control groups not utilizing AI technologies serving as comparators. The primary outcomes assessed focused on diagnostic accuracy and therapeutic efficacy. Exclusion criteria were applied to non-empirical studies, research not directly related to AI applications in psychiatry, and studies compromised by incomplete data. The inclusion of DL and hybrid models highlighted their unique capabilities, such as handling unstructured data and integrating diverse data sources, further enriching the review's scope.

Study selection and data management

The selection process for the systematic review and meta-analysis began with meticulous screening of titles and abstracts. This initial phase was essential for identifying studies that potentially met the inclusion criteria focused on the application of AI in psychiatric settings. The title and abstract screening were performed by two independent reviewers, who worked together to ensure that only the most relevant studies advanced to the next stage of review. Any discrepancies were resolved through discussion or by consulting a third reviewer if necessary. Subsequently, a comprehensive full-text review was conducted for each shortlisted study. This in-depth review evaluated each study against predefined criteria, including relevance to the research questions, methodological rigor, and the specific AI technologies utilized. The full-text review was conducted by three authors to ensure accuracy and to reduce potential biases.

Data extraction and reliability testing

Once the studies were selected, the critical step of data extraction began in Phase 1. A standardized data extraction form was used to systematically gather key information from each study, including authorship, publication year, study design, sample size, types of AI technology utilized, primary outcomes observed, and key findings (Table 2). Two authors were responsible for the initial data extraction during this phase. In Phase 2, a pilot test of the data extraction form was conducted on a subset of studies to ensure the reliability of the process. This phase was essential for identifying and resolving any issues with the form, ensuring consistency in data collection across all included studies. Three authors participated in this phase, assessing the form's usability and consistency. In Phase 3, any discrepancies encountered during the data extraction process were addressed through collaborative discussions among the research team. Four authors participated in resolving these discrepancies to ensure accuracy. In the final Phase 4, consultations with two external experts were conducted to resolve complex issues, ensuring the accuracy and integrity of the data collected.

Table 2.

Study characteristics.

Reference	Study	Authors	Year	Location	Design	Sample size	Participants	Objective	Types of AI technology utilized	Primary outcomes	Key findings
³⁶	1	Danieli et al.	2021	Italy	Randomized controlled trial	21 participants	Adults with mild to moderate levels of stress, anxiety, and depression	Evaluate AI-based app for mental health stress management	Conversational AI agent	Improved engagement in therapy; symptom reduction trends	Group A showed symptom reduction trends, but not statistically significant (no p-value reported)
³⁷	2	Morales et al.	2017	Chile	Cross-sectional study	707 patients	Mental health patients, aged 14-85	Ascertain critical variables associated with suicide behavior	AI tools and decision tree techniques	Identification of variables for suicide risk; developed predictive models	Decision tree metrics: Accuracy 0.714, Precision 0.734, Recall 0.628, AUC 73.35%
³⁸	3	Fulmer et al.	2018	United States	Randomized controlled trial	75 participants	College students	Assess AI chatbot for depression and anxiety	AI chatbot (Tess)	Reduction in depression and anxiety symptoms	Significant reduction in depression (p = .03) and anxiety (p = .02); PANAS improvement (p = .03)
³⁹	4	Wei and Li	2022	China	Observational study	Data from 2018 China Labor Force Dynamics Survey	Manufacturing workers	Study the impact of AI on mental health in manufacturing workers	General AI integration in manufacturing	AI reduces depressive symptoms; impact mediated by work environment	AI reduced depression scores by 1.643 points; Work environment mediated 11.509% of the effect
⁴⁰	5	Yu et al.	2021	United States	Retrospective case-control study	142,432 individuals	People with diabetes mellitus	Predict mental health risk in people with diabetes using ML	Machine learning-based passive sensing	Effective mental health risk prediction in diabetes patients	Model metrics: Sensitivity >0.5, Specificity >0.5, AUC >0.5; SHAP values identified key predictors
⁴¹	6	Danieli et al.	2022	Italy	Randomized controlled trial	60 participants	Aging adults with work stress and anxiety	Evaluate AI-based agent for stress management	Conversational AI agent (TEO)	Stable symptom improvement with TEO integration	Group 2 significant symptom reduction (p < .05)
⁴²	7	Xiao Li	2023	China	Cluster sampling study	Community elderly population, stratified sample	Elderly community residents	Explore depression factors and improve interventions	Attention-LSTM and AI filters	Significant reduction in depression with intervention	Intervention group depression reduced significantly (p < .05)
⁴³	8	Zhang et al.	2023	China	Machine learning analysis	14,915 patients; 4538 controls	SMI patients and healthy controls	Detect SMI using MRI and AI	Multiple Instance Learning (MIL)	Accurate SMI detection; AUC: 0.82	MIL AUC: 0.82; ResNet AUC: 0.83; Generalization test strong
⁴⁴	9	Andrikopoulos et al.	2024	Greece	Case-control study	76 adults (32 ADHD; 44 controls)	Adults with ADHD and healthy controls	Utilize physiological data for ADHD detection	Physiological data and ML models	High sensitivity (81.4%) and specificity (81.9%)	Best model (SVM): Sensitivity: 81.4%; Specificity: 81.9%
⁴⁵	10	Zhang et al.	2024	China	Machine learning analysis	2088 college students; 751 external validation	College students at risk for mental distress	Predict mental distress in students using AI	eXGBM, RF, SVM models	Highest AUC 0.932 for eXGBM model	eXGBM model AUC: 0.932; External validation AUC: 0.918
⁴⁶	11	Gomeni et al.	2023	United States	Randomized controlled trial	40 centers; exact number not reported	Patients with major depressive disorder	Predict placebo response in RCTs to improve treatment effect estimation	Artificial Neural Networks (ANN)	Weighted analysis provided doubled treatment effect size	Placebo response weighted analysis doubled effect size
⁴⁷	12	Lacy et al.	2023	United States	Cross-sectional study	1120 participants	Youth aged 5–21 with mental health concerns	Predict major psychiatric conditions in youth using AI	Artificial Neural Networks (ANN), XGBoost	AUC â‰¥ 0.94 for predicting psychiatric conditions in youth	ANN outperformed with AUC â‰¥ 0.94; psychosocial features were key
⁴⁸	13	Kalmady et al.	2019	India and Canada	Ensemble machine learning analysis	174 participants	Drug-naive schizophrenia patients and controls	Improve schizophrenia prediction using ensemble learning	EMPaSchiz ensemble model	87% accuracy for schizophrenia prediction	EMPaSchiz accuracy: 87%; sensitivity: 80%; specificity: 93%
⁴⁹	14	Nemesure et al.	2021	France	Observational study	4184 participants	Undergraduate students screened for GAD and MDD	Predict GAD and MDD from EHR data without psychiatric features	XGBoost, Random Forest, Neural Networks	AUC of 0.73 for GAD and 0.67 for MDD prediction	AUC: GAD (0.73), MDD (0.67); SHAP analysis identified key predictors

Quality and bias assessment

The quality and potential biases of the included studies were rigorously assessed using established tools such as the Cochrane Risk of Bias Tool (Figure 2)⁵⁰ and the STROBE checklist⁵¹ (Table 3). Each study was meticulously evaluated and scored based on its methodological robustness and the presence of any biases that could influence the meta-analysis results. This quality assessment was pivotal, as it directly shaped the interpretative framework of the analysis, ensuring that the conclusions were both reliable and valid. By employing these rigorous methodologies, the review aimed to deliver a comprehensive and unbiased evaluation of the diagnostic and therapeutic effectiveness of AI technologies in psychiatry, providing significant insights into the field.

Figure 2.

Quality assessment using Cochrane risk of bias tool.

Table 3.

Quality assessment using STROBE checklist.

Study ID	Title and abstract	Introduction	Methods	Results	Discussion	Other information
1	Yes	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes	Yes
5	Yes	Yes	Yes	Yes	Yes	Yes
6	Yes	Yes	Yes	Yes	Yes	Yes
7	Yes	Yes	Yes	Yes	Yes	Yes
8	Yes	Yes	Yes	Yes	Yes	Yes
9	Yes	Yes	Yes	Yes	Yes	Yes
10	Yes	Yes	Yes	Yes	Yes	Yes
11	Yes	Yes	Yes	Yes	Yes	Yes
12	Yes	Yes	Yes	Yes	Yes	Yes
13	Yes	Yes	Yes	Yes	Yes	Yes
14	Yes	Yes	Yes	Yes	Yes	Yes

Note. Other Information: Includes funding and other potential conflicts of interest disclosures.

Statistical analysis

The statistical analysis employed both fixed-effect and random-effects models, depending on the heterogeneity of the studies, assessed using the I² statistic and Q tests. The fixed-effect model was applied when heterogeneity was low (I² < 50%), while the random-effects model was used when heterogeneity was high (I² > 50%). Effect size was primarily measured using area under the curve (AUC) for diagnostic accuracy, as it is a standard metric in diagnostic performance studies. However, when AUC was unavailable, standardized mean differences or risk ratios were used as alternative effect size measures to ensure comparability across studies. Statistical computations were performed using Python for data processing and Stata for meta-analysis calculations, as both are widely recognized for their versatility and ability to handle meta-analysis computations. This meticulous methodological approach ensured that systematic review and meta-analysis produced robust, evidence-based conclusions about the effectiveness of AI technologies in diagnosing and treating psychiatric disorders. These findings enriched the academic and clinical understanding of the potential of AI in psychiatry, offering valuable insights into its integration into practice while highlighting opportunities for further research and innovation in psychiatric treatment approaches.

Results

Study selection

The systematic review identified 317 records from multiple databases using a comprehensive search strategy. After the removal of 52 duplicates and 15 ineligible records (10 due to insufficient data and 5 for other reasons), 250 studies were subjected to title and abstract screening. During this process, 200 studies were excluded based on irrelevance to the research question or non-empirical design. Subsequently, 50 full-text articles were assessed for retrieval, but 5 could not be accessed due to unavailable data or publication restrictions. Ultimately, 45 studies were assessed for eligibility, and 14 studies met the inclusion criteria.^36–49 These studies were included in the meta-analysis, representing a robust dataset for evaluating the diagnostic and therapeutic efficacy of AI technologies in psychiatry (Figure 1).

Study characteristics

The 14 studies included in the analysis represented diverse methodologies, populations, and applications of AI in psychiatry (Table 2). Sample sizes ranged from 21 to 142,432 participants, with studies addressing psychiatric conditions such as depression, schizophrenia, and bipolar disorder. The AI techniques employed included ML algorithms (e.g., support vector machines, random forests), DL techniques (e.g., convolutional and recurrent neural networks), and hybrid models combining multiple AI approaches.^36–49 Outcomes were classified into two primary categories: diagnostic accuracy and therapeutic efficacy, with most studies reporting data on both aspects. The studies were conducted across various clinical and geographical contexts, contributing to the generalizability of findings. Additionally, the timeline of the studies reflected an increasing trend in the adoption of advanced AI models over the years, particularly DL and hybrid approaches. This evolution aligns with technological advancements and growing interest in AI integration within psychiatry.

Quality assessment

Figure 2 highlights the distribution of low, unclear, and high risks across multiple domains, including random sequence generation, allocation concealment, and selective reporting. Most studies demonstrated low risk in random sequence generation, while high risk was predominantly observed in the domains of blinding and incomplete outcome data. The summary across 14 studies (Figure 6) indicates that 70% of the studies had low risk in key areas, supporting the reliability of their results.

Diagnostic accuracy

The forest plot of diagnostic accuracy (Figure 3) synthesizes effect sizes and 95% confidence intervals (CIs) across 14 studies that investigated the performance of AI models in diagnosing psychiatric disorders. The pooled effect size for diagnostic accuracy was 0.85 (95% CI: 0.80–0.87), indicating a high level of precision achieved by AI technologies. Individual studies demonstrated varying effect sizes, with ML models consistently achieving higher performance metrics. For instance, Morales et al. and Kalmady et al. reported effect sizes close to the upper limit of the pooled estimate, highlighting the effectiveness of ML approaches in extracting diagnostic insights from structured datasets such as clinical records and neuroimaging.^37,48

Figure 3.

Diagnostic accuracy—forest plot of effect sizes with 95% CIs.

Subgroup analysis revealed differences in diagnostic performance based on the type of AI model utilized. ML models achieved the highest pooled diagnostic accuracy (effect size = 0.85), followed by hybrid models (effect size = 0.84) and DL techniques (effect size = 0.82). These findings underscore the robust ability of ML algorithms to process structured psychiatric data and identify patterns indicative of psychiatric conditions. Hybrid models demonstrated comparable performance, excelling in integrating diverse data sources such as biomarkers and neuroimaging, while DL techniques, despite excelling with complex and unstructured data, showed slightly lower diagnostic accuracy in this analysis.

Heterogeneity across the studies was moderate, as indicated by an I² value of 47% (Table 4). This suggests variability in study design, datasets, and populations, which could have influenced the diagnostic performance of the AI models. The Q-test results were statistically significant (p = 0.03), justifying the use of a random-effects model to account for this variability and provide a more robust pooled estimate.

Table 4.

Heterogeneity assessment I² statistic and Q-test.

Study	I² (%)	Q-test value
Danieli et al.³⁶	40	25
Morales et al.³⁷	35	22
Fulmer et al.³⁸	50	30
Wei and Li³⁹	60	40
Yu et al.⁴⁰	45	28
Danieli et al.⁴¹	55	33
Xiao Li⁴²	40	25
Zhang et al.⁴³	50	31
Andrikopoulos et al.⁴⁴	60	38
Zhang et al.⁴⁵	65	42
Gomeni et al.⁴⁶	50	35
Lacy et al.⁴⁷	45	29
Kalmady et al.⁴⁸	55	34
Nemesure et al.⁴⁹	60	41

Therapeutic efficacy

The forest plot of therapeutic efficacy (Figure 4) presents the synthesized effect sizes and 95% CIs for multiple studies evaluating the application of AI in psychiatric interventions. The pooled effect size was 0.84 (95% CI: 0.82–0.86), demonstrating that AI models have a robust impact on therapeutic outcomes. Individual studies varied in their results, with ML models consistently achieving higher efficacy. Notable studies, such as those by Yu et al. and Li, reported some of the highest effect sizes, highlighting the advanced capabilities of ML algorithms in personalizing treatment plans and predicting therapeutic outcomes.^40,42

Figure 4.

Therapeutic efficacy—forest plot of effect sizes with 95% CIs.

Subgroup analysis revealed distinct differences in performance among AI methodologies. ML models showed the highest pooled effect size of 0.85 (95% CI: 0.83–0.87), excelling at processing structured clinical data to optimize therapeutic interventions and achieving consistently high efficacy with minimal variability. Studies such as Fulmer et al. and Gomeni et al. demonstrated exceptional performance, positioning ML as the most effective methodology.^38,46 Hybrid models, with a pooled effect size of 0.84 (95% CI: 0.81–0.86), combine the strengths of ML and DL. These models excel in integrating diverse data sources such as clinical records, neuroimaging, and biomarkers. While hybrid models performed slightly lower than ML, studies like Danieli et al. and Lacy et al. showcased their utility in therapeutic applications.^41,47 DL models, with a pooled effect size of 0.82 (95% CI: 0.80–0.84), demonstrated solid performance, particularly with complex and unstructured data such as neuroimaging and genetic datasets. However, DL showed slightly lower efficacy and greater variability compared to ML and hybrid models. Studies such as Zhang et al. and Nemesureet al. highlight the potential of DL, though its overall effectiveness was lower in this meta-analysis.^43,49

Moderate heterogeneity was observed across studies, as indicated by an I² statistic of 57%, reflecting variability in factors such as intervention designs, patient populations, and therapeutic outcomes. The Q-test results were statistically significant (p = .04), supporting the use of a random-effects model to account for this heterogeneity and provide robust pooled estimates. These findings underscore the potential of AI technologies in advancing therapeutic strategies in psychiatry by tailoring interventions, monitoring progress, and predicting outcomes. ML models demonstrated the highest efficacy, followed by hybrid models, while DL models showed slightly lower but still substantial performance. To further solidify AI's role in psychiatric care, standardization in study designs and reporting metrics is necessary to reduce variability and improve comparability across studies.

Publication bias

To evaluate publication bias, a funnel plot was generated (Figure 5), displaying the relationship between study precision (standard error) and effect sizes (log odds ratio). The plot includes individual study points, an overall effect line, and 95% confidence boundaries. While the plot appears generally symmetrical, there is some dispersion among smaller studies, suggesting a possible tendency toward selective reporting. Statistical tests, including Egger's regression and Begg's rank correlation, were conducted to further assess bias. The results showed p-values greater than .05, indicating no statistically significant asymmetry. This suggests that while minor bias may be present, it is unlikely to substantially impact on the overall conclusions of the meta-analysis. The assessment reinforces the robustness of the findings but underscores the importance of cautious interpretation given the potential for selective publication.

Figure 5.

Funnel plot publication bias assessment.

Risk of bias summary

The aggregated risk of bias summary (Figure 6) highlighted that most studies adhered to rigorous methodological standards. However, four studies exhibited high risk in at least one domain, particularly in areas related to detection bias and handling incomplete outcome data. Despite these limitations, the majority of studies showed low risk in critical areas such as randomization (selection bias) and selective reporting (reporting bias), which bolsters the validity of the meta-analysis conclusions. This risk assessment ensures that the findings of the meta-analysis are grounded in high-quality evidence.

Figure 6.

Risk of bias summary across 14 studies.

Comparative performance of AI techniques

A comparative evaluation of AI techniques (Table 5) revealed that ML models achieved the highest performance metrics, with diagnostic accuracy at 85% and therapeutic efficacy at 85%. This superior performance can be attributed to ML's strength in analyzing structured data, such as clinical records, neuroimaging, and patient demographics, which allows it to identify diagnostic patterns more effectively. ML models excel at processing large volumes of well-organized data, which enhances their ability to make precise predictions and optimize therapeutic interventions in psychiatric care.

Table 5.

Subgroup and sensitivity analyses by AI technology.

AI technique in psychiatry	Diagnostic accuracy (%)	Therapeutic efficacy (%)
Machine learning	85	85
Hybrid models	84	85
Deep learning	80	85

Hybrid models followed closely, demonstrating an 84% diagnostic accuracy and an 85% therapeutic efficacy. These models combine multiple AI approaches, such as ML with DL and natural language processing, enabling them to integrate diverse data sources, including unstructured data like clinical notes and neuroimaging. This flexibility allows hybrid models to capture a broader range of relevant information, improving their diagnostic and therapeutic effectiveness in varied clinical settings. Their performance indicates the value of combining different AI methodologies to address the complexity and diversity of psychiatric conditions.

DL models, while slightly less effective in diagnostic accuracy, still performed well, with diagnostic accuracy at 80% and therapeutic efficacy at 85%. DL strength lies in its ability to handle large, complex, and unstructured datasets, such as neuroimaging, genetic data, and longitudinal health records. Despite performing slightly lower in diagnostic accuracy compared to ML and hybrid models, DL's ability to process unstructured data allows it to provide high-precision outcomes in therapeutic applications, particularly for symptom monitoring and relapse prediction. These findings underscore the versatility and effectiveness of AI methodologies in psychiatric research and clinical applications. ML stands out for its superior diagnostic accuracy, reflecting its strength in structured data analysis, while DL excels in therapeutic efficacy, showcasing its capacity to manage and extract valuable insights from complex, unstructured datasets. The results highlight that while ML is most effective for diagnostic purposes, hybrid and DL models offer unique advantages for personalized treatment and therapeutic decision-making. A nuanced understanding of each model's strengths allows clinicians to select the most appropriate AI tools based on the nature of the data and the specific needs of the patient population.

Discussion

The findings of this systematic review and meta-analysis reveal the transformative potential of AI in psychiatry, particularly in its diagnostic accuracy and therapeutic efficacy. These results align with, and extend, a growing body of literature exploring the intersection of AI technologies and mental health care. Our findings underscore AI's capacity to improve diagnostic precision, enhance therapeutic outcomes, and personalize psychiatric care. This discussion contextualizes these results within the broader landscape of existing meta-analyses, emphasizing how this study uniquely contributes to the ongoing dialogue on AI's role in psychiatry.

Diagnostic accuracy

This study found a pooled effect size of 0.85 (95% CI: 0.80–0.87) for diagnostic accuracy, which underscores the significant advancements AI has made in identifying and classifying psychiatric disorders. The strength of ML models in processing structured datasets, such as clinical records and neuroimaging data, mirrors findings from prior meta-analyses, such as those by Zhong et al. and Li.^52,53 Both studies reported high diagnostic accuracy for ML models, reinforcing our result that ML remains the most effective methodology for psychiatric diagnoses. However, unlike these prior analyses, our study highlights the broader impact of hybrid models, which combine ML with other approaches like DL and natural language processing.^{42, 48}

Hybrid models, which we found to have a pooled effect size of 0.84 (95% CI: 0.80–0.88), demonstrated comparable efficacy. This result extends beyond previous work such as Abd-Alrazaq et al. and He et al., who also identified the utility of hybrid models in psychiatric diagnoses, especially in the detection of schizophrenia and mood disorders.^{54, 55} Our study's unique contribution lies in further emphasizing how the integration of diverse data sources—clinical, neuroimaging, and biomarker data—can enhance diagnostic accuracy, providing a more nuanced understanding of AI's diagnostic potential in psychiatry. Additionally, our analysis examined the performance of DL models, which showed a pooled effect size of 0.82 (95% CI: 0.79–0.85). While this result is consistent with findings from other studies, which found DL models perform exceptionally well with unstructured data such as neuroimaging and genetic data, our study adds a critical layer of insight by highlighting the relative underperformance of DL models compared to ML and hybrid models in diagnostic contexts.^56,57 This difference underscores the need to further optimize DL methodologies for psychiatric applications.

Another key point highlighted by this study, which has been previously noted in literature, is the variability in AI performance across different populations and settings. AI systems trained on Western populations often exhibit reduced accuracy when applied to more diverse groups, emphasizing the need for culturally sensitive and inclusive models.⁵⁸ This consideration is relatively underexplored in prior meta-analyses, and our study contributes to this discussion by suggesting that the generalizability of AI models in psychiatry will require more inclusive datasets, representative of various demographic groups.

Therapeutic efficacy

AI's therapeutic applications in psychiatry showed robust performance, with a pooled effect size of 0.84 (95% CI: 0.82–0.86). This result echoes findings from studies such as Quaak et al. and Meinke, which also highlighted the efficacy of ML models in developing personalized treatment recommendations and symptom tracking systems for psychiatric conditions.^59,60 Our study builds on this existing literature by showing that ML models excel in optimizing therapeutic outcomes by processing structured clinical data, offering actionable insights for treatment planning. In contrast to earlier reviews that have primarily focused on the performance of ML alone, our study introduces a comprehensive examination of hybrid models. These models, which combine ML with DL and natural language processing (NLP), demonstrated competitive performance, achieving effect sizes of up to 0.84 (95% CI: 0.81–0.86). This finding is consistent with Wang et al., who showed that hybrid models could enhance the precision of therapeutic insights derived from patient-reported outcomes and clinical notes.⁶¹ Our contribution lies in further emphasizing how hybrid models can effectively integrate diverse data sources, providing clinicians with more personalized therapeutic strategies.⁶²

Our analysis also explored the contributions of DL algorithms in therapeutic contexts, particularly in symptom monitoring and relapse prediction. The study by Kaur et al. found that DL algorithms could accurately predict relapse risks in patients with bipolar disorder.⁶³ While DL was highly effective in handling complex and unstructured datasets, such as neuroimaging and longitudinal health data, its overall performance was slightly lower (pooled effect size: 0.82, 95% CI: 0.80–0.84) compared to ML and hybrid models. These findings align with results from other meta-analyses, such as Villarreal-Zegarra et al., which found that DL methods are particularly dependent on data quality and sample size,⁶⁴ but our study goes further by directly comparing DL with ML and hybrid models across therapeutic applications.

One of the more notable findings of this study is the consistency in therapeutic efficacy across various studies, which aligns with the results of Qiu et al., who reported pooled effect sizes for diagnostic accuracy ranging from 0.82 to 0.89.⁶⁵ While this meta-analysis reinforces the reproducibility of AI-driven outcomes, our work also points out the limitations of ML in real-world applications.⁶⁶ Specifically, Linardon et al. discussed how machine-learning models face challenges in scaling to real-world settings due to the high computational resources required.⁶⁷ Our study builds on this by offering a more nuanced understanding of how ML can be optimized for broader clinical adoption.

Unique contributions of this study

This study makes a unique contribution by offering a comprehensive comparison of ML, DL, and hybrid models across both diagnostic and therapeutic applications. Previous meta-analyses, such as those by He and Abd-Alrazaq et al., have primarily focused on either diagnostic or therapeutic applications in isolation.^55,56 Our work advances the field by presenting a comparative analysis of AI's diagnostic and therapeutic efficacy, thereby offering a holistic view of its potential in psychiatry. Additionally, by incorporating hybrid models and highlighting their potential in both diagnostic and therapeutic contexts, this study provides a new avenue for future research, encouraging the integration of various AI methodologies to improve clinical outcomes. The focus on the generalizability and inclusivity of AI models further sets this research apart, addressing a critical gap in the literature regarding the application of AI across diverse populations.⁶⁸

However, concerns around data privacy and security are significant in the implementation of AI in psychiatry. As highlighted by Linardon et al., physicians often express hesitancy in adopting AI tools due to fears of breaching patient confidentiality, especially when handling sensitive mental health data.⁶⁹ These concerns are particularly pressing in psychiatry, where patient data are inherently personal and vulnerable. Ensuring robust data protection measures and transparency in AI processes is essential to alleviate these concerns and build trust among clinicians and patients alike. Moreover, the integration of AI in psychiatric care faces unique challenges due to the subjectivity of psychiatric diagnoses and the complexity of individual treatment plans. Physicians are concerned about relying on AI systems that may not fully capture the nuanced understanding of patient needs, especially when clinical judgment is central to diagnosis and treatment.⁷⁰ Therefore, future research should also focus on addressing these concerns by developing AI systems that complement, rather than replace, the expertise of mental health professionals, ensuring that AI tools support personalized, patient-centered care.

Validation of AI models in psychiatric applications

While the studies included in this review demonstrate promising results in terms of diagnostic and therapeutic efficacy, it is critical to highlight the validation methods employed in assessing the performance of AI models, particularly for DL, ML, and hybrid approaches. Effective validation is essential for ensuring the reliability and generalizability of AI models in diverse clinical settings. Several studies utilized cross-validation and external validation to assess the robustness of the AI models, while others compared the performance of AI models against established clinical benchmarks.⁵⁹ However, the methodological variability in validation approaches across studies limits the ability to draw definitive conclusions about the broader applicability of these models.⁵⁵ Future research should prioritize standardized validation methods, including multi-center trials and external validation using diverse patient populations, to confirm the generalizability of these AI-driven techniques in psychiatry.

Practical implications

The implications of these findings are far-reaching. AI's ability to improve diagnostic accuracy has the potential to address longstanding challenges in psychiatry, such as misdiagnosis and delayed treatment initiation. By identifying patterns in complex data sets, AI systems can support clinicians in making more informed and timely decisions, ultimately improving patient outcomes.⁷¹ For instance, early detection of conditions like schizophrenia or bipolar disorder could enable earlier interventions, mitigating the progression of these disorders and reducing the associated societal and economic burdens.⁷² Therapeutically, AI-driven tools can complement traditional approaches by offering personalized, data-driven insights.⁷³ For example, symptom monitoring applications can provide real-time feedback to both patients and clinicians, facilitating more adaptive and responsive care.⁷⁴ Additionally, AI's capacity to predict treatment responses and relapse risks can enable a shift toward preventive psychiatry, focusing on maintaining mental health rather than merely addressing crises.⁷⁵

Ethical and practical considerations

Despite its promise, the integration of AI into psychiatry raises several ethical and practical concerns. Issues such as data privacy, algorithmic bias, and the interpretability of AI models should be addressed to ensure equitable and ethical use.^69,76 Research by Lee et al. highlighted how biases in training data can perpetuate health disparities, emphasizing the need for rigorous validation and transparency in AI model development.⁷⁷ Moreover, the reliance on high-quality data for training AI models poses challenges in resource-limited settings, where data availability and quality may be constrained.⁷⁸ Collaborative efforts among researchers, clinicians, and policymakers are crucial to ensuring that AI technologies are accessible and beneficial to diverse populations.

Future directions

Future research should focus on addressing these challenges while expanding the scope of AI applications in psychiatry. Areas such as natural language processing (NLP) and wearable technologies hold significant promises for advancing both diagnostic and therapeutic capabilities. For example, NLP can analyze patient narratives to detect subtle linguistic markers of mental health conditions, while wearable devices can provide continuous, real-time monitoring of physiological and behavioral data. Additionally, longitudinal studies are needed to evaluate the long-term impact of AI-driven interventions on mental health outcomes. While the current analysis provides strong evidence for AI's efficacy in controlled settings, real-world trials will be critical to understanding its practical utility and sustainability.

Strengths and limitations of this study

This study has several strengths, including its comprehensive scope, rigorous methodology, and focus on both diagnostic and therapeutic applications of AI in psychiatry. It offers valuable comparative insights into different AI methodologies, supported by statistically significant findings, and provides a holistic evaluation of AI's potential in mental health care. However, several limitations should be considered. First, while 14 studies met the inclusion criteria, the relatively small number of studies included in the meta-analysis may limit the statistical power and the ability to detect true heterogeneity among the studies. This could affect the generalizability of the findings, particularly when considering the potential variability in study designs, populations, and AI methodologies. The presence of moderate heterogeneity, as indicated by the I² statistic, suggests that the results may be influenced by factors such as study quality and data characteristics, which need to be addressed in future research.

A notable limitation is the potential dependence of effect size on sample size, as suggested by Figure 3. The observed relationship between sample size and effect size warrants further investigation. It is possible that the effect sizes are driven more by sample size than by the specific AI techniques employed. This possibility requires further scrutiny in future research to clarify whether larger sample sizes contribute to more precise or inflated effect sizes, and to what extent the methodological differences between techniques influence the outcomes. Future studies should control sample size when assessing the efficacy of different AI methodologies to ensure a more accurate interpretation of the findings. Additionally, there is a lack of adequate blinding in many of the included studies, as evidenced by the risk of bias assessment in Figure 2. Insufficient blinding can introduce biases, particularly in the assessment of diagnostic accuracy and therapeutic efficacy, potentially skewing results and impacting the reliability of conclusions drawn from these studies. The absence of blinding in the trials included in this review may affect the objectivity of the reported outcomes and should be considered when interpreting the findings. Future research should prioritize improving blinding procedures to minimize bias and enhance the credibility of AI evaluation studies. Further limitations include the underrepresentation of diverse populations in the included studies, which could impact the generalizability of AI applications across different demographic groups. The reliance on short-term outcomes and secondary data further restricts its applicability to real-world settings, and ethical and practical considerations, such as data privacy and algorithmic bias, are not fully addressed in this review. Additionally, the underrepresentation of emerging AI techniques and a lack of longitudinal evidence highlight important areas for future research and refinement.

Conclusion

The results of this systematic review and meta-analysis highlight the significant potential of AI in psychiatry, particularly in enhancing diagnostic accuracy and therapeutic efficacy. Our findings suggest that AI technologies, especially ML models, have made substantial progress in both diagnostic and therapeutic applications. Specifically, ML models demonstrated the highest diagnostic accuracy (85%) and therapeutic efficacy (85%) among the AI methodologies reviewed. However, while these results are compelling, further statistical testing, such as pairwise comparisons, is needed to confirm the differences between AI techniques, as the current analysis does not provide such tests across the groups divided by model type. In addition to these promising outcomes, the study emphasizes the need for addressing the ethical, practical, and technical challenges involved in the integration of AI into psychiatric care. These challenges include ensuring data privacy, mitigating algorithmic biases, and refining AI models for broader and more diverse populations. The transformative potential of AI in mental health care is clear, but to realize this potential, further research should focus on standardizing methodologies, validating models in real-world settings, and exploring new, innovative applications. This will ultimately foster a more personalized, efficient, and equitable approach to psychiatric care.

Footnotes

Acknowledgments

The authors are deeply grateful to the Miyan Research Institute, International University of Business Agriculture and Technology, Dhaka, Bangladesh. Additionally, this research was supported by the Deanship of Scientific Research, King Saud University, Riyadh, Saudi Arabia.

ORCID iDs

Moustaq Karim Khan Rony

Dipak Chandra Das

Most. Tahmina Khatun

Silvia Ferdousi

Mosammat Ruma Akter

Most. Hasina Begum

Md Ibrahim Khalil

Mst. Rina Parvin

Fazila Akter

Ethical considerations

Our study did not require an ethical board approval because it did not contain human or animal trials.

Author contributions/CRediT

Moustaq Karim Khan Rony contributed to writing‒review and editing, writing‒original draft, visualization, validation, supervision, software, resources, project administration, methodology, investigation, formal analysis, data curation, and conceptualization. Dipak Chandra Das contributed to writing‒review and editing, writing‒original draft, software, project administration, methodology, formal analysis, data curation, and conceptualization. Most. Tahmina Khatun contributed to writing‒original draft, validation, methodology, formal analysis, data curation, and conceptualization. Silvia Ferdousi contributed to writing‒review and editing, visualization, validation, investigation, and data curation. Mosammat Ruma Akter contributed to writing‒review and editing, project administration, methodology, formal analysis, data curation. Mst. Amena Khatun contributed to writing‒review and editing, resources, project administration, formal analysis, and conceptualization. Most. Hasina Begum contributed to writing‒review and editing, visualization, investigation, formal analysis, data curation. Md Ibrahim Khalil contributed to writing‒review and editing, methodology, formal analysis, and data curation. Mst. Rina Parvin contributed to writing‒review and editing, writing‒original draft, supervision, investigation, formal analysis, and conceptualization. Daifallah M. Alrazeeni contributed to writing‒review and editing, visualization, validation, supervision, investigation, and conceptualization. Fazila Akter contributed to writing‒review and editing, methodology, formal analysis, data curation, supervision, and conceptualization.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Bendotti

Lawler

Chan

GCK

, et al. Conversational artificial intelligence interventions to support smoking cessation: a systematic review and meta-analysis. Digital Health 2023; 9: 20552076231211634.

Rony

MKK

Parvin

Ferdousi

. Advancing nursing practice with artificial intelligence: enhancing preparedness for the future. Nurs Open 2024; 11: nop2.2070.

Beg

Verma M

Vishvak Chanthar

KMM

, et al. Artificial intelligence for psychotherapy: a review of the current state and future directions. Indian J Psychol Med 2024: 02537176241260819. DOI: 10.1177/02537176241260819.

Wang

Ouyang

Jiao

, et al. Machine learning methods to discriminate posttraumatic stress disorder: a protocol of systematic review and meta-analysis. Digital Health 2024; 10: 20552076241239238.

Washington

Park

Srivastava

, et al. Data-driven diagnostics and the potential of mobile artificial intelligence for digital therapeutic phenotyping in computational psychiatry. Biol PsychiatCognit Neurosci Neuroimag 2020; 5: 759–769.

Zhou

Zhao

Zhang

. Application of artificial intelligence on psychological interventions and diagnosis: an overview. Front Psychiatry 2022; 13: 811665.

Komatsu

Watanabe

Fukuchi

. Psychiatric neural networks and precision therapeutics by machine learning. Biomedicines 2021; 9: 03.

Menke

. Precision pharmacotherapy: psychiatry’s future direction in preventing, diagnosing, and treating mental disorders. PGPM 2018; 11: 211–222.

Singh

. Artificial intelligence in the era of ChatGPT - opportunities and challenges in mental health care. Indian J Psychiatry 2023; 65: 297–298.

10.

Ray

Bhardwaj

Malik

, et al. Artificial intelligence and psychiatry: an overview. Asian J Psychiatr 2022; 70: 103021.

11.

Gual-Montolio

Jaén

Martínez-Borba

, et al. Using artificial intelligence to enhance ongoing psychological interventions for emotional problems in real- or close to real-time: a systematic review. IJERPH 2022; 19: 7737.

12.

Das

Gavade

. A review on the efficacy of artificial intelligence for managing anxiety disorders. Front Artif Intell 2024; 7: 1435895.

13.

Babu

Joseph

. Artificial intelligence in mental healthcare: transformative potential vs. The necessity of human interaction. Front Psychol 2024; 15: 1378904.

14.

Luxton

. An Introduction to artificial intelligence in behavioral and mental health care. In: Luxton DD (ed) Artificial intelligence in behavioral and mental health care. Seattle, Washington, USA: Elsevier, 2016, pp.1–26. DOI: 10.1016/B978-0-12-420248-1.00001-5.

15.

Graham

Depp

Lee

, et al. Artificial intelligence for mental health and mental illnesses: an overview. Curr Psychiatry Rep 2019; 21: 16.

16.

Rony

MKK

Kayesh

Bala

, et al. Artificial intelligence in future nursing care: exploring perspectives of nursing professionals - A descriptive qualitative study. Heliyon 2024; 10: e25718.

17.

Zhang

Wang

. Can AI replace psychotherapists? Exploring the future of mental health care. Front Psychiatry 2024; 15: 1444382.

18.

Brown

Story

Mourão-Miranda

, et al.

Will artificial intelligence eventually replace psychiatrists?

Br J Psychiatry 2021; 218: 131–134.

19.

Monaco

Vignapiano

Piacente

, et al. An advanced artificial intelligence platform for a personalised treatment of eating disorders. Front Psychiatry 2024; 15: 1414439.

20.

Bzdok

Meyer-Lindenberg

. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry: Cognit Neurosci Neuroimag 2018; 3: 223–230.

21.

Khare

Acharya

Shukla

, et al. Utilising artificial intelligence (AI) in the diagnosis of psychiatric disorders: a narrative review. JCDR 2024: 1–5. DOI: 10.7860/JCDR/2023/61698.19249.

22.

Silverman

Hanrahan

Huang

, et al. Artificial intelligence and human behavior modeling and simulation for mental health conditions. In: Artificial intelligence in behavioral and mental health care. Seattle, Washington, USA: Elsevier, 2016, pp.163–183. DOI: 10.1016/B978-0-12-420248-1.00007-6.

23.

Milne-Ives

Selby

Inkster

, et al.

Artificial intelligence and machine learning in mobile apps for mental health: a scoping review.

PLOS Digit Health 2022; 1: e0000079.

24.

King

Nanda

Stoddard

, et al. An Introduction to generative artificial intelligence in mental health care: considerations and guidance. Curr Psychiatry Rep 2023; 25: 839–846.

25.

Jin

Xie

, et al. Artificial intelligence in mental healthcare: an overview and future perspectives. Br J Radiol 2023; 96: 20230213.

26.

D’Alfonso

Santesteban-Echarri

Rice

, et al. Artificial intelligence-assisted online social therapy for youth mental health. Front Psychol 2017; 8: 96.

27.

Tornero-Costa

Martinez-Millana

Azzopardi-Muscat

, et al. Methodological and quality flaws in the use of artificial intelligence in mental health research: systematic review. JMIR Ment Health 2023; 10: e42045.

28.

Denecke

Abd-Alrazaq

Househ

. Artificial intelligence for chatbots in mental health: opportunities and challenges. In: Househ

Borycki

Kushniruk

(eds) Multiple perspectives on artificial intelligence in healthcare: lecture notes in bioengineering. Cham, Switzerland: Springer International Publishing, 2021, pp.115–128. DOI: 10.1007/978-3-030-67303-1_10.

29.

Rony

MKK

Numan

Akter

, et al. Nurses’ perspectives on privacy and ethical concerns regarding artificial intelligence adoption in healthcare. Heliyon 2024; 10: e36702.

30.

Adebayo

Bhuiyan

Ahmed

. Exploring the effectiveness of artificial intelligence, machine learning and deep learning in trauma triage: a systematic review and meta-analysis. Digital Health 2023; 9: 20552076231205736.

31.

Boucher

Harake

Ward

, et al. Artificially intelligent chatbots in digital mental health interventions: a review. Expert Rev Med Devices 2021; 18: 37–49.

32.

Rogan

Bucci

Firth

. Health care professionals’ views on the use of passive sensing, AI, and machine learning in mental health care: systematic review with meta-synthesis. JMIR Ment Health 2024; 11: e49577.

33.

Joyce

Kormilitzin

Smith

, et al. Explainable artificial intelligence for mental health through transparency and interpretability for understandability. npj Digit Med 2023; 6: 6.

34.

Warrier

Khandelwal

. Ethical considerations in the use of artificial intelligence in mental health. Egypt J Neurol Psychiatry Neurosurg 2023; 59: 39.

35.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021: n71. DOI: 10.1136/bmj.n71.

36.

Danieli

Ciulli

Mousavi

, et al. A conversational artificial intelligence agent for a mental health care app: evaluation study of its participatory design. JMIR Form Res 2021; 5: e30053.

37.

Morales

Barros

Echávarri

, et al. Acute mental discomfort associated with suicide behavior in a clinical sample of patients with affective disorders: ascertaining critical variables using artificial intelligence tools. Front Psychiatry 2017: 8. DOI: 10.3389/fpsyt.2017.00007.

38.

Fulmer

Joerin

Gentile

, et al. Using psychological artificial intelligence (tess) to relieve symptoms of depression and anxiety: randomized controlled trial. JMIR Ment Health 2018; 5: 64.

39.

Wei

. The impact of artificial intelligence on the mental health of manufacturing workers: the mediating role of overtime work and the work environment. Front Public Health 2022; 10: 862407.

40.

Chiu

Wang

, et al. A machine learning approach to passively informed prediction of mental health risk in people with diabetes: retrospective case-control analysis. J Med Internet Res 2021; 23: e27709.

41.

Danieli

Ciulli

Mousavi

, et al. Assessing the impact of conversational artificial intelligence in the treatment of stress and anxiety in aging adults: randomized controlled trial. JMIR Ment Health 2022; 9: e38067.

42.

. Evaluation and analysis of elderly mental health based on artificial intelligence. Occup Ther Int 2023; 2023: 1–11.

43.

Zhang

Yang

Cao

, et al. Detecting individuals with severe mental illness using artificial intelligence applied to magnetic resonance imaging. eBioMedicine 2023; 90: 104541.

44.

Andrikopoulos

Vassiliou

Fatouros

, et al. Machine learning-enabled detection of attention-deficit/hyperactivity disorder with multimodal physiological data: a case-control study. BMC Psychiatry 2024; 24: 47.

45.

Zhang

Zhao

Yang

, et al. An artificial intelligence tool to assess the risk of severe mental distress among college students in terms of demographics, eating habits, lifestyles, and sport habits: an externally validated study using machine learning. BMC Psychiatry 2024; 24: 81.

46.

Gomeni

Bressolle-Gomeni

Fava

. Artificial intelligence approach for the analysis of placebo-controlled clinical trials in major depressive disorders accounting for individual propensity to respond to placebo. Transl Psychiatry 2023; 13: 41.

47.

Lacy

Ramshaw

McCauley

, et al. Predicting individual cases of major adolescent psychiatric conditions with artificial intelligence. Transl Psychiatry 2023; 13: 14.

48.

Kalmady

Greiner

Agrawal

, et al. Towards artificial intelligence in mental health by improving schizophrenia prediction with multiple brain parcellation ensemble-learning. npj Schizophr 2019; 5: 2.

49.

Nemesure

Heinz

Huang

, et al. Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence. Sci Rep 2021; 11: 1980.

50.

Higgins

JPT

Altman

Gotzsche

, et al. The Cochrane collaboration’s tool for assessing risk of bias in randomised trials. Br Med J 2011; 343: d5928–d5928.

51.

Vandenbroucke

Elm

Altman

, et al. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. PLoS Med 2007; 4: e297.

52.

Zhong

Luo

Zhang

. The therapeutic effectiveness of artificial intelligence-based chatbots in alleviation of depressive and anxiety symptoms in short-course treatments: a systematic review and meta-analysis. J Affect Disord 2024; 356: 459–469.

53.

Zhang

Lee

, et al. Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. npj Digit Med 2023; 6: 36.

54.

Abd-Alrazaq

Rababeh

Alajlani

, et al. Effectiveness and safety of using chatbots to improve mental health: systematic review and meta-analysis. J Med Internet Res 2020; 22: e16021.

55.

Yang

Qian

, et al. Conversational agent interventions for mental health problems: systematic review and meta-analysis of randomized controlled trials. J Med Internet Res 2023; 25: e43862.

56.

Abd-Alrazaq

AlSaad

Shuweihdi

, et al. Systematic review and meta-analysis of performance of wearable artificial intelligence in detecting and predicting depression. npj Digit Med 2023; 6: 84.

57.

Lim

Shiau

CWC

Cheng

, et al. Chatbot-delivered psychotherapy for adults with depressive and anxiety symptoms: a systematic review and meta-regression. Behav Ther 2022; 53: 334–347.

58.

Saboori Amleshi

Ilaghi

Rezaei

, et al. Predictive utility of artificial intelligence on schizophrenia treatment outcomes: a systematic review and meta-analysis. Neurosci Biobehav Rev 2025; 169: 105968.

59.

Quaak

Van De Mortel

Thomas

, et al. Deep learning applications for the classification of psychiatric disorders using neuroimaging data: systematic review and meta-analysis. NeuroImage: Clin 2021; 30: 102584.

60.

Meinke

Lueken

Walter

, et al. Predicting treatment outcome based on resting-state functional connectivity in internalizing mental disorders: a systematic review and meta-analysis. Neurosci Biobehav Rev 2024; 160: 105640.

61.

Wang

Ouyang

Jiao

, et al. The application of machine learning techniques in posttraumatic stress disorder: a systematic review and meta-analysis. npj Digit Med 2024; 7: 21.

62.

Wang

Jiang

, et al. Efficacy of artificial intelligence-assisted psychotherapy in patients with anxiety disorders: a prospective, national multicenter randomized controlled trial protocol. Front Psychiatry 2022; 12: 799917.

63.

Kaur

Sharma

. Diagnosis of human psychological disorders using supervised learning and nature-inspired computing techniques: a meta-analysis. J Med Syst 2019; 43: 04.

64.

Villarreal-Zegarra

Reategui-Rivera

García-Serna

, et al. Self-Administered interventions based on natural language processing models for reducing depressive and anxiety symptoms: systematic review and meta-analysis. JMIR Ment Health 2024; 11: e59560.

65.

Qiu

Liu

, et al. Effectiveness of digital intelligence interventions on depression and anxiety in older adults: a systematic review and meta-analysis. Psychiatry Res 2024; 342: 116166.

66.

Jacobson

Nemesure

. Using artificial intelligence to predict change in depression and anxiety symptoms in a digital intervention: evidence from a transdiagnostic randomized controlled trial. Psychiatry Res 2021; 295: 113618.

67.

Linardon

Torous

Firth

, et al. Current evidence on the efficacy of mental health smartphone apps for symptoms of depression and anxiety: a meta-analysis of 176 randomized controlled trials. World Psychiatry 2024; 23: 139–149.

68.

Cruz-Gonzalez

AWJ

Lam

, et al. Artificial intelligence in mental health care: a systematic review of diagnosis, monitoring, and intervention applications. Psychol Med 2025; 55: e18.

69.

Linardon

Liu

Messer

, et al. Current practices and perspectives of artificial intelligence in the clinical management of eating disorders: insights from clinicians and community participants. Intl J Eating Disorders 2025: eat.24385. DOI: 10.1002/eat.24385.

70.

Pak

Hernandez

, et al.

Artificial intelligence in psychiatry: threat or blessing?

Acad Psychiatry 2023; 47: 587–588.

71.

Rahim

Khatoon

Khan

, et al. Artificial intelligence-powered dentistry: probing the potential, challenges, and ethicality of artificial intelligence in dentistry. Digital Health 2024; 10: 20552076241291345.

72.

Doraiswamy

Blease

Bodner

. Artificial intelligence and the future of psychiatry: insights from a global physician survey. Artif Intell Med 2020; 102: 101753.

73.

Shinners

Grace

Smith

, et al. Exploring healthcare professionals’ perceptions of artificial intelligence: piloting the Shinners artificial intelligence perception tool. Digital Health 2022; 8: 205520762210781.

74.

Rony

MKK

Numan

Johra

, et al. Perceptions and attitudes of nurse practitioners toward artificial intelligence adoption in health care. Health Sci Rep 2024; 7: e70006.

75.

Walsh

Chaudhry

Dua

, et al. Stigma, biomarkers, and algorithmic bias: recommendations for precision behavioral health with artificial intelligence. JAMIA Open 2020; 3: 9–15.

76.

Smrke

Mlakar

Lin

, et al. Language, speech, and facial expression features for artificial intelligence-based detection of cancer survivors’ depression: scoping meta-review. JMIR Ment Health 2021; 8: e30439.

77.

Lee

Ragguett

Mansur

, et al. Applications of machine learning algorithms to predict therapeutic outcomes in depression: a meta-analysis and systematic review. J Affect Disord 2018; 241: 519–532.

78.

Migdadi

Oweidat

Alosta

, et al. The association of artificial intelligence ethical awareness, attitudes, anxiety, and intention-to-use artificial intelligence technology among nursing students. Digital Health 2024; 10: 20552076241301958.

Study ID	Title and abstract	Introduction	Methods	Results	Discussion	Other information
1	Yes	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes	Yes
5	Yes	Yes	Yes	Yes	Yes	Yes
6	Yes	Yes	Yes	Yes	Yes	Yes
7	Yes	Yes	Yes	Yes	Yes	Yes
8	Yes	Yes	Yes	Yes	Yes	Yes
9	Yes	Yes	Yes	Yes	Yes	Yes
10	Yes	Yes	Yes	Yes	Yes	Yes
11	Yes	Yes	Yes	Yes	Yes	Yes
12	Yes	Yes	Yes	Yes	Yes	Yes
13	Yes	Yes	Yes	Yes	Yes	Yes
14	Yes	Yes	Yes	Yes	Yes	Yes

Study ID	Title and abstract	Introduction	Methods	Results	Discussion	Other information
1	Yes	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes	Yes
5	Yes	Yes	Yes	Yes	Yes	Yes
6	Yes	Yes	Yes	Yes	Yes	Yes
7	Yes	Yes	Yes	Yes	Yes	Yes
8	Yes	Yes	Yes	Yes	Yes	Yes
9	Yes	Yes	Yes	Yes	Yes	Yes
10	Yes	Yes	Yes	Yes	Yes	Yes
11	Yes	Yes	Yes	Yes	Yes	Yes
12	Yes	Yes	Yes	Yes	Yes	Yes
13	Yes	Yes	Yes	Yes	Yes	Yes
14	Yes	Yes	Yes	Yes	Yes	Yes

Artificial intelligence in psychiatry: A systematic review and meta-analysis of diagnostic and therapeutic efficacy

Abstract

Background

Aims

Methods

Results

Conclusion

Keywords

Highlights

Introduction

Methodology

Conceptualization of the study

Guideline and registration

Literature search protocol

Inclusion and exclusion criteria

Study selection and data management

Data extraction and reliability testing

Quality and bias assessment

Statistical analysis

Results

Study selection

Study characteristics

Quality assessment

Diagnostic accuracy

Therapeutic efficacy

Publication bias

Risk of bias summary

Comparative performance of AI techniques

Discussion

Diagnostic accuracy

Therapeutic efficacy

Unique contributions of this study

Validation of AI models in psychiatric applications

Practical implications

Ethical and practical considerations

Future directions

Strengths and limitations of this study

Conclusion

Footnotes

Acknowledgments

ORCID iDs

Ethical considerations

Author contributions/CRediT

Funding

Conflicting interests

References

Study ID	Title and abstract	Introduction	Methods	Results	Discussion	Other information
1	Yes	Yes	Yes	Yes	Yes	Yes
2	Yes	Yes	Yes	Yes	Yes	Yes
3	Yes	Yes	Yes	Yes	Yes	Yes
4	Yes	Yes	Yes	Yes	Yes	Yes
5	Yes	Yes	Yes	Yes	Yes	Yes
6	Yes	Yes	Yes	Yes	Yes	Yes
7	Yes	Yes	Yes	Yes	Yes	Yes
8	Yes	Yes	Yes	Yes	Yes	Yes
9	Yes	Yes	Yes	Yes	Yes	Yes
10	Yes	Yes	Yes	Yes	Yes	Yes
11	Yes	Yes	Yes	Yes	Yes	Yes
12	Yes	Yes	Yes	Yes	Yes	Yes
13	Yes	Yes	Yes	Yes	Yes	Yes
14	Yes	Yes	Yes	Yes	Yes	Yes