Abstract
Background
Artificial Intelligence (AI) has demonstrated significant potential in transforming psychiatric care by enhancing diagnostic accuracy and therapeutic interventions. Psychiatry faces challenges like overlapping symptoms, subjective diagnostic methods, and personalized treatment requirements. AI, with its advanced data-processing capabilities, offers innovative solutions to these complexities.
Aims
This study systematically reviewed and meta-analyzed the existing literature to evaluate AI's diagnostic accuracy and therapeutic efficacy in psychiatric care, focusing on various psychiatric disorders and AI technologies.
Methods
Adhering to PRISMA guidelines, the study included a comprehensive literature search across multiple databases. Empirical studies investigating AI applications in psychiatry, such as machine learning (ML), deep learning (DL), and hybrid models, were selected based on predefined inclusion criteria. The outcomes of interest were diagnostic accuracy and therapeutic efficacy. Statistical analysis employed fixed- and random-effects models, with subgroup and sensitivity analyses exploring the impact of AI methodologies and study designs.
Results
A total of 14 studies met the inclusion criteria, representing diverse AI applications in diagnosing and treating psychiatric disorders. The pooled diagnostic accuracy was 85% (95% CI: 80%–87%), with ML models achieving the highest accuracy, followed by hybrid and DL models. For therapeutic efficacy, the pooled effect size was 84% (95% CI: 82%–86%), with ML excelling in personalized treatment plans and symptom tracking. Moderate heterogeneity was observed, reflecting variability in study designs and populations. The risk of bias assessment indicated high methodological rigor in most studies, though challenges like algorithmic biases and data quality remain.
Conclusion
AI demonstrates robust diagnostic and therapeutic capabilities in psychiatry, offering a data-driven approach to personalized mental healthcare. Future research should address ethical concerns, standardize methodologies, and explore underrepresented populations to maximize AI's transformative potential in mental health.
Keywords
Highlights
AI achieved 85% pooled diagnostic accuracy, excelling in detecting complex psychiatric disorders. Machine learning models demonstrated the highest diagnostic and therapeutic performance among AI methodologies. Therapeutic efficacy of AI technologies reached 84%, enabling personalized treatment strategies in psychiatry. Hybrid AI models effectively integrated diverse data sources for enhanced diagnostic and therapeutic outcomes. Ethical challenges and methodological variability underline the need for standardized, inclusive AI applications in mental health care.
Introduction
Artificial Intelligence (AI) has emerged as a transformative force driving advancements across numerous fields, particularly healthcare. 1 Its capacity to revolutionize diagnostics, treatment planning, and patient care has attracted substantial attention. 2 Within psychiatry—a field dedicated to understanding the complexities of human emotions, cognition, and behavior—the integration of AI presents remarkable opportunities.3,4 This is especially critical given the inherent challenges in addressing mental health disorders, 5 which often feature overlapping symptoms, subjective diagnostic methods, and personalized treatment requirements. 6 AI provides a data-driven pathway to enhance diagnostic accuracy, improve therapeutic outcomes, and foster personalized mental healthcare.7,8
AI's strength lies in its ability to process and synthesize extensive, complex datasets, a capability highly relevant to psychiatry. 9 Traditional diagnostic approaches typically depend on clinical assessments, interviews, and self-reports, which, while valuable, can lack consistency and fail to capture the intricate nature of mental health conditions.10,11 AI technologies, including machine learning (ML) and natural language processing, bring a fresh perspective to psychiatric care by integrating diverse data sources such as electronic health records (EHRs),12,13 neuroimaging scans, genetic information, and real-time behavioral data. 14 By combining these inputs, AI can improve diagnostic precision, support early detection, and help clinicians develop tailored treatment strategies.15,16
Accurate diagnosis is fundamental to effective psychiatric care, but it remains a significant challenge due to overlapping symptoms across many mental health conditions. 17 Disorders like schizophrenia, depression, bipolar disorder, and anxiety share common characteristics, complicating efforts to distinguish between them using traditional methods.18,19 AI offers potential solutions to these complexities. 20 For example, ML models can analyze neuroimaging data to identify biomarkers unique to specific disorders, while natural language processing tools can evaluate speech patterns and text inputs to detect early mental health issues. 21 These advanced capabilities provide clinicians with deeper insights, enabling more precise and timely interventions. 22
Beyond diagnostics, AI is becoming an essential tool in therapeutic applications. 23 Virtual therapists and AI-powered chatbots are increasingly popular for delivering psychological support, including cognitive-behavioral therapy and stress management techniques. 24 These technologies offer scalable solutions, especially in underserved areas or among populations with limited access to trained professionals. 25 Moreover, AI can help clinicians personalize treatment plans by analyzing patient histories, real-time information, and predictive patterns to recommend therapies most likely to succeed. 26 This personalized approach is particularly crucial in psychiatry, where treatment effectiveness often varies significantly from person to person. 27
Despite its promise, integrating AI into psychiatry comes with challenges that need be carefully managed. 28 Key ethical concerns include safeguarding patient privacy, ensuring data security, and addressing potential biases in AI systems. 29 Additionally, rigorous validation and standardization of AI tools are essential to meet clinical and ethical benchmarks. 30 As the field evolves, collaboration between psychiatrists, AI developers, and other stakeholders is vital in overcoming these challenges and maximizing AI's benefits in mental healthcare. 31
The growing interest in AI's role in psychiatry reflects its transformative potential. 32 However, the existing research exhibits considerable variation in methodology, scope, and quality, highlighting the necessity for systematic reviews and meta-analyses to synthesize findings, assess AI's effectiveness, and identify areas that require further investigation.33,34 By synthesizing the current evidence, a clearer picture can emerge of how AI can advance psychiatric care and improve outcomes for patients. This study systematically reviewed and meta-analyzed existing research to evaluate the diagnostic and therapeutic efficacy of AI in psychiatry.
Methodology
Conceptualization of the study
This study was structured as a systematic review and meta-analysis, employing a robust methodology to aggregate and analyze data from existing research on AI applications in psychiatry. The two primary goals were to assess the diagnostic precision of AI and evaluate its therapeutic effectiveness across various psychiatric disorders. The choice of this framework is driven by its unmatched capacity to thoroughly examine available data, enabling a quantitative synthesis that highlights both overarching patterns and specific results related to AI integration in psychiatric practice.
Guideline and registration
The research design adheres rigorously to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to ensure detailed and transparent reporting (Figure 1). 35 These guidelines provide the foundational structure for the review process, including the literature search, study selection, bias assessment, and data synthesis. Moreover, the study was registered with the Open Science Framework (OSF) (10.17605/OSF.IO/4PZXC), ensuring transparency and reproducibility of the research methodology.

The PRISMA flowchart.
Literature search protocol
The comprehensive literature search was conducted across four major databases: PubMed, Scopus, Web of Science, and PsycINFO, utilizing a broad range of keywords and MeSH terms such as “artificial intelligence,” “machine learning,” “deep learning (DL),” “hybrid models,” “neural networks,” and “psychiatry.” Boolean operators were applied to refine the search, ensuring coverage of both peer-reviewed and grey literature (Table 1). The search was unrestricted by language or publication status to capture all relevant research. The search duration for each database spanned from November 1, 2024, to January 2, 2025, with the databases being last searched during this period. Specifically, PubMed was searched from November 1 to December 15, 2024; Scopus was searched from November 1 to December 10, 2024; Web of Science was searched from November 1 to December 20, 2024; and PsycINFO was searched from November 1 to December 30, 2024. This approach ensured that the most recent studies were included while also capturing foundational research and emerging AI methodologies in psychiatry.
Search strategy.
Inclusion and exclusion criteria
Studies selected for the review included randomized controlled trials, cohort studies, and other empirical research that investigated AI applications in psychiatric settings, involving participants diagnosed with psychiatric conditions across various demographics. The AI interventions analyzed encompassed technologies such as ML algorithms, DL models, hybrid models, and neural networks, with control groups not utilizing AI technologies serving as comparators. The primary outcomes assessed focused on diagnostic accuracy and therapeutic efficacy. Exclusion criteria were applied to non-empirical studies, research not directly related to AI applications in psychiatry, and studies compromised by incomplete data. The inclusion of DL and hybrid models highlighted their unique capabilities, such as handling unstructured data and integrating diverse data sources, further enriching the review's scope.
Study selection and data management
The selection process for the systematic review and meta-analysis began with meticulous screening of titles and abstracts. This initial phase was essential for identifying studies that potentially met the inclusion criteria focused on the application of AI in psychiatric settings. The title and abstract screening were performed by two independent reviewers, who worked together to ensure that only the most relevant studies advanced to the next stage of review. Any discrepancies were resolved through discussion or by consulting a third reviewer if necessary. Subsequently, a comprehensive full-text review was conducted for each shortlisted study. This in-depth review evaluated each study against predefined criteria, including relevance to the research questions, methodological rigor, and the specific AI technologies utilized. The full-text review was conducted by three authors to ensure accuracy and to reduce potential biases.
Data extraction and reliability testing
Once the studies were selected, the critical step of data extraction began in Phase 1. A standardized data extraction form was used to systematically gather key information from each study, including authorship, publication year, study design, sample size, types of AI technology utilized, primary outcomes observed, and key findings (Table 2). Two authors were responsible for the initial data extraction during this phase. In Phase 2, a pilot test of the data extraction form was conducted on a subset of studies to ensure the reliability of the process. This phase was essential for identifying and resolving any issues with the form, ensuring consistency in data collection across all included studies. Three authors participated in this phase, assessing the form's usability and consistency. In Phase 3, any discrepancies encountered during the data extraction process were addressed through collaborative discussions among the research team. Four authors participated in resolving these discrepancies to ensure accuracy. In the final Phase 4, consultations with two external experts were conducted to resolve complex issues, ensuring the accuracy and integrity of the data collected.
Study characteristics.
Quality and bias assessment
The quality and potential biases of the included studies were rigorously assessed using established tools such as the Cochrane Risk of Bias Tool (Figure 2) 50 and the STROBE checklist 51 (Table 3). Each study was meticulously evaluated and scored based on its methodological robustness and the presence of any biases that could influence the meta-analysis results. This quality assessment was pivotal, as it directly shaped the interpretative framework of the analysis, ensuring that the conclusions were both reliable and valid. By employing these rigorous methodologies, the review aimed to deliver a comprehensive and unbiased evaluation of the diagnostic and therapeutic effectiveness of AI technologies in psychiatry, providing significant insights into the field.

Quality assessment using Cochrane risk of bias tool.
Quality assessment using STROBE checklist.
Note. Other Information: Includes funding and other potential conflicts of interest disclosures.
Statistical analysis
The statistical analysis employed both fixed-effect and random-effects models, depending on the heterogeneity of the studies, assessed using the I² statistic and Q tests. The fixed-effect model was applied when heterogeneity was low (I² < 50%), while the random-effects model was used when heterogeneity was high (I² > 50%). Effect size was primarily measured using area under the curve (AUC) for diagnostic accuracy, as it is a standard metric in diagnostic performance studies. However, when AUC was unavailable, standardized mean differences or risk ratios were used as alternative effect size measures to ensure comparability across studies. Statistical computations were performed using Python for data processing and Stata for meta-analysis calculations, as both are widely recognized for their versatility and ability to handle meta-analysis computations. This meticulous methodological approach ensured that systematic review and meta-analysis produced robust, evidence-based conclusions about the effectiveness of AI technologies in diagnosing and treating psychiatric disorders. These findings enriched the academic and clinical understanding of the potential of AI in psychiatry, offering valuable insights into its integration into practice while highlighting opportunities for further research and innovation in psychiatric treatment approaches.
Results
Study selection
The systematic review identified 317 records from multiple databases using a comprehensive search strategy. After the removal of 52 duplicates and 15 ineligible records (10 due to insufficient data and 5 for other reasons), 250 studies were subjected to title and abstract screening. During this process, 200 studies were excluded based on irrelevance to the research question or non-empirical design. Subsequently, 50 full-text articles were assessed for retrieval, but 5 could not be accessed due to unavailable data or publication restrictions. Ultimately, 45 studies were assessed for eligibility, and 14 studies met the inclusion criteria.36–49 These studies were included in the meta-analysis, representing a robust dataset for evaluating the diagnostic and therapeutic efficacy of AI technologies in psychiatry (Figure 1).
Study characteristics
The 14 studies included in the analysis represented diverse methodologies, populations, and applications of AI in psychiatry (Table 2). Sample sizes ranged from 21 to 142,432 participants, with studies addressing psychiatric conditions such as depression, schizophrenia, and bipolar disorder. The AI techniques employed included ML algorithms (e.g., support vector machines, random forests), DL techniques (e.g., convolutional and recurrent neural networks), and hybrid models combining multiple AI approaches.36–49 Outcomes were classified into two primary categories: diagnostic accuracy and therapeutic efficacy, with most studies reporting data on both aspects. The studies were conducted across various clinical and geographical contexts, contributing to the generalizability of findings. Additionally, the timeline of the studies reflected an increasing trend in the adoption of advanced AI models over the years, particularly DL and hybrid approaches. This evolution aligns with technological advancements and growing interest in AI integration within psychiatry.
Quality assessment
Figure 2 highlights the distribution of low, unclear, and high risks across multiple domains, including random sequence generation, allocation concealment, and selective reporting. Most studies demonstrated low risk in random sequence generation, while high risk was predominantly observed in the domains of blinding and incomplete outcome data. The summary across 14 studies (Figure 6) indicates that 70% of the studies had low risk in key areas, supporting the reliability of their results.
Diagnostic accuracy
The forest plot of diagnostic accuracy (Figure 3) synthesizes effect sizes and 95% confidence intervals (CIs) across 14 studies that investigated the performance of AI models in diagnosing psychiatric disorders. The pooled effect size for diagnostic accuracy was 0.85 (95% CI: 0.80–0.87), indicating a high level of precision achieved by AI technologies. Individual studies demonstrated varying effect sizes, with ML models consistently achieving higher performance metrics. For instance, Morales et al. and Kalmady et al. reported effect sizes close to the upper limit of the pooled estimate, highlighting the effectiveness of ML approaches in extracting diagnostic insights from structured datasets such as clinical records and neuroimaging.37,48

Diagnostic accuracy—forest plot of effect sizes with 95% CIs.
Subgroup analysis revealed differences in diagnostic performance based on the type of AI model utilized. ML models achieved the highest pooled diagnostic accuracy (effect size = 0.85), followed by hybrid models (effect size = 0.84) and DL techniques (effect size = 0.82). These findings underscore the robust ability of ML algorithms to process structured psychiatric data and identify patterns indicative of psychiatric conditions. Hybrid models demonstrated comparable performance, excelling in integrating diverse data sources such as biomarkers and neuroimaging, while DL techniques, despite excelling with complex and unstructured data, showed slightly lower diagnostic accuracy in this analysis.
Heterogeneity across the studies was moderate, as indicated by an I² value of 47% (Table 4). This suggests variability in study design, datasets, and populations, which could have influenced the diagnostic performance of the AI models. The Q-test results were statistically significant (p = 0.03), justifying the use of a random-effects model to account for this variability and provide a more robust pooled estimate.
Heterogeneity assessment I² statistic and Q-test.
Therapeutic efficacy
The forest plot of therapeutic efficacy (Figure 4) presents the synthesized effect sizes and 95% CIs for multiple studies evaluating the application of AI in psychiatric interventions. The pooled effect size was 0.84 (95% CI: 0.82–0.86), demonstrating that AI models have a robust impact on therapeutic outcomes. Individual studies varied in their results, with ML models consistently achieving higher efficacy. Notable studies, such as those by Yu et al. and Li, reported some of the highest effect sizes, highlighting the advanced capabilities of ML algorithms in personalizing treatment plans and predicting therapeutic outcomes.40,42

Therapeutic efficacy—forest plot of effect sizes with 95% CIs.
Subgroup analysis revealed distinct differences in performance among AI methodologies. ML models showed the highest pooled effect size of 0.85 (95% CI: 0.83–0.87), excelling at processing structured clinical data to optimize therapeutic interventions and achieving consistently high efficacy with minimal variability. Studies such as Fulmer et al. and Gomeni et al. demonstrated exceptional performance, positioning ML as the most effective methodology.38,46 Hybrid models, with a pooled effect size of 0.84 (95% CI: 0.81–0.86), combine the strengths of ML and DL. These models excel in integrating diverse data sources such as clinical records, neuroimaging, and biomarkers. While hybrid models performed slightly lower than ML, studies like Danieli et al. and Lacy et al. showcased their utility in therapeutic applications.41,47 DL models, with a pooled effect size of 0.82 (95% CI: 0.80–0.84), demonstrated solid performance, particularly with complex and unstructured data such as neuroimaging and genetic datasets. However, DL showed slightly lower efficacy and greater variability compared to ML and hybrid models. Studies such as Zhang et al. and Nemesureet al. highlight the potential of DL, though its overall effectiveness was lower in this meta-analysis.43,49
Moderate heterogeneity was observed across studies, as indicated by an I² statistic of 57%, reflecting variability in factors such as intervention designs, patient populations, and therapeutic outcomes. The Q-test results were statistically significant (p = .04), supporting the use of a random-effects model to account for this heterogeneity and provide robust pooled estimates. These findings underscore the potential of AI technologies in advancing therapeutic strategies in psychiatry by tailoring interventions, monitoring progress, and predicting outcomes. ML models demonstrated the highest efficacy, followed by hybrid models, while DL models showed slightly lower but still substantial performance. To further solidify AI's role in psychiatric care, standardization in study designs and reporting metrics is necessary to reduce variability and improve comparability across studies.
Publication bias
To evaluate publication bias, a funnel plot was generated (Figure 5), displaying the relationship between study precision (standard error) and effect sizes (log odds ratio). The plot includes individual study points, an overall effect line, and 95% confidence boundaries. While the plot appears generally symmetrical, there is some dispersion among smaller studies, suggesting a possible tendency toward selective reporting. Statistical tests, including Egger's regression and Begg's rank correlation, were conducted to further assess bias. The results showed p-values greater than .05, indicating no statistically significant asymmetry. This suggests that while minor bias may be present, it is unlikely to substantially impact on the overall conclusions of the meta-analysis. The assessment reinforces the robustness of the findings but underscores the importance of cautious interpretation given the potential for selective publication.

Funnel plot publication bias assessment.
Risk of bias summary
The aggregated risk of bias summary (Figure 6) highlighted that most studies adhered to rigorous methodological standards. However, four studies exhibited high risk in at least one domain, particularly in areas related to detection bias and handling incomplete outcome data. Despite these limitations, the majority of studies showed low risk in critical areas such as randomization (selection bias) and selective reporting (reporting bias), which bolsters the validity of the meta-analysis conclusions. This risk assessment ensures that the findings of the meta-analysis are grounded in high-quality evidence.

Risk of bias summary across 14 studies.
Comparative performance of AI techniques
A comparative evaluation of AI techniques (Table 5) revealed that ML models achieved the highest performance metrics, with diagnostic accuracy at 85% and therapeutic efficacy at 85%. This superior performance can be attributed to ML's strength in analyzing structured data, such as clinical records, neuroimaging, and patient demographics, which allows it to identify diagnostic patterns more effectively. ML models excel at processing large volumes of well-organized data, which enhances their ability to make precise predictions and optimize therapeutic interventions in psychiatric care.
Subgroup and sensitivity analyses by AI technology.
Hybrid models followed closely, demonstrating an 84% diagnostic accuracy and an 85% therapeutic efficacy. These models combine multiple AI approaches, such as ML with DL and natural language processing, enabling them to integrate diverse data sources, including unstructured data like clinical notes and neuroimaging. This flexibility allows hybrid models to capture a broader range of relevant information, improving their diagnostic and therapeutic effectiveness in varied clinical settings. Their performance indicates the value of combining different AI methodologies to address the complexity and diversity of psychiatric conditions.
DL models, while slightly less effective in diagnostic accuracy, still performed well, with diagnostic accuracy at 80% and therapeutic efficacy at 85%. DL strength lies in its ability to handle large, complex, and unstructured datasets, such as neuroimaging, genetic data, and longitudinal health records. Despite performing slightly lower in diagnostic accuracy compared to ML and hybrid models, DL's ability to process unstructured data allows it to provide high-precision outcomes in therapeutic applications, particularly for symptom monitoring and relapse prediction. These findings underscore the versatility and effectiveness of AI methodologies in psychiatric research and clinical applications. ML stands out for its superior diagnostic accuracy, reflecting its strength in structured data analysis, while DL excels in therapeutic efficacy, showcasing its capacity to manage and extract valuable insights from complex, unstructured datasets. The results highlight that while ML is most effective for diagnostic purposes, hybrid and DL models offer unique advantages for personalized treatment and therapeutic decision-making. A nuanced understanding of each model's strengths allows clinicians to select the most appropriate AI tools based on the nature of the data and the specific needs of the patient population.
Discussion
The findings of this systematic review and meta-analysis reveal the transformative potential of AI in psychiatry, particularly in its diagnostic accuracy and therapeutic efficacy. These results align with, and extend, a growing body of literature exploring the intersection of AI technologies and mental health care. Our findings underscore AI's capacity to improve diagnostic precision, enhance therapeutic outcomes, and personalize psychiatric care. This discussion contextualizes these results within the broader landscape of existing meta-analyses, emphasizing how this study uniquely contributes to the ongoing dialogue on AI's role in psychiatry.
Diagnostic accuracy
This study found a pooled effect size of 0.85 (95% CI: 0.80–0.87) for diagnostic accuracy, which underscores the significant advancements AI has made in identifying and classifying psychiatric disorders. The strength of ML models in processing structured datasets, such as clinical records and neuroimaging data, mirrors findings from prior meta-analyses, such as those by Zhong et al. and Li.52,53 Both studies reported high diagnostic accuracy for ML models, reinforcing our result that ML remains the most effective methodology for psychiatric diagnoses. However, unlike these prior analyses, our study highlights the broader impact of hybrid models, which combine ML with other approaches like DL and natural language processing.42, 48
Hybrid models, which we found to have a pooled effect size of 0.84 (95% CI: 0.80–0.88), demonstrated comparable efficacy. This result extends beyond previous work such as Abd-Alrazaq et al. and He et al., who also identified the utility of hybrid models in psychiatric diagnoses, especially in the detection of schizophrenia and mood disorders.54, 55 Our study's unique contribution lies in further emphasizing how the integration of diverse data sources—clinical, neuroimaging, and biomarker data—can enhance diagnostic accuracy, providing a more nuanced understanding of AI's diagnostic potential in psychiatry. Additionally, our analysis examined the performance of DL models, which showed a pooled effect size of 0.82 (95% CI: 0.79–0.85). While this result is consistent with findings from other studies, which found DL models perform exceptionally well with unstructured data such as neuroimaging and genetic data, our study adds a critical layer of insight by highlighting the relative underperformance of DL models compared to ML and hybrid models in diagnostic contexts.56,57 This difference underscores the need to further optimize DL methodologies for psychiatric applications.
Another key point highlighted by this study, which has been previously noted in literature, is the variability in AI performance across different populations and settings. AI systems trained on Western populations often exhibit reduced accuracy when applied to more diverse groups, emphasizing the need for culturally sensitive and inclusive models. 58 This consideration is relatively underexplored in prior meta-analyses, and our study contributes to this discussion by suggesting that the generalizability of AI models in psychiatry will require more inclusive datasets, representative of various demographic groups.
Therapeutic efficacy
AI's therapeutic applications in psychiatry showed robust performance, with a pooled effect size of 0.84 (95% CI: 0.82–0.86). This result echoes findings from studies such as Quaak et al. and Meinke, which also highlighted the efficacy of ML models in developing personalized treatment recommendations and symptom tracking systems for psychiatric conditions.59,60 Our study builds on this existing literature by showing that ML models excel in optimizing therapeutic outcomes by processing structured clinical data, offering actionable insights for treatment planning. In contrast to earlier reviews that have primarily focused on the performance of ML alone, our study introduces a comprehensive examination of hybrid models. These models, which combine ML with DL and natural language processing (NLP), demonstrated competitive performance, achieving effect sizes of up to 0.84 (95% CI: 0.81–0.86). This finding is consistent with Wang et al., who showed that hybrid models could enhance the precision of therapeutic insights derived from patient-reported outcomes and clinical notes. 61 Our contribution lies in further emphasizing how hybrid models can effectively integrate diverse data sources, providing clinicians with more personalized therapeutic strategies. 62
Our analysis also explored the contributions of DL algorithms in therapeutic contexts, particularly in symptom monitoring and relapse prediction. The study by Kaur et al. found that DL algorithms could accurately predict relapse risks in patients with bipolar disorder. 63 While DL was highly effective in handling complex and unstructured datasets, such as neuroimaging and longitudinal health data, its overall performance was slightly lower (pooled effect size: 0.82, 95% CI: 0.80–0.84) compared to ML and hybrid models. These findings align with results from other meta-analyses, such as Villarreal-Zegarra et al., which found that DL methods are particularly dependent on data quality and sample size, 64 but our study goes further by directly comparing DL with ML and hybrid models across therapeutic applications.
One of the more notable findings of this study is the consistency in therapeutic efficacy across various studies, which aligns with the results of Qiu et al., who reported pooled effect sizes for diagnostic accuracy ranging from 0.82 to 0.89. 65 While this meta-analysis reinforces the reproducibility of AI-driven outcomes, our work also points out the limitations of ML in real-world applications. 66 Specifically, Linardon et al. discussed how machine-learning models face challenges in scaling to real-world settings due to the high computational resources required. 67 Our study builds on this by offering a more nuanced understanding of how ML can be optimized for broader clinical adoption.
Unique contributions of this study
This study makes a unique contribution by offering a comprehensive comparison of ML, DL, and hybrid models across both diagnostic and therapeutic applications. Previous meta-analyses, such as those by He and Abd-Alrazaq et al., have primarily focused on either diagnostic or therapeutic applications in isolation.55,56 Our work advances the field by presenting a comparative analysis of AI's diagnostic and therapeutic efficacy, thereby offering a holistic view of its potential in psychiatry. Additionally, by incorporating hybrid models and highlighting their potential in both diagnostic and therapeutic contexts, this study provides a new avenue for future research, encouraging the integration of various AI methodologies to improve clinical outcomes. The focus on the generalizability and inclusivity of AI models further sets this research apart, addressing a critical gap in the literature regarding the application of AI across diverse populations. 68
However, concerns around data privacy and security are significant in the implementation of AI in psychiatry. As highlighted by Linardon et al., physicians often express hesitancy in adopting AI tools due to fears of breaching patient confidentiality, especially when handling sensitive mental health data. 69 These concerns are particularly pressing in psychiatry, where patient data are inherently personal and vulnerable. Ensuring robust data protection measures and transparency in AI processes is essential to alleviate these concerns and build trust among clinicians and patients alike. Moreover, the integration of AI in psychiatric care faces unique challenges due to the subjectivity of psychiatric diagnoses and the complexity of individual treatment plans. Physicians are concerned about relying on AI systems that may not fully capture the nuanced understanding of patient needs, especially when clinical judgment is central to diagnosis and treatment. 70 Therefore, future research should also focus on addressing these concerns by developing AI systems that complement, rather than replace, the expertise of mental health professionals, ensuring that AI tools support personalized, patient-centered care.
Validation of AI models in psychiatric applications
While the studies included in this review demonstrate promising results in terms of diagnostic and therapeutic efficacy, it is critical to highlight the validation methods employed in assessing the performance of AI models, particularly for DL, ML, and hybrid approaches. Effective validation is essential for ensuring the reliability and generalizability of AI models in diverse clinical settings. Several studies utilized cross-validation and external validation to assess the robustness of the AI models, while others compared the performance of AI models against established clinical benchmarks. 59 However, the methodological variability in validation approaches across studies limits the ability to draw definitive conclusions about the broader applicability of these models. 55 Future research should prioritize standardized validation methods, including multi-center trials and external validation using diverse patient populations, to confirm the generalizability of these AI-driven techniques in psychiatry.
Practical implications
The implications of these findings are far-reaching. AI's ability to improve diagnostic accuracy has the potential to address longstanding challenges in psychiatry, such as misdiagnosis and delayed treatment initiation. By identifying patterns in complex data sets, AI systems can support clinicians in making more informed and timely decisions, ultimately improving patient outcomes. 71 For instance, early detection of conditions like schizophrenia or bipolar disorder could enable earlier interventions, mitigating the progression of these disorders and reducing the associated societal and economic burdens. 72 Therapeutically, AI-driven tools can complement traditional approaches by offering personalized, data-driven insights. 73 For example, symptom monitoring applications can provide real-time feedback to both patients and clinicians, facilitating more adaptive and responsive care. 74 Additionally, AI's capacity to predict treatment responses and relapse risks can enable a shift toward preventive psychiatry, focusing on maintaining mental health rather than merely addressing crises. 75
Ethical and practical considerations
Despite its promise, the integration of AI into psychiatry raises several ethical and practical concerns. Issues such as data privacy, algorithmic bias, and the interpretability of AI models should be addressed to ensure equitable and ethical use.69,76 Research by Lee et al. highlighted how biases in training data can perpetuate health disparities, emphasizing the need for rigorous validation and transparency in AI model development. 77 Moreover, the reliance on high-quality data for training AI models poses challenges in resource-limited settings, where data availability and quality may be constrained. 78 Collaborative efforts among researchers, clinicians, and policymakers are crucial to ensuring that AI technologies are accessible and beneficial to diverse populations.
Future directions
Future research should focus on addressing these challenges while expanding the scope of AI applications in psychiatry. Areas such as natural language processing (NLP) and wearable technologies hold significant promises for advancing both diagnostic and therapeutic capabilities. For example, NLP can analyze patient narratives to detect subtle linguistic markers of mental health conditions, while wearable devices can provide continuous, real-time monitoring of physiological and behavioral data. Additionally, longitudinal studies are needed to evaluate the long-term impact of AI-driven interventions on mental health outcomes. While the current analysis provides strong evidence for AI's efficacy in controlled settings, real-world trials will be critical to understanding its practical utility and sustainability.
Strengths and limitations of this study
This study has several strengths, including its comprehensive scope, rigorous methodology, and focus on both diagnostic and therapeutic applications of AI in psychiatry. It offers valuable comparative insights into different AI methodologies, supported by statistically significant findings, and provides a holistic evaluation of AI's potential in mental health care. However, several limitations should be considered. First, while 14 studies met the inclusion criteria, the relatively small number of studies included in the meta-analysis may limit the statistical power and the ability to detect true heterogeneity among the studies. This could affect the generalizability of the findings, particularly when considering the potential variability in study designs, populations, and AI methodologies. The presence of moderate heterogeneity, as indicated by the I² statistic, suggests that the results may be influenced by factors such as study quality and data characteristics, which need to be addressed in future research.
A notable limitation is the potential dependence of effect size on sample size, as suggested by Figure 3. The observed relationship between sample size and effect size warrants further investigation. It is possible that the effect sizes are driven more by sample size than by the specific AI techniques employed. This possibility requires further scrutiny in future research to clarify whether larger sample sizes contribute to more precise or inflated effect sizes, and to what extent the methodological differences between techniques influence the outcomes. Future studies should control sample size when assessing the efficacy of different AI methodologies to ensure a more accurate interpretation of the findings. Additionally, there is a lack of adequate blinding in many of the included studies, as evidenced by the risk of bias assessment in Figure 2. Insufficient blinding can introduce biases, particularly in the assessment of diagnostic accuracy and therapeutic efficacy, potentially skewing results and impacting the reliability of conclusions drawn from these studies. The absence of blinding in the trials included in this review may affect the objectivity of the reported outcomes and should be considered when interpreting the findings. Future research should prioritize improving blinding procedures to minimize bias and enhance the credibility of AI evaluation studies. Further limitations include the underrepresentation of diverse populations in the included studies, which could impact the generalizability of AI applications across different demographic groups. The reliance on short-term outcomes and secondary data further restricts its applicability to real-world settings, and ethical and practical considerations, such as data privacy and algorithmic bias, are not fully addressed in this review. Additionally, the underrepresentation of emerging AI techniques and a lack of longitudinal evidence highlight important areas for future research and refinement.
Conclusion
The results of this systematic review and meta-analysis highlight the significant potential of AI in psychiatry, particularly in enhancing diagnostic accuracy and therapeutic efficacy. Our findings suggest that AI technologies, especially ML models, have made substantial progress in both diagnostic and therapeutic applications. Specifically, ML models demonstrated the highest diagnostic accuracy (85%) and therapeutic efficacy (85%) among the AI methodologies reviewed. However, while these results are compelling, further statistical testing, such as pairwise comparisons, is needed to confirm the differences between AI techniques, as the current analysis does not provide such tests across the groups divided by model type. In addition to these promising outcomes, the study emphasizes the need for addressing the ethical, practical, and technical challenges involved in the integration of AI into psychiatric care. These challenges include ensuring data privacy, mitigating algorithmic biases, and refining AI models for broader and more diverse populations. The transformative potential of AI in mental health care is clear, but to realize this potential, further research should focus on standardizing methodologies, validating models in real-world settings, and exploring new, innovative applications. This will ultimately foster a more personalized, efficient, and equitable approach to psychiatric care.
Footnotes
Acknowledgments
The authors are deeply grateful to the Miyan Research Institute, International University of Business Agriculture and Technology, Dhaka, Bangladesh. Additionally, this research was supported by the Deanship of Scientific Research, King Saud University, Riyadh, Saudi Arabia.
ORCID iDs
Ethical considerations
Our study did not require an ethical board approval because it did not contain human or animal trials.
Author contributions/CRediT
Moustaq Karim Khan Rony contributed to writing‒review and editing, writing‒original draft, visualization, validation, supervision, software, resources, project administration, methodology, investigation, formal analysis, data curation, and conceptualization. Dipak Chandra Das contributed to writing‒review and editing, writing‒original draft, software, project administration, methodology, formal analysis, data curation, and conceptualization. Most. Tahmina Khatun contributed to writing‒original draft, validation, methodology, formal analysis, data curation, and conceptualization. Silvia Ferdousi contributed to writing‒review and editing, visualization, validation, investigation, and data curation. Mosammat Ruma Akter contributed to writing‒review and editing, project administration, methodology, formal analysis, data curation. Mst. Amena Khatun contributed to writing‒review and editing, resources, project administration, formal analysis, and conceptualization. Most. Hasina Begum contributed to writing‒review and editing, visualization, investigation, formal analysis, data curation. Md Ibrahim Khalil contributed to writing‒review and editing, methodology, formal analysis, and data curation. Mst. Rina Parvin contributed to writing‒review and editing, writing‒original draft, supervision, investigation, formal analysis, and conceptualization. Daifallah M. Alrazeeni contributed to writing‒review and editing, visualization, validation, supervision, investigation, and conceptualization. Fazila Akter contributed to writing‒review and editing, methodology, formal analysis, data curation, supervision, and conceptualization.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
