Sage Journals: Discover world-class research

Abstract

Objective

Recent advances in generative artificial intelligence (AI) and dimensional approaches in psychiatry offer scalable scoring of psychopathology, yet biological validation remains challenging. This study aimed to compare AI and human performances in scoring dimensional psychopathology and its relationship with inflammatory brain markers in psychotic disorders.

Methods

In a cross-sectional, real-world, prospective study, we generated research domain criteria (RDoC) profiles using a large-language model and human ratings from admission notes of 127 consecutively selected patients with psychotic disorders. Magnetic resonance imaging (MRI) diffusion-based restricted fraction (RF) values were extracted from the amygdala, hippocampus, and neocortex as a proxy of inflammation. We assessed the agreement between AI- and human-derived scores and their predictive value for regional RF.

Results

AI and human RDoC ratings showed moderate-to-high agreement (intraclass correlation coefficients: 0.65–0.81). AI-derived, but not human-derived, negative and positive valence RDoC scores predicted amygdala and neocortical inflammation, while social and regulatory/arousal scores predicted hippocampal RF. A significant association was found between neocortical RF and regulatory/arousal scores in the AI assessment. Both AI- and human-derived cognitive scores predicted cortical RF. When the regression analyses were corrected for multiple comparisons, only the AI-derived associations remained significant: the amygdala for negative valence and the cortex for regulatory/arousal scores.

Conclusions

These results suggest a significant correspondence between AI and human RDoC ratings. AI-based dimensional phenotyping may reflect underlying neuroinflammatory processes, offering a biologically anchored tool for precision psychiatry.

Keywords

RDoC artificial intelligence magnetic resonance imaging inflammation psychotic disorders

Introduction

The past decade has seen a paradigm shift in the description and classification of mental disorders from syndromic labels toward dimensional constructs that may map more directly onto brain and behavior.¹ The National Institute of Mental Health's research domain criteria (RDoC) framework reflects this trend, organizing psychopathology into six domains that transcend categorical diagnoses and align with identifiable neuronal circuits and molecules: negative and positive affective valence, cognitive, social, arousal/regulatory, and sensorimotor systems.^2–5 Despite its conceptual strength, using RDoC in routine clinical and translational research is challenging because reliable and scalable scoring remains elusive. Traditional manual ratings are time-consuming and subjective. However, natural-language pipelines that counted RDoC-related tokens in electronic health records showed promising predictive validity for clinical outcomes, such as length of hospital stay and readmission, and corresponded with the genetic correlates of psychopathological dimensions.^6–9

Generative artificial intelligence (AI) with large language models (LLMs), trained on billions of biomedical and general language tokens, offers new possibilities.¹⁰ Using GTP-4, McCoy and Perlis (2024) demonstrated that an LLM can infer all six RDoC domains directly from the admission and discharge notes of psychiatric inpatients: the scores converged with earlier token methods and predicted service utilization.^11,12 However, the critical question remains unanswered: can AI-derived psychopathological dimensions accurately track underlying neurobiology? A direct comparison between AI and humans in the context of neuroscientific measures is also lacking.

MRI has begun to quantify the putative markers of subtle neuroinflammation, which is one of the most widespread transdiagnostic pathophysiological mechanisms in mental disorders.^13–16 Elevated diffusion-based restricted fraction (RF) reflects microglial activation and increased cellularity, whereas other MRI diffusion parameters are markers of altered myelin structure and increased free water, linking inflammation to the neural substrates of emotion regulation and learning in limbic and cortical structures.^17–20 Specifically, RF quantifies the proportion of water molecules whose movement is highly restricted within tissue. Water gets “trapped” when the microenvironment becomes crowded, structured, or swollen. Thus, RF reflects biological processes that change cellular density or intracellular volume.²⁰

Recent work has shown that patients with major depressive disorder display higher RF in the amygdala, hippocampus, and neocortex.²¹ Likewise, algorithm-guided modular psychotherapy that targets RDoC constructs reduced both anxiety and amygdala RF in generalized anxiety disorder, indicating that dimensional interventions can ameliorate inflammatory brain signatures.²² Recently, a widespread elevation of RF and extra-axonal water was found in the white matter of patients with schizophrenia, indicating diffuse neuroinflammation and vasogenic oedema, whereas no similar alterations were found in bipolar disorder.²³

Building on these advances in AI and neuroimaging, the present study aims to integrate RDoC phenotyping with biomarkers (an MRI proxy of neuroinflammation) within the same cohort. Specifically, we generated RDoC dimensional profiles for our participants using AI and ratings from trained human experts who were blinded to the AI output. We also extracted MRI RF values from the amygdala, hippocampus, and neocortex, regions implicated in affect regulation, contextual memory, and the higher-order integration of emotional experience and cognitive control.²⁴ Finally, we tested brain–behavior correspondence by correlating AI-derived and human-derived RDoC scores with RF values in each brain region. We evaluated whether the RDoC dimensions inferred by AI and human raters from medical records capture similar neuroinflammatory variance.

This multimodal validation addresses several gaps in the literature. First, it evaluates whether AI can serve as a trustworthy “digital rater” whose scores correlate with biological measures. Second, it leverages the continuous nature of RDoC and RF metrics to move beyond categorical case–control differences and to delineate brain–behavior mapping in a naturalistic clinical sample (i.e. the relationship between RDoC and RF that transcends diagnostic categories). We hypothesized that neuroinflammatory processes in the amygdala and hippocampus are related to negative valence, arousal dysregulation, and social threat processing. By adding the neocortex, we investigated whether cortical inflammation is related to cognitive dysfunctions captured by AI and clinicians.

Methods

Participants

In a cross-sectional, real-world, prospective study, we consecutively enrolled 127 patients with psychotic disorders (schizophrenia: n = 84; schizoaffective disorder: n = 23; bipolar disorder with psychotic features: n = 20) from five Hungarian psychiatric hospitals at the University of Szeged and National Psychiatric Center, Budapest, Hungary. The study was conducted between April 2020 and February 2025 (recruitment and collection of medical data: April 2020 to December 2023, data analysis: January 2024 to February 2025). We obtained the sociodemographic and medical characteristics (age, sex, education, and a detailed medical history) as well as the narrative admission notes from each patient. All patients received antipsychotic medications and exhibited a stabilized clinical state. We used the Brief Symptom Inventory 18 (BSI-18) to measure overall psychological distress, including anxiety, depression, somatization, and paranoia (Global Severity Index [GSI] of BSI-18, range of scores: 0–72 from 18 items; permission was granted from copyright holders)²⁵ (Table 1). Data on potential confounding factors related to inflammation, including nicotine, caffeine, and alcohol intake, contraception use, body mass index, and chronic diseases (e.g. cardiovascular and metabolic diseases) were also collected using a structured questionnaire.²⁶ The reporting of this study conforms to STROBE guidelines.²⁷

Table 1.

Demographic and clinical characteristics of the participants with psychotic disorders (N = 127).

	Mean (SD)
Age (years)	41.4 (9.5)
Sex (male/female)	75/52
Education (years)	11.7 (3.1)
Duration of illness	15.1 (6.4)
Body mass index	26.5 (7.8)
Global severity index	35.6 (10.7)
Chlorpromazine-equivalent antipsychotic dose (mg/day)	357.9 (179.4)

RDoC scoring

The AI-derived scores on the RDoC domains were generated using a private cloud-based instance (GTP o3 – IBM Watsonx/Granite link, OpenAI-compatible proxy in front of Watsonx, featuring zero data retention). The model was re-initialized for each clinical note, with details that enabled personal identification removed. Following the method of McCoy and Perlis (2024),¹¹ the model was prompted as follows: “You are a skilled psychiatrist scoring an emergency room clinical note in terms of how the patient symptoms over the past 24 h reflect the 6 NIMH RDoC: Negative Valence Systems, Positive Valence Systems, Cognitive Systems, Social Processes, Arousal and Regulatory Systems, and Sensorimotor Systems. Remember that substance use can be reflected as Positive Valence symptom. Notes are scored on a 0–10 scale to capture the magnitude of documented symptoms relevant in a given domain. Score 0 if no symptoms are present and functioning is normal, 1–3 mild symptoms, 4–6 moderate, 7–9 severe, 10 extremely severe.” The AI prompt was not modified, piloted, or calibrated before use. We used the prompts and rating procedures exactly as described by McCoy and Perils (2024)¹¹ to enhance comparability across studies.

The human-derived RDoC scores on a 0–10 scale were based on the consensus of two formally trained clinicians with extensive expertise in RDoC.^22,28 The raters were instructed to consider the symptoms present at the time of admission as described in the medical record. The clinical scheme for RDoC assessment, inter-rater reliability, and test-retest reliability are presented in the Supplementary Material.

Data requirements and preprocessing of admission notes

We used the following inclusion criteria for medical notes: clinician-authored admission notes, with a minimum length of 500 words; covering all standard clinical sections (presenting problem, history of present illness, course, mental status examination, general clinical impression, ICD-10 diagnosis, treatment plan, and disposition); machine-readable text. The preprocessing and normalization of the text included the following steps: removal of headers, footers, boilerplate, and duplicated sentences; conversion to UTF-8; whitespace normalization; expansion of standard clinical abbreviations; sentence segmentation; section detection with rule-based headers (e.g. history of present illness, mental state examination, and treatment plan) to quantify coverage and to ensure that the LLM prompt is applied to the same scope of text across notes. The AI and human raters used the same information.

MRI

Diffusion-weighted imaging (DWI) and T1-weighted structural scans were acquired using the protocol of the United Kingdom (UK) biobank within 1 week after admission.^{17,22,29–31} FreeSurfer v7.4.1 was used for image processing.³² The technical equipment and the parameters were as follows: Philips Achieva 3 T scanner, MPRAGE (magnetization-prepared rapid acquisition gradient echo), 3D sagittal acquisition, FOV (square field of view) = 5256 mm, 1 × 1 × 1 mm³, TI = 5900 ms, TE (shortest) = 3.16, flip angle: 9 degrees, no fat suppression, full k space, no averages, acquisition time: 6 min and 50 s, acceleration factor: 2). We used a multi-shell approach (b1 = 1000 s/mm², b2 = 2000s/mm², 2 × 2× 2 mm³, 50 diffusion encoding directions for each shell). We used eddy currents and head motion corrections, outlier slice correction, and gradient distortion correction during DWI preprocessing.^17,29 Putative neuroinflammatory changes were quantified by RF from DWI data (diffusion-basis spectral imaging-based restricted fraction, DBSI-RF).^18,19 We investigated the hippocampus, amygdala, and neocortex using FreeSurfer regions of interest (ROIs) by measuring DBSI-RF in these ROIs.^17,33,34 The left and right hemisphere values were averaged because they showed high correlations (left-right correlations: rs > .8).¹⁷

Data analysis

We used JASP and R for data analysis.³⁵ Following descriptive statistics and data characteristics assessment (Kolmogorov-Smirnov and Levene's tests), we calculated two-way random effects intraclass correlation coefficients (ICC(2,1)) and Spearman's correlation coefficients (R) between AI- and human-derived RDoC scores to test actual score match (total observed variance) and monotonic associations (rank-order) between AI and human assessors, respectively. The predictive value of RDoC scores on RF values in the amygdala, hippocampus, and neocortex was assessed using multiple regression analysis, which included potential confounders (age, sex, education, BMI, and GSI scores). To control for Type I error due to multiple comparisons, we used Benjamini–Hochberg false discovery rate (FDR) corrections at q = 0.05 for 18 domain-by-brain region tests (6 RDoC domains × 3 brain regions).

The diagnostic groups and RDoC scores were compared with one-way analyses of variance (ANOVAs). The level of statistical significance was α < 0.05 for non-effect-size and non-Bayesian comparisons. For Bayesian analysis, we computed Vovk–Sellke maximum p-ratios (VS-MPRs), which transform p-values into the largest Bayes factor favoring the alternative hypothesis by maximizing the likelihood ratio over all simple Beta(α,1) priors. The Jeffreys’ scale defines the VS-MPR evidence levels against the null hypothesis: 1–3: anecdotal; > 3–10: substantial; > 10–30: strong; > 30–100: very strong; > 100: decisive.

We conducted an analytic power analysis using a partial correlation approach. Sample sizes for 80% power were obtained and evaluated under Benjamini–Hochberg FDR across the 18 domain-by-region tests. Calculations were implemented in R and interpreted in the context of the observed AI–human reliability.

Results

Comparisons of the RDoC scores of AI- and human raters

First, we calculated ICCs to assess how closely the AI and human raters actually matched (absolute agreement or consistency). We also calculated Spearman's correlation coefficients to compare the ordering (monotonic association) of AI and human raters’ RDoC scores. There were no significant differences in AI and human raters’ RDoC scores (Table 2). The ICCs ranged between 0.65 (positive valence) and 0.81 (negative valence) (Table 3). The ICCs were considered good (adequate for research comparisons) for positive valence, cognitive systems, and sensorimotor systems, and moderate/acceptable (some measurement errors, adequate for exploratory work and early development) for positive valence, social processes, and arousal/regulatory systems according to the Koo and Li (2016) classification system.³⁶

Table 2.

RDoC domain scores provided by the AI and human raters.

		95% confidence interval
RDoC domains	Mean	Upper	Lower	SD
AI rater
Negative valence	5.9	6.4	5.4	2.8
Positive valence	6.7	7.3	6.2	3.0
Cognitive systems	6.3	6.9	5.7	3.3
Social processes	6.2	6.8	5.7	3.3
Sensorimotor systems	4.8	5.3	4.3	2.8
Arousal/regulatory systems	5.6	6.2	5.1	3.2
Human rater
Negative valence	5.8	6.3	5.3	2.8
Positive valence	6.4	6.9	5.9	3.0
Cognitive systems	6.3	6.8	5.7	3.2
Social processes	5.9	6.4	5.3	3.2
Sensorimotor systems	4.5	5.0	4.0	2.8
Arousal/regulatory systems	5.2	5.8	4.7	3.3

Note. Mean scores are from a 0–10 severity scale for each domain. RDoC = research domain criteria.

Table 3.

ICC and Spearman's correlation coefficients (R) between the AI and human raters.

	ICC(2,1) (95% CI, lower-upper)	R (95% CI, lower-upper)
Negative valence	0.81 (0.79–0.83)	0.80 (0.73–0.86)
Positive valence	0.65 (0.61–0.69)	0.62 (0.50–0.72)
Cognitive systems	0.77 (0.74–0.80)	0.76 (0.68–0.83)
Social processes	0.71 (0.67–0.74)	0.69 (0.58–0.77)
Sensorimotor systems	0.75 (0.72–0.78)	0.75 (0.67–0.82)
Arousal/regulatory systems	0.66 (0.62–0.69)	0.66 (0.55–0.75)

Note. ps < .001 for Rs. ICC = intraclass correlation coefficients; CI = confidence interval.

The Spearman's Rs ranged between 0.62 (positive valence) and 0.80 (negative valence) (ps < 0.001) (Table 3). These correlations are considered moderate (useful agreement with common rank swaps for positive valence, social processes, and arousal/regulatory systems) and high (substantial alignment with occasional rank swamps for positive valence, cognitive systems, and sensorimotor system).³⁷

Prediction of brain inflammatory markers by AI- and human-derived RDoC scores

First, we report the uncorrected exploratory regression analysis, followed by the FDR-corrected results. The correlation matrix between RDoC scores and RF values across different brain regions showed weak-to-moderately strong correlations (Rs < 0.8, except for the correlation between AI- and human-derived negative valence scores; Supplementary Tables 1 and 2). The variance inflation factors (VIFs) did not indicate multicollinearity (VIFs < 3).

The amygdala RF values were predicted by the AI-derived negative valence scores (β = 0.53, t = 4.04, p < .001, VS-MPR = 416.06) but not by the human-derived negative valence scores (β = 0.01, p = .97, VS-MPR = 1.0) when both scores were included in the same model, corrected for age, sex, education, BMI, and GSI scores. In a similar model, the amygdala RF values were also predicted by the AI-derived positive valence scores (β = 0.26, t = 2.88, p < 0.05, VS-MPR = 4.12) but not by the human-derived positive valence scores (β = 0.14, p = .21, VS-MPR = 1.12). In the case of cognitive, sensorimotor, and arousal/regulatory systems, the predictive value did not reach statistical significance (ps > 0.05).

The hippocampal RF values were predicted by the AI-derived social functions scores (β = 0.26, t = 2.21, p < .05, VS-MPR = 2.76) but not by the human-derived social functions scores (β = 0.05, p = .67, VS-MPR = 1.0). The other significant predictor of hippocampal RF values was the arousal/regulatory system scores from the AI rater (β = 0.26, t = 2.23, p < .05, VS-MPR = 3.73) but not from the human rater (β = ‒0.04, p = 0.72, VS-MPR = 1.0). The remaining RDoC scores were not significant (ps > .05).

Finally, the cortical RF values were predicted by the AI-derived positive valence scores (β = 0.30, t = 2.26, p < .05, VS-MPR = 8.14) but not by the human-derived positive valence scores (β = 0.02, p = 0.88, VS-MPR = 1.0). Similar effects were found for the arousal/regulatory scores (AI: β = 0.46, t = 4.18, p < .001, VS-MPR = 665.66; human: β = ‒0.10, p = .38, VS-MPR = 1.0). The predictive value of both AI- and human-derived cognitive scores on cortical RF was significant (AI: β = 0.28, t = 2.26, p < .05, VS-MPR = 3.91; human: β = 0.28, t = 2.34, p < .05, VS-MPR = 4.41) (for raw correlations between RDoC scores and RF values, see Supplementary Figures 1 and 2).

When the regression analyses were performed with FDR, the relationships between two RDoC domains and brain regions remained significant for the AI rater: negative valence and amygdala (q < 0.001), and arousal/regulatory scores and neocortex (q < 0.001). However, for human raters, no associations remained significant after FDR correction (qs > 0.1).

To replicate the two main effects (negative valence - amygdala; arousal - neocortex) with family-wise error rate control across four confirmatory tests and 90% power, 115 participants are sufficient. However, for a comprehensive domain-by-region map with FDR control across all 18 tests, 205 patients are needed for 80% power.

Comparisons across diagnostic categories

One-way ANOVAs revealed no significant differences between patients with schizophrenia, schizoaffective disorder, and bipolar disorder in AI-derived and human-derived RDoC scores (ps > .5) (Supplementary Figures 3 and 4). There were no between-group differences in RF values in the amygdala, hippocampus, and cortex (ps > .05) (Supplementary Figure 5).

We also calculated ICCs and Rs in the three diagnostic groups separately. In the schizophrenia group, the ICCs ranged between 0.63 (positive valence) and 0.81 (negative valence). The R-values ranged between 0.60 (positive valence) and 0.81 (negative valence) (Supplementary Table 3). In the schizoaffective group, the ICCs ranged between 0.62 (sensorimotor systems) and 0.85 (negative valence). The R-values ranged between 0.66 (social processes) and 0.86 (cognitive systems) (Supplementary Table 4). Finally, in patients with bipolar disorder, the ICCs ranged between 0.63 (positive valence) and 0.85 (sensorimotor systems). The R-values ranged between 0.56 (positive valence) and 0.87 (cognitive systems and social processes) (Supplementary Table 5).

Discussion

In this study, we provide the first multimodal validation of AI-derived RDoC scores against human raters and in vivo neuroinflammatory markers in a clinical cohort of psychotic disorders. We demonstrated high-to-moderate agreement between AI and human assessors in scoring the RDoC dimensions using narrative medical records. Additionally, we found that AI-derived negative and positive valence scores predicted amygdala and cortical RF values, a proxy of neuroinflammation. In contrast, consensus human ratings did not contribute significant variance in the same models. Moreover, AI-derived social and arousal/regulatory scores were associated with hippocampal RF, and both AI- and human-derived cognitive scores predicted cortical inflammation. Notably, after correcting for multiple comparisons, only two associations remained significant for the AI rater: amygdala and negative valence, and neocortex and regulatory/arousal scores. These results underscore that AI-inferred dimensional psychopathology captures biologically meaningful signals beyond what is accessible through traditional manual scoring.

Our findings align with the growing literature on LLM phenotyping in psychiatry. Early work demonstrated that token-counting pipelines could predict service utilization and genetic liability by mapping clinical text to RDoC domains.^6–8 More recent studies using LLMs have shown improved sensitivity in detecting psychopathological features in clinical notes.^10–12 Our results extend these findings by linking AI-derived scores to neuroimaging markers, suggesting that AI-driven phenotype serves as “a digital rater” whose outputs are grounded in underlying neurobiology.

This work also adds to the understanding of the RDoC framework. The RDoC initiative advocates dimensional constructs that map onto neural circuits, yet its routine implementation has been hindered by the lack of scalable and reliable scoring methods.^38–40 Our AI-LLM pipeline offers a high-throughput alternative that maintains consistency with expert consensus while capturing brain–behavior associations. Furthermore, contemporary clinical and research approaches emphasize individual-level precision and transdiagnostic approaches, highlighting the potential of continuous metrics over categorical contrasts.^41–43

The associations between RDoC valence domains and limbic RF align with models of inflammation-driven emotional dysregulation. Neuroinflammation, reflected by elevated RF in the amygdala and hippocampus, is implicated in depression, anxiety, and psychosis, mediating stress reactivity and cognitive control deficits.^14,17,21,22 Our results show that AI-derived negative valence scores are the strongest predictor of amygdala RF, which confirms the assumption that inflammatory processes amplify threat sensitivity and affective disturbance.^44–46 The unique prediction of hippocampal RF by social and arousal scores suggests that neuroimmune signaling may impact contextual memory and regulatory circuits underlying social cognition.^47,48 In the neocortex, the joint predictive significance of AI- and human-derived cognitive scores implies that cortical inflammation disrupts executive functions and top-down control that can be meaningfully detected by both AI and human raters.^49,50

Several limitations warrant consideration when interpreting our results. First, the sample size was too small to achieve optimal power for multiple regression analyses. Treatment duration and MRI time lag were not included in the analysis because all assessments were conducted within one week after admission; the sample showed minimal variance in this respect. The results should therefore be interpreted cautiously and framed as exploratory rather than confirmatory. We cannot establish the directionality or causality of the observed associations between RDoC domains and neuroinflammation. Future research could employ longitudinal frameworks in larger samples to examine whether diagnostic category or illness duration exerts indirect effects on neuroinflammation via specific RDoC dimensions, or conversely, whether inflammatory activity mediates the link between dimensional psychopathology and clinical outcomes. Structural equation modeling or Bayesian network approaches integrating diagnosis, AI- and human-derived RDoC scores, and RF indices could test these assumptions. In this respect, sensitivity and specificity analyses are critical. In the present work, we focused on a parsimonious, a priori model because the study was cross-sectional with a moderate sample size and already involved 18 domain-by-region tests corrected for multiplicity. Expanding the analysis to alternative models (e.g. covariate specifications and correction strategies) would have increased the number of parameters and comparisons, thereby reducing power and increasing the risk of unstable estimates.

Second, our LLM instantiation may differ from open-source models. Human ratings were consensus-based rather than independent, which may have inflated interrater agreement. Third, although RF is a validated proxy for neuroinflammation,^17–19 multimodal markers (e.g. positron emission tomography and peripheral cytokines) could provide convergent neuroimmune validation. The resolution of RF measurements did not allow a more sophisticated anatomical parcellation (e.g. hippocampal and amygdalar subregions). Finally, our sample, drawn from Hungarian psychiatric units, may limit generalizability to cultural and clinical contexts. The schizoaffective and bipolar groups were too small to obtain conclusive group-specific effects. Moreover, reliance on narrative notes introduces variability in content and style. However, even with heterogeneity in narrative style, AI–human agreement was in the moderate-to-good range, and mean domain scores were closely aligned across raters, indicating that the signal captured from routine notes is valid and reproducible. Multicenter projects should compare narrative-only pipelines with structured cover sheets (e.g. checkboxes for acute symptoms and mental status examination) to enhance reproducibility.

Although human and animal studies revealed that increased RF is associated with neuroinflammation, its sensitivity and specificity in humans should be validated.²⁰ Future work should pursue longitudinal, multimodal phenotyping to track how AI-derived RDoC trajectories correspond to changes in neuroinflammation and clinical outcomes. The inclusion of healthy controls would anchor RF values and RDoC scores to a normative baseline and test whether the RDoC–RF associations are specific to psychotic-spectrum disorders rather than a more general dimensional relationship. Furthermore, incorporating ecological data streams (wearables and smartphone metrics) may enrich AI inputs for digital phenotyping. Integrating AI-phenotyping into clinical trials could test whether digital scoring, targeted at psychopharmacological and psychosocial treatments, serves the purpose of precision psychiatry.

Conclusions

In conclusion, our findings may demonstrate that AI-derived RDoC scores align with human expert consensus and track neuroinflammatory processes in psychotic disorders. This biologically anchored digital phenotyping may represent a transformative step toward scalable and precision assessment tools in psychiatry that bridge narrative clinical records, neurobiology, and dimensional psychopathology.

Supplemental Material

sj-docx-1-sci-10.1177_00368504261417875 - Supplemental material for AI-derived research domain criteria scores from medical records predict brain inflammatory markers in psychotic disorders: A cross-sectional, real-world study

Supplemental material, sj-docx-1-sci-10.1177_00368504261417875 for AI-derived research domain criteria scores from medical records predict brain inflammatory markers in psychotic disorders: A cross-sectional, real-world study by Szabolcs Kéri, Balázs Barko and Oguz Kelemen in Science Progress

Footnotes

Acknowledgements

The authors thank Katalin Kaza, Péter Nagy, and Csilla Szabó for their assistance in patient recruitment and clinical administration. Grammarly AI was used for language improvement.

ORCID iD

Szabolcs Kéri

Ethics approval and consent to participate

The study was conducted in accordance with the Helsinki Declaration of 1975 as revised in 2024. Ethical approval was provided by the National Medical Research Council (Egészségügyi Tudományos Tanács, Tudományos és Kutatásetikai Bizottság, ETT-TUKEB, 18814-2/2020/EKU; March 30, 2020). We have de-identified all patient details. Participants had the decisional capacity to provide the written informed consent, which was obtained from each of them.

Authors contributions

SK: conceptualization, formal analysis, methodology, supervision, and writing–original draft preparation; BB: data curation, investigation, resources, software, project administration, and writing–review and editing; OK: conceptualization, methodology, validation, and writing–review and editing.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

Data and materials are available on request from the corresponding author.

Supplemental material

Supplemental material for this article is available online.

References

Heinz

. A new understanding of mental disorders: computational models for dimensional psychiatry. Cambridge, MA, USA: The MIT Press, 2017.

Clark

Cuthbert

Lewis-Fernández

, et al. Three approaches to understanding and classifying mental disorder: ICD-11, DSM-5, and the National Institute of Mental Health’s Research Domain Criteria (RDoC). Psychol Sci Publ Int 2017; 18: 72–145.

Yee

Javitt

Miller

. Replacing DSM categorical analyses with dimensional analyses in psychiatry research: the research domain criteria initiative. JAMA Psychiatry 2015; 72: 1159–1160.

Cuthbert

. Research domain criteria (RDoC): progress and potential. Curr Dir Psychol Sci 2022; 31: 107–114. 20220301.

Quah

SKL

Geniesse

, et al. A data-driven latent variable approach to validating the research domain criteria framework. Nat Commun 2025; 16: 30.

McCoy

Hart

, et al. High throughput phenotyping for dimensional psychopathology in electronic health records. Biol Psychiatry 2018; 83: 997–1004.

McCoy

Castro

Rosenfield

, et al. A clinical perspective on the relevance of research domain criteria in electronic health records. Am J Psychiatry 2015; 172: 316–320.

McCoy

Castro

Hart

, et al. Genome-wide association study of dimensional psychopathology using electronic health records. Biol Psychiatry 2018; 83: 1005–1011.

Castro

Minnier

Murphy

, et al. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry 2014; 172: 363–372.

10.

Peng

Cohen

, et al. Large language models in biomedicine and health: current research landscape and future directions. J Am Med Inform Assoc 2024; 31: 1801–1811.

11.

McCoy

Perlis

. Characterizing research domain criteria symptoms among psychiatric inpatients using large language models. J Mood Anxiety Disord 2024; 8: 100079.

12.

McCoy

Perlis

. Dimensional measures of psychopathology in children and adolescents using large language models. Biol Psychiatry 2024; 96: 940–947.

13.

Ferat-Osorio

Maldonado-García

Pavón

. How inflammation influences psychiatric disease. World J Psychiatry 2024; 14: 342–349. 20240319.

14.

Thylur

Goldsmith

. Brick by brick: building a transdiagnostic understanding of inflammation in psychiatry. Harv Rev Psychiatry 2022; 30: 40–53.

15.

Yuan

Chen

Xia

, et al. Inflammation-related biomarkers in major psychiatric disorders: a cross-disorder assessment of reproducibility and specificity in 43 meta-analyses. Transl Psychiatry 2019; 9: 33.

16.

Oestreich

LKL

O’Sullivan

. Transdiagnostic in vivo magnetic resonance imaging markers of neuroinflammation. Biol Psychiatry Cogn Neurosci Neuroimaging 2022; 7: 638–658.

17.

Zhang

Rutlin

Eisenstein

, et al. Neuroinflammation in the amygdala is associated with recent depressive symptoms. Biol Psychiatry Cogn Neurosci Neuroimaging 2023; 8: 967–975. 20230509.

18.

Wang

Liu

, et al. Quantification of white matter cellularity and damage in preclinical and early symptomatic Alzheimer's disease. Neuroimage Clin 2019; 22: 101767. 20190313.

19.

Wang

Haldar

, et al. Quantification of increased cellularity during inflammatory demyelination. Brain 2011; 134: 3590–3601.

20.

Kéri

. Diffusion basis restricted fraction as a putative magnetic resonance imaging marker of neuroinflammation: histological evidence, diagnostic accuracy, and translational potential. Life (Basel) 2025; 15: 20251014.

21.

Kaszás

Kelemen

Kéri

. Magnetic resonance imaging signatures of neuroinflammation in major depressive disorder with religious and spiritual problems. Sci Rep 2025; 15: 5407. 20250213.

22.

Kéri

Kancsev

Kelemen

. Algorithm-based modular psychotherapy alleviates brain inflammation in generalized anxiety disorder. Life (Basel) 2024; 14: 20240718.

23.

Mamah

Patel

Chen

, et al. Diffusion basis spectrum imaging of white matter in schizophrenia and bipolar disorder. Brain Imaging Behav 2025; 19: 1002–1017.

24.

Salzman

Fusi

. Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Annu Rev Neurosci 2010; 33: 173–202.

25.

Derogatis

. Brief Symptom Inventory-18 (BSI-18) [Database record]. Washington, DC, USA: APA PsycTests, 2000.

26.

Narvaez Linares

Charron

Ouimet

, et al. A systematic review of the trier social stress test methodology: issues in promoting study comparison and replicable research. Neurobiol Stress 2020; 13: 100235. 20200615.

27.

von Elm

Altman

Egger

, et al. Strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Br Med J 2007; 335: 806–808.

28.

Yager

Feinstein

. Potential applications of the National Institute of Mental Health's Research Domain Criteria (RDoC) to clinical psychiatric practice: how RDoC might be used in assessment, diagnostic processes, case formulation, treatment planning, and clinical notes. J Clin Psychiatry 2017; 78: 423–432.

29.

Alfaro-Almagro

Jenkinson

Bangerter

, et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 2018; 166: 400–424. 20171024.

30.

Miller

Alfaro-Almagro

Bangerter

, et al. Multimodal population brain imaging in the UK biobank prospective epidemiological study. Nat Neurosci 2016; 19: 1523–1536. 20160919.

31.

Kéri

Kelemen

. Signatures of neuroinflammation in the hippocampus and amygdala in individuals with religious or spiritual problem. Relig Brain Behav 2025; 15: 274–286.

32.

Fischl

. Freesurfer. Neuroimage 2012; 62: 774–781.

33.

Fischl

Salat

Busa

, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 2002; 33: 341–355.

34.

Jenkinson

Beckmann

Behrens

TEJ

, et al. FSL. Neuroimage 2012; 62: 782–790.

35.

Team

. JASP (Version 0.19.3). 2025.

36.

Koo

. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15: 155–163.

37.

Hinkle

Wiersma

Jurs

. Applied statistics for the behavioral sciences. (5th ed.). Boston, MA, USA: Houghton Mifflin, 2003.

38.

First

. Preserving the clinician-researcher interface in the age of RDoC: the continuing need for DSM-5/ICD-11 characterization of study populations. World Psychiatry 2014; 13: 53–54.

39.

Hakak-Zargar

Tamrakar

Voth

, et al. The utility of Research Domain Criteria in diagnosis and management of dual disorders: a mini-review. Front Psychiatry 2022; 13: 805163.

40.

Cuthbert

. The role of RDoC in future classification of mental disorders. Dialogues Clin Neurosci 2020; 22: 81–85.

41.

Williams

Carpenter

Carretta

, et al. Precision psychiatry and research domain criteria: implications for clinical trials and future practice. CNS Spectr 2024; 29: 26–39. 2023/09/07.

42.

Fusar-Poli

. TRANSD Recommendations: improving transdiagnostic research in psychiatry. World Psychiatry 2019; 18: 361–362.

43.

Abdelmoula

Bouayed Abdelmoula

. The new paradigm of psychiatry precision medicine and its emerging clinical framework. Eur Psychiatry 2024; 67: S668–S669. 2024/08/27.

44.

Mehta

Haroon

, et al. Inflammation negatively correlates with amygdala-ventromedial prefrontal functional connectivity in association with anxiety in patients with depression: preliminary results. Brain Behav Immun 2018; 73: 725–730. 20180801.

45.

Zheng

Z-H

J-L

X-H

, et al. Neuroinflammation induces anxiety- and depressive-like behavior by modulating neuronal plasticity in the basolateral amygdala. Brain Behav Immun 2021; 91: 505–518.

46.

Won

Kim

Y-K

. Neuroinflammation-associated alterations of the brain as potential neural biomarkers in anxiety disorders. Int J Mol Sci 2020; 21: 6546.

47.

Duarte

Nguyen

Kyprou

, et al. Hippocampal contextualization of social rewards in mice. Nat Commun 2024; 15: 9493.

48.

Alexander

Farris

Pirone

, et al. Social and novel contexts modify hippocampal CA2 representations of space. Nat Commun 2016; 7: 10300.

49.

Labrenz

Wrede

Forsting

, et al. Alterations in functional connectivity of resting state networks during experimental endotoxemia - an exploratory study in healthy men. Brain Behav Immun 2016; 54: 17–26. 20151117.

50.

Zhu

Zhou

Jia

, et al. Inflammation disrupts the brain network of executive function after cardiac surgery. Ann Surg 2023; 277: e689–e698. 20210702.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.77 MB

AI-derived research domain criteria scores from medical records predict brain inflammatory markers in psychotic disorders: A cross-sectional,real-world study

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Methods

Participants

RDoC scoring

Data requirements and preprocessing of admission notes

MRI

Data analysis

Results

Comparisons of the RDoC scores of AI- and human raters

Prediction of brain inflammatory markers by AI- and human-derived RDoC scores

Comparisons across diagnostic categories

Discussion

Conclusions

Supplemental Material

sj-docx-1-sci-10.1177_00368504261417875 - Supplemental material for AI-derived research domain criteria scores from medical records predict brain inflammatory markers in psychotic disorders: A cross-sectional, real-world study

Footnotes

Acknowledgements

ORCID iD

Ethics approval and consent to participate

Authors contributions

Funding

Declaration of conflicting interests

Data availability statement

Supplemental material

References

Supplementary Material