Abstract
Objective
Recent advances in generative artificial intelligence (AI) and dimensional approaches in psychiatry offer scalable scoring of psychopathology, yet biological validation remains challenging. This study aimed to compare AI and human performances in scoring dimensional psychopathology and its relationship with inflammatory brain markers in psychotic disorders.
Methods
In a cross-sectional, real-world, prospective study, we generated research domain criteria (RDoC) profiles using a large-language model and human ratings from admission notes of 127 consecutively selected patients with psychotic disorders. Magnetic resonance imaging (MRI) diffusion-based restricted fraction (RF) values were extracted from the amygdala, hippocampus, and neocortex as a proxy of inflammation. We assessed the agreement between AI- and human-derived scores and their predictive value for regional RF.
Results
AI and human RDoC ratings showed moderate-to-high agreement (intraclass correlation coefficients: 0.65–0.81). AI-derived, but not human-derived, negative and positive valence RDoC scores predicted amygdala and neocortical inflammation, while social and regulatory/arousal scores predicted hippocampal RF. A significant association was found between neocortical RF and regulatory/arousal scores in the AI assessment. Both AI- and human-derived cognitive scores predicted cortical RF. When the regression analyses were corrected for multiple comparisons, only the AI-derived associations remained significant: the amygdala for negative valence and the cortex for regulatory/arousal scores.
Conclusions
These results suggest a significant correspondence between AI and human RDoC ratings. AI-based dimensional phenotyping may reflect underlying neuroinflammatory processes, offering a biologically anchored tool for precision psychiatry.
Introduction
The past decade has seen a paradigm shift in the description and classification of mental disorders from syndromic labels toward dimensional constructs that may map more directly onto brain and behavior. 1 The National Institute of Mental Health's research domain criteria (RDoC) framework reflects this trend, organizing psychopathology into six domains that transcend categorical diagnoses and align with identifiable neuronal circuits and molecules: negative and positive affective valence, cognitive, social, arousal/regulatory, and sensorimotor systems.2–5 Despite its conceptual strength, using RDoC in routine clinical and translational research is challenging because reliable and scalable scoring remains elusive. Traditional manual ratings are time-consuming and subjective. However, natural-language pipelines that counted RDoC-related tokens in electronic health records showed promising predictive validity for clinical outcomes, such as length of hospital stay and readmission, and corresponded with the genetic correlates of psychopathological dimensions.6–9
Generative artificial intelligence (AI) with large language models (LLMs), trained on billions of biomedical and general language tokens, offers new possibilities. 10 Using GTP-4, McCoy and Perlis (2024) demonstrated that an LLM can infer all six RDoC domains directly from the admission and discharge notes of psychiatric inpatients: the scores converged with earlier token methods and predicted service utilization.11,12 However, the critical question remains unanswered: can AI-derived psychopathological dimensions accurately track underlying neurobiology? A direct comparison between AI and humans in the context of neuroscientific measures is also lacking.
MRI has begun to quantify the putative markers of subtle neuroinflammation, which is one of the most widespread transdiagnostic pathophysiological mechanisms in mental disorders.13–16 Elevated diffusion-based restricted fraction (RF) reflects microglial activation and increased cellularity, whereas other MRI diffusion parameters are markers of altered myelin structure and increased free water, linking inflammation to the neural substrates of emotion regulation and learning in limbic and cortical structures.17–20 Specifically, RF quantifies the proportion of water molecules whose movement is highly restricted within tissue. Water gets “trapped” when the microenvironment becomes crowded, structured, or swollen. Thus, RF reflects biological processes that change cellular density or intracellular volume. 20
Recent work has shown that patients with major depressive disorder display higher RF in the amygdala, hippocampus, and neocortex. 21 Likewise, algorithm-guided modular psychotherapy that targets RDoC constructs reduced both anxiety and amygdala RF in generalized anxiety disorder, indicating that dimensional interventions can ameliorate inflammatory brain signatures. 22 Recently, a widespread elevation of RF and extra-axonal water was found in the white matter of patients with schizophrenia, indicating diffuse neuroinflammation and vasogenic oedema, whereas no similar alterations were found in bipolar disorder. 23
Building on these advances in AI and neuroimaging, the present study aims to integrate RDoC phenotyping with biomarkers (an MRI proxy of neuroinflammation) within the same cohort. Specifically, we generated RDoC dimensional profiles for our participants using AI and ratings from trained human experts who were blinded to the AI output. We also extracted MRI RF values from the amygdala, hippocampus, and neocortex, regions implicated in affect regulation, contextual memory, and the higher-order integration of emotional experience and cognitive control. 24 Finally, we tested brain–behavior correspondence by correlating AI-derived and human-derived RDoC scores with RF values in each brain region. We evaluated whether the RDoC dimensions inferred by AI and human raters from medical records capture similar neuroinflammatory variance.
This multimodal validation addresses several gaps in the literature. First, it evaluates whether AI can serve as a trustworthy “digital rater” whose scores correlate with biological measures. Second, it leverages the continuous nature of RDoC and RF metrics to move beyond categorical case–control differences and to delineate brain–behavior mapping in a naturalistic clinical sample (i.e. the relationship between RDoC and RF that transcends diagnostic categories). We hypothesized that neuroinflammatory processes in the amygdala and hippocampus are related to negative valence, arousal dysregulation, and social threat processing. By adding the neocortex, we investigated whether cortical inflammation is related to cognitive dysfunctions captured by AI and clinicians.
Methods
Participants
In a cross-sectional, real-world, prospective study, we consecutively enrolled 127 patients with psychotic disorders (schizophrenia: n = 84; schizoaffective disorder: n = 23; bipolar disorder with psychotic features: n = 20) from five Hungarian psychiatric hospitals at the University of Szeged and National Psychiatric Center, Budapest, Hungary. The study was conducted between April 2020 and February 2025 (recruitment and collection of medical data: April 2020 to December 2023, data analysis: January 2024 to February 2025). We obtained the sociodemographic and medical characteristics (age, sex, education, and a detailed medical history) as well as the narrative admission notes from each patient. All patients received antipsychotic medications and exhibited a stabilized clinical state. We used the Brief Symptom Inventory 18 (BSI-18) to measure overall psychological distress, including anxiety, depression, somatization, and paranoia (Global Severity Index [GSI] of BSI-18, range of scores: 0–72 from 18 items; permission was granted from copyright holders) 25 (Table 1). Data on potential confounding factors related to inflammation, including nicotine, caffeine, and alcohol intake, contraception use, body mass index, and chronic diseases (e.g. cardiovascular and metabolic diseases) were also collected using a structured questionnaire. 26 The reporting of this study conforms to STROBE guidelines. 27
Demographic and clinical characteristics of the participants with psychotic disorders (N = 127).
RDoC scoring
The AI-derived scores on the RDoC domains were generated using a private cloud-based instance (GTP o3 – IBM Watsonx/Granite link, OpenAI-compatible proxy in front of Watsonx, featuring zero data retention). The model was re-initialized for each clinical note, with details that enabled personal identification removed. Following the method of McCoy and Perlis (2024), 11 the model was prompted as follows: “You are a skilled psychiatrist scoring an emergency room clinical note in terms of how the patient symptoms over the past 24 h reflect the 6 NIMH RDoC: Negative Valence Systems, Positive Valence Systems, Cognitive Systems, Social Processes, Arousal and Regulatory Systems, and Sensorimotor Systems. Remember that substance use can be reflected as Positive Valence symptom. Notes are scored on a 0–10 scale to capture the magnitude of documented symptoms relevant in a given domain. Score 0 if no symptoms are present and functioning is normal, 1–3 mild symptoms, 4–6 moderate, 7–9 severe, 10 extremely severe.” The AI prompt was not modified, piloted, or calibrated before use. We used the prompts and rating procedures exactly as described by McCoy and Perils (2024) 11 to enhance comparability across studies.
The human-derived RDoC scores on a 0–10 scale were based on the consensus of two formally trained clinicians with extensive expertise in RDoC.22,28 The raters were instructed to consider the symptoms present at the time of admission as described in the medical record. The clinical scheme for RDoC assessment, inter-rater reliability, and test-retest reliability are presented in the Supplementary Material.
Data requirements and preprocessing of admission notes
We used the following inclusion criteria for medical notes: clinician-authored admission notes, with a minimum length of 500 words; covering all standard clinical sections (presenting problem, history of present illness, course, mental status examination, general clinical impression, ICD-10 diagnosis, treatment plan, and disposition); machine-readable text. The preprocessing and normalization of the text included the following steps: removal of headers, footers, boilerplate, and duplicated sentences; conversion to UTF-8; whitespace normalization; expansion of standard clinical abbreviations; sentence segmentation; section detection with rule-based headers (e.g. history of present illness, mental state examination, and treatment plan) to quantify coverage and to ensure that the LLM prompt is applied to the same scope of text across notes. The AI and human raters used the same information.
MRI
Diffusion-weighted imaging (DWI) and T1-weighted structural scans were acquired using the protocol of the United Kingdom (UK) biobank within 1 week after admission.17,22,29–31 FreeSurfer v7.4.1 was used for image processing. 32 The technical equipment and the parameters were as follows: Philips Achieva 3 T scanner, MPRAGE (magnetization-prepared rapid acquisition gradient echo), 3D sagittal acquisition, FOV (square field of view) = 5256 mm, 1 × 1 × 1 mm3, TI = 5900 ms, TE (shortest) = 3.16, flip angle: 9 degrees, no fat suppression, full k space, no averages, acquisition time: 6 min and 50 s, acceleration factor: 2). We used a multi-shell approach (b1 = 1000 s/mm2, b2 = 2000s/mm2, 2 × 2× 2 mm3, 50 diffusion encoding directions for each shell). We used eddy currents and head motion corrections, outlier slice correction, and gradient distortion correction during DWI preprocessing.17,29 Putative neuroinflammatory changes were quantified by RF from DWI data (diffusion-basis spectral imaging-based restricted fraction, DBSI-RF).18,19 We investigated the hippocampus, amygdala, and neocortex using FreeSurfer regions of interest (ROIs) by measuring DBSI-RF in these ROIs.17,33,34 The left and right hemisphere values were averaged because they showed high correlations (left-right correlations: rs > .8). 17
Data analysis
We used JASP and R for data analysis. 35 Following descriptive statistics and data characteristics assessment (Kolmogorov-Smirnov and Levene's tests), we calculated two-way random effects intraclass correlation coefficients (ICC(2,1)) and Spearman's correlation coefficients (R) between AI- and human-derived RDoC scores to test actual score match (total observed variance) and monotonic associations (rank-order) between AI and human assessors, respectively. The predictive value of RDoC scores on RF values in the amygdala, hippocampus, and neocortex was assessed using multiple regression analysis, which included potential confounders (age, sex, education, BMI, and GSI scores). To control for Type I error due to multiple comparisons, we used Benjamini–Hochberg false discovery rate (FDR) corrections at q = 0.05 for 18 domain-by-brain region tests (6 RDoC domains × 3 brain regions).
The diagnostic groups and RDoC scores were compared with one-way analyses of variance (ANOVAs). The level of statistical significance was α < 0.05 for non-effect-size and non-Bayesian comparisons. For Bayesian analysis, we computed Vovk–Sellke maximum p-ratios (VS-MPRs), which transform p-values into the largest Bayes factor favoring the alternative hypothesis by maximizing the likelihood ratio over all simple Beta(α,1) priors. The Jeffreys’ scale defines the VS-MPR evidence levels against the null hypothesis: 1–3: anecdotal; > 3–10: substantial; > 10–30: strong; > 30–100: very strong; > 100: decisive.
We conducted an analytic power analysis using a partial correlation approach. Sample sizes for 80% power were obtained and evaluated under Benjamini–Hochberg FDR across the 18 domain-by-region tests. Calculations were implemented in R and interpreted in the context of the observed AI–human reliability.
Results
Comparisons of the RDoC scores of AI- and human raters
First, we calculated ICCs to assess how closely the AI and human raters actually matched (absolute agreement or consistency). We also calculated Spearman's correlation coefficients to compare the ordering (monotonic association) of AI and human raters’ RDoC scores. There were no significant differences in AI and human raters’ RDoC scores (Table 2). The ICCs ranged between 0.65 (positive valence) and 0.81 (negative valence) (Table 3). The ICCs were considered good (adequate for research comparisons) for positive valence, cognitive systems, and sensorimotor systems, and moderate/acceptable (some measurement errors, adequate for exploratory work and early development) for positive valence, social processes, and arousal/regulatory systems according to the Koo and Li (2016) classification system. 36
RDoC domain scores provided by the AI and human raters.
Note. Mean scores are from a 0–10 severity scale for each domain. RDoC = research domain criteria.
ICC and Spearman's correlation coefficients (R) between the AI and human raters.
Note. ps < .001 for Rs. ICC = intraclass correlation coefficients; CI = confidence interval.
The Spearman's Rs ranged between 0.62 (positive valence) and 0.80 (negative valence) (ps < 0.001) (Table 3). These correlations are considered moderate (useful agreement with common rank swaps for positive valence, social processes, and arousal/regulatory systems) and high (substantial alignment with occasional rank swamps for positive valence, cognitive systems, and sensorimotor system). 37
Prediction of brain inflammatory markers by AI- and human-derived RDoC scores
First, we report the uncorrected exploratory regression analysis, followed by the FDR-corrected results. The correlation matrix between RDoC scores and RF values across different brain regions showed weak-to-moderately strong correlations (Rs < 0.8, except for the correlation between AI- and human-derived negative valence scores; Supplementary Tables 1 and 2). The variance inflation factors (VIFs) did not indicate multicollinearity (VIFs < 3).
The amygdala RF values were predicted by the AI-derived negative valence scores (β = 0.53, t = 4.04, p < .001, VS-MPR = 416.06) but not by the human-derived negative valence scores (β = 0.01, p = .97, VS-MPR = 1.0) when both scores were included in the same model, corrected for age, sex, education, BMI, and GSI scores. In a similar model, the amygdala RF values were also predicted by the AI-derived positive valence scores (β = 0.26, t = 2.88, p < 0.05, VS-MPR = 4.12) but not by the human-derived positive valence scores (β = 0.14, p = .21, VS-MPR = 1.12). In the case of cognitive, sensorimotor, and arousal/regulatory systems, the predictive value did not reach statistical significance (ps > 0.05).
The hippocampal RF values were predicted by the AI-derived social functions scores (β = 0.26, t = 2.21, p < .05, VS-MPR = 2.76) but not by the human-derived social functions scores (β = 0.05, p = .67, VS-MPR = 1.0). The other significant predictor of hippocampal RF values was the arousal/regulatory system scores from the AI rater (β = 0.26, t = 2.23, p < .05, VS-MPR = 3.73) but not from the human rater (β = ‒0.04, p = 0.72, VS-MPR = 1.0). The remaining RDoC scores were not significant (ps > .05).
Finally, the cortical RF values were predicted by the AI-derived positive valence scores (β = 0.30, t = 2.26, p < .05, VS-MPR = 8.14) but not by the human-derived positive valence scores (β = 0.02, p = 0.88, VS-MPR = 1.0). Similar effects were found for the arousal/regulatory scores (AI: β = 0.46, t = 4.18, p < .001, VS-MPR = 665.66; human: β = ‒0.10, p = .38, VS-MPR = 1.0). The predictive value of both AI- and human-derived cognitive scores on cortical RF was significant (AI: β = 0.28, t = 2.26, p < .05, VS-MPR = 3.91; human: β = 0.28, t = 2.34, p < .05, VS-MPR = 4.41) (for raw correlations between RDoC scores and RF values, see Supplementary Figures 1 and 2).
When the regression analyses were performed with FDR, the relationships between two RDoC domains and brain regions remained significant for the AI rater: negative valence and amygdala (q < 0.001), and arousal/regulatory scores and neocortex (q < 0.001). However, for human raters, no associations remained significant after FDR correction (qs > 0.1).
To replicate the two main effects (negative valence - amygdala; arousal - neocortex) with family-wise error rate control across four confirmatory tests and 90% power, 115 participants are sufficient. However, for a comprehensive domain-by-region map with FDR control across all 18 tests, 205 patients are needed for 80% power.
Comparisons across diagnostic categories
One-way ANOVAs revealed no significant differences between patients with schizophrenia, schizoaffective disorder, and bipolar disorder in AI-derived and human-derived RDoC scores (ps > .5) (Supplementary Figures 3 and 4). There were no between-group differences in RF values in the amygdala, hippocampus, and cortex (ps > .05) (Supplementary Figure 5).
We also calculated ICCs and Rs in the three diagnostic groups separately. In the schizophrenia group, the ICCs ranged between 0.63 (positive valence) and 0.81 (negative valence). The R-values ranged between 0.60 (positive valence) and 0.81 (negative valence) (Supplementary Table 3). In the schizoaffective group, the ICCs ranged between 0.62 (sensorimotor systems) and 0.85 (negative valence). The R-values ranged between 0.66 (social processes) and 0.86 (cognitive systems) (Supplementary Table 4). Finally, in patients with bipolar disorder, the ICCs ranged between 0.63 (positive valence) and 0.85 (sensorimotor systems). The R-values ranged between 0.56 (positive valence) and 0.87 (cognitive systems and social processes) (Supplementary Table 5).
Discussion
In this study, we provide the first multimodal validation of AI-derived RDoC scores against human raters and in vivo neuroinflammatory markers in a clinical cohort of psychotic disorders. We demonstrated high-to-moderate agreement between AI and human assessors in scoring the RDoC dimensions using narrative medical records. Additionally, we found that AI-derived negative and positive valence scores predicted amygdala and cortical RF values, a proxy of neuroinflammation. In contrast, consensus human ratings did not contribute significant variance in the same models. Moreover, AI-derived social and arousal/regulatory scores were associated with hippocampal RF, and both AI- and human-derived cognitive scores predicted cortical inflammation. Notably, after correcting for multiple comparisons, only two associations remained significant for the AI rater: amygdala and negative valence, and neocortex and regulatory/arousal scores. These results underscore that AI-inferred dimensional psychopathology captures biologically meaningful signals beyond what is accessible through traditional manual scoring.
Our findings align with the growing literature on LLM phenotyping in psychiatry. Early work demonstrated that token-counting pipelines could predict service utilization and genetic liability by mapping clinical text to RDoC domains.6–8 More recent studies using LLMs have shown improved sensitivity in detecting psychopathological features in clinical notes.10–12 Our results extend these findings by linking AI-derived scores to neuroimaging markers, suggesting that AI-driven phenotype serves as “a digital rater” whose outputs are grounded in underlying neurobiology.
This work also adds to the understanding of the RDoC framework. The RDoC initiative advocates dimensional constructs that map onto neural circuits, yet its routine implementation has been hindered by the lack of scalable and reliable scoring methods.38–40 Our AI-LLM pipeline offers a high-throughput alternative that maintains consistency with expert consensus while capturing brain–behavior associations. Furthermore, contemporary clinical and research approaches emphasize individual-level precision and transdiagnostic approaches, highlighting the potential of continuous metrics over categorical contrasts.41–43
The associations between RDoC valence domains and limbic RF align with models of inflammation-driven emotional dysregulation. Neuroinflammation, reflected by elevated RF in the amygdala and hippocampus, is implicated in depression, anxiety, and psychosis, mediating stress reactivity and cognitive control deficits.14,17,21,22 Our results show that AI-derived negative valence scores are the strongest predictor of amygdala RF, which confirms the assumption that inflammatory processes amplify threat sensitivity and affective disturbance.44–46 The unique prediction of hippocampal RF by social and arousal scores suggests that neuroimmune signaling may impact contextual memory and regulatory circuits underlying social cognition.47,48 In the neocortex, the joint predictive significance of AI- and human-derived cognitive scores implies that cortical inflammation disrupts executive functions and top-down control that can be meaningfully detected by both AI and human raters.49,50
Several limitations warrant consideration when interpreting our results. First, the sample size was too small to achieve optimal power for multiple regression analyses. Treatment duration and MRI time lag were not included in the analysis because all assessments were conducted within one week after admission; the sample showed minimal variance in this respect. The results should therefore be interpreted cautiously and framed as exploratory rather than confirmatory. We cannot establish the directionality or causality of the observed associations between RDoC domains and neuroinflammation. Future research could employ longitudinal frameworks in larger samples to examine whether diagnostic category or illness duration exerts indirect effects on neuroinflammation via specific RDoC dimensions, or conversely, whether inflammatory activity mediates the link between dimensional psychopathology and clinical outcomes. Structural equation modeling or Bayesian network approaches integrating diagnosis, AI- and human-derived RDoC scores, and RF indices could test these assumptions. In this respect, sensitivity and specificity analyses are critical. In the present work, we focused on a parsimonious, a priori model because the study was cross-sectional with a moderate sample size and already involved 18 domain-by-region tests corrected for multiplicity. Expanding the analysis to alternative models (e.g. covariate specifications and correction strategies) would have increased the number of parameters and comparisons, thereby reducing power and increasing the risk of unstable estimates.
Second, our LLM instantiation may differ from open-source models. Human ratings were consensus-based rather than independent, which may have inflated interrater agreement. Third, although RF is a validated proxy for neuroinflammation,17–19 multimodal markers (e.g. positron emission tomography and peripheral cytokines) could provide convergent neuroimmune validation. The resolution of RF measurements did not allow a more sophisticated anatomical parcellation (e.g. hippocampal and amygdalar subregions). Finally, our sample, drawn from Hungarian psychiatric units, may limit generalizability to cultural and clinical contexts. The schizoaffective and bipolar groups were too small to obtain conclusive group-specific effects. Moreover, reliance on narrative notes introduces variability in content and style. However, even with heterogeneity in narrative style, AI–human agreement was in the moderate-to-good range, and mean domain scores were closely aligned across raters, indicating that the signal captured from routine notes is valid and reproducible. Multicenter projects should compare narrative-only pipelines with structured cover sheets (e.g. checkboxes for acute symptoms and mental status examination) to enhance reproducibility.
Although human and animal studies revealed that increased RF is associated with neuroinflammation, its sensitivity and specificity in humans should be validated. 20 Future work should pursue longitudinal, multimodal phenotyping to track how AI-derived RDoC trajectories correspond to changes in neuroinflammation and clinical outcomes. The inclusion of healthy controls would anchor RF values and RDoC scores to a normative baseline and test whether the RDoC–RF associations are specific to psychotic-spectrum disorders rather than a more general dimensional relationship. Furthermore, incorporating ecological data streams (wearables and smartphone metrics) may enrich AI inputs for digital phenotyping. Integrating AI-phenotyping into clinical trials could test whether digital scoring, targeted at psychopharmacological and psychosocial treatments, serves the purpose of precision psychiatry.
Conclusions
In conclusion, our findings may demonstrate that AI-derived RDoC scores align with human expert consensus and track neuroinflammatory processes in psychotic disorders. This biologically anchored digital phenotyping may represent a transformative step toward scalable and precision assessment tools in psychiatry that bridge narrative clinical records, neurobiology, and dimensional psychopathology.
Supplemental Material
sj-docx-1-sci-10.1177_00368504261417875 - Supplemental material for AI-derived research domain criteria scores from medical records predict brain inflammatory markers in psychotic disorders: A cross-sectional, real-world study
Supplemental material, sj-docx-1-sci-10.1177_00368504261417875 for AI-derived research domain criteria scores from medical records predict brain inflammatory markers in psychotic disorders: A cross-sectional, real-world study by Szabolcs Kéri, Balázs Barko and Oguz Kelemen in Science Progress
Footnotes
Acknowledgements
The authors thank Katalin Kaza, Péter Nagy, and Csilla Szabó for their assistance in patient recruitment and clinical administration. Grammarly AI was used for language improvement.
Ethics approval and consent to participate
The study was conducted in accordance with the Helsinki Declaration of 1975 as revised in 2024. Ethical approval was provided by the National Medical Research Council (Egészségügyi Tudományos Tanács, Tudományos és Kutatásetikai Bizottság, ETT-TUKEB, 18814-2/2020/EKU; March 30, 2020). We have de-identified all patient details. Participants had the decisional capacity to provide the written informed consent, which was obtained from each of them.
Authors contributions
SK: conceptualization, formal analysis, methodology, supervision, and writing–original draft preparation; BB: data curation, investigation, resources, software, project administration, and writing–review and editing; OK: conceptualization, methodology, validation, and writing–review and editing.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Data and materials are available on request from the corresponding author.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
