Abstract
Background
A simplified magnetic resonance enterography (MRe) index (sMARIA) for Crohn’s disease (CD) was recently developed and validated.
Objective
Our aims were (a) to assess sMARIA’s accuracy in a sample other than the validation cohort; (b) to evaluate its correlation with a simpler endoscopy index (SES-CD) and fecal calprotectin (FC); and (c) to assess the need of an expert radiologist to reliably use sMARIA.
Methods
Patients with CD who underwent MRe, ileocolonoscopy and FC within 2–4 weeks had their MRe retrospectively reviewed by two blinded raters. Disease activity was evaluated through sMARIA, SES-CD and FC. sMARIA’s accuracy, indices correlation, and interrater reliability were assessed.
Results
In total, 84 patients were included, comprising 420 intestinal segments evaluations. sMARIA ≥1 accurately identified segments with active disease (90% sensitivity, 98% specificity; area under the curve 0.94, 95% confidence interval (CI) 0.91–0.97; p < 0.01). sMARIA correlated with endoscopy, both for ileal and colonic segments (R = 0.94 and R = 0.82; p < 0.01). Per patient, there was a strong correlation between sMARIA, endoscopy (R = 0.95; p < 0.01) and FC (R = 0.91; p < 0.01). Interrater agreement was excellent (intraclass correlation coefficient 0.95; 95% CI 0.94–0.96; p < 0.01).
Conclusion
sMARIA accurately measured CD activity using SES-CD as standard of reference, and exhibited high correlation with a simple endoscopic index and a biomarker. The interrater reliability between a radiology resident and an expert was excellent.
Introduction
Magnetic resonance enterography (MRe) has an established role in the assessment of Crohn’s disease (CD), from diagnosis to treatment monitoring, detection of disease complications, and for uncovering postoperative recurrence.1–3 Several indices for grading disease using MRe have been developed, the Magnetic Resonance Index of Activity for CD (MARIA) being the best-characterised.4,5 As this index has some limitations, a newly simplified version of the score (sMARIA) was recently developed and validated. 6 sMARIA is a categorical 4-variable score, that was shown to accurately measure CD activity, severity and response to therapy. The same group further explored sMARIA, demonstrating a reproducible performance, even without gadolinium-enhanced sequences. 7 These studies used the Crohn’s Disease Endoscopic Index of Severity (CDEIS) to address sMARIA’s accuracy and endoscopy correlation.6,7 Though CDEIS has proven to be reliable and reproducible,8,9 this is a time-consuming index, making it unsuitable for everyday practice. Hence, a complementary analysis using a simpler endoscopic index is desirable. Furthermore, to prove sMARIA’s role as a monitoring tool, a biomarker correlation analysis would be appropriate. On the other hand, an essential aspect of indices is reliability. 10 In the original sMARIA publication, one of the limitations pointed by Ordás et al. 6 was that only experienced radiologists were involved in sMARIA development.
Our study aimed to assess sMARIA diagnostic accuracy, in a sample other than the validation cohort, using the Simple Endoscopic Score for CD (SES-CD) as the reference standard. We also evaluated sMARIA’s correlation and agreement with everyday practice instruments such as a simpler endoscopy index and fecal calprotectin (FC) biomarker. To evaluate the required training for an acceptable use of sMARIA, we calculated the interrater reliability between a radiology resident and a senior abdominal radiologist.
Material and methods
Study design
This was a retrospective, 5-year cohort study (March 2015–March 2020). Through the Radiology department registries, all MRe studies performed since March 2015 were signalled. Subsequently, MRe studies performed for CD evaluation were identified, and these patients’ endoscopic data, clinical data, and biochemistry assays were reviewed. Adult patients diagnosed with ileal, colonic or ileocolonic CD, 11 followed at our tertiary hospital, who underwent MRe, ileocolonoscopy and FC within a 2–4 week timeframe were eligible. Ileocolonoscopy and FC were considered within a maximum of 2 weeks prior or 2 weeks after MRe. Patients treated with over 15 mg of prednisolone daily, those whose pharmacological therapy was modified within the examinations timeframe, and patients whose MRe evaluations were considered of low quality were excluded.
Eligible MRe studies were retrospectively and independently evaluated by an expert radiologist with 13 years of experience in abdominal radiology (JB), and by a final-year radiology resident (ARV) differentiating in abdominal radiology who had previously completed MRe training.
Patient characteristics
Demographics (age, gender) and clinical variables (Montreal classification for CD, time of diagnosis, surgical history, ongoing therapy) were collected.
Procedure and operative details
Disease activity was evaluated through sMARIA, SES-CD and FC.
MRe analysis: sMARIA
All MRe examinations were performed in the supine position in a 1.5 T magnet (Magnetom Symphony Tim System, Siemens, Erlangen, Germany) equipped with a two phased-array-6-elements coils, according to the institution standard acquisition protocol (Supplementary data file). All patients received orally 1000–1500 ml of iso-osmotic polyethylene glycol solution 1 h prior to MRe.
Image analysis was independently performed by two raters – one experienced radiologist and one radiology resident – blinded to clinical and endoscopic aspects. To allow comparison with the endoscopic score, five segments were considered: terminal ileum; right colon; transverse colon; left colon; and rectum. As defined by Ordás et al., 6 the sMARIA in each segment was calculated by the following formula: 1 × thickening >3 mm + 1 × edema + 1×fat stranding + 2 × ulcers, these four variables being categorised as absent or present (Figure 1). Hence, sMARIA ranges from 0 to 5 per segment and global sMARIA ranges from 0 to 25. Image interpretation focused on T2 weighted sequences to identify all sMARIA variables (thickening, mural edema, fat stranding and ulcers).

(1) Wall thickening – coronal T2-weighted (a) and axial T2-weighted fat suppressed (b) images show the terminal ileum with a thickened wall (arrow); coronal T2-weighted images depict wall thickening of the transverse colon (arrows in image (c)) and a slighter thickening of the ascending colon (arrow in image (d)). Note the intramural high signal intensity (star in image (b)) corresponding to inflammatory edema.
Ileocolonoscopy: SES-CD
All ileocolonoscopies were performed under deep or superficial sedation, by a single experienced inflammatory bowel disease endoscopist (HTS), following the standard protocol, which includes reporting CD lesions according to the SES-CD. According to SES-CD, five bowel segments are considered (terminal ileum, right colon, transverse colon, left colon, and rectum), and for each segment, four variables are scored from 0 to 3, as described elsewhere. 12 SES-CD per patient results from adding the five segments’ individual scores. This allows to calculate ileal and colonic SES-CD as two separate scores. 13 Thus, SES-CD ranges from 0 to 12 per segment, and global SES-CD ranges from 0 to 60. 12 SES-CD between 0 and 2 is considered as inactive disease, 3–6 as mildly active, 7–15 as moderately active, and ≥16 as severely active disease.14,15 In this study, patients were considered as having severe disease if global SES-CD was ≥16. In addition, a classification of severity on a segment basis was performed by considering the presence of severe lesions (ulcers). 6
Concentration of FC
Sample extracts were performed at our hospital laboratory, and FC concentration (µg/g) was then measured using a commercially available fluoroenzyme immunoassay (EliA Calprotectin). Peyrin-Biroulet et al. 16 underlined that multiple FC cut-off values have been described, but determining thresholds for fecal biomarkers to differentiate between different disease severities can be challenging. In our study, to prevent selection bias and systematic error, FC scoring was used as a continuous scale to describe disease activity.
Statistical considerations
All statistical analysis was performed using the Statistical Package for the Social Sciences, version 25.0 (IBM Corp., Armonk, New York, USA), and level of significance was established at 5%. An informal approach to test numerical variables normality was performed using the histogram evaluation, that showed a right-skewed distribution for all variables. Descriptive data were described as median and interquartile range for continuous variables, and categorical variables were summarised using absolute (n) and relative frequencies (%). Two decimals were used throughout the manuscript and three decimals were used to express p-values. Kruskall–Wallis test was used to compare different groups of subjects.
The senior abdominal radiologist reader set was used to evaluate sMARIA’s diagnostic accuracy for the prediction of active disease and severe disease per segment, using the receiver operating characteristic (ROC) area under the curve (AUC), based on the endoscopic categorisation of active disease and ulcer observation. Analysis using the radiology resident reader set was also performed for comparison purposes.
Correlations between the sMARIA and SES-CD (per segment and per patient), and global sMARIA and FC were measured by the Spearman coefficient test. Kappa statistics test was applied for the evaluation of agreement between the sMARIA and endoscopy for binary classification of active and severe disease.
For the interrater reliability assessment, the intraclass correlation coefficient (ICC) was calculated to access total sMARIA per segment score agreement between the two raters. Sub-analysis was performed for each intestinal segment and for each sMARIA item using Kappa statistics test for each paired evaluation by the two raters.
Ethical considerations
The study protocol conforms to the ethical guidelines of the Declaration of Helsinki, and received favourable opinion from the Algarve University Hospital Center Ethic Committee (30/11/2019). The authors retrospectively analysed data from March 2015 to March 2020. Thus, waiver of consent for this study was approved. All efforts were made to ensure confidentiality of the data.
Results
Patient characteristics
In total, 87 patients were eligible. Three MRe studies were excluded because of low quality (movement artifacts [n = 2]; suboptimal bowel distension [n = 1]). Finally, 84 patients were included, comprising a total of 420 intestinal segments explored by MRe and endoscopy. Four patients had a history of ileocecal surgical resection prior to MRe. One patient had two ileal stricturoplasties that did not compromise segment evaluation. Table 1 presents patients’ demographic and clinical characteristics.
Patient demographics and clinical characteristics.
IQR: interquartile range; CD: Crohn’s disease.
Disease activity assessment
60 patients (71.43%) had active disease on endoscopy (SES-CD score ≥3), for a total of 85 segments with disease activity. As for severe disease, 12 patients (14.29%) had severe disease (SES-CD score ≥16), and severe lesions (ulcers) were identified in 28 segments. Endoscopic disease activity and severity were documented both for ileal (49 and 18 segments, respectively) and colonic segments (36 and 10 segments, respectively). Table 2 presents descriptive statistics on the sMARIA and FC assessments between patients with inactive, active or severe disease on endoscopy.
MRe and FC assessments according to disease activity on endoscopy.
p-values based on Kruskal–Wallis test.
IQR: interquartile range; sMARIA: simplified Magnetic Resonance Index of Activity for Crohn’s Disease; MRe: magnetic resonance enterography; FC: fecal calprotectin.
Diagnostic accuracy of the sMARIA
To maximise the accuracy captured by the ROC curve, all points of the sMARIA score domain were dichotomously tested to identify the ideal cut-off point for the identification of intestinal segments with active and severe disease on endoscopy. sMARIA ≥1 accurately identified segments with active disease with 90% sensitivity (77 segments with sMARIA ≥1, out of 85 segments with SES-CD ≥3) and 98% specificity (AUC 0.94; 95% confidence interval [CI] 0.91–0.97; p < 0.001). As for severe disease, sMARIA ≥3 accurately identified segments with severe lesions (ulcers) with 93% sensitivity (26 segments with sMARIA ≥3 out of 28 segments with ulcers) and 92% specificity (AUC 0.93; 95% CI 0.89–0.97; p < 0.001) (Figure 2). sMaRIA’s diagnostic accuracy separate sub-analysis for each intestinal segment is presented in Table 3. MRe overestimated severity in the rectum, and accuracy was not significant for the identification of severe disease in this segment. Still, the one patient with severe rectal lesions on endoscopy was adequately captured by a sMARIA ≥3. Also, endoscopic remission (SES-CD 0 to 2 per segment) was correctly identified by a sMARIA score <1 in 98.8% of cases (332 out of 336 segments with inactive disease).

(a) ROC curve for the prediction of active disease; (b) ROC curve for the prediction of severe disease.
sMaRIA’s diagnostic accuracy for the identification of active and severe disease on endoscopy, per intestinal segment.
p-values based on the receiver operating characteristic (ROC) area under the curve (AUC) analysis.
CI: Confidence Interval; sMARIA: simplified Magnetic Resonance Index of Activity for Crohn’s Disease.
This analysis was additionally performed for the radiology resident reader set, and accuracy was comparable between raters. sMARIA ≥1 and ≥3, rated by the radiology resident, accurately identified segments with active disease with 87% sensitivity and 98% specificity (AUC 0.92; 95% CI 0.88–0.97; p < 0.001) and severe disease with 89% sensitivity and 92% specificity (AUC 0.91; 95% CI 0.84–0.97; p < 0.001).
Correlation and agreement between scores
At the segment level, a significant correlation between sMARIA and SES-CD was observed for both ileal (R = 0.94; p < 0.001) and colonic segments (R = 0.82; p < 0.001). Also, there was an excellent correlation between global sMARIA and SES-CD (R = 0.95; p < 0.001) and between sMARIA and FC (R = 0.91; p < 0.001) per patient. The correlation between sMARIA and FC remained equally strong when a separate analysis was performed for ileal, colonic and ileocolonic CD (R = 0.88, R = 0.92, R = 0.88, respectively; p < 0.001).
Per segment agreement between the sMARIA and endoscopy for binary classification of active disease was strong (K = 0.87; p < 0.001). Agreement for the identification of severe disease was moderate (K = 0.60; p < 0.001).
Interrater reliability assessment
Overall, interrater agreement between the radiology expert and the resident was excellent (ICC 0.95; 95% CI 0.94–0.96; p < 0.001). Sub-analysis per sMARIA variables, and per intestinal segment is detailed in Table 4. sMARIA variables were all found highly reliable between raters, and per intestinal segment analysis showed raters agreement proportions over 92%. Interrater agreement between the two readers was always superior for the terminal ileum (K = 0.88–1.0), and globally inferior, still moderate, for the transverse colon and rectum (K = 0.52–0.59).
Interrater agreement analysis between the two raters.
p-values based on Kappa statistics test.
sMARIA: simplified Magnetic Resonance Index of Activity for Crohn’s Disease.
Discussion
In this study, we conclusively showed sMARIA’s high accuracy to predict CD activity and severity using SES-CD as the reference standard. In addition, we disclosed sMARIA’s strong correlation with everyday practice tools such as a simple endoscopic score and a biomarker. Lastly, we demonstrated an overall, per item, and per segment sMARIA interrater adequate reliability, between an experienced radiologist and a resident.
This is the largest assessment on sMARIA’s performance after its development study. 6 More recently, the same group of authors addressed the score’s performance without gadolinium-enhanced sequences in a 50-patient sample, reinforcing this instrument asset. 7 In addition, only a comment by Williet et al. 17 was published on this subject, confirming sMARIA’s correlation with the original MARIA, and the Clermont index. Williet et al. advocated that a sMARIA <1 could predict ‘deep remission’. However, ‘deep remission’ was defined with no endoscopic evaluation. Thus, we believe that an independent external validation of the sMARIA was lacking.
To date, endoscopy and endoscopic scores remain the gold standard for the evaluation of CD. 18 Although CDEIS is considered the standard score, its calculation is complex and unsuitable for clinical practice. 13 SES-CD is simpler, significantly correlates with CDEIS,12,19 and with MRe through the original MARIA score.20–22 In our study, more than 400 intestinal segments were evaluated by ileocolonoscopy, with endoscopic assessment based on SES-CD. Our results conclusively demonstrated that sMARIA’s diagnostic accuracy to identify disease remission, active disease and severe disease was excellent, using SES-CD as the reference index. Ordás et al. 6 showed that sMARIA ≥1 and ≥2 were the best cut-off values to identify active and severe disease, respectively, using CDEIS. Similarly, we demonstrated a sMARIA ≥1 and ≥3 as the optimal cut-off values to identify active and severe disease, using SES-CD. This difference in the cut-off value for identification of severe inflammation on MRe may be explained by a low number of segments with severe lesions (n = 28) in our study population.
The major driver for the sMARIA development was the demand for a quick and easy imaging index that could be generalised for clinical practice, without compromising its original accuracy. 6 Indeed, there is a growing need to replace invasive tools with non-invasive alternatives, 1 which should correlate with the endoscopy gold standard, intercorrelate themselves, and add to one another. Endoscopic skipping phenomenon requiring a beyond-the-mucosa assessment and FC elevation constitute examples of this concept. FC appears to be the most sensitive biomarker of intestinal inflammation in inflammatory bowel disease, but it provides no information on disease phenotype or complications. 1 In these scenarios, a MRe evaluation would be useful and complementary. Original MARIA and FC correlation was demonstrated for colonic CD. 23 To prove sMARIA’s role as a clinical practice monitoring tool, a biomarker correlation was lacking. Our data demonstrated an excellent correlation between sMARIA and FC for ileal, colonic and ileocolonic CD.
Lastly, we explored sMARIA’s interrater reliability between an experienced abdominal radiologist and a radiology resident in her last year of training. The original authors pointed out that sMARIA could require much training, and that this could limit the score’s availability. 6 Thereupon, the same authors partly addressed this, showing a moderate to excellent agreement between a less-experienced radiologist and an expert. 7 Further ahead, we demonstrated a moderately good to perfect, per variable and per segment, interrater agreement between an experienced radiologist and a resident. The agreement was lower in distal colonic segments, probably due to suboptimal distension compared with the more proximal segments, as previously reported. 24 This may also explain sMARIA’s lower diagnostic accuracy in the rectum.
Some limitations should be noted in this study. Its design did not allow the control of assessment timepoints, and this may limit the examination’s association precision. To overcome this and prevent significant information bias, the interval between paired MRe and endoscopy and paired MRe and FC was considered only if within 2 weeks, as previously reported, 23 and patients with therapeutic interventions during this timeframe were excluded. Ideally, two senior radiologists’ readings should have been included for diagnostic accuracy analysis. Finally, radiologists were not blinded to gadolinium-enhanced sequences, as this was a retrospective study and its use is currently recommended in guidelines. 1
However, our study has a few strengths that should be highlighted. It encompasses a robust investigation, comprising more than 400 intestinal segments evaluated by MRe and ileocolonoscopy. It represents an independent sample and author contribution, as required for an unbiased external validation.25,26 In addition, to the authors’ knowledge, this is the first study validating sMARIA using everyday tools such as a simple endoscopic index and a biomarker. Also, we further analysed the score’s reliability, evaluating interrater agreement between an expert radiologist and a resident. Finally, unlike the original authors,6,7 luminal colonic contrast administration by a rectal catheter was not performed. As seldom reported,22,27,28 this study shows an accurate colonic assessment may be performed using only oral luminal contrast.
In conclusion, our study demonstrated sMARIA to accurately predict CD activity using SES-CD as a reference and to correlate with a simple endoscopic index and a biomarker. Moreover, the interrater reliability between a radiology resident and an expert was excellent, supporting sMARIA as a suitable clinical practice instrument.
Supplemental Material
sj-pdf-1-ueg-10.1177_2050640620943089 - Supplemental material for The new simplified MARIA score applies beyond clinical trials: A suitable clinical practice tool for Crohn’s disease that parallels a simple endoscopic index and fecal calprotectin
Supplemental material, sj-pdf-1-ueg-10.1177_2050640620943089 for The new simplified MARIA score applies beyond clinical trials: A suitable clinical practice tool for Crohn’s disease that parallels a simple endoscopic index and fecal calprotectin by Joana Roseira, Ana Rita Ventosa, Helena Tavares de Sousa and Jorge Brito in United European Gastroenterology Journal
Footnotes
Ethics approval
The study protocol conforms to the ethical guidelines of the Declaration of Helsinki, and received favourable opinion from the Algarve University Hospital Center Ethic Committee (30/11/2019).
Informed Consent
The authors retrospectively analysed data from March 2015 to March 2020. Thus, waiver of consent for this study was approved. All efforts were made to ensure confidentiality of the data.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
