Correlation Is Not Prediction: Reassessing Predictive MRI Evidence in Guidelines for Persons With Relapsing-Remitting Multiple Sclerosis

Abstract

Background

Effective treatment monitoring and treatment decisions in relapsing-remitting multiple sclerosis (RRMS) require accurate and individualized prediction of future disease courses. Guidelines from the Magnetic Resonance Imaging in Multiple Sclerosis (MAGNIMS) group and the Canadian Multiple Sclerosis Working Group (CMSWG) frequently cite MRI outcomes as predictive, but the methodological quality of this evidence is uncertain.

Objectives

This study aims to critically assess the methodological standards underlying predictive claims about MRI outcomes in four major relevant MS guidelines.

Design

We conducted a content review of citations in the MAGNIMS 2015 and 2021 and the CMSWG 2013 and 2020 guideline publications.

Methods

Each source was evaluated for whether it reported quantitative predictive evidence: either predictive values with confidence intervals, Kaplan–Meier–based risk estimates, or externally validated models that provide accurate risk estimates (good calibration) and correctly separate high- from low-risk patients (good discrimination); We also checked if measures such as correlations, odds ratios, hazard ratios, Prentice criteria, or likelihood ratio tests were used.

Results

Across all four guidelines, most predictive statements relied on secondary citations and association-based measures. Odds ratios, hazard ratios, correlations, or Prentice criteria were commonly reported. Some studies reported predictive values, but confidence intervals were frequently not provided. Only isolated examples of properly validated prediction models were cited, and only one had undergone full external validation. Advanced methods, such as the likelihood reduction factor, were absent.

Conclusion

Current guideline statements on MRI prediction in RRMS often rely on associations rather than validated individualized predictions. They do not quantify individual risk or provide evidence for accuracy, calibration, discrimination, or robustness (reliability of predictions across different patients and settings). To ensure trustworthy and actionable evidence, future guidelines should require prospective risk estimates with confidence intervals, externally validated models with calibration and discrimination, predefined thresholds for predictive usefulness, and evaluation of clinical utility (e.g., decision curve analysis).

Plain Language Summary

Why was the study done? To effectively treat relapsing-remitting multiple sclerosis (RRMS), reliable individualized prediction of disease worsening based on MRI findings is needed. Four widely used guidelines cite sources that provide such predictive information. However, it is unclear if the presented evidence supports good individual predictions. Thus, this study aims to assess the quality of predictive information in the literature cited by the guidelines on the prediction quality of MRI in RRMS. What did the researchers do? The sources claiming predictive ability of MRI in two guidelines from the Magnetic Resonance Imaging in Multiple Sclerosis (MAGNIMS) group and two guidelines from the Canadian Multiple Sclerosis Working Group (CMSWG) were extracted. The objectives, methodology, and content of the sources were analyzed. The methodologies were then grouped into ten statistical categories, and each category was assessed for its quality in individual prediction. What did the researchers find? In total, 75 sources were identified, which were directly or indirectly cited in the guidelines to show the predictive quality of MRI information. About 80% of sources used association measures to show individual prediction. The cited evidence was mostly insufficient to enable clinically relevant individual predictions. Neither groups report the quality of evidence they used in their guidelines. They also do not report measures for uncertainty of estimates (e.g., confidence intervals). However, one study included an independently tested model, while one other study used a statistically sound prediction model. What do these findings mean? Current guideline statements on MRI prediction in RRMS often rely on associations and rarely employ well-validated methods. There is a need for the Multiple Sclerosis scientific community to set minimum standards for the evidence accepted to support individualized prediction and to rank and assess the contribution of each evidence.

Keywords

relapsing-remitting multiple sclerosis prediction disease-modifying therapy personalized MS-treatment treatment monitoring

Background

Personalized (or precision) medicine aims to tailor prevention, diagnosis, and treatment strategies to individual patients rather than applying “one-size-fits-all” approaches. This requires understanding which patients are likely to benefit or be harmed by a particular treatment, based on their molecular, imaging, or clinical profiles. Prediction models estimate the probability of a future clinical outcome (e.g., treatment response, relapse, survival) based on patient-specific features. They can be prognostic (predict individual disease course regardless of treatment) or predictive (estimate differential treatment benefit or harm). Prediction models make personalized medicine operational, but only through shared decision-making (SDM) do individualized predictions translate into care choices aligned with patient values.

There is a triangular relationship between 1) Personalized Medicine (“why?” or “under what circumstances?”), 2) prediction models (“how?” or “how decisions are made?”), and 3) the shared decision making (“who?” or “who participated in the decision?”) as shown in Figure 1. The quality of prediction models is the keystone of the personalized medicine and shared decision-making triangle: only when models are well-validated and clinically useful can personalization be trustworthy and decisions be genuinely shared.

Figure 1.

The triangular relationship between “Personalized Medicine ”, “Prediction Model” and “Shared-Decision Making”.

If prediction models are valid and robust, the flow from data → personalization → shared decisions is coherent. When methodologically inappropriate models provide individual prediction, both personalization and shared decision-making are compromised: the triangle collapses into opinion-based medicine rather than evidence-based personalization.

The treatment of multiple sclerosis (MS) involves a multifaceted approach to personalized medicine and encompasses various types of immunotherapy with different mechanisms of action, routes of administration, and risk profiles.¹ New or enlarging T2 lesions despite immunotherapy indicate a treatment failure. If a breakthrough disease cannot be ruled out and the patient is adhering to treatment, the option is to escalate the therapy. This can prevent further relapses and delay disability by ensuring treatment is optimized on time.^2-6

Treatment response monitoring uses prognostic models for the future disease course based on valid response biomarkers. Early treatment failure detection and consequent treatment switches or escalation enable optimal individual management in relapsing-remitting multiple sclerosis (RRMS). The MAGNIMS (Magnetic Resonance Imaging in MS) group and CMSWG (Canadian Multiple Sclerosis Working Group) guidelines have made important contributions to the treatment and management of MS based on MRI biomarkers.

The MAGNIMS guidelines^5,6 provide standardized MRI protocols and criteria for diagnosis, prognosis, and monitoring disease activity, which are integral for SDM. For example, a patient on a disease-modifying treatment (DMT) who shows new T2 lesions despite no clinical relapse may be identified as having radiological disease activity, prompting a switch to a more effective treatment option.

CMSWG also focuses on optimizing the clinical utility of MRI in MS.^2,3 This provision of guidance is conducive to the consistent utilization of MRI for the timely detection of disease progression. This ensures that any necessary alterations in treatment strategy can be implemented on time. The guidance encourages the use of imaging evidence to facilitate SDM.

A survey among US neurologists revealed that 97% of physicians adhere to the recommendation of monitoring relapsing-remitting MS (RRMS) within 12 months by MRI. Furthermore, when two or more new T2 lesions occur, 67% of patients undergo a treatment switch.⁷ While neither MAGNIMS nor CMSWG issues direct treatment algorithms, they underpin the SDM process in RRMS by defining how MRI is used.

MAGNIMS and CMSWG define practices that lead physicians to recommend treatment changes to their patients following an unfavorable MRI assessment.^2,5,6 However, systematic reviews on prediction models in MS^8,9 concluded that most prediction models are inadequately validated and not yet recommended for clinical use, highlighting the need for the MS field to develop a more nuanced understanding of the elements that constitute a high-quality individual prediction.

Both guidelines enhanced interpretability by standardizing definitions and acquisition protocols. They improved the clarity and consistency of MRI findings, thereby facilitating the process by which clinicians and algorithms can make accurate predictions. Predictions can be made sooner, thus improving opportunities for early intervention. Timeliness is enhanced by ensuring early and regular imaging, while relevance is augmented by the direct correlation of MRI findings with treatment and prognosis in RMS. This ensures that MRI-based predictions are clinically meaningful.

However, in addition to its practical utility, a high-quality prediction must also be statistically sound. The focus of the present paper is on the evidence on prediction quality in the MAGNIMS and CMSWG guidelines. To achieve this objective, we perform a qualitative content analysis, collect, critically assess, and classify the methodologies employed by both guidelines and their updates when assessing evidence for the predictive potential of MRI-based measures. Our goal is to discuss contributions to a methodological roadmap for future guideline development.

Despite the argument that guidelines are not designed as methodological gold standards, but as pragmatic tools, we examine the MAGNIMS and CMSWG guidelines with a methodological and biostatistical focus. Medical guidelines sit in a tension between methodological rigor and clinical practicality: if they privilege one side, the “personalized medicine–prediction model–shared decision-making” triangle tilts, but when both are balanced, guidelines become the bridge that makes robust methodology usable in everyday patient care. Balanced guidelines should explicitly state model quality criteria (validation, calibration, clinical utility) and provide tools for communication in practice (risk calculators, patient decision aids). The following review elucidates the context of prediction in both guidelines. Only when guidelines combine methodological rigor with clinical practicality do prediction models serve as a stable foundation for personalized medicine and shared decision-making in the treatment of RRMS patients.

Methods

A qualitative content analysis (QCA)¹⁰ was performed on all articles to which both guidelines and their extensions refer, as well as the papers in the generations before (chains of citation). It is a structured yet interpretive method of analyzing textual material, aiming to uncover meanings, patterns, and themes. In this specific case, it critically assesses the methodology used to claim evidence for prognostic statements. For more details, see Supplementary Text 1. The methodological strategies were then classified based on four criteria (accuracy, discrimination, calibration, robustness) outlined in Table 1.

Table 1.

Evidence for Predictive Validity Provided by Specific Biostatistical Methods

Method	What it provides	Accuracy (A)	Calibration (C)	Discrimination (D)	Robustness (R)	Comment
Group comparisons (responders vs. non-responders)	Retrospective contrasts of MRI outcomes	✗	✗	✗	✗	No individual risk quantification
Sensitivity & Specificity	Conditional probabilities of MRI outcome given disease outcome	✗	✗	✗	✗	Do not translate into individual risk
Odds Ratios/Hazard Ratios	Relative measures of association	✗	✗	✗	✗	Require baseline risk to inform prediction
Prentice criteria/PTE	Association & mediation at population level	✗	✗	✗	✗	Insensitive to individual variability
Likelihood Ratio Test (LRT)	Compares model fit with/without MRI	✗	✗	✗	✗	Significance depends on sample size
Predictive Values (PPV, NPV)	Risk within MRI-defined groups	✓	✓	✓	✓	Valid if CIs reported
Kaplan–Meier curves	Time-dependent risk estimates	✓	✓	✓	✓	CIs quantify precision of estimates
Correlation/r²	Shared variance between MRI & outcome	✓	✗	✓	✓	Thresholds needed to define good prediction
LRF/Mutual Information	Variance reduction in complex models	✓	✗	✓	✓	Not used in guidelines; thresholds required
Prediction models (validated)	Externally validated AUC + calibration	✓	✓	✓	✓	Gold standard

✗ - No, ✓- Yes

The literature was screened by two authors (DM and UM), and the results were discussed and harmonized by three authors (DM, SB, and UM). Medical issues were discussed with our clinical partner (JH). For each reference in the guidelines, a search was conducted for the paper containing the original results, and the chain of cross-references was recorded. A detailed content analysis of the predictive statements made is provided in the supplement.

The QCA defines a prediction rule as “high-quality” if it is accurate, reliable, and actionable. Accuracy refers to the closeness of the prediction to the true occurrence of the future event. It is assessed by the degree to which predicted probabilities correspond to observed frequencies (calibration) and by the ability of the model to distinguish between different outcomes (discrimination).¹¹ Reliability reflects the robustness and generalizability of the model, i.e., its ability to perform consistently across different settings, including an external validation on independent data.¹² Actionability emphasizes that predictions must support practical SDM. This requires that predictions are interpretable, fostering user trust and understanding of the rationale behind the model outputs.¹³ Predictions must also be timely and relevant, meaning that they are delivered early enough to inform decisions and address the clinical question within its proper context.¹⁴

The following paragraphs explain how we judge the specific methodology that we found in the material studied:

A significant difference in MRI outcomes between response groups is sometimes interpreted as predictive. However, such retrospective contrasts cannot quantify the predictive value of MRI and do not establish individual risk or provide evidence for accuracy, calibration, discrimination, or robustness (A, C, D, R). For example, Tomassini and colleagues,¹⁵ cited in three guidelines, compared baseline MRI between responders and suboptimal responders after six years.

Sensitivity (probability of an unfavorable MRI outcome among patients with unfavorable clinical outcomes) and specificity (probability of a favorable MRI outcome among patients with favorable outcomes) are also cited as predictive, but they likewise fail to establish individual risk or provide evidence for A, C, D, R. Durelli et al.,¹⁶ cited in MAGNIMS 2015 and 2020, reported sensitivity, specificity for MRI activity and neutralizing antibody (Nab) positivity at six months in relation to worsening at 18 months.

Odds ratios (OR) and hazard ratios (HR) are relative measures. Without baseline risk (the probability of an event when all risk factors are assumed yielding their pre-defined baseline values), they cannot quantify individual prediction or provide evidence for A, C, D, R. Río et al.,¹⁷ cited in three guidelines, reported a high OR (8.3, 95% CI 3.1–21.9) for MRI at 12 months predicting disability progression at two years.

The Prentice criteria test whether treatment affects both clinical and MRI outcomes, whether MRI is associated with clinical outcome, and whether MRI mediates a treatment effect¹⁸(Prentice, 1989). From these, the proportion of treatment effect explained (PTE) can be derived using estimates of appropriate regression models.¹⁹ However, these regression-based estimates summarize effects for the entire population group and do not tell us how well the treatment or MRI predicts outcomes for an individual patient. Thus, they provide no evidence for A, C, D, R. An example is Sormani et al.,²⁰ cited in both MAGNIMS guidelines, which linked T2 lesion number and relapse rate utilizing PTE. Supplementary Text 2 illustrates how PTE remains equal to 1 regardless of individual variability, again failing to support individual prediction.

The likelihood ratio test (LRT) compares model fit with and without MRI outcomes. A significant result only shows that including MRI improves model fit, not that it improves individual prediction, especially in large samples. Thus, LRT does not provide evidence for A, C, D, R. Prosperini et al.,²¹ cited in all four guidelines, used LRT and Cox models to identify predictors of EDSS worsening.

Predictive values (positive predictive value, PPV; negative predictive value, NPV) quantify risks within MRI-defined groups and, when accompanied by confidence intervals, provide evidence for A, C, D, R. Prosperini et al.²² reported PPVs and NPVs with CIs to define lesion thresholds predictive of worsening during four years.

Kaplan–Meier curves estimate time-dependent risks within MRI-defined groups. When risk estimates and CIs are reported, they provide evidence for A, C, D, R. For example, Sormani et al.²³ compared disability progression over time between baseline MRI groups, though without Cis and using the same data as for model development (no external or internal validation).

Correlation coefficients (r) and coefficients of determination (r²) are frequently reported. A significant correlation alone does not imply predictive utility. Thresholds for r² and lower confidence bounds above those thresholds are needed to claim precise prediction. This provides evidence for A, D, R, but not calibration. Rudick et al.,²⁴ cited in CMSWG 2020, reported only Pearson correlations between T2 lesions and disease severity. The figure in Supplementary Text 2 provides insight into how correlation relates to the precision of prediction.

The likelihood reduction factor (LRF) and mutual information (MI) extend r² to more complex models. Both quantify reduction in variability, with higher values indicating stronger prediction. Evidence for A, D, R is possible if thresholds are defined, but no guideline cites this method.

Finally, prediction models are methodologically sound only if discrimination (by the area under the curve - AUC) and calibration (intercept and slope) are externally validated. Internal AUCs alone are insufficient, because a model can seem accurate on the original data yet provide systematically incorrect risk estimates for new patients, resulting in miscalibration and poor generalizability. Properly validated models provide evidence for A, C, D, R. For example, Rudick et al.²⁵ selected models based on AUC.

Results

MAGNIMS

In MAGNIMS 2015,⁵ we identified four predictive statements on the role of MRI outcomes for future disease course (Supplementary Figure 1). Most references were secondary citations rather than original studies. Supplementary Tables 1–2 summarize methodological details. Among cited references, five reported predictive values, one reported a prediction model with missing external validation, seven provided Prentice criteria or PTE values, seven used odds or hazard ratios without full model specification, and four cited correlations without thresholds for clinically relevant precision.

In MAGNIMS 2021,⁶ five predictive statements were identified (Supplementary Figure 2; Tables 3–4). Again, most references cited secondary sources. Six reported predictive values, two Prentice criteria or PTE, six odds or hazard ratios without full model specification, and one correlation without thresholds.

CMSWG

The 2013 edition² contained six predictive statements (Supplementary Figure 3; Tables 5–6). Sources again relied on secondary citations. Two papers presented prediction models with no external validation, four reported predictive values, seven cited odds or hazard ratios without reporting the full models or baseline risks, and two provided correlations without thresholds.

In the 2020 update,³ four predictive statements were identified (Supplementary Figure 4; Tables 7–8). Eleven sources reported predictive values, one Prentice criteria/PTE, eleven odds or hazard ratios without models or baseline risks, and two correlations without thresholds.

In summary, across all four guidelines, predictive statements on MRI outcomes were frequent but largely based on secondary citations. Most sources reported relative measures such as odds ratios, hazard ratios. Some sources reported correlations or Prentice criteria (PTE). Only a minority presented predictive values, and a few described prediction models, generally without external validation or calibration. None of the guidelines cited advanced measures such as the LRF. An overview is provided in Table 2.

Table 2.

Synopsis of Methodology Across the Guideline Documents

Methodology	MAGNIMS 2015	MAGNIMS 2021	CWGMS 2013	CWGMS 2020
Group comparison	M: 6 S: 0	M: 6 S: 0	M: 7 S: 0	M: 7 S: 0
Sensitivity, Specificity	M: 2 S: 0	M: 4 S: 0	M: 3 S: 0	M: 4 S: 0
OR, HR	M: 7 S: 0	M: 6 S: 0	M: 7 S: 0	M: 11 S: 0
Prentice criteria, PTE	M: 7	M: 2	M: 0	M: 1
Likelihood Ratio Test (LRT)	M: 1	M: 2	M: 1	M: 1
Predictive values	S: 3	S: 4	S: 3	S: 6
Subgroup-specific Kaplan-Meier curves	S: 2	S: 2	S: 1	S: 5
Correlation	M: 4 S: 0	M: 1 S: 0	M: 2 S: 0	M: 2 S: 0
Likelihood reduction factor (LRF)	M: 0 S: 0	M: 0 S: 0	M: 0 S: 0	M: 0 S: 0
Prediction models	M: 1 S: 0	M: 0 S: 0	M: 2 S: 0	M: 0 S: 0
Percentage of sound methods	M: 28 S: 5 15.2%	M: 21 S: 6 22.2%	M: 22 S: 4 15.4%	M: 26 S: 11 29.7%

S: Number of sources using a methodology that can provide reliable information on prediction quality (sound).

M: Number of sources using a methodology that cannot provide reliable information on prediction quality (misleading).

Most information provided does not allow to quantify patient-related risks.

Discussion

Clinically actionable prediction in RRMS requires transparent risk statements with uncertainty and prediction horizon, not association metrics. Statements like “At 12 months after treatment initiation, two new T2 lesions are observed; the two-year risk of disease activity or EDSS progression is 72% (95% CI 67–77%).” meet this bar when such risk estimates are based on calibrated, discriminating, and decision-ready models and pair a point estimate with quantified uncertainty and a clear prediction horizon.

Such statements can be obtained via: (i) predictive values when both MRI and outcome are dichotomized, with 95% CIs; (ii) Kaplan–Meier analyses that estimate time-dependent risks within MRI-defined strata, allowing horizon-specific predictive values and CIs; and (iii) well-calibrated, externally validated prediction models that report individual risks with calibration (intercept, slope) and discrimination (AUC).^11,26

Rudick et al.²⁵ provided a seminal paper that explored definitions of treatment response to interferon-β in MS by classifying patients according to relapse counts and on-treatment MRI activity. Their analysis showed that subgroups with high numbers of new MRI lesions during therapy exhibited greater clinical and structural progression, whereas relapse-based definitions lacked specificity. Importantly, baseline characteristics could not reliably predict response status, highlighting the limitations of pre-treatment prognostication. While this work illustrates the potential of dynamic, on-treatment markers to refine response classification, it remains essentially a subgrouping exercise rather than a validated individual-level prediction model.

Efthimiou et al.²⁷ present a guidance paper on the appropriate methodology for developing and validating predictions of RRMS disease progression. The paper provides a 13-step guide. The authors develop a Bayesian logistic mixed-effects prediction and test its calibration via a calibration curve and its discrimination via the area under the curve (AUC). Furthermore, the clinical utility of the model is assessed using decision curve analysis, and suitable thresholds are provided.

Frequently, studies assess predictive potential rather than deliver individualized risks, using R² (coefficient of determination) or its generalization, the likelihood reduction factor (LRF).^28,29 Both quantify uncertainty reduction by regression models. Additionally, LRF is suitable for logistic, Poisson, or survival models: for example, an RRMS logistic model including new T2 lesions, treatment, age, and EDSS, yielding LRF = 0.27 (95% CI 0.23–0.32). This number needs a qualitative interpretation, which can be seen as a modest information gain relative to an uninformative model, which does not consider any patient-related characteristics.

To judge clinical relevance, thresholds for R²/LRF are needed. They can help to decide whether it’s worth developing a full predictive model. RRMS studies rarely prespecify them. As a cross-domain reference, IQWiG³⁰ considers LRF < 0.50 clinically irrelevant in oncology. Establishing RRMS-specific thresholds is a priority for future research.

Our review also highlights the frequent reliance on methods that do not support individualised prediction, such as the Prentice criteria or PTE. Future guidelines should prioritise evidence that meets the standards of accuracy, calibration, discrimination and robustness, drawing on recent best-practice papers such as those outlined by Efthimiou et al.²⁷ Figure 2 shows the evidence hierarchy.

Figure 2.

The hierarchy of statistical evidence for individualized prediction in relapsing-remitting multiple sclerosis.

Guideline development should pivot from association-centric evidence towards risk-centric, validated predictions. Guidelines should report PPV/NPV (with CIs) or horizon-specific KM-based risks, and adopt externally validated models with calibration metrics. Based on our discussions, we propose a methodological roadmap for future guideline development, presented in Supplementary Text 3.

To close the loop to practice, guidelines should also incorporate decision curve analysis to quantify the net benefit of treatment for individual patients across relevant disease-progression risk-thresholds, demonstrating whether using a prediction model actually improves treatment decisions compared with treating all or none.^31-33 Recent methodological primers²⁷ offer a template for this shift.

Finally, our focus on four guidelines and on statistical criteria is a limitation; nonetheless, the pattern is consistent: association is abundant; individualized, calibrated risk is scarce. Redirecting evidence standards toward A, C, D, R (see Table 1) will make MRI-based prediction trustworthy and actionable for SDM.

Conclusion

Our review shows that guideline statements from MAGNIMS and CMSWG regarding the predictive value of MRI in RRMS frequently depend on association-based methods, such as correlations, odds ratios, hazard ratios, or Prentice criteria, which are unable to quantify the quality of individual predictions. In contrast, approaches such as predictive values with confidence intervals, subgroup-specific Kaplan–Meier curves, the mutual information-based LRF or externally validated models with calibration and discrimination are the exception rather than the rule.

This imbalance is important because prediction forms the basis of individualized, evidence-based shared decision-making (SDM) in MS care. MRI remains clinically valuable, but the way predictive evidence is currently assessed and reported should be adapted in the future to meet the requirements of modern predictive research.

Future guidelines should therefore set minimum requirements for predictive claims:

(1) Report prospective risk estimates (e.g., predictive values or survival-based estimates) alongside confidence intervals.

(2) Ensure that prediction models are externally validated with calibration and discrimination metrics.

(3) Define thresholds for predictive usefulness (e.g., minimal R² or LRF values).

(4) Evaluate clinical utility with decision curve analysis.

Adopting such standards, similar to initiatives like TRIPOD,³⁴ would greatly improve the interpretability, timeliness and relevance of MRI-based predictions in RRMS and provide a stronger basis for clinical guidelines.

Although MRI remains central to RRMS care, only risk estimates with confidence intervals, validated models and demonstrated clinical utility can provide trustworthy predictions for individualised treatment decisions.

Supplemental Material

Supplemental Material - Correlation Is Not Prediction: Reassessing Predictive MRI Evidence in Guidelines for Persons With Relapsing-Remitting Multiple Sclerosis

Supplemental Material for Correlation Is Not Prediction: Reassessing Predictive MRI Evidence in Guidelines for Persons With Relapsing-Remitting Multiple Sclerosis by Dulat Minas, Stefan Buchka, Joachim Havla and Ulrich Mansmann in Journal of Central Nervous System Disease

Footnotes

ORCID iD

Dulat Minas

Consent for Publication

No personal data is used, and no consent is necessary as the work relies only on published papers.

Author Contributions

DM screened the literature, performed text analysis and text excerption, organized the material and wrote the manuscript, he organized the supplemental material. SB helped to settle conflicts during the text analysis and contributes biostatistical input. He also contributed to the manuscript. JH contributed clinical aspects and advised the team as clinical expert. He also contributed to the manuscript. UM conceived the paper and supervised the project. He contributed to text analysis and gave statistical counselling. He also wrote part of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dulat Minas is funded by the German Academic Exchange Service DAAD (grant number: 91899630); Stefan Buchka is funded via DIFUTURE (grant number: BMBF 01ZZ1804C), and Joachim Havlaby by DIFUTURE (grant number: BMBF 01ZZ1804B).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author Joachim Havla reports a grant for OCT research from the Friedrich-Baur-Stiftung, Amgen/Horizon, Sanofi, Roche, and Merck, personal fees and nonfinancial support from Merck, Alexion, Novartis, Roche, Celgene, Biogen, Bayer, Neuraxpharmand Horizon/Amgen, nonfinancial support from the Sumaira-Foundation and Guthy-Jackson Charitable Foundation. All this support is outside the submitted work. The remaining three authors declare no conflict of interest.

Data Availability Statement

All data are presented in the electronic supplement.*

Supplemental Material

Supplemental Material for this article is available online.

References

Rotstein

Montalban

. Reaching an evidence-based prognosis for personalized treatment of multiple sclerosis. Nature Reviews Neurology. 2019;15(5):287-300. doi:10.1038/s41582-019-0170-8.

Freedman

Selchen

Arnold

, et al. Treatment optimization in MS: Canadian MS Working Group updated recommendations. Canadian Journal of Neurological Sciences. 2013;40(3):307-323. doi:10.1017/s0317167100014244

Freedman

Devonshire

Duquette

, et al. Treatment optimization in multiple sclerosis: Canadian MS working group recommendations. Canadian Journal of Neurological Sciences. 2020;47(4):437-455. doi:10.1017/cjn.2020.66.

Hoffmann

Gold

Meuth

, et al. Prognostic relevance of MRI in early relapsing multiple sclerosis: ready to guide treatment decision making. Therapeutic Advances in Neurological Disorders. 2024;17:17562864241229325. doi:10.1177/17562864241229325.

Wattjes

Rovira

Miller

, et al. MAGNIMS consensus guidelines on the use of MRI in multiple sclerosis--establishing disease prognosis and monitoring patients. Nature Reviews Neurology. 2015;11(10):597-607. doi:10.1038/nrneurol.2015.157.

Wattjes

Ciccarelli

Reich

, et al. MAGNIMS–CMSC–NAIMS consensus recommendations on the use of MRI in patients with multiple sclerosis. The Lancet Neurology. 2021;20(8):653-670. doi:10.1016/S1474-4422(21)00095-8.

Tornatore

Phillips

Khan

Miller

Hughes

. Consensus opinion of US neurologists on practice patterns in RIS, CIS, and RRMS: evolution of treatment practices. Neurology: Clinical Practice. 2016;6(4):329-338. doi:10.1212/cpj.0000000000000254.

Havas

Leray

Rollot

, et al. Predictive medicine in multiple sclerosis: A systematic review. Multiple Sclerosis and Related Disorders. 2020;40:101928. doi:10.1016/j.msard.2020.101928.

Reeve

Havla

, et al. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database of Systematic Reviews. 2023;2023(9): doi:10.1002/14651858.cd013606.pub2.

10.

Mayring

. Qualitative Content Analysis: Theoretical Background and Procedures. In: Bikner-Ahsbahs

Knipping

Presmeg

, eds. Approaches to Qualitative Research in Mathematics Education. Dordrecht: Springer; 2015:365-380. doi:10.1007/978-94-017-9181-6-13.

11.

Alba

Agoritsas

Walsh

, et al. Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. Jama. 2017;318(14):1377-1384. doi:10.1001/jama.2017.12126.

12.

Mannor

. Robustness and generalization. Machine learning. 2012;86(3):391-423.

13.

Lipton

. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue. 2018;16(3):31-57.

14.

Moons

Royston

Vergouwe

Grobbee

Altman

. Prognosis and prognostic research: what, why, and how. Bmj. 2009;338:b375. doi:10.1136/bmj.b375.

15.

Tomassini

Paolillo

Russo

, et al. Predictors of long–term clinical response to interferon beta therapy in relapsing multiple sclerosis. Journal of neurology. 2006;253:287-293. doi:10.1007/s00415-005-0979-5.

16.

Durelli

Barbero

Bergui

, et al. MRI activity and neutralising antibody as predictors of response to interferon β treatment in multiple sclerosis. Journal of Neurology, Neurosurgery & Psychiatry. 2008;79(6):646-651. doi:10.1136/jnnp.2007.130229.

17.

Rio

Rovira

Tintoré

, et al. Relationship between MRI lesion activity and response to IFN-β in relapsing–remitting multiple sclerosis patients. Multiple Sclerosis Journal. 2008;14(4):479-484. doi:10.1177/1352458507085555.

18.

Heller

. Statistical controversies in clinical research: an initial evaluation of a surrogate end point using a single randomized clinical trial and the Prentice criteria. Annals of Oncology. 2015;26(10):2012-2016. doi:10.1093/annonc/mdv333.

19.

Freedman

Graubard

Schatzkin

. Statistical validation of intermediate endpoints for chronic diseases. Statistics in medicine. 1992;11(2):167-178. doi:10.1002/sim.4780110204.

20.

Sormani

Bruzzi

, et al. Combined MRI lesions and relapses as a surrogate for disability in multiple sclerosis. Neurology. 2011;77(18):1684-1690. doi:10.1212/wnl.0b013e31823648b9.

21.

Prosperini

Gallo

Petsas

Borriello

Pozzilli

. One-year MRI scan predicts clinical response to interferon beta in multiple sclerosis. Eur J Neurol. 2009;16(11):1202-1209. doi:10.1111/j.1468-1331.2009.02708.x.

22.

Prosperini

Mancinelli

De Giglio

De Angelis

Barletta

Pozzilli

. Interferon beta failure predicted by EMA criteria or isolated MRI activity in multiple sclerosis. Mult Scler. 2014;20(5):566-576. doi:10.1177/1352458513502399.

23.

Sormani

Rio

Tintorè

, et al. Scoring treatment response in patients with relapsing multiple sclerosis. Mult Scler. 2013;19(5):605-612. doi:10.1177/1352458512460605. PubMed PMID: 23012253.

24.

Rudick

Lee

Simon

Fisher

. Significance of T2 lesions in multiple sclerosis: A 13-year longitudinal study. Ann Neurol. 2006;60(2):236-242. doi:10.1002/ana.20883. PubMed PMID: 16786526.

25.

Rudick

Lee

Simon

Ransohoff

Fisher

. Defining interferon beta response status in multiple sclerosis patients. Ann Neurol. 2004;56(4):548-555. doi:10.1002/ana.20224. PubMed PMID: 15389896.

26.

Moons

Kengne

Woodward

, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio) marker. Heart. 2012;98(9):683-690. doi:10.1136/heartjnl-2011-301246.

27.

Efthimiou

Seo

Chalkou

Debray

Egger

Salanti

. Developing clinical prediction models: a step-by-step guide. Bmj. 2024;386:e078276. doi:10.1136/bmj-2023-078276.

28.

Alonso

Bigirumurame

Burzykowski

, et al. Applied surrogate endpoint evaluation methods with sas and r. CRC Press; 2016.

29.

Burzykowski

Buyse

Molenberghs

. The evaluation of surrogate endpoints. Springer; 2005.

30.

Validity of Surrogate Endpoints in Oncology. Institute for Quality and Efficiency in Health Care (IQWiC): Executive Summaries. 2011; A10-05. https://www.iqwig.de/en/projects/a10-05.html

31.

Chalkou

Vickers

Pellegrini

Manca

Salanti

. Decision Curve Analysis for Personalized Treatment Choice between Multiple Options. Med Decis Making. 2023;43(3):337-349. doi:10.1177/0272989X221143058. PubMed PMID: 36511470; PubMed Central PMCID: PMC10021120.

32.

Vickers

van Calster

Steyerberg

. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18. doi:10.1186/s41512-019-0064-7. PubMed PMID: 31592444; PubMed Central PMCID: PMC6777022.

33.

Vickers

Elkin

. Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Med Decis Making. 2006;26(6):565-574. doi:10.1177/0272989X06295361.

34.

Moons

Altman

Reitsma

Collins

Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Development Initiative . New guideline for the reporting of studies developing, validating, or updating a multivariable clinical prediction model: the TRIPOD statement. Advances in anatomic pathology. 2015;22(5):303-305. doi:10.1097/PAP.0000000000000072.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.86 MB