Abstract
Background:
Multiple sclerosis (MS) is a chronic neuroinflammatory disease affecting about 2.8 million people worldwide. Disease course after the most common diagnoses of relapsing-remitting multiple sclerosis (RRMS) and clinically isolated syndrome (CIS) is highly variable and cannot be reliably predicted. This impairs early personalized treatment decisions.
Objectives:
The main objective of this study was to algorithmically support clinical decision-making regarding the options of early platform medication or no immediate treatment of patients with early RRMS and CIS.
Design:
Retrospective monocentric cohort study within the Data Integration for Future Medicine (DIFUTURE) Consortium.
Methods:
Multiple data sources of routine clinical, imaging and laboratory data derived from a large and deeply characterized cohort of patients with MS were integrated to conduct a retrospective study to create and internally validate a treatment decision score [Multiple Sclerosis Treatment Decision Score (MS-TDS)] through model-based random forests (RFs). The MS-TDS predicts the probability of no new or enlarging lesions in cerebral magnetic resonance images (cMRIs) between 6 and 24 months after the first cMRI.
Results:
Data from 65 predictors collected for 475 patients between 2008 and 2017 were included. No medication and platform medication were administered to 277 (58.3%) and 198 (41.7%) patients. The MS-TDS predicted individual outcomes with a cross-validated area under the receiver operating characteristics curve (AUROC) of 0.624. The respective RF prediction model provides patient-specific MS-TDS and probabilities of treatment success. The latter may increase by 5–20% for half of the patients if the treatment considered superior by the MS-TDS is used.
Conclusion:
Routine clinical data from multiple sources can be successfully integrated to build prediction models to support treatment decision-making. In this study, the resulting MS-TDS estimates individualized treatment success probabilities that can identify patients who benefit from early platform medication. External validation of the MS-TDS is required, and a prospective study is currently being conducted. In addition, the clinical relevance of the MS-TDS needs to be established.
Keywords
Introduction
Multiple sclerosis (MS) is a chronic neuroinflammatory disease affecting more than 200,000 people in Germany and 2.8 million people worldwide.1,2 At the time the disease becomes symptomatic, it is classified as clinically isolated syndrome (CIS), relapsing-remitting multiple sclerosis (RRMS), or primary progressive multiple sclerosis (PPMS). CIS is a patient’s first clinical event without meeting criteria of dissemination both in time and space. 3 Many patients with CIS are likely to convert to RRMS later on. A number of disease-modifying therapy (DMT) options have been approved and are available for treatment of patients with RRMS and CIS. Treatment with DMT is most efficacious during the early phase of the diseases and the efficacy of DMTs decreases over time especially when the disease converts into secondary progressive MS.
Although many patients take advantage of early DMT, long-term studies have demonstrated that a proportion of patients with CIS and MS, who are not treated with DMT, do not acquire significant disability even decades after diagnosis. 4 Given the increase in prevalence over the last decades 5 and the observation of a much better prognosis of recently diagnosed patients, which cannot be fully explained by the availability of DMT, 6 it is conceivable to conclude that a subset of patients with MS or CIS may not require long-term DMT treatment. Identifying these patients may not only protect them from possible side effects of DMTs, which often go along with impaired quality of life, but may also avoid significant costs for the health care system. Thus, the algorithms that allow to stratify patients with respect to prognosis and treatment responses are warranted.
The course of the disease is difficult to predict from the onset and varies greatly among patients. Therefore, various data sources have been used to identify prognostic factors and build multivariable predictive models for disease progression through statistical modelling and machine learning. The results of several recent systematic reviews show that there is a broad awareness of the relevance of the research question, which is reflected in the extensive literature and the many proposed prognostic models.7–9 Their common conclusion, however, is that most of the reviewed studies and respective models are at high risk of bias and lack external validation. A few methodologically well-conduced examples with low risk of bias exist, but these models show only weak accuracy.10–12 A related and even more complex research question arises from the field of personalized medicine and concerns the modification of treatment effects by predictive factors. Appropriate predictive models should support individualized treatment recommendations based on a patient’s characteristics. As a practical consequence, patients requiring effective treatment at an early stage could be identified, as well as patients with an expected mild disease course who may not be unnecessarily exposed to the risk of adverse effects.
This retrospective monocentric cohort study (Retro-MS) was conducted to develop and internally validate a clinically relevant and individualized treatment decision score (Multiple Sclerosis Treatment Decision Score – MS-TDS) for newly diagnosed CIS and RRMS patients. The MS-TDS is supposed to support the treating physician and the patient in making an informed decision based on anticipated treatment success between no or platform medication. A further objective was to identify patient features from clinical, imaging and laboratory data that are predictive factors in this regard.
Methods
The Retro-MS cohort was formed from the routine care patients treated at the Department of Neurology at the Klinikum rechts der Isar of the Technical University of Munich (TUM) to create the MS-TDS by predictive modelling of existing multidimensional baseline data. The MS-TDS predicts the outcome of no new or enlarging T2-lesions13–16 in cerebral magnetic resonance images (cMRIs) of a newly diagnosed CIS or RRMS patient on platform or no medication between 6 and 24 months after their first cMRI using patient features collected at baseline. This outcome is considered to be a sensitive short-term surrogate of clinical disease activity observed in the long-term. 17 Baseline was defined as the first date of cMRI available or DMT start date, whichever occurred first. Patients were followed-up as long as they had eligible cMRI, which is defined as a cMRI acquired until 32 months after baseline or until the first cMRI acquired after 32 months.
Study population and sample
Patients treated at the TUM Department of Neurology at the Klinikum rechts der Isar during the years 2008–2017 were taken into account. They were diagnosed according to the 2005 and 2010 McDonald diagnostic criteria depending on the time point of diagnosis. They were all seen in the outpatient centre of the department in regular intervals, and clinical parameters related to disease activity and severity were recorded. To align the sample of Retro-MS with the study population of the ongoing prospective validation study ProVal-MS, 18 we applied the following selection criteria. Only patients diagnosed with CIS or RRMS earliest 3 years before and latest 1 month after their first cMRI were included. They also had to be previously untreated, including the possibility of a DMT no earlier than 6 months before their first cMRI. Thereby, a period of 6 months was considered as the run-in phase after which a medication becomes effective. 19 Patients with less than two cMRI images available or a difference of more than 32 months between their consecutive cMRI images were excluded. Patients whose data complied with the above rules were eligible for analysis. An illustration of these definitions is given in Figure 1 for an example patient.

Patient-level data progression for an example patient.
Platform medication included the following DMTs: Glatiramer acetate, Interferon-beta 1a and 1b, Peg-Interferon, Dimethyl fumarate and Teriflunomide. A total of 12 patients, who received a more active DMT (e.g. Alemtuzumab, Cladribine, Natalizumab, Mitoxantrone, Rituximab) as first medication, were excluded from analysis. Each interval between two consecutive cMRI dates was assigned one of the two treatment regimens of no medication or platform medication, depending on which was used for the majority of the time within that interval.
Data
Because Retro-MS is based on routine clinical data, the timings of clinical assessments were not under the control of the investigators. Many feature values were collected at baseline, that is, within a 6-month time window around the first date of cMRI or DMT onset. If any data or measurements existed during the 3-month period prior to this date, then the latest of those measurements was considered the baseline value. Otherwise, the earliest measurement within the 3-month period after this date was considered the baseline value (Figure 1). Standarized cMRIs were available from the beginning of 2009 until the end of 2017. Baseline data were collected from 2008 onwards and the outcome assessment was limited to until the end of 2017. The timeframes and definitions of the features included in the analysis were consented between the centres participating in the Retro-MS and ProVal-MS studies. Details are provided in Appendix 1.
Data were exported from different clinical information systems as specified below to a central staging area in the protected clinical network. Data were extracted from tabular form in the staging area and loaded into the Informatics for Integrating Biology and the Bedside (i2b2) and TranSMART data marts for data exploration. Final data integration, data cleaning and construction of patient histories were performed within the software R version 3.6.3 (The R Foundation for Statistical Computing, Vienna, Austria) by creating a data frame object that was used for analysis. All steps were performed according to German and European data protection regulations.
Clinical data
Clinical data were collected during outpatient visits of the patients and stored in the clinical information system. These included demographics, information on diagnosis and clinical presentation at onset, occurrence and clinical presentation of relapses, disease severity [Expanded Disability Severity Scale (EDSS), multiple sclerosis functional composite (MSFC)], fatigue [Fatigue Scale for Motor and Cognitive Functions (FSMC)] and depression [Beck Depression Inventory (BDI)].
Imaging data
The cMRIs were acquired during routine clinical practice at one and the same 3 Tesla scanner (Achieva; Philips Healthcare, Best, the Netherlands) and stored in the radiology information system. The intervals between available consecutive images were of different length. The respective outcome assessment was performed retrospectively in a semi-automated manner based on a fluid-attenuated inversion recovery (FLAIR) sequence [voxel size = 1.5 × 1 × 1 mm; repetition time (TR) = 10,000 ms; time to echo (TE) = 140 ms; inversion time (TI) = 2750 ms] and a three-dimensional (3D) spoiled gradient echo T1-weighted sequence (1 mm isotropic; TR = 9 ms; TE = 4 ms). All images were converted from dicom to Nifti format using dcm2niix. First, lesions in baseline scans were automatically segmented by the lesion segmentation tool (LST, https://www.applied-statistics.de/lst.html) yielding binary lesion segmentations in native space. 20 Next, all images were rigidly co-registered to the T1-weighted image of the same time point using NiftyReg. Then, all images were rigidly brought to Montreal Neurological Institute (MNI) space, and skull-stripped using parameters derived from the T1-weighted image of the same time point (HD-BET, github.com/MIC-DKFZ/HD-BET). 21 Now, segmented lesions from baseline images were labelled according to their location (periventricular, juxtacortical/cortical, infratentorial, subcortical/unspecific) using an atlas-based approach, in which the MNI tissue atlas was deformably registered onto the T1-weighted image using ANTs SyN. Segmented lesions were manually reviewed and corrected by one out of four experienced neuroradiologists using ITK-SNAP. 22 Baseline FLAIR images were rigidly co-registered to follow-up FLAIR images using NiftyRegand, to ensure comparable image intensities, FLAIR baseline images were intensity-scaled according to FLAIR follow-up images by a histogram-matching algorithm (using the ‘match_histograms’ function of the Python package scikit-image); finally, subtraction images were rendered by a voxel-wise subtraction of the baseline FLAIR image from the follow-up FLAIR image. In these difference images, raters only segmented new or enlarging lesions. 23 New solitary lesions had to be at least 3 mm in diameter according to the current diagnostic criteria. 3 New lesions that showed any overlap (i.e. >0 voxels) with an existing lesion and that, hence, could be regarded enlarged were counted if the new lesion area was of a shape that (virtually) could best be described by two (or even more) spheroids, as we then assumed that a new lesion had grown into an existing one. Again, only those lesions with an estimated diameter of at least 3 mm were counted. Lesions having enlarged along the whole of their circumference (towards brain parenchyma) were only counted if the enlargement was clear to the observers. Such ‘truly’ enlarged lesions were hardly ever observed. Both new and enlarging lesions were considered as disease progression. All image evaluations were finally reviewed by one senior neuroradiologist (J.S.K.). This assessment of lesions was blinded to the treatment, medication or future cMRI of a patient.
Laboratory data
Routine laboratory data were generated by the central clinical laboratory and stored in the laboratory information system. Cerebrospinal fluid (CSF) data were generated by the CSF laboratory of the Department of Neurology. Data were transferred into the clinical information system.
Statistical analysis
In Retro-MS, the definition of baseline data and outcome assessment is mainly governed by the timing of cMRI. The latter, however, is not perfectly regular due to the fact that the patient visits in routine care can deviate from preplanned schedule and observations may have different patterns for different patients. This obstacle was overcome by conceptualizing predictive modelling in a time-to-event framework. The occurrence of the primary endpoint – that is, new or enlarged cMRI lesions between consecutive images – was considered as an interval-censored event. The conditional probability of observing an event time T between 6 and 24 months for a patient with feature vector
A predictive RF model was implemented through transformation forests based on fully parameterized Cox proportional hazards models (using a smooth baseline hazard function) to deal with the interval-censored outcome and to finally provide the MS-TDS.24,25 The predictive RF had treatment (no medication versus platform medication) as a predictor variable in the underlying Cox models while other features were used as potential splitting variables to build the tree structure of the forest. With this approach, the interaction of the features with treatment is explicitly modelled. The optimized hyperparameters of the RF were the number of variables randomly sampled as candidates for splitting (usually termed ‘mtry’) and the minimum number of observations to be considered for splitting (‘minsplit’). The hyperparameter tuning was based on a prespecified set of potential values. Recommended values
A benchmark study was performed for hyperparameter tuning and to choose the best performing model as well as to obtain an unbiased estimate of its performance. The area under the receiver operating characteristics curve (AUROC) at 24 months served as the corresponding performance measure using the MS-TDS as predictor variable. 29 Models were compared by internal validation, that is, via nested threefold cross-validation. Thereby, a best performing model was determined in each inner cross-validation loop. These models were refit to the whole data of the respective inner loop and applied to the test data of the corresponding outer loop to obtain unbiased performance estimates. The average of these values provides an unbiased assessment of the overall performance of a best model. The best model itself was determined through the best performing model in the outer loop and refit to the whole data to produce the MS-TDS. Likelihood-based permutation variable importance measures (VIMPs) of this final model were used to identify informative predictor variables.30,31 Predictor variables with a VIMP lower than the VIMP of an additionally included random variable were excluded from VIMP display. 32 To further evaluate the counterfactual analysis, the MS-TDS was calculated assuming both treatment alternatives for each patient and compared between the actual medication groups.
In addition to the above-mentioned analyses, baseline data were described for the analysis cohort by medication group. Descriptive statistics used are absolute and relative frequencies for categorical variables and median and interquartile range (IQR) for numeric or ordinal variables. The interval-censored outcome was described by plotting Weibull estimates of event probabilities by medication group.
Before the analysis, missing values of patient features were imputed by an RF imputation model provided by the R package ‘missForest’. 33 The interval-censored outcome was omitted during imputation to prevent artificially creating relations between that and the patient features.
All analyses were performed with the software R 3.6.3 (The R Foundation for Statistical Computing). The session info including information about the used packages is provided in Appendix 2. A Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist for prediction model development and validation is provided in Appendix 4. 34
Results
A total of 2992 patients had a record of any neurological diagnosis in the clinical data of the neurology department during the years 2008–2017. Of these, 774 patients had a diagnosis of CIS or RRMS and at least two T2 FLAIR cMRI images. A subset of 509 patients further met the eligibility criteria based on diagnosis, imaging and treatment history. A further 34 patients had to be excluded because of technical problems in image analysis, pregnancy, unknown medication status or use of high efficacy medication from baseline. Finally, 475 patients contributed 1804 images to analysis. A detailed flow chart of patient selection is presented in Figure 2.

Flow chart of patient selection.
A summary of baseline characteristics by medication is given in Table 1. No medication and platform medication were administered to 277 (58.3%) and 198 (41.7%) patients at baseline, respectively. Patients with no medication at baseline had fewer lesions in their first cMRI (median 13.0 versus 17.5), and were more likely to be diagnosed with CIS rather than RRMS (54.2% versus 45.5%). The primary endpoint was met by 214 patients and 167 patients under no medication and platform medication, respectively. Some data were missing in 44 of 65 features (68%) with an average of 20.5% missing values per feature (median = 22.5%, IQR = 0.0–36.0%). A detailed presentation of the number of missing values per feature and medication group is given in Table 1.
Baseline characteristics of all patients by medication at baseline.
AST, aspartate transaminase; ALAT, alanine transaminase; BDI, Beck Depression Inventory; BMI, body mass index; CIS, clinically isolated syndrome; cMRI, cerebral magnetic resonance images; CSF, Cerebrospinal fluid; EDSS, Expanded Disability Severity Scale; FSMC, Fatigue Scale for Motor and Cognitive Functions; GOT, glutamic oxaloacetic transaminase; GPT, glutamic-pyruvic transaminase; IgA, Immunoglobulin A; IgG, Immunoglobulin G; IgM, Immunoglobulin M; IQR, interquartile range; MCH, mean corpuscular haemoglobin; MCHC, mean corpuscular haemoglobin concentration; MCV, mean corpuscular volume; MSFC, MS functional composite; Quo, quotient; TSH, thyroid-stimulating hormone.
The estimated probabilities of observing no event until time t are displayed in Figure 3. The estimated median time-to-event is 122.0 days (4.0 months) under platform medication and 136.9 days (4.5 months) under no medication. The estimated probability of being event-free at 6 months is 43.2% and 45.3% under platform medication and no medication, respectively. At 24 months, these probabilities are 18.6% and 20.5%, respectively.

Weibull estimates (solid lines) and pointwise 95% confidence intervals (dashed lines) of observing no event until time t, that is, estimates of
The average performance of the best prediction model was AUROC = 0.624 (in-depth information about the results of the benchmark study are given in Appendix 3). The VIMPs of the most important features of the final model, that exceeded the VIMP of a random noise variable, are displayed in Figure 4. Medication is clearly the most important predictor variable. Demographics like age, weight, height, body mass index (BMI) and sex also play an important role in the prediction of the outcome and as predictive factors. Further relevant patient features include the lesion count at baseline as well as the presence of periventricular lesions, the number of relapses, the diagnosis at baseline (CIS or RRMS) as well as the presence of CSF-specific oligoclonal bands and interestingly the Immunoglobulin A (IgA)-index (i.e. IgA Quotient/Albumin Quotient). It is important to note that causal effects or biological relevance cannot be inferred for the variables listed in Figure 4, as the prediction model can also benefit from spurious correlations with the outcome.

Predictor variables with VIMP exceeding the VIMP of a random noise variable in the final model.
Predicting the outcome
The final model provides the MS-TDS, which is the probability of observing no new or enlarged T2-lesion between 6 and 24 months under platform medication or no medication. Given a patient’s characteristics

Illustration of the individual probabilities of being event-free, given the event did not occur before month 6, that is,
According to the MS-TDS, about 61.4% of patients with no medication would have benefitted from platform medication with an expected median increase of 6.9% (IQR = 3.7–10.9%) in the probability of being event-free between month 6 and month 24. For patients with platform medication, it is expected that 45.5% benefitted from this treatment option by a corresponding 5.1% (IQR = 2.3–12.8%). These and additional numbers, as well as the distribution of expected differences in the probabilities of being event-free between month 6 and month 24 under either potential treatment option, are shown in Figure 6. In summary, the median and maximum values suggest that for half of the patients, the risk of an event is expected to be reduced by about 5–20% if the treatment recommended by the TDS is given.

Increase in the probability of being event-free between month 6 and month 24 if the treatment option considered superior according to the MS-TDS rather than the inferior one would have been or was administered to a patient, stratified by actual treatment.
Discussion
Data collected in routine practice are becoming increasingly available for observational studies in MS research on predictive factors and related treatment decisions. 35 Similarly, advanced statistical and machine learning methods are continuously evolving within a theoretically sound framework for estimating average and individual treatment effects (ITEs).25,36–38 Building on these insights, we developed and internally validated the MS-TDS to predict the outcome of no new or enlarging cMRI lesions in a newly diagnosed CIS or RRMS patient on platform or no medication between 6 and 24 months after their first cMRI.
We are publishing the results from the MS-TDS development before performing an external validation study. This provides an objective statement on what will be evaluated in the planned external validation based on the ProVal-MS cohort (ProVal-MS study; German Clinical Trails Register study ID: DRKS00014034).
A predictive RF based on fully parameterized Cox proportional hazards models was fit to the prediction problem and internally validated in a benchmark study that included hyperparameter tuning. The resulting MS-TDS is informative in many ways, suggesting features relevant to the prediction problem by calculating variable importance measures, and predicting individual patient probabilities of being event-free, as a function of time or for the focused time frame of 6–24 months. An illustration of the latter showed that it is possible to identify newly diagnosed patients who would benefit from no medication or platform medication through computation of the MS-TDS based on individual characteristics.
This study identified a number of clinical, laboratory and imaging features as predictive factors of the investigated outcome. Among the strongest predictor variables were the total lesion count at baseline as well as the presence of periventricular lesions, the diagnosis at baseline (CIS versus RRMS), the number of relapses before baseline as well as two CSF parameters – that is the presence of CSF-specific oligoclonal bands and the IgA-index. Most of these have previously been identified as being predictors of the disease course of MS.39–41 In contrast to previous studies, our model is based on an unbiased approach without preselecting supposedly informative variables and internally validated. To our knowledge, none of the prediction models reported so far had low risk of bias in their model development and evaluation steps and performed higher than area under the curve (AUC) = 0.7.10,11,12,42 Although the performance of the model is weak, the results of the prediction are promising and provide a basis for future developments.
The present Retro-MS study has several limitations. The analysis of observational data obtained from nonrandomized trials for treatment effect estimation and the exploration of predictive factors generally inherits the risk of confounding and selection bias. The inferiority of platform medication illustrated in Figure 3 indicates that this may also apply to this study. A counterfactual framework involving the concept of potential outcomes, which are the expected outcomes of a patient under each treatment, has been suggested as a solution and has been applied in the present work. A commonly used approach is the application of weighted, conditional or stratified analyses to estimate average treatment effects (ATEs). An underlying assumption is the strongly ignorable treatment assignment (SITA), which suggests that the actual treatment assignment is conditionally independent from the potential outcomes given the observed covariables. Against that background, potential outcomes can be estimated from the observed data. 36 A known limitation of the potential counterfactual framework, which is a limitation shared with any other study trying to estimate ATE or ITEs from observational data of nonrandomized trials, is that SITA might not hold. This might result in biased effect estimation and models. Such models, however, might still be useful for prediction purposes, which is a property that was internally validated in this study. Another source of potential selection bias is the fact that the data were collected at a specialized centre. External validation will be provided by the subsequent prospective and multicentric ProVal study and may indicate such problems.
Lu et al. 36 suggest the estimation of ITE by RF under SITA. In a comparison of several RF implementations, they found that tuned RF, with a separate RF model fit to each treatment group, performed best. The strategy of fitting separate models to the treatment groups has been criticized though. Powers et al. 37 state, for the case of two treatments, that ‘it is to be expected that the selected basis be different between the 2 regression functions. This can cause differences between the conditional means attributable not to a heterogeneous treatment effect but rather to randomness in the basis selection’. In this study, we therefore fitted tuned RF to the whole data and estimated treatment effects within Cox models simultaneously fitted to both treatments.
Furthermore, the reality of routinely collected data necessitated defining the outcome as interval censored, dealing with missing values, and making consensus decisions regarding ambiguous data. Even at the level of feature definitions, strong assumptions had to be made to consider some features as ‘none’ when there was no entry in the source data. Such decisions were made in consensus meetings with the authors. The assignment of baseline measurements and of treatment groups to the intervals had to be operationalized. In combination with the methodologically challenging task to properly estimate treatment effects, these conditions narrowed the set of applicable statistical models and machine learning methods to a model-based RF, a recently developed one used in the present work. In addition, only internal validation using patients from the same clinic and period could be performed with the available data set. The results on predictive factors should be considered exploratory as they were only discovered to be relevant during the analysis, although the set of potential features was determined in preparatory consensus meetings of the investigators. The routine data also carry a potential risk of misclassification, for example, in the diagnosis of CIS, where oligoclonal band analysis was not recorded in 174/240 (72.5%) patients. The true misclassification rate, however, is likely to be lower because the diagnosis of CIS was based on further diagnostic criteria depending on the time of diagnosis. A similar problem is posed by unobserved confounding, which may be present and may have led to biased findings. Another source of potential bias is the selection of the study population with the imposed restrictions on data availability and timing of DMT onset, first cMRI and diagnosis. For these reasons, the aforementioned prospective ProVal-MS study was initiated simultaneously to the present Retro-MS study to allow an external and unbiased assessment of the MS-TDS. With an AUROC of 0.624, the performance of the MS-TDS can be considered weak 43 but is comparable to the performance of other models from studies with low risk of bias.10,11,12,42 The robustness of the result has yet to be demonstrated in a prospective study currently being conducted for external validation. 18 Clinical relevance also remains to be proven. While the score can provide clear treatment recommendations, there are also patients for whom no clear decision is possible (see Figure 5(b)). Some variables that have shown to be relevant in the present prediction model may be so because of spurious correlations, and may not be of direct biological relevance to the outcome. Further insights into more specific medication subgroups with differential efficacy – such as dimethyl fumarate, teriflunomide and injectable medications – could not be obtained due to the moderate overall sample size available for analysis in this study. Further targeted studies with increased sample sizes are needed to investigate such differences and provide more specific treatment recommendations.
Conclusion
Clinical routine data can be used to support treatment decision-making by statistical modelling and machine learning. The MS-TDS predicts the 24-month outcome of no new or enlarging cMRI lesions in newly diagnosed CIS and RRMS patients. It provides risk estimates that can be used to identify patients who are expected to benefit from no medication or platform medication. The overall performance of the prediction model is weak but comparable to similar models that have recently been suggested. A prospective study is currently being conducted to allow for external validation. The clinical relevance of MS-TDS has yet to be demonstrated. The task of developing models for supporting treatment decisions in early MS remains challenging, and the present work can serve as a methodological example for future studies.
Footnotes
Appendix
| Categorical variables: | ||||
|---|---|---|---|---|
| Domain/test | Variable name | Type | Categories | Timespan |
| Demographics | Smoking | Ordered | Yes/former/no/unknown | Ever smoker (at base document or following visits) ELSE/ever ex-smoker (at base document or following visits) ELSE/ever nonsmoker (at base document or following visits) ELSE/no information at base document or following visits |
| First symptoma,b | Numbness | Logical | No/yes | Ever before baseline |
| First symptoma,b | Other cranial nerve symptom | Logical | No/yes | Ever before baseline |
| First symptoma,b | Paresis | Logical | No/yes | Ever before baseline |
| First symptoma,b | Optic neuritis | Logical | No/yes | Ever before baseline |
| First symptoma,b | Any other symptom | Logical | No/yes | Ever before baseline |
| Demographics | Sex | Factor | Male/female | N/A |
| Relapsesa,b | Numbness | Logical | No/yes | During ±3 months from baseline |
| Relapsesa,b | Other neurological symptom | Logical | No/yes | During ±3 months from baseline |
| Relapsesa,b | Paresis | Logical | No/yes | During ±3 months from baseline |
| Relapsesa,b | Optic neuritis | Logical | No/yes | During ±3 months from baseline |
| Relapsesa,b | Any other symptom | Logical | No/yes | During ±3 months from baseline |
| cMRI | Periventricular lesions | Logical | No/yes | First image |
| cMRI | Subcortical/unspecific lesions | Logical | No/yes | First image |
| cMRI | Juxtacortical or cortical lesions | Logical | No/yes | First image |
| cMRI | Infratentorial lesions | Logical | No/yes | First image |
| Diagnosis | Diagnosis at baseline (CIS) | Factor | CIS/RRMS | Earliest of closest to baseline during −36/+1 months |
cMRI, cerebral magnetic resonance images; CIS, clinically isolated syndrome; RRMS, relapsing-remitting multiple sclerosis.
No if no information.
Include if only year is available in the same year as baseline.
Appendix 2
Appendix 3
Appendix
TRIPOD checklist for prediction model development and validation.
| Section/topic | Item a | Checklist item | Page | |
|---|---|---|---|---|
| Title and abstract | ||||
| Title | 1 | D;V | Identify the study as developing and validating a multivariable prediction model, the target population and the outcome to be predicted. | 1 |
| Abstract | 2 | D;V | Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results and conclusions. | 1 |
| Introduction | ||||
| Background and objectives | 3a | D;V | Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models. | 2 |
| 3b | D;V | Specify the objectives, including whether the study describes the development or validation of the model or both. | 2 | |
| Methods | ||||
| Source of data | 4a | D;V | Describe the study design or source of data (e.g. randomized trial, cohort or registry data), separately for the development and validation data sets, if applicable. | 3–4 |
| 4b | D;V | Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up. | 3 | |
| Participants | 5a | D;V | Specify key elements of the study setting (e.g. primary care, secondary care, general population) including number and location of centres. | 3 |
| 5b | D;V | Describe eligibility criteria for participants. | 3 | |
| 5c | D;V | Give details of treatments received, if relevant. | 3 | |
| Outcome | 6a | D;V | Clearly define the outcome that is predicted by the prediction model, including how and when assessed. | 4 |
| 6b | D;V | Report any actions to blind assessment of the outcome to be predicted. | NA | |
| Predictors | 7a | D;V | Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured. | 3–4, Table 1 |
| 7b | D;V | Report any actions to blind assessment of predictors for the outcome and other predictors. | NA | |
| Sample size | 8 | D;V | Explain how the study size was arrived at. | 3 |
| Missing data | 9 | D;V | Describe how missing data were handled (e.g. complete-case analysis, single imputation, multiple imputation) with details of any imputation method. | 5 |
| Statistical analysis methods | 10a | D | Describe how predictors were handled in the analyses. | 5 |
| 10b | D | Specify type of model, all model-building procedures (including any predictor selection) and method for internal validation. | 5 | |
| 10c | V | For validation, describe how the predictions were calculated. | 5 | |
| 10d | D;V | Specify all measures used to assess model performance and, if relevant, to compare multiple models. | 5 | |
| 10e | V | Describe any model updating (e.g. recalibration) arising from the validation, if done. | 5 | |
| Risk groups | 11 | D;V | Provide details on how risk groups were created, if done. | NA |
| Development versus validation | 12 | V | For validation, identify any differences from the development data in setting, eligibility criteria, outcome and predictors. | NA |
| Results | ||||
| Participants | 13a | D;V | Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. | 5–6, Figure 2 |
| 13b | D;V | Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome. | 6, Table 1 | |
| 13c | V | For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors and outcome). | NA | |
| Model development | 14a | D | Specify the number of participants and outcome events in each analysis. | 6 |
| 14b | D | If done, report the unadjusted association between each candidate predictor and outcome. | NA | |
| Model specification | 15a | D | Present the full prediction model to allow predictions for individuals (i.e. all regression coefficients and model intercept or baseline survival at a given time point). | NA |
| 15b | D | Explain how to the use the prediction model. | 10–11 | |
| Model performance | 16 | D;V | Report performance measures (with CIs) for the prediction model. | 6 |
| Model-updating | 17 | V | If done, report the results from any model updating (i.e. model specification, model performance). | 6 |
| Discussion | ||||
| Limitations | 18 | D;V | Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data). | 13 |
| Interpretation | 19a | V | For validation, discuss the results with reference to performance in the development data, and any other validation data. | 11 |
| 19b | D;V | Give an overall interpretation of the results, considering objectives, limitations, results from similar studies and other relevant evidence. | 13–14 | |
| Implications | 20 | D;V | Discuss the potential clinical use of the model and implications for future research. | 14 |
| Other information | ||||
| Supplementary information | 21 | D;V | Provide information about the availability of supplementary resources, such as study protocol, Web calculator and data sets. | 18–23 |
| Funding | 22 | D;V | Give the source of funding and the role of the funders for this study. | 15 |
TRIPOD, Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis; NA: not applicable.
Items relevant only to the development of a prediction model are denoted by D, items relating solely to a validation of a prediction model are denoted by V and items relating to both are denoted D;V.
Acknowledgements
The authors thank Nikolaus Will, Marie-Christin Metz, David Schinz, Dominik Heim, Philip Prucker, Benita Schnitz-Koep and Daria Filatova for the image segmentation.
