Abstract
Study design
Prospective observational multicenter cohort study.
Objectives
To determine Minimal Clinically Important Difference (MCID) of AO Spine PROST (Patient Reported Outcome Spine Trauma) and conducting a long-term prospective validation.
Methods
Data were collected from a prospective observational international multicenter cohort study. Adults (18-65) with acute thoracolumbar (TL) burst fractures without neurologic deficits were enrolled, and followed for up to 2 years. Patients completed the AO Spine PROST, Oswestry Disability Index (ODI), EQ-5D, and Pain NRS. Characteristics were analyzed using descriptive statistics, MCID for PROST with distribution-based approach using the standard deviation (SD) of change in scores. Floor and ceiling effects were also evaluated. Internal consistency (Cronbach’s alpha, item-total correlation coefficient (ITCC) and pairwise Spearman correlation), construct validity (Pearson correlations (rs) with ODI, EQ-5D, Pain NRS), and responsiveness (effect sizes (ES) and standardized response mean (SRM)) were assessed.
Results
Ninety-three patients were included. MCID for a moderate change (0.5*SD) in PROST score was 10.6. No floor or ceiling effects were observed. Internal consistency was high (Cronbach’s α = 0.9-1.0 and acceptable ITCC). PROST scores strongly correlated with ODI (rs = −0.67 to −0.89; P < .001), but correlations with EQ-5D were weak (rs = −0.29 to 0.05; P > .005), except at 1-year follow-up. No consistent pattern was found with Pain NRS. Responsiveness was very good (ES = 3.2, SRM = 3.1; P < .001).
Conclusions
The AO Spine PROST identified an MCID of 10.6 as indicative of a moderate clinically meaningful change. The instrument also showed strong internal consistency, construct validity, and excellent responsiveness in long-term follow-up.
Introduction
The AO Spine PROST (Patient Reported Outcome Spine Trauma) was developed through a series of preparatory studies aimed at creating the first comprehensive patient-centered outcome measure designed for spine trauma. 1 The development process followed the Core Set development methodology of the International Classification of Functioning, Disability and Health (ICF). 2 During the preparatory phase of the project, 4 distinct studies were conducted. Three of these focused on identifying relevant ICF categories for assessing outcomes of traumatic spinal column injuries, drawing on perspectives from research, clinical experts, and patients.3-5 The fourth study evaluated different response scales to determine their suitability for inclusion in the questionnaire. 6 The preparatory efforts were followed by a formal consensus process, that incorporated the earlier findings and expert feedback, ultimately resulting in the PROST—a 19-item scale based on 25 core ICF categories. 7 Initial validation efforts established the PROST’s reliability and validity in both English and Dutch-speaking populations.8,9 Since then, the instrument has been translated into 18 languages, expanding its global applicability and supporting its use in diverse clinical settings. 10 The PROST has also been increasingly adopted as an outcome measure in clinical research evaluating spine trauma interventions.
Despite these advancements, a critical psychometric property remains undefined: the Minimal Clinically Important Difference (MCID). The MCID represents the smallest change in score that patients perceive as beneficial, serving as a vital benchmark for evaluating treatment effects and guiding clinical decision-making. 11 Without an established MCID, it remains challenging to distinguish statistically significant changes from those that are clinically relevant, which potentially limits the interpretability and practical application of PROST scores in both research and practice.
To address this gap, the present study aimed to determine the MCID for the AO Spine PROST using a distribution-based method. Additionally, a long-term prospective validation of the instrument’s psychometric performance was performed in an international patient cohort.
Methods
Study Procedures and Patients
This study utilized the data from a prospective observational international multicenter cohort study.12,13 It investigated the management of thoracolumbar burst fractures in neurologically intact patients aged 18 to 65 years with an acute (<10 days from injury) traumatic fracture, with or without a suspected Posterior Ligamentous Complex (PLC) injury, between T10 and L2. To ensure a relatively homogeneous cohort among spine trauma, patients with pathological fractures such as osteoporotic or neoplastic, prisoners, prior spinal surgery, multi-trauma with injury severity scores (ISS) greater than 16, and unable to understand or report outcomes were excluded. Patients were recruited from several hospitals worldwide participating in the trial entitled ‘Thoracolumbar burst fractures (AO Spine A3, A4 fractures) in neurologically intact patients: An observational, multicenter cohort study comparing surgical vs non-surgical treatment. (Spine TL A3/4 Study, ClinicalTrials.gov: NCT02827214). 14 The 14 study sites represented North America (6 sites), Europe (5 sites), and 1 site each in India, Middle East and Australia. Each enrolling center obtained local approval from their local institutional review board (UBC CREB NUMBER: H16-02527). Treatment was not randomized but followed the standard clinical decision-making process in each institution and the judgment of the treating surgeon. Patients received either surgical stabilization or non-operative management, including orthosis, body cast, or no bracing. Written informed consent was obtained from all patients, and they were asked to complete the questionnaires at discharge (ie, baseline) and subsequently at 6 weeks, 3 months, 6 months, 1 year, and 2 years post-trauma.
Instruments
Next to the AO Spine PROST, various other questionnaires were administered to the patients for validity purposes: Oswestry Disability Index (ODI), Pain NRS (Numeric Rating Scale) and EuroQol-5D (EQ-5D).15,16 The PROST is the first condition-specific patient-reported outcome measure (PROM) for spine trauma. It consists of 19 items covering a wide range of functional domains such as work/study, travelling, pain, urinating, sexual function, and emotional function. 10 Each item is rated on a 0 to 100 numeric rating scale, with 0 indicating no functional at all and 100 the same level as pre-trauma, regardless of how well or poorly the patient priorly functioned. At the time of study initiation, only the Dutch and English versions of the PROST were available and validated, thus only participants fluent in English and Dutch were included.
The ODI is a widely used PROM designed to assess disability related to low back pain. 15 It comprises 10 items spanning activities of daily living, each scored on a 6-point scale ranging from 0 (no disability) to 5 (maximum disability). The total score is expressed as a percentage, ranging from 0% (no disability) to 100% (completely disabled). Pain NRS is a widely used, unidimensional tool for assessing pain intensity. Patients rate their pain on an 11-point scale from 0 (no pain) to 10 (worst imaginable pain). The EQ-5D is a widely used generic instrument for evaluating health-related quality of life (HRQoL), with scores ranging from 0 to 1, where 1 represents optimal health. 17 It comprises 2 components: the EQ-5D descriptive system and the EQ-5D visual analog scale (EQ VAS).
Statistical Analysis
Patient characteristics were analyzed using descriptive statistics and including sample size (n), mean, standard deviation (SD), median, lower and upper values of the inter-quartile range, and minimum and maximum values. Categorical variables were summarized using the frequency and percentage for each category.
The MCID for the PROST was assessed using a distribution-based approach, assessing change in PROST scores between baseline and 1-year follow-up. First the standard deviation (SD) of change in PROST score between the aforementioned timepoints was calculated. The SD of this change was used to determine effect sizes according to Cohen’s criteria, with a small effect size defined as 0.2 × SD and a moderate effect size as 0.5 × SD. The MCID was calculated for the total group and also stratified by treatment modality (surgical vs. non-surgical) and fracture type (A3 vs. A4). The threshold of 0.5 SD was chosen in accordance with prior literature indicating that a moderate effect size corresponds to a clinically meaningful change. 18
Floor and ceiling effects were also assessed, which would occur if >15% of the patients achieve the lowest or highest possible score, respectively.
Internal consistency was evaluated by Cronbach’s alpha, with a threshold of ≥0.7 considered acceptable. 19 Additional reliability metrics included item-total correlation coefficients and pairwise Spearman correlations to examine the relationship between individual items and the overall scale. Item-total correlation coefficients below 0.2 were considered indicative of poor alignment with the total score and potential candidates for removal. 20
Construct validity was assessed by examining both convergent and discriminant validity. Convergent validity refers to the extent to which the PROST correlates with other instruments as theoretically expected, while discriminant validity refers to a lack of correlation with constructs it is not expected to relate to. 21 Pearson correlation coefficients were calculated between PROST scores and those of the ODI, Pain NRS, and EQ-5D. Correlations were examined at baseline and all follow-up time points, as well as for changes in scores between baseline and 1-year follow-up. For the ODI specifically, comparisons were limited to baseline, 3 months, 1 year, and 2 years, as data from other time points were not collected.
Responsiveness was evaluated by the effect size (ES) and the standardized response mean (SRM). ES was calculated as the mean change in score from baseline divided by the SD at baseline, while SRM was calculated using the mean change in score divided by the SD of the change score. Paired t-tests were used to assess the significance of change at the 1-year follow-up. According to established guidelines, both for ES and SRM values of 0.2-0.5 were interpreted as small, 0.5-0.8 as moderate, and ≥0.8 as large change.19,21,22
Results
Patient Characteristics
Socio-Demographic and Clinical Characteristics of the Study Population (n = 93)
Abbreviations: PLC, posterior ligamentous complex; SD, standard deviation.
Details on the Nonsurgical and Surgical Treatment
Abbreviations: SD = standard deviation.
aMultiple answer options possible.
MCID
Distribution-Based MCID Assessment for AO Spine PROST (0.2 and 0.5 × SD of Change)
SDdiff = standard deviation of change in PROST score from baseline to 1 year.
Floor and Ceiling Effects
Across all follow-up time points, fewer than 15% of participants achieved either the lowest or highest possible PROST scores. This indicates that neither floor nor ceiling effects were present, respectively, suggesting that the instrument has an appropriate scoring range for this patient population.
Internal Consistency
Results for Internal Consistency (Cronbach’s α) of AO Spine PROST Across Different Timepoints
Cronbach α ≥ 0.7 indicate acceptable internal consistency, that is, the items are highly correlated and thus measure to high extent the same concept.
Construct Validity
Construct Validity AO Spine PROST: Convergent and Discriminant Validity Based on Correlations With ODI, Pain NRS and EQ-5d
Abbreviations: rs: Pearson correlation
Responsiveness
Responsiveness AO Spine PROST From Baseline Up to 2-Year Follow-Up With Effect Size and Standardized Response Mean
Abbreviations: SD, standard deviation; ES, effect size; SRM, standardized response mean.
*P-value <.001.
Discussion
This study investigated the Minimal Clinically Important Difference (MCID) of the AO Spine PROST (Patient Reported Outcome Spine Trauma). The MCID for a moderate change in the PROST score was 10.6, based on the distribution-based methodology used in the current study. Additionally, this study represents the first long-term, prospective validation of the PROST with follow-up extending up to 2 years post-trauma in an international cohort. The findings demonstrate that the PROST has strong psychometric properties and high responsiveness over time.
The concept of the MCID was introduced by Jaeschke et al in 1989 to assess whether differences in treatment effects are meaningful from the perspective of individuals living with a given condition. 23 They defined MCID as ‘the smallest difference in score, within the domain of interest, which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive costs, a change in the patient’s management'. In short, this definition centers the patient perspective and reflects the smallest difference that people living with a specific condition perceive as beneficial. Methodological approaches for determining the MCID are generally categorized into 2 main types: distribution-based and anchor-based methods.24,25 Distribution-based methods assess the meaningfulness of change by comparing observed score differences to statistical properties of the sample, such as the standard deviation or standard error of measurement. 18 In contrast, anchor-based approaches link changes in the outcome measure to an anchor that is defined as another measure of change. Most commonly, this comparison is done with an external question, defined on a Likert scale.24,26 The selection of an appropriate anchor is not solely statistical; it often reflects clinical judgment, patient input, or expert consensus regarding what constitutes a meaningful change. 27 In the current study, a distribution-based method was employed due to the lack of an appropriate anchor within the available dataset. This approach is widely accepted, and prior literature suggests that a moderate effect size (0.5 SD) is a reasonable approximation of the MCID. 18 Moreover, it is widely recommended that MCIDs derived through anchor-based methods be supported or validated using distribution-based analyses to enhance interpretability and robustness. 28 Based on this approach, the estimated MCID for the PROST score was 10.6 points. From a clinical perspective, a change of this magnitude can be interpreted as a meaningful change in patient-reported function, reflecting a noticeable improvement or deterioration in overall function as related to their spine injury, and considered clinically important by treating clinicians.
To the best of our knowledge, no MCID thresholds have been established in spine trauma populations for other instruments used in this study, that is, ODI and EQ-5D. This is not surprising as those measures where not developed and validated for spine trauma patients, rather for degenerative lumbar conditions and the generic population, respectively. 29 Consequently, the constructs assessed by these instruments only partially overlap with those captured by the PROST, which was specifically designed to assess functional outcomes following spine trauma. This difference in scope may explain the relatively weak correlations observed with the EQ-5D. Concerning the Pain NRS scale, expert consensus has proposed that a 30% reduction in self-reported pain represents a clinically meaningful improvement, particularly in chronic pain conditions, including those following spinal cord injury.30,31 However, pain intensity represents only one component of recovery after spine trauma, which may contribute to the relatively weak correlation between the PROST and Pain NRS. To date, the current study is also the first to prospectively evaluate the long-term validity of the PROST with structured follow-up up to 2 years. A previous study by Buijs et al did investigate the long-term validity of the PROST, with a median duration of follow-up being 94.5 months, however they did not perform a prospective follow-up, rather a cross-sectional long-term assessment of the PROST together with other questionnaires. 32 Nonetheless, they also found very good long-term validity results for the PROST. Similarly, earlier research demonstrated excellent responsiveness of the PROST in a 3-month follow-up, with effect size (ES) and standardized response mean (SRM) values of 1.81 and 2.03, respectively. In the present study, these values were even higher at 3 months (ES and SRM = 2.3) and continued to increase at 2 years (ES = 3.2, SRM = 3.1), indicating strong responsiveness of the instrument across time.
This study found good psychometric properties for the PROST. Interestingly, the item-total correlation analysis identified ‘Urinating’, ‘Bowel movement’ and ‘Sleep’ as having the weakest correlations with the total score. This is likely due to the inclusion of only neurologically intact patients in this study. Bladder and bowel dysfunctions may be major impairments in patients with severe neurological deficit. 33 Similar findings were reported in the initial PROST development phase and in the Dutch and English validation studies, which also focused on patients with no, transient or mild neurological deficit.8,9 Ongoing validation efforts now include individuals with motor-complete spinal cord injury, thereby broadening the instrument’s applicability across the full spectrum of spinal trauma.34,35 Beyond the aforementioned validation studies for the Dutch and English versions, the PROST has since been validated in several other languages, including German, Slovak, Nepalese, and most recently Finnish.36-39 Given the increasing adoption of the PROST as an outcome measure in clinical research, the establishment of its MCID represents a critical advancement in facilitating the meaningful interpretation and application of its scores.40-46 The identified MCID can help distinguish statistically significant changes from those that are meaningful to patients. In clinical research, this threshold may be used to determine the proportion of patients achieving a clinically meaningful improvement following different treatment strategies. In routine clinical practice, the MCID may also assist clinicians in monitoring patient recovery over time and evaluating whether observed changes in PROST scores reflect meaningful improvements in functional status.
We do recognize this study has several limitations. First, the patient sample was restricted to those with thoracolumbar burst fractures. However, this subgroup is highly relevant as being most controversial in terms of optimal management, thus making it particularly important to define an MCID for this specific population. Nevertheless, future studies should explore whether similar MCID thresholds apply to broader spine trauma populations as that the identified MCID may not be directly generalizable to patients with other spinal injury types. Second, only Dutch- and English-speaking participants were included as these were the only validated language versions available at the study’s initiation. Although this limitation may somewhat restrict the cultural and linguistic generalizability of the findings, additional translations of the PROST have since become available. Future studies can broaden its scope to include larger patient samples with more diverse linguistic populations and a wider range of fracture types. Finally, the study did not include test-retest reliability assessment, primarily to avoid overburdening participants with additional questionnaires. Nonetheless, previous studies have consistently demonstrated excellent test-retest reliability for the PROST.
In conclusion, this study established the Minimal Clinically Important Difference (MCID) of the AO Spine PROST (Patient Reported Outcome Spine Trauma) using a distribution-based approach, identifying a threshold of 10.6 points as indicative of a moderate clinically meaningful change. Additionally, the instrument demonstrated satisfactory psychometric performance in a prospective, long-term validation with follow-up extending to 2 years post-trauma. These findings further support the PROST as a valid and responsive outcome measure, enhancing its utility for both clinical research and routine clinical practice in spine trauma care.
Footnotes
Acknowledgment
The authors thank Dimitri Hauri, from AO, Innovation Translation Center, for the statistical analysis support.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was organized and funded by AO Spine through the AO Spine Knowledge Forum Trauma & Infection, a focused group of international spinal experts. AO Spine is a clinical specialty of the AO Foundation, which is an independent medically guided not-for-profit organization. Study support was provided directly through AO Innovation Translation Center, Network Clinical Research and Clinical Evidence.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Institutional Review Board Statement
Each enrolling center obtained local approval from their local institutional review board (UBC CREB NUMBER: H16-02527).
