Abstract
Background:
The minimal clinically important difference (MCID) for patient-reported outcome measures (PROMs) expresses both the extent of the improvement and the value that patients place on it. MCID use is becoming increasingly widespread to understand the clinical efficacy of a given treatment, define guidelines for clinical practice, and properly interpret trial results. However, there is still large heterogeneity in the different calculation methods.
Purpose:
To calculate and compare the MCID threshold values of a PROM by applying various methods and analyzing their effect on the study results interpretation.
Study Design:
Cohort study (Diagnosis); Level of evidence, 3.
Methods:
The data set used to investigate the different MCID calculation approaches was based on a database of 312 patients affected by knee osteoarthritis and treated with intra-articular platelet-rich plasma. MCID values were calculated on the International Knee Documentation Committee (IKDC) subjective score at 6 months using 2 approaches: 9 methodologies referred to an anchor-based approach and 8 methodologies to a distribution-based approach. The obtained threshold values were applied to the same series of patients to understand the effect of using different MCID methods in evaluating patient response to treatment.
Results:
The different methods employed led to MCID values ranging from 1.8 to 25.9 points. The anchor-based methods ranged from 6.3 to 25.9, while the distribution-based ones were from 1.8 to 13.8 points, showing a 4.1× variation of the MCID values within the anchor-based methods and a 7.6× variation within the distribution-based methods. The percentage of patients who reached the MCID for the IKDC subjective score changed based on the specific calculation method used. Among the anchor-based methods, this value varied from 24.0% to 66.0%, while among the distribution-based methods, the percentage of patients reaching the MCID varied from 44.6% to 75.9%.
Conclusion:
This study proved that different MCID calculation methods lead to highly heterogeneous values, which significantly affect the percentage of patients achieving the MCID in a given population. The wide-ranging thresholds obtained with the different methodologies make it difficult to evaluate the real effectiveness of a given treatment questioning the usefulness of MCID, as currently available, in the clinical research.
Keywords
The minimal clinically important difference (MCID) for patient-reported outcome measures (PROMs) was originally introduced in 1989 by Jaeschke et al 11 to determine the clinical relevance of a specific treatment. The MCID was defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management.” 11 This psychometric parameter is a patient-centered measure that expresses both the extent of the improvement and the value that patients place on it.5,18,26 Recently, the use of MCID for specific PROMs has become increasingly widespread in clinical research to better understand the clinical efficacy of a given treatment, to define guidelines for clinical practice, and to properly interpret the results from trials that use PROMs.2,3,14,26,33 Even though the MCID is widely reported in the literature, there is still large heterogeneity in the different calculation methods proposed and applied over the years.25,26,29,33
The MCID of a given score can be calculated in several ways, which can be broadly classified into anchor-based methods or distribution-based methods.24,34 These methods present both advantages and limitations, and to date, no consensus has been reached for the most suitable methodology to calculate the MCID.24,34 Anchor-based methods are limited by the choice of an anchor question, which is a subjective assessment, and they could be susceptible to recall bias and be influenced by the statistical distribution of scores within each category of the anchor question.18,24 Distribution-based methods are based on purely statistical reasoning, and they could not properly identify what really matters for patients.18,24 Considering the absence of an ascertained superiority of one method over the others, the choice of the MCID calculation approach in clinical research is arbitrary.25,26,29,33 Moreover, the different MCID thresholds potentially calculable by using the various methods are rarely compared or discussed to quantify the effect of the MCID calculation approach chosen. Given the extensive use of the MCID in the scientific literature, there is a need to understand the effect of the different available approaches in terms of MCID values and how this can affect the study results.
The aim of this study was to calculate the MCID threshold values of a commonly used PROM (the International Knee Documentation Committee [IKDC] subjective score) administered to patients with knee osteoarthritis (OA) treated with platelet-rich plasma (PRP) injections by applying various published methods to compare the different MCID values obtained and their effect in the interpretation of the study clinical outcome.
Methods
Study Design and Patient Selection
The data set used to investigate the different MCID calculation approaches was based on prospectively collected data from a database of patients affected by knee OA and treated with intra-articular PRP injections between March 2009 and November 2020 (institutional review board approval Prot. n. 0015664) at the Rizzoli Orthopedic Institute (Bologna, Italy). Informed consent was obtained at the time of patients’ enrollment. PRP treatment was indicated in patients with unilateral symptomatic knee OA with a history of chronic pain (at least 6 months) or swelling, early OA findings at imaging evaluation with signs of cartilage degeneration (Kellgren-Lawrence [K-L] grade = 0, detected on magnetic resonance imaging) or OA (K-L grade = 1-4), and age between 18 and 80 years. Patients with major axial deviation (varus >5°, valgus >5° for mechanical alignment), focal chondral or osteochondral lesions, concomitant ligamentous or meniscal injury, hematologic or severe cardiovascular diseases, infections, or immunosuppression were excluded.
Patients were evaluated through the IKDC subjective score at baseline and at 6 months after the injective treatment. Moreover, at 6 months, patients were asked to express an overall opinion on the treatment received by answering an explicit anchor question, rating on a 6-point scale their clinical condition compared with the baseline: “Compared with before the injective treatment, how would you rate your knee now?” (1, total recovery; 2, much better; 3, a little better; 4, no change; 5, a little worse; 6, much worse). From a total of 408 patients available in the database at the time of the study analysis, 312 were included in this study based on the availability of the specific data requested for the calculation of MCID value for the IKDC subjective score using different methods. The study population consisted of 194 men and 118 women, with a mean age of 53.6 ± 11.4 years and a mean body mass index (BMI) of 26.7 ± 5.0. The affected knee was right in 176 patients and left in 136 patients. Using the collected clinical data of this series, the MCID was calculated for the IKDC subjective score through different previously published methods, either anchor or distribution based.2,7,26,30,33 The different MCID values obtained were then applied to the same series of 312 patients affected by knee OA, treated with PRP injections, and evaluated at 6 months of follow-up to understand the effect of using different MCID methods in evaluating patient response to treatment.
Anchor-Based Methods
Method based on the receiver operating characteristic (ROC) curve (Figure 1) that derives the MCID from the Euclidian method.
Method based on the ROC curve that derives the MCID from the value that maximizes the Youden Index, which is the value having the maximum of the sum of sensitivity and specificity.
Method based on the ROC curve that derives the MCID from the value that minimizes the difference between sensitivity and specificity (Farrar method).
The social comparison approach provides the MCID as the mean of 2 differences: the difference of the mean score between patients who rate themselves as “a little better” and patients who rate themselves “about the same” and the difference in mean score of patients who rate themselves “a little worse” and patients who rate themselves “about the same.”
The responsiveness statistic is the standardized response mean of stable patients: it is obtained by the ratio of the mean Δ score and the standard deviation of the Δ score of stable patients.
Between-patients score change is the mean Δ score of patients who improved minus the mean Δ score of patients who did not.
Within-patients score change MCID is the mean Δ score of patients who improved: the selection of cut-point used on the anchor is arbitrary and can correspond to small, moderate, or large changes according to the decision to include only the patients with small improvement (a little better), patients with small and substantial improvement (a little and much better), or all the improved patients (from a little better to complete recovery).
The methods of 95% limits of upper agreement: MCID is the mean Δ score – 1.96 standard error of the Δ score of stable patients (those who answered “about the same”).
The methods of 95% limits of lower agreement: MCID is the mean Δ score + 1.96 standard error of the Δ score of stable patients.

Receiver operating characteristic (ROC) curve based on the study population.
Distribution-Based Methods
The MCID is the standard error of measurement (SEM) evaluated on the baseline value according to Rai et al. 23 The SEM is defined as the variation in PROMs attributed to instrument unreliability, in which a change smaller than the calculated SEM is likely due to measurement error rather than a true change. Thus, the SEM is considered a characteristic of the measure, not the sample.
The MCID is calculated based on the small effect size: 0.2 effect size, where the effect size is a standardized measure of change obtained by dividing the difference in scores from baseline to posttreatment by the standard deviation of baseline scores.
The MCID is 1.96 SEM, representing the 95% confidence interval margin of error.
The MCID is calculated based on the medium effect size: 0.3 effect size, where the effect size is a standardized measure of change obtained by dividing the difference in scores from baseline to posttreatment by the standard deviation of baseline scores.
The MCID is 0.5 SD of the Δ score.
The MCID is calculated based on the growth curve analysis: it is based on the least squares estimation of the slope of the curve between follow-up and basal values. The MCID is the ratio between the estimated slope and its standard error.
The MCID is calculated based on the standardized response mean, which is a standardized measure of change obtained by dividing the difference in scores from baseline to posttreatment by the standard deviation of the change. It is similar to the effect size, except the change in score is divided by the standard deviation of that change instead of the baseline.
The MCID is the Student
Results
Clinical Results
The IKDC subjective score improved from the basal value of 48.5 ± 16.7 to 62.3 ± 19.0 at 6 months of follow-up (
IKDC Subjective Score at Baseline and at 6 Months of Follow-up a
IKDC, International Knee Documentation Committee subjective score.
MCID Values
Seventeen different methods were employed with a mean value of 10.5 (range, 1.8-25.9). Among these, 9 were classifiable as anchor-based methods and 8 as distribution-based methods (Table 2). The anchor-based methods gave a mean MCID value of 13.1 (range, 6.3-25.9) while the distribution-based ones gave a mean MCID value of 7.6 (range, 1.8-13.8) showing a greater variability. A 4.1× variation of the MCID values was found within the anchor-based methods, and a 7.6× variation of the MCID values was found within the distribution-based methods (Figure 2).
MCID Values Obtained With the Different Methods a
MCID, minimal clinically important difference; ROC, receiver operating characteristic.

Minimal clinically important difference (MCID) threshold value for the International Knee Documentation Committee (IKDC) subjective score obtained through different calculation methods. The threshold values are reported in the same order as they appear in Table 2.
The percentage of patients who reached the MCID for the IKDC subjective score depended on the specific calculation method used. Among the anchor-based methods, this value varied from 24.0% to 66.0%, while among the distribution-based methods, it went from 44.6% to 75.9%, as shown in Figure 3.

Minimal clinically important difference (MCID) achievement based on different calculation methods. Abscissa axis indicates the percentage of the whole study population. Ordinate axis indicates the improvement in the International Knee Documentation Committee subjective score at 6 months of follow-up. The shaded area under the curve represents the percentage of patients achieving MCID. For both anchor-based and distribution-based methods, the minimum and maximum threshold values have been considered for this graph.
Discussion
The main finding of this study is that different MCID calculation methods lead to highly heterogeneous values, which significantly affect the percentages of patients achieving the MCID. These results challenge the current perception in musculoskeletal studies of MCID being able to reflect the treatment success based on the patient perception and on predefined thresholds, as these are highly dependent and variable based on the calculation method chosen.4,12,15,16,28
The study findings question the real usefulness and validity of this psychometric measure in the clinical research. The large number of MCID methods documented in this study confirms the difficulty in choosing the approach to determine the most appropriate value. The use of various MCID calculation methods for the IKDC subjective score, with both anchor-based and distribution-based approaches, provided different values ranging from 1.8 to 25.9. This accounts for a 14× difference in threshold values. This remarkable variability could be due to the conceptual and methodologic differences of the different calculation methods, each with its own advantages and pitfalls.10,26 Unfortunately, no method emerged over the others, making unclear which is the most reliable threshold to be used in clinical practice.
The anchor-based approaches estimate the MCID in reference to an external subjective patient assessment used to evaluate the entity of the change in a PROM.8,24 The external criterion and the selection or grouping of the different scale levels are chosen arbitrarily, and this leads to different MCID values. Anchor-based methods have been criticized for the effect of recall bias on long-term responsiveness. Recall bias happens when a patient remembers best what has happened most recently and has a less clear memory of the more distant past.17,21,26 In addition, the patient report of change could be reflective of the patient’s current health status rather than the amount of change from baseline. 26
On the other hand, distribution-based methods are purely statistical approaches, do not employ clinical questionnaires, and are sample specific, being strictly related to baseline characteristics and results of the cohort of patients evaluated.6,32 One advantage of distribution-based methods is the ability to account for change beyond some level of random variation. 33 Conversely, a weakness of distribution-based methods is that there are no agreed-upon benchmarks for establishing clinically significant improvement.27,33 Perhaps more important, distribution-based methods do not address the question of the patient’s perspective of clinically important change, which is a completely different perspective from a statistical significance. In this regard, according to this study, the distribution methods provided 3 very small MCIDs that are acceptable only if we are looking for any improvement, whatever it is.
This myriad of results leads to problems of interpretation and a state of conflict when deciding which of the reported MCID values is most appropriate. Different MCID calculation methods result in different MCID values, which in turn lead to different interpretation of the treatment success in a given population.29,33 The researcher/clinician using 1.8 points as the MCID is going to demonstrate better results compared with the researcher/clinician using 25.9 points as the MCID. In the series used in this study to understand the effect in the results interpretation of using different MCID methods, the same series could be interpreted as having 76% of patients reaching an MCID, which could be a significant treatment success, or as 24%, which means that 3 of 4 patients fail to experience an improvement, thus showing a complete lack of treatment effectiveness. Playing with these thresholds can favor misinterpretations if not even manipulations of study outcomes.
Another contributing factor to the variability in reported MCID scores is the study population. In particular, patients’ characteristics such as age, sex, BMI, evaluated disease, disease severity, type of treatment, and period of follow-up can significantly influence the MCID score.13,33 Therefore, MCID scores can be considered context specific, rather than an absolute value. For example, Wang et al 31 analyzed the correlation between MCID and these parameters in a cohort of patients affected by different knee impairments, finding that women with a high baseline functional status score and subacute symptoms required lower score differences to report a meaningful change.
The aforementioned aspects can lead to confusion in properly assessing the clinical relevance in the clinical practice as well as in the research setting. For example, in a recently published randomized controlled trial, Park et al 19 evaluated the efficacy of a single intra-articular PRP versus hyaluronic acid injection for patients with knee OA, analyzing the number of patients achieving the MCID threshold for the IKDC subjective score at 6 months. The authors applied an MCID value of 6.3 for the IKDC subjective score, 9 reporting an MCID achievement rate of 60% in the PRP group and 46% in the hyaluronic acid group at 6 months and supporting a higher clinical efficacy for PRP treatment. However, the results of this study, like many others, should be considered as strongly influenced by the arbitrary choice of the MCID value, and the overall effectiveness for both groups could be overestimated or underestimated. While the difference between the treatment groups remains significant, being referenced to the same threshold, the generalizability of the findings in terms of treatment success percentage could be questioned. In fact, applying other calculation methods, such as those evaluated in this study, the percentage of MCID achievement would differ. In this study, the MCID used was previously calculated on a population of patients who underwent articular cartilage surgery for focal cartilage defect rather than injective treatment for knee OA, and thus non-context-specific values were applied. Boffa et al 1 established the MCID for the IKDC subjective score in a cohort of patients treated with PRP injections for knee OA and obtained a value of 8.6. This value differs from the one applied in that cohort and would have led to different results in terms of percentage of patients achieving the MCID if applied for example in the study by Park et al. 19
The lack of context-specific values for many populations and treatments, as well as the great heterogeneity of MCID calculation methods, limits the validity of MCID use. The very wide range of values that can be obtained and used implies the risk of drawing different conclusions in different studies on the same topic. Therefore, it appears necessary to reach a consensus on which is the most suitable method to determine the MCID and to apply the same approach when discussing treatment results within and among different studies. It would be important to reach an agreement on which is the most suitable calculation approach or if a mix of different kinds of calculation methods could be applicable with more reliable results.
The present study has some limitations. First, many MCID calculation methods were included in the analysis, but others could be available. Still, the 17 methods applied represent well the 2 types of MCID calculation approaches, anchor- and distribution-based, and allowed us to clearly document the effect of different methodologies on the identified thresholds. Second, the effect on the study interpretation was shown in a series of patients, which could lead to different findings than other series. Third, a single PROM was evaluated in this series, although the MCID calculation approaches available are the same as for other PROMs, and they can be indistinctly used in relation to all available outcome measures. Fourth, patients affected by knee OA have specific pain trajectories during the natural evolution of this pathology, with unstable pain level in almost half of patients. 20 This aspect could affect the patient’s perception of their clinical status or clinical improvement after a specific treatment, altering the MCID evaluation. Moreover, it is important to consider the patient’s expectation after a treatment, especially for attractive therapies such as intra-articular orthobiologic injections, where the placebo effect plays an important role. 22 However, this study was not aimed at correctly defining a specific treatment efficacy and success, but rather at proving the proof of concept of the results variability and therefore the critical issues in MCID use.
This study confirms that caution is needed when reporting and interpreting the MCID of a given PROM as a measure of a treatment effectiveness both in the clinical setting and in research. A consensus of methodology experts on this issue would definitely be welcome in this field to offer clarity and guidance. Failure to acknowledge these limitations runs the risk of misclassifying patients below a preselected MCID as nonresponders when in fact they have improved. On the flip side, there is also risk of overestimating the number of responders in patient groups with more acute symptoms or disease severity. Moreover, the comparison of the percentage of patients reaching the MCID among different studies should be performed only when the same methodology is applied. Given the inherent limitations in the current MCID score methodologies and applications, MCID should not be considered a main study outcome but rather as one of the outcome measures within a more complete assessment to document treatment results.
Conclusion
This study proved that different MCID calculation methods lead to highly heterogeneous values, which significantly affect the percentage of patients achieving the MCID in a given population. The application of different MCID values, calculated by applying different anchor- and distribution-based methods, in a cohort of patients with knee OA treated with PRP led to an IKDC subjective score MCID variability from 1.8 to 25.9. This translates into a treatment success ranging from 76% or 24% in the same series. The wide-ranging thresholds obtained with the different methodologies make it difficult to evaluate the real effectiveness of a given treatment, questioning the usefulness of the MCID, as currently available, in the clinical research.
Footnotes
Submitted August 16, 2022; accepted December 19, 2022.
One or more of the authors has declared the following potential conflict of interest or source of funding: S.Z. reports personal fees from I + SRL and grants from FidiaFarmaceutici SPA, Cartiheal Ltd, IGEA clinical biophysics, BIOMET, and Kensey Nash. These funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
