Abstract
Background:
Patient-reported outcomes (PROs) are considered the gold standard for evaluating value-based care in orthopaedics. However, there is little evidence to guide implementation of PROs for surgeon performance evaluation.
Purpose:
To develop a risk-adjusted surgeon performance measure using the Knee injury and Osteoarthritis Outcome Score (KOOS) for patients undergoing anterior cruciate ligament reconstruction (ACLR).
Study Design:
Cross-sectional study; Level of evidence, 3.
Methods:
Patients (N = 1248; 662 men; mean age, 30 ± 13 years) who underwent ACLR performed by 40 surgeons between 2010 and 2018 were identified from a large, nationally representative sports medicine clinical data registry. Linear regression was used to predict change scores for each KOOS subscale (Pain, Symptoms, Activities of Daily Living [ADL], Function in Sports and Recreation, and Knee-Related Quality of Life) while adjusting for patient baseline characteristics. A risk-adjusted performance measure was calculated for each KOOS subscale as the difference between the unadjusted and the risk-adjusted predicted change score across all patients treated by a single surgeon. Surgeon-relative quartile ranking was compared across outcome subscale scores.
Results:
One-third of the patients (34%) displayed acute cartilage damage, and 56% had a meniscal injury. In the risk adjustment models, older age, presence of diabetes, current smoking status, acute cartilage damage, concurrent cartilage treatment, lower baseline Veterans RAND 12-Item Health Survey mental and physical component scores, and lower baseline Marx and KOOS subscale values all had a significant negative influence on the predicted KOOS subscale change values (P < .05 for all). Surgeon performance, ranked in quartile groups, was the same for 10 surgeons but varied by 1 to 2 quartiles for the other 30 surgeons across the different KOOS subscales.
Conclusion:
These results showed that surgeon performance varies widely when evaluated using different KOOS subscales for patients undergoing ACLR. Based on the preliminary results and clinical perspective, the authors recommend the ADL and Symptoms subscales as the best options to differentiate surgeon performance for these patients. However, evaluation of surgeon performance may require consideration or use of a set of PROs or the development of a single index PRO that is sensitive to the range of outcome dimensions important to patients.
Value-based health care, where “value” is defined as the patient health outcomes achieved relative to the resources spent, remains at the forefront of the discussion surrounding health care system reform in the United States.13,18 Initial attempts in orthopaedics to assess quality include the Centers for Medicare and Medicaid Services (CMS) hospital-specific reports of 30-day and 90-day risk standardized readmission rates after elective total hip and knee procedures.6,17,23 However, these efforts only reflect process-based measures, which are void of treatment outcomes that are valued by othopaedic patients.
More recent efforts to define and measure value-based care recognize the importance of including patient-reported outcomes (PROs) that capture and track patient expectations, preferences, and distinct dimensions of health status (pain, function, quality of life, etc) over time.4,11,14,20 The CMS Quality Payment Program (QPP; https://qpp.cms.gov) now includes PROs as high-priority measures that provide a better assessment of clinical performance, quality, and value in orthopaedic medicine compared with process measures alone. 9 The changes in PROs before and after a treatment are proposed as an approach to compare providers in the form of PRO–performance measures (PRO-PMs). 9 PRO-PMs can be used to score and compare providers’ relative performance. These scores can then be ranked, and cutoff points can be established to inform pay-for-performance compensation bonuses or penalties. 21 However, the operationalization of this approach is complicated by the multidimensional nature of the outcomes valued by patients.4,14
Recent work has shown that a risk-adjusted PRO-PM approach reveals wide variation in surgeon performance and enables performance comparisons that control for factors outside the surgeon’s control. 22 This approach is particularly suitable for the assessment of common outpatient surgeries such as anterior cruciate ligament (ACL) reconstruction (ACLR). There are several validated PRO options that could be used for ACLR, including the Knee injury and Osteoarthritis Outcome Score (KOOS), International Knee Documentation Committee (IKDC) subjective score, and Lysholm knee score. The KOOS, which is responsive to changes after ACLR, includes 5 subscales: Pain, Symptoms, Activities of Daily Living (ADL), Function in Sports and Recreation (Sports/Rec), and Knee-Related Quality of Life (QOL).3,8 Each of these subscales has been shown to measure distinct outcome dimensions that are valued by patients. However, no study has evaluated if the weighted subscales of pain or function influence surgeon performance ranking. Prior work has shown that the American Shoulder and Elbow Surgeons shoulder score differentiated surgeon performance for rotator cuff repair using risk adjustment of baseline scores and patient factors. 22 Thus, when applying this approach to evaluate surgeon performance for ACLR, it is unknown if performance will vary across surgeons or different KOOS subscale dimensions.
To develop risk-adjusted PRO-PMs from KOOS outcome dimensions, it must first be established if a similar risk adjustment approach differentiates surgeon performance and if surgeon performance varies across KOOS subscales. 2 This is fundamental if PRO-PMs are to be applied to determine surgeon performance and reimbursement in value-based health care models.9,12
The purpose of this study was to apply a risk-adjusted approach for comparing surgeon performance using the KOOS for patients undergoing ACLR and to evaluate if different KOOS subscale values influence surgeon ranking. We hypothesized that surgeon performance will vary according to KOOS subscale values.
Methods
Data and Study Sample
This study received institutional review board approval and adhered to the Reporting of Studies Conducted Using Observational Routinely-Collected Data guidelines. Data from 2851 patients who underwent ACLR from 2010 to 2018 were extracted from the Surgical Outcomes System (SOS; Arthrex). The SOS database is a national orthopaedic and sports medicine–certified clinical data registry that was developed for surgeons to easily collect and analyze patient outcomes at baseline and after surgery. Patients were included in this study if they had undergone a primary, unilateral ACLR of a complete ACL tear, had complete patient demographic data, and had complete baseline and 6-month postoperative PRO data. The final analytical sample included data from 85 surgeons and 1248 patients. The full cohort inclusion process is depicted in Figure 1.

Flowchart of patient inclusion. ACL, anterior cruciate ligament; KOOS, Knee injury and Osteoarthritis Outcome Score; SOS, Surgical Outcomes System.
Outcome Measures
The KOOS was chosen as the primary outcome of interest because it has been widely used for this study population.3,10,15 Additionally, the outcome dimensions of the KOOS have been shown to be responsive to changes after surgical procedures, including ACLR.3,8 The KOOS holds 5 separately scored subscales: Pain, Symptoms, ADL, Sports/Rec, and QOL. A normalized score (100 indicating no symptoms and 0 indicating extreme symptoms) is calculated for each subscale. A total score has not been validated and is not recommended. For this analysis, we compared the use of different subscale scores to assess how the use of different outcomes affected clinical performance scores and relative surgeon rankings. Postsurgical KOOS subscale values collected at 6 months were used. 2 The 6-month KOOS was selected because currently the CMS QPP requires annual reporting, and this outcome period would grant physicians a 6-month period to perform surgeries that would be used for annual performance evaluation. The use of this episode-based approach maximizes the treatment period and sample size for each physician under evaluation.
Analytical Approach
For this analysis, we applied a previously published risk-adjusted performance measure approach for rotator cuff repair. 22 A summary of each step in that process is presented below, and more details can be found in our previous publication and in the Appendix. 22
Step 1: Select a Disease-Specific PRO
The initial step in developing a performance measure is selecting a disease-specific PRO measure that is sensitive to changes after ACLR. Given that the purpose of this study was to evaluate if different KOOS subscale values influenced surgeon ranking, we chose to model each of the 5 KOOS subscale values.
Step 2: Baseline Risk Adjustment Factors
Patient demographic control measures included in all model specifications were age, sex, diabetes, and smoking status. Clinical control measures included concurrent diagnosis of a multiligament injury, acute cartilage damage (articular or other intra-articular), chronic cartilage damage (osteoarthritis), or meniscal injury. A multiple-ligament injury was defined as a grade 3 injury to the posterior cruciate ligament, medial collateral ligament, lateral collateral ligament, medial patellofemoral ligament, or anterolateral ligament. Concurrent knee procedures were included as control variables and included meniscal repair, meniscectomy, other ligament repair, or cartilage treatment. Baseline PROs were included in all model specifications and included measures of pain, Veterans RAND 12-Item Health Survey (VR-12) mental and physical component scores, Marx activity scores, and all KOOS subscale values. A P value <.05 was considered statistically significant.
Step 3: Model Selection and Assessment
For each KOOS subscale value, we modeled outcome change using a linear regression model estimated using ordinary least squares because the outcome variable was continuous with a large sample size. Although more complex functional forms are available, the linear specification provided the direct relationship between specific factors and subscale scores. The adjusted R2 and residual standard error were used to assess the explanatory power and accuracy of each model. Coefficients for each risk factor were examined for each factor’s impact on the KOOS subscale change value.
Multiple linear predictive models were examined for the effect of each factor on the KOOS subscale change value. Reference class categories were specified as follows: male patient, nonsmoker, no nondiabetes, no concurrent diagnosis of multiligament injury, no acute cartilage damage (articular or other intra-articular), no chronic cartilage damage (osteoarthritis), and no concurrent meniscal repair or meniscectomy surgical procedure performed.
Step 4: Surgeon Performance Scores
The predicted 6-month KOOS subscale change values were estimated for each patient using the risk-adjusted linear models outlined above. The predicted change scores and unadjusted change scores were averaged across all patients treated by a given surgeon. The performance scores for individual surgeons were then calculated as the difference between the risk-adjusted and unadjusted change scores. The resulting “surgeon performance score” represents the number of KOOS points better or worse that the surgeons’ panel of patients achieved than expected.
The Efron bootstrap method was used to estimate the distributional characteristics of each performance score and 95% confidence intervals surrounding each surgeon’s performance score. 5 Each surgeon’s performance score was estimated from a random sample, with replacements, meaning patients could be redrawn multiple times to create a new sample equal to the original sample size. Surgeons with ≥5 complete patient observations were used to estimate performance scores (total of 1165 patients across 40 surgeons). This process was repeated 2000 times to generate a performance score distribution and median score per surgeon. Bootstrapping provides robust control for confounding due to missingness, as it extrapolates the available data to provide robust estimates of the prediction model. All values are presented as means with standard deviations. SAS (Version 9.4; SAS Institute) and R software (Version 1.2.1335; R Foundation for Statistical Computing) were used for building our analytical database and for statistical analysis.
Results
The full study sample included 1248 patients (662 men; mean age, 30 ± 13 years) who underwent primary unilateral ACLR, with 1% having diabetes and 1% current smokers. One-third of patients (34%) reported having acute cartilage damage, with 25% receiving concurrent cartilage treatment. Similarly, 56% had a meniscal injury and 22% underwent a concurrent meniscal repair. The mean KOOS subscale change values were Pain (18 ± 19), Symptoms (15 ± 21), ADL (18 ± 19), Sports/Rec (31 ± 30), and QOL (26 ± 23). Table 1 shows the complete details of the study sample.
Patient Sample Characteristics (N = 1245) a
Data are presented as mean ± SD or n (%). ADL, Activities of Daily Living; KOOS, Knee injury and Osteoarthritis Outcome Score; PRO, patient-reported outcome; QOL, Knee-Related Quality of Life; Sports/Rec, Function in Sports and Recreation; VAS, visual analog scale; VR-12, Veterans RAND 12-Item Health Survey.
Articular or other intra-articular.
The KOOS ADL model explained the largest amount of the variance (R2 = 0.70), followed by the Pain (R2 = 0.55) and Sports/Rec (R2 = 0.53) subscales. Risk factor degree of influence and direction of effect were relatively consistent across all 5 models with some slight variations. Increasing patient age and smoking had a significant negative effect on KOOS change values across all subscales (Table 2). Patients undergoing cartilage treatment had significantly lower KOOS change values across all KOOS outcome dimension models. Patients with higher baseline VR-12 mental and physical component scores displayed greater 6-month KOOS change values across all KOOS subscale models except the KOOS ADL model.
Results of Adjusted Regression Analysis a
Reference class categories were as follows: male sex, nonsmoker, no diabetes, no concurrent diagnosis of multiligament injury, no acute cartilage damage (articular or other intra-articular), no chronic cartilage damage (osteoarthritis), and no concurrent meniscal repair or meniscectomy surgical procedure performed. ADL, Activities of Daily Living; KOOS, Knee injury and Osteoarthritis Outcome Score; PRO, patient-reported outcome; QOL, Knee-Related Quality of Life; Sports/Rec, Function in Sports and Recreation; VAS, visual analog scale; VR-12, Veterans RAND 12-Item Health Survey.
P < .05.
P < .01.
P < .001.
Clinical Performance Scores Across KOOS Subscale Dimensions
Risk-adjusted surgeon performance scores ranged widely across surgeons and models (range –25 to 23 points). The KOOS Sports/Rec model displayed the largest score range. However, the Sports/Rec scores were skewed, with the majority of surgeon scores <0. Conversely, the KOOS Pain, ADL, and Symptoms models had the majority of surgeons achieving scores higher than expected (Figure 2). Evaluation of the 95% confidence intervals indicated that the KOOS QOL model was only able to differentiate the performance scores of 8 of the 40 surgeons, suggesting that surgeon performance did not significantly vary across the QOL subscale. However, the KOOS Symptoms, Pain, Sports/Rec, and ADL models were able to significantly differentiate the performance of many more surgeons.

Anterior cruciate ligament reconstruction surgeon performance measure scores and rankings across Knee injury and Osteoarthritis Outcome Score (KOOS) subscales. Error bars represent 95% confidence intervals. ADL, Activities of Daily Living; QOL, Knee-Related Quality of Life; Sports/Rec, Function in Sports and Recreation.
Relative Surgeon Ranking Across KOOS Subscales
Rankings varied greatly for some surgeons and were more consistent for others when comparing individual and quartile group ranks across KOOS subscale dimensions (Figure 3). For example, surgeons 3, 13, 14, 20, 30, and 35 had performance rankings in the top quartile across all 5 KOOS subscale models. Similarly, surgeons 17, 24, and 25 consistently performed in the bottom quartile of surgeons across all 5 subscale models. For these surgeons, performance rankings did not significantly differ when different KOOS outcome dimensions were used.

Surgeon rank score and quartile group score across Knee injury and Osteoarthritis Outcome Score (KOOS) subscales. Red indicates surgeons who changed ≥2 quartile rankings across KOOS subscale dimensions (n = 13). Yellow indicates surgeons who changed 1 quartile ranking across dimensions (n = 17). Green indicates surgeons who did not change quartile rankings across dimensions (n = 10). ADL, Activities of Daily Living; QOL, Knee-Related Quality of Life; Sports/Rec, Function in Sports and Recreation.
However, other surgeons saw much more varied rankings across the KOOS dimensions. For example, surgeons 4, 5, 11, 26, and 28 had quartile rankings change by 2 or more quartiles across KOOS dimensions. In total, 10 of 40 surgeons were in the same performance quartile no matter what dimension was used (green in Figure 3), 17 of 40 surgeons changed 1 group position (yellow), and 13 surgeons changed at least quartile 2 groups (red) (Figure 3). The majority of surgeons (30/40) experienced rank changes depending on which KOOS subscale dimension was used for outcome assessment.
Discussion
This is the first study to show that clinical performance measures vary widely across KOOS subscales, which are deemed important to patients undergoing ACLR. The variability in surgeon ranking appears to be influenced by the outcome subscale selected and calls to attention that careful consideration must be given to outcome dimension selection when used for surgeon performance rankings. Our results demonstrate that risk adjustment changes surgeon ranking compared with raw KOOS subscale values and support recommendations that provider performance evaluations should report performance across outcome dimensions valued by patients, and the development of value-based payment models should incorporate patient values into payment models. Caution should be used when using subscales for conditions in which improvement in those areas is not expected in the measurement time frame (eg, KOOS Sport/Rec subscale).
While the KOOS subscale models were able to significantly differentiate surgeon rankings, in many cases surgeon rankings varied widely across subscales. In fact, 75% of the surgeons’ quartile group ranking changed by >1 quartile based on the different KOOS models selected. For example, surgeon 11 was the fourth-highest ranked surgeon for the KOOS QOL model but was ranked 25th of the 40 surgeons for the ADL model. This is a critical factor that should be considered as efforts to harmonize outcomes and ongoing conversations surrounding PRO-PM development are occurring.2,9 Because our results demonstrated variation in surgeon performance when using different KOOS subscales, it is possible that other PROs in orthopaedics that are a combined single score across subscale dimensions (eg, Western Ontario and McMaster Universities Osteoarthritis Index and IKDC score) would produce similar provider performance variation across outcome dimensions. To improve performance measurement in orthopaedics, the industry must move beyond the initial efforts to reach PRO consensus that have been strictly based on measurement properties (validity, reliability, and responsiveness).7,20 Rather, outcome measures that are aligned with or weighted to reflect individual patient values across treatment outcome domains should be selected.
Consideration of the outcomes highly valued by patients with ACLR and thought to be the most responsive to treatment choice should have the most weight in surgeon performance evaluation. Given these considerations and our specific analysis, we would prioritize the ADL and Symptom subscales to measure surgeon performance for ACLR at 6 months with the understanding that these factors are most likely to change in that time frame. Pain subscale change scores suggest that, on average, the vast majority of patients had resolved pain at 6 months and pain is not typically considered a primary indication for ACLR. Similarly, the QOL subscale is not typically a differential subscale before ACLR and differentiated the fewest surgeons compared with other subscale models. We do not recommend the Sports/Rec subscale, as best-practice recommendations suggest return to sport no earlier than 6 months and most often later than 7 months; thus, using this subscale as a measure would promote substandard clinical practice. In contrast, the ADL and Symptoms subscales reflect important dimensions that are associated with the best indications for ACLR and are expected to recover in a substantial manner within the first 6 months after ACLR. These 2 domains likely provide the best measures to differentiate surgeon performance at a 6-month postoperative time point after ACLR for the purpose of value-based payment. However, a longer outcome period could be considered for ACLR, with an extended outcome window at 1 or 2 years, and may better differentiate and reflect surgeon performance, keeping in mind that the longer follow-up period will be challenging for current value-based performance models, which are annual assessment periods. This distinction highlights an important consideration when applying these types of performance measures in value-based payment arrangements.
Limitations
Our results should be viewed within the limitations of our study sample and design. We utilized a large, nationally representative data registry to obtain information and were inherently limited by the extent of information available for each patient and physician. For example, specifics regarding the location, size, and morphology of acute cartilage injuries and meniscal tears as well as the extent of the concomitant cartilage/meniscal procedure performed were not readily available. We acknowledge that cartilage and meniscal injuries and concurrent procedures are important clinical variables and should be considered in future lower extremity sports medicine performance models. As a result, in this study, it is possible that a portion of the performance measurement variation we observe across outcome dimensions could stem from differences in the distributions of injury location, size, and morphology across the patients seen by each surgeon. However, it is unlikely that these factors varied consistently across surgeons. Thus, the observed differences in risk-adjusted surgeon performance measurement approach previously applied for shoulder patients using a single outcome can be applied to patients undergoing ACLR across KOOS dimensions. Furthermore, we used outcomes at 6 months to generate the surgeon performance score. This time frame may be too short for some treatment outcome domains and may explain why some models failed to differentiate surgeon performance. The expected full recovery for ACLR is not until 9 to 12 months postoperatively. Therefore, outcomes at 1 year may provide the most meaningful differentiator, as this is the “ultimate outcome” of the treatment.1,16 However, an episode-based approach and a 6-month outcome period were selected to maximize the treatment period and sample size for each physician under evaluation. This decision was made to harmonize with the current CMS reporting period. We found that physician scores were highly variable using 6-month outcomes and believe that the use of a longer outcome, such as 12 months, would likely have the same conclusion. Therefore, we conclude that careful consideration should be given to the outcome measures selected for performance evaluation.
This approach allows for an iterative process of model refinement, where emerging risk factors can be added to the model and model adjustments can be made to improve overall performance. Finally, complete patient demographic and clinical variables needed for analysis were only available for 43.6% of the ACLR population in this registry, suggesting that implementation barriers to establishing a national path forward for clinical performance measurement should be considered. 20 If PRO-PMs are being considered as a method to determine surgeon reimbursement, there must be a widespread commitment to implementing robust data collection practices.
Conclusion
Our results showed that surgeon performance varies widely across the KOOS subscales for patients undergoing ACLR. This is the first study to show that the choice of KOOS subscale influences the relative surgeon ranking and thus potentially impacts reimbursement in value-based payment models. Based on our preliminary results and evaluation at 6 months, we recommend the KOOS ADL and Symptoms subscales as the best options to accurately differentiate surgeon performance for patients undergoing ACLR. Our results suggest that a patient-centered evaluation of surgeon performance may require consideration or use of a set of PROs per joint or the development of a single index PRO that is sensitive to the range of outcome dimensions important to patients. These limitations in current PRO-PM should be carefully considered as value-based payment models are developed and deployed.
Footnotes
Appendix
To develop a risk-adjusted surgeon-level performance measure, we followed the subsequent steps as outlined in a previous publication. 22
The Efron bootstrap method was used to estimate performance scores and confidence intervals surrounding each surgeon’s performance score. Performance scores were estimated from a random sample with replacement drawn from each surgeon’s sample to create a new sample equal to the original sample size. This process was repeated 2000 times to generate a performance score distribution per surgeon. The median score with 95% confidence intervals from each surgeon-specific distribution was used as the estimate.
Final revision submitted March 27, 2024; accepted May 2, 2024.
One or more of the authors has declared the following potential conflict of interest or source of funding: B.A. has received grant support from Arthrex and DJO and education payments from Smith & Nephew and Peerless Surgical. M.K. has received education payments from Arthrex, consulting fees from Arthrex, nonconsulting fees from Arthrex, and hospitality payments from Exactech. C.A.T. has received consulting fees from Breg and has stock/stock options in Players Health and Trex. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval for this study was obtained from Prisma Health (Pro00102962).
