Sage Journals: Discover world-class research

Abstract

Background: Machine learning (ML) has emerged as a method to determine patient-specific risk for prolonged postoperative opioid use after orthopedic procedures. Purpose: We sought to analyze the efficacy and validity of ML algorithms in identifying patients who are at high risk for prolonged opioid use following orthopedic procedures. Methods: PubMed, EMBASE, and Web of Science Core Collection databases were queried for articles published prior to August 2021 for articles applying ML to predict prolonged postoperative opioid use following orthopedic surgeries. Features pertaining to patient demographics, surgical procedures, and ML algorithm performance were analyzed. Results: Ten studies met inclusion criteria: 4 spine, 3 knee, and 3 hip. Studies reported postoperative opioid use over 30 to 365 days and varied in defining prolonged use. Prolonged postsurgical opioid use frequency ranged from 4.3% to 40.9%. C-statistics for spine studies ranged from 0.70 to 0.81; for knee studies, 0.75 to 0.77; and for hip studies, 0.71 to 0.77. Brier scores for spine studies ranged from 0.039 to 0.076; for knee, 0.01 to 0.124; and for hip, 0.052 to 0.21. Seven articles reported calibration intercept (range: –0.02 to 0.16) and calibration slope (range: 0.88 to 1.08). Nine articles included a decision curve analysis. No investigations performed external validation. Thematic predictors of prolonged postoperative opioid use were preoperative opioid, benzodiazepine, or antidepressant use and extremes of age depending on procedure population. Conclusions: This systematic review found that ML algorithms created to predict risk for prolonged postoperative opioid use in orthopedic surgery patients demonstrate good discriminatory performance. The frequency and predictive features of prolonged postoperative opioid use identified were consistent with existing literature, although algorithms remain limited by a lack of external validation and imperfect adherence to predictive modeling guidelines.

Keywords

opioids orthopedics artificial intelligence machine learning addiction pain management

Introduction

Opioid misuse is an increasingly deadly and costly crisis in the United States. In 2020, opioids were involved in nearly 74.8% of all drug overdose deaths [7]. The number of opioid-related fatalities has surpassed 6 times that of the 1990s, when opioid prescriptions surged in response to a call for better treatment of pain [28,44]. Strikingly, a recent study found that orthopedic surgeons prescribe nearly 8% of all opioids in the United States [5]. This position endows orthopedic surgeons with both accountability and opportunity to be responsible stewards of opioid prescription practices.

The clinical use of opioid prescribing guidelines and opioid-sparing pain management protocols, including alternative postoperative analgesic regimens, is essential to protect orthopedic patients from the risks of prolonged postoperative opioid use and misuse. Studies analyzing large patient registries and insurance databases have attempted to retrospectively identify trends in postoperative opioid use and pinpoint risk factors for prolonged use after orthopedic surgery [4,24]. While some factors, such as preoperative opioid use, are widely reported as having important associations with prolonged use [24,35], there remain conflicting reports regarding other independent risk factors, such as age. Some studies have cited an increased risk at age over 50 years, while others cite an age less than 30 years [30,43]. Predictors of opioid use may vary between different orthopedic populations; thus, identification of specific risk factors is necessary to properly assess and educate patients on individualized risk. Furthermore, this information is essential when considering interventions to mitigate risk, such as opioid holidays prior to surgical intervention.

To identify predictive factors for prolonged postoperative opioid use, machine learning (ML) studies have emerged as a method to determine patient-specific risk. ML offers statistically driven predictive modeling wherein algorithm-based tools improve their predictive ability using new experience and data. ML models are also capable of modeling complex associations built upon large datasets, enabling the clinician to integrate a wide variety of patient-specific features to predict a personalized outcome. ML algorithms can be described as supervised, unsupervised, semi-supervised, or reinforcement based; supervised models are common in the medical literature as they are easily built upon preexisting patient databases in which input and output data have been collected [8]. Common supervised ML models used include random forest, neural networks, support vector machines, and naive Bayes, among many others [18]. ML models arrive at conclusions through differing pathways, each with its own flaws and strengths and none being grossly superior across all scenarios. Thus, in designing an ML model, investigators traditionally employ algorithmic methodologies, then select for the best-performing model. Complex modeling achieved by ML is more dynamic than traditional statistical modeling, which is inherently static and limited by collinearity [17]. Also, ML tools can be delivered via readily accessible online applications, which enable clinician and patient to calculate individualized risk in the clinical setting. Given the predictive potential of these tools in estimating patients’ risk for prolonged opioid use, a better understanding of the efficacy and validity of these models may aid in the prevention of long-term opioid use by elucidating who may be at higher risk, thus allowing for targeted interventions.

We set out to conduct a systematic review to analyze the efficacy and validity of ML algorithms in identifying factors that increase patients’ risk for prolonged opioid use after orthopedic surgery. Performance of both internal and external validation was examined. Specifically, definitions of prolonged opioid use, quantitative measures of opioid consumption, risk factors for prolonged opioid use, and successful clinical implementation of ML algorithms were investigated. We hypothesized that current ML algorithms would demonstrate good-to-excellent performance for predicting prolonged postoperative opioid use in patients after orthopedic surgery, but that there would be substantial variability in the parameters used to define prolonged use and adherence to algorithm reporting guidelines.

Methods

Study identification and selection process for this systematic review was performed according to the 2020 Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines (Supplemental File 1) [33]. The review was registered with PROSPERO prior to commencement (ID: CRD42021259523). The following databases were searched for articles published prior to August 1, 2021: PubMed, EMBASE, and the Web of Science Core Collection. The search terminology used to query the databases is available in Supplemental File 2. All articles were reviewed with no additional restrictions.

Two independent reviewers (L.M.K. and K.J.) screened all abstracts of the identified articles for agreement with the following inclusion criteria: (1) available in English; (2) presenting original data; and (3) reporting on the use of ML in orthopedic literature to predict postoperative opioid use. The following exclusion criteria were applied to the queried articles: (1) basic science or biomechanics articles; (2) review articles; (3) case reports; (4) technical notes; (5) editorial notes; and (6) articles reporting on patient outcomes outside the context of postoperative opioid use. Full-length texts were reviewed when the article title and abstract were insufficient for screening purposes. The references of the included articles were also screened to ensure all relevant studies were included in this review. All queried articles were screened using Covidence, an online systematic review manager.

Two independent investigators (L.M.K and K.J.) extracted the following from each study: surgical intervention, sample size, average patient age, number and type of study sites, level of preoperative opioid exposure, definition of prolonged postoperative opioid exposure, types of ML algorithms, and ML model performance metrics such as discrimination, Brier score, calibration, and decision curve analysis [41]. These performance metrics are summarized in Table 1; the pearls and pitfalls are detailed throughout the ML literature [3,10,42]. As available, feature selection, handling of missing data, predictive features of prolonged opioid use, and validation methods were also collected. Exact data and statistics were reported when provided. Disagreement in extracted content between investigators was settled by a third, independent reviewer (K.N.K.).

Table 1.

Machine learning performance metrics.

Aspect	Measure	Score range and interpretation	Aim
Discrimination	Concordance Statistic	0.5 to 1 Scores closer to 1 indicate the model is able to accurately identify the truest positive results with the least false negatives	Calculation of the area under the receiver operating characteristic curve to assess the discriminative ability of the models
Calibration	Intercept Slope	Scores closer to 0 indicate the model infrequently over or underpredicts outcome prevalence Scores closer to 1 indicate the model’s predictions closely approximate the observed outcomes	The tendency of a model to over or underestimate an observed outcome prevalence on average A quantitative assessment of whether a model’s predictions were precise or extreme
Overall performance	Brier Score	0 to 1 0 represents perfect accuracy, while 1 represents perfect inaccuracy	Explains how close predictions are to the actual outcome as a reflection of overall model performance
Clinical utility	Decision Curve Analysis	Curves nearing higher y-axis values with increasing risk profiles suggest the decision under investigation confers greater benefit than other options	Enables interpretation of the clinical utility of the model; compares net benefit at various risk cutoffs

The Methodological Index for Non-Randomized Studies (MINORS) criteria were used to assess the methodologic quality of each study [39]. Noncomparative studies are assessed with a maximum score of 16 and comparative studies with a maximum score of 24. Higher MINORS scores are representative of greater methodological quality. Two independent reviewers scored each article; inter-rater reliability is represented by Cohen’s κ coefficient calculated in Microsoft Excel (Version 16.51).

Reported adherence to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement and Journal of Medical Internet Research (JMIR) Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research was also recorded [9,27]. Articles were further assessed for adherence to each criterion of the TRIPOD checklist of items recommended to be included in the development, reporting, and interpretation of predictive modeling such as ML to enhance the reproducibility and transparency of the models.

Results

Of 1480 studies reviewed, 10 studies that used ML to predict prolonged postoperative opioid use in patients after orthopedic surgery met inclusion criteria (Fig. 1). All 10 studies developed new ML models and were published between 2019 and 2021; no studies performed external validation of an existing ML model. Four (40%) studies reported prolonged opioid use following spine surgery, 3 (30%) following knee surgery, and 3 (30%) following hip surgery (Table 2). The search strategy did not identify studies that used ML to predict prolonged postoperative opioid use following hand or wrist, elbow, shoulder, foot or ankle, pediatric, or trauma surgery.

Fig. 1.

Preferred Reporting Items for Systematic Reviews and Meta-Analysis diagram.

Table 2.

Study features and demographics.

First author (year)	Patient age	Total number of patients	Study site	Orthopedic surgery	Preoperative opioid exposure	Reported outcome	Frequency of prolonged postoperative opioid use
Karhade et al (2019) [21]	51 (IQR: 44–59)	2737	Multicenter (2 academic centers, 3 community hospitals)	Anterior cervical discectomy and fusion for disk herniation or degeneration, stenosis, and/or other spondylotic condition	Continuous opioid use exceeding 180 days prior to surgery, less than 180 days prior to surgery, and opioid naive patients	Uninterrupted filling of prescription opioid extending at least 90–180 days after surgery	9.9%
Karhade et al (2019) [20]	46 (IQR: 37–58)	5413	Multicenter (5 sites)	Lumbar spine surgery for disk herniation	Continuous opioid use exceeding 180 days prior to surgery, less than 180 days prior to surgery, and opioid naive patients	Sustained opioid prescriptions filled after surgery to at least 90–180 days after the index procedure	7.7%
Karhade et al [22]	60 (IQR: 46–71)	8435	Multicenter (2 academic centers, 3 community hospitals)	Lumbar decompression for disk herniation, stenosis, and spondylolisthesis	Opioid naive	Sustained prescription opioid use at least 90–180 days after surgery	4.3%
Zhang et al [46]	Non-long-term users: 51 (IQR: 41–59) Long-term users: 52 (IQR: 43–59)	19,317	Multicenter—Database (MarketScan, Truven Health)	Thoracic or lumbosacral decompression with or without fusion	Opioid naive	Long-term opioid use: filling ≥ 180 days in 1 year after surgery	4.7%
Anderson et al [2]	27 (IQR: 23–33)	10,919	Multicenter—Database (Military Health System Data Repository)	ACL reconstruction	Opioid naive subjects and subjects with preoperative opioid use for > 30 days prior to surgery	Prescription filled more than 90 days after anterior cruciate reconstruction	12.6%
Katakam et al [23]	67 (IQR: 60–74)	12,542	Single health care system	TKA	Opioid naive subjects and patients with ongoing preoperative opioid use of undefined duration	Continuous opioid prescriptions in the 30–180 days after surgery	9.0%
Lu et al [26]	50.5 (IQR: 37–61)	381	Single Institution	Knee Arthroscopy	Opioid naive subjects (no opioid prescriptions within 12 months before surgery), and opioid users (1 or more opioid prescriptions filled in 12 months before surgery and those who endorsed opioid use at the time of surgery)	Extended postoperative opioid consumption at least 150 days following surgery	20.3%
Karhade et al (2019)	66 (IQR: 57–74)	5507	Multicenter (2 academic centers, 3 community hospitals)	THA	Continuous opioid use exceeding 180 days prior to surgery, less than 180 days prior to surgery, and opioid naive patients	Continuous opioid prescriptions to at least 90 days after surgery	6.3%
Kunze et al [25]	34 (IQR: 23–44)	775	Single Surgeon	Hip Arthroscopy (FAIS)	Opioid Naive	One or more opioid prescription refills postoperatively with unspecified follow-up length	18.2%
Grazal et al [14]	31 (IQR: 13)	6760	Multicenter—Database (Military Data Repository)	Hip Arthroscopy	Opioid naive subjects (no opioid prescriptions within 12 months before surgery), and opioid users (1 or more opioid prescriptions filled in 12 months before surgery and those who endorsed opioid use at the time of surgery)	One or more opioid prescription refills 90 days after surgery	40.9%

IQR interquartile range, ACL anterior cruciate ligament, TKA total knee arthroplasty, THA total hip arthroplasty, FAIS femoroacetabular impingement syndrome.

The average MINORS criteria score was 11.3 ± 0.32 points, with almost perfect inter-rater reliability between reviewers (κ = 0.92) [29]. Eight (80%) studies reported adherence to both the TRIPOD and JMIR guidelines; 2 (20%) studies did not report adherence to either set of guidelines [23,46]. All 10 investigations met at least 17 of 20 TRIPOD checklist items (average 18.8 ± 0.79). Items 12 (validation) and 17 (model updating) were omitted from the 22-item checklist as is standard for non-updated, development studies [31]. Missing checklist items included (1) lack of confidence interval reporting [19,23], (2) absence of supplemental information such as study protocol development or access to a web-based application [2,23], and (3) failure to address missing data [46]. Only 1 investigation performed risk grouping [14]. Three (30%) studies compared ML models to logistic regression (LR).

Frequency of Prolonged Opioid Use

Reported frequency of prolonged postoperative opioid use ranged from 4.3% to 40.9% (Table 2), with the lowest rates of prolonged postoperative opioid use reported in 3 of the 4 articles pertaining to spine surgery (range, 4.3%–9.9%). Six (60%) articles used definitions that required sustained opioid use over a predetermined follow-up period; the remaining 4 (40%) articles defined prolonged use by the filling of an opioid prescription after a predetermined time point in the follow-up period [2,14,25,46]. Two (20%) articles [2,46], 1 from either method of defining prolonged use, reported opioid use according to standardized dosing as defined by the Centers for Disease Control and Prevention [2,11,12,46]. Nine (90%) articles used a benchmark of at least 90 days to define prolonged postoperative opioid use, while the remaining study defined prolonged opioid use at ≥ 30 days following the index surgery [23].

Thematic Factors Associated With Prolonged Opioid Use

Preoperative opioid, antidepressant, or benzodiazepine use and age were the most frequently identified risk factors for prolonged postoperative opioid use (Fig. 2). Three investigations defined preoperative opioid use as use for > 180 days prior to surgery [20 –22]; 2 studies defined preoperative opioid use as use between 30 and 365 days prior [2,14]; and 2 studies used conditional or undefined timelines in defining preoperative opioid use [23,46]. Preoperative benzodiazepine and antidepressant use were reported as binary risk factors (use vs no use) without a specified timeline. Four studies identified older age as a positive predictor [14,22,23,25]; 2 identified younger age as a positive predictor [2,14]; and 1 study noted age as a positive predictor but did not specify younger or older [26]. A total of 9 out of 10 studies applied a lower limit of age 18 as an inclusion criterion, with no upper limit to age; only Kunze et al [25] applied no lower age limit.

Fig. 2.

Machine learning methods determine predictive features of prolonged postoperative opioid use following orthopedic surgery.

Spine Surgery

Four studies used ML to predict prolonged postoperative opioid use in patients undergoing spine surgery (Table 2). The average patient age was 52 years. All 4 studies excluded patients younger than 18 years old, with no upper age limit. Three studies investigated lumbar spine fusion and/or decompression, while 1 investigated anterior cervical discectomy and fusion (ACDF). All 4 studies developed novel ML algorithms and utilized a randomized 80:20 training:test population split, where the ML algorithm developed on the training set of patients was independently tested on the remaining 20% of patients not used for algorithm development. Three studies utilized 10-fold cross-validation to assess model performance; Zhang et al [46] did not report validation methodology. All 4 spine investigations converted the premier algorithm into an open-access web application capable of generating individualized predictions for risk of prolonged postsurgical opioid use.

Karhade et al [21] utilized a stochastic gradient boosting algorithm to predict postoperative opioid use following ACDF in both opioid naive patients and opioid users (median age: 51, interquartile range (IQR): 44–59). Model performance was as follows: area under the curve (AUC): 0.81; Brier: 0.076; calibration intercept: −0.01; calibration slope: 1.05. Global explanations of this model highlighted 4 variables as predictors of prolonged opioid use following ACDF from the 12 features isolated by recursive feature selection with random forest algorithms: preoperative opioid use greater than 180 days, antidepressant use, tobacco use, and Medicaid insurance. Similar to their ACDF investigation, Karhade et al [20] identified preoperative opioid use >180 days as an essential predictor of prolonged opioid use following surgery for lumbar disk herniation, along with comorbid depression and instrumentation (median age: 46, IQR: 37–58). This investigation identified elastic net penalized logistic regression (ENPLR) as the optimal ML model (AUC: 0.81; Brier: 0.064; calibration intercept: 0.13; calibration slope: 1.02). In a separate study, this same group [19] applied ML to a study population of opioid naive patients, where they found that instrumentation, uninsured status, and preoperative use of benzodiazepines, antidepressants, or gabapentin were most predictive of prolonged postoperative opioid use (median age: 60, IQR: 46–71). Again, the ENPLR algorithm had the best relative performance among all ML algorithms developed in this study (AUC: 0.70; Brier: 0.039; calibration intercept: 0.06; calibration slope: 1.02).

Zhang et al [46] developed a least absolute shrinkage and selection operator (LASSO) regression model for feature selection and determined that documented preoperative opioid use conferred a 2.70 times greater odds of prolonged opioid use and was the maximally predictive feature. Median age of patients with prolonged opioid use was 52 years and of patients without prolonged opioid use was 51 years. In their comparison of 3 traditional LR models and 4 ML models, LR was shown to be superior to all ML models. Each model utilized a random 80:20 training:test split. Full LR (AUC: 0.847; Brier: 0.039; Sensitivity: 0.749) accurately predicted 80.2% of patients who demonstrated prolonged opioid use and was thus used to construct an online predictive tool. The best-performing ML model, a time-varying convolutional neural network (AUC: 0.800; Brier: 0.041; Sensitivity; 0.809), performed with greater sensitivity but underperformed on discrimination.

Knee Surgery

Three studies developed ML models to predict prolonged opioid use following knee surgeries including anterior cruciate ligament (ACL) reconstruction (median age: 27 years, IQR: 27–33) [2], total knee arthroplasty (TKA) (median age: 67 years, IQR: 60–74) [23], and knee arthroscopy (median age: 50.5 years, IQR: 37.3–60.7) [26] (Table 2).

Anderson et al [2] identified 4 positive predictive features (preoperative morphine equivalents, pharmacy location, shorter deployment time, and age ≤ 23 years) using the Boruta algorithm for elimination with random forest algorithms. Katakam et al [23] identified the following positive predictive features via recursive feature elimination with random forest algorithms: age > 68 years, marital status (unmarried), opioid use between days 30 and 365 preoperatively, diabetes, and preoperative medications (antidepressants, benzodiazepines, gabapentin, nonsteroidal anti-inflammatory drugs, and beta-2-agonists). Anderson et al [2] and Katakam et al [23] utilized a random 80:20 training:test split and cross-validation of the training set, but neither study converted their chosen algorithm (gradient boosting machine, AUC: 0.77, Brier: 0.010; stochastic gradient boosting, AUC: 0.76, Brier: 0.073; Calibration intercept: 0.16; Calibration slope: 1.08, respectively) into an open-access web application. Anderson et al also included a comparison of their ML algorithms to traditional LR, where LR performed similarly but inferiorly (AUC: 0.76; Brier: 0.10) to the gradient boosting machine and superiorly to the remaining ML models.

Lu et al [26] performed training and validation using bootstrapping [40]; their model was the only algorithm developed primarily with preoperative patient-reported outcomes as predictive features, determined by recursive feature elimination with random forest algorithms. They reported that the preoperative International Knee Documentation Committee (IKDC); the Knee Injury and Osteoarthritic Outcomes Score (KOOS) pain, activities of daily living, and sports and activities subscales; and the Veterans RAND 12 Mental Component Score (VR12 MCS), age (unspecified), duration of symptoms, perioperative oral morphine equivalents, previous injections or nerve blocks, and days of exercise per week were the most important predictive features. Reduced baseline patient reported outcome metrics were associated with prolonged postoperative opioid use, although thresholds for identifying low preoperative scores were not defined. Their linear ensemble model demonstrated superior discrimination (AUC: 0.75; Brier: 0.124; Calibration intercept: 0.001; Calibration slope: 0.99) in comparison to LR when compared by decision curve analysis and was converted into a web application.

Hip Surgery

Three of the included studies developed ML models to predict prolonged opioid use following hip surgery (Table 2), including total hip arthroplasty (THA) (median patient age: 66 years, IQR: 57–74) [22] and hip arthroscopy for femoroacetabular impingement syndrome (median age: 34 years, IQR: 23–44 [25]; median age: 31 [14]). Karhade et al and Kunze et al tested 5 ML models (stochastic gradient boosting, random forest, support vector machine, neural network, and ENPLR,; implemented recursive feature elimination with random forest algorithms for feature selection, utilized a random 80:20 training:test split, and performed model assessment via 10-fold cross-validation (Supplemental File 3). Karhade et al [22] identified the ENPLR (AUC: 0.77; Brier: 0.052; calibration intercept: 0.01; calibration slope: 0.97) as the best performing model for predicting opioid use after THA and identified the following features as predictive: age > 66 years, opioid use >180 days preoperatively, preoperative hemoglobin (anemia), and preoperative medications (antidepressants, benzodiazepines, nonsteroidal anti-inflammatory drugs, and beta-2-agonists). Kunze et al [25] selected a stochastic gradient boosting algorithm (AUC: 0.75; Brier: 0.13; calibration intercept: −0.02; calibration slope: 0.88) and identified the following predictive factors for prolonged opioid use following hip arthroscopy: preoperative modified Harris hip score (mHHS), age, body mass index, preoperative visual analog scale (VAS) for pain, and workers compensation status.

Grazal et al [14] also examined hip arthroscopy, testing 6 algorithms (naive Bayes, gradient boosting machine, extreme gradient boosting, random forest, elastic net regularization, and artificial neural network); employed the Boruta algorithm for feature selection; and used a randomized 80:20 training:test split. Grazal et al did not report training model assessment methodology, such as cross-validation or bootstrapping. The artificial neural network demonstrated the best performance (AUC: 0.71; Brier: 0.21), and 5 features were identified as maximally predictive of prolonged opioid use (age > 40 or ≤ 25, opioid use between 30 and 365 days prior to surgery, opioid filling between 14 and 90 days postoperatively, mental health comorbidity, and preoperative substance misuse diagnosis, excluding tobacco dependence). Notably, the discriminatory ability of Kunze et al’s algorithm for hip arthroscopy outperformed that of Grazal et al, although significance of this comparison cannot be assessed with the information provided. All 3 investigations converted their optimal model into an open-access online web application capable of generating individualized predictions for risk of opioid use.

Discussion

This systematic review identified 10 studies published between 2019 and 2021 out of 1480 queried articles, underscoring the push for individualized preoperative risk stratification to assist patient management and expectations. In the majority of studies, ML discriminatory performance was good-to-excellent with strong performance metrics, confirming the efficacy of current ML in internally validated populations. Preoperative opioid use, benzodiazepine use, antidepressant use, and several procedure-specific age ranges were identified as predictors of prolonged opioid use. Finally, analysis of the methodologic execution of studies and adherence to TRIPOD guidelines highlights areas for needed improvement.

There are several limitations to this review. First, we could not assess the quality of data upon which each ML model was built, as only 80% of studies reported methodology to account for missing data. It is imperative that studies report methodology for handling missing data, such as the use of multiple imputation, as the quality of the results is dependent on the quality of input data [34]. Second, calibration intercept and slope were reported in only 70% of studies; failure to report calibration can generate misleading conclusions, as poorly calibrated ML models are subject to overprediction and underprediction [45]. In addition, a high degree of variability in selection criteria (ie, surgical procedure and degree of preoperative opioid exposure) and lack of external validation may limit applicability of ML algorithms to a general population. Third, heterogeneity of individual study populations on which ML models were built results in the inability to quantitatively pool ML results. This also limits the ability to quantitatively compare ML metrics to standard predictive modeling methods such as LR. However, the ambiguous nature of ML algorithms means that they are not designed to provide quantitative data amenable to meta-analysis. Complexities inherent in designing ML tools prohibit the dissemination of complete models or the associated programming code for that tool, challenging the replication of research in the field of ML.

Previous orthopedic literature has suggested that preoperative opioid use, chronic pain, and back pain are associated with prolonged postoperative opioid use [37]. In an opioid naive population, factors increasing risk of prolonged postoperative opioid use have been identified as age older than 50 years, male sex, and preoperative benzodiazepine or antidepressant use [43]. In the current review, predictive features identified by ML were preoperative opioid use, benzodiazepine use, and antidepressant use (Fig. 2). Age was also a frequently identified predictive factor, in 6 of the 10 articles. However, categorization of risk by age varied substantially in the studies, which may be attributable to heterogeneity between patient populations secondary to inherent variations in population age associated with specific pathologies. For example, the age range of patients with the highest incidence of hip arthroscopy is the fifth decade of life, which represents the lower end of the age range expected to undergo TKA or THA (the highest incidence is in the seventh decade of life) [13,38]. Notably, 9 of 10 studies excluded patients < 18 years of age; thus, the results of this review do not necessarily reflect the risk of prolonged postoperative opioid use in children following orthopedic surgery. Exploration of the utility of ML to this end is warranted. Therefore, the extremes of age implicated as predictive factors in the primary investigations in this review may, in fact, command less generalizable predictive value than do preoperative opioid, antidepressant, or benzodiazepine use, which was consistently identified as predictive of prolonged postoperative opioid use in orthopedics.

Rates of prolonged postoperative opioid use ranged widely from 4.3% to 40.9%, with articles pertaining to spine surgery reporting the majority of lower rates. While heterogeneity in definitions of prolonged opioid use contributes to the variability observed, these percentages are consistent with previously published literature. For instance, Karhade et al (Karhade, Ogink and Thio, 2019) [20,21] found that 9.9% of patients undergoing ACDF met criteria for sustained opioid prescription, which was driven by several factors, including preoperative opioid prescription, antidepressant use, tobacco use, and Medicaid insurance status. This concurs with Harris et al [16], who used an insurance claims database to investigate over 28,000 patients undergoing ACDF and found that 17% of these patients met criteria for chronic postoperative opioid use. While Karhade et al’s utilization of institutional data allowed for broader consideration of Medicare, Medicaid, and uninsured patients, making the study results more applicable to these populations, the single-corporation nature of the study suggests that surgical, geographic, or patient-specific factors specific to their population may influence rates of prolonged opiate use. Performance of ML models should be confirmed in populations other than that in which the initial study was performed to assess external validity. Moreover, openly available web applications of ML models provide visualization tools and explanations of model predictions, overcoming the conventional drawbacks of traditional risk scores or nomograms. Clinically, this creates opportunities for patient-provider discussions, preoperative health modification, and subsequent improvements in probability of achieving clinically relevant outcomes.

Opioids may be necessary postsurgical analgesics in some settings, though recent literature suggests that it is possible to eliminate them altogether after some elective surgeries [32]. A randomized controlled trial by Hannon et al [15] found that prescribing fewer oxycodone immediate-release pills was associated with no differences in pain scores and a significant reduction in unused opioid pills in both hip and knee arthroplasty populations. Patients in this study stopped taking opioids at an average of 1 week after discharge, and about 30% of patients never took opioids after discharge. With increasing evidence that opioid use in arthroplasty patients can be reduced, using ML tools such as the one by Karhade et al to identify patients with refills at 2 weeks is a clinically meaningful result. Early identification may allow for direction of patients to multidisciplinary resources to reduce the potential for long-term opioid use in arthroplasty patients. Furthermore, the International Association for the Study of Pain defines chronic pain as pain that lasts beyond the normal healing time, which in their latest revision was reported as more than 3 months [36]. However, like the ability of ML to determine patient-specific risk based on individual risk factors, definitions of persistent opioid use should be derived from evidence-based understandings of the natural course of postoperative pain after specific orthopedic surgeries. For example, in the Femoroacetabular Impingement RandomiSed controlled Trial (FIRST), Almasri et al [1] reported that the majority of patients undergoing primary hip arthroscopy for treatment of femoroacetabular impingement syndrome show stabilization in VAS pain scores 6 months after surgery. Variation in the timelines of pain resolution following specific procedures necessitates that definitions of prolonged opioid use, as an intervention for prolonged pain, be modified accordingly. Opioid use may be considered prolonged only when the need for it outlives the course of the illness it is intended to treat. As it becomes available, procedure-specific literature should be cited when defining prolonged opioid use. However, when defining prolonged use, 5 studies [2,19 –22] referenced an investigation exploring persistent opioid use after major surgical procedures, such as cardiothoracic, gastric, and pelvic operations, rather than specific orthopedic procedures [6]. Heterogeneity in the definitions for refill frequency, time-to-refill, and dosing strategies limits the interpretation and external validity of current ML investigations. Authors should interpret results with caution when attempting to apply this data to their populations, as this is a potential source of confounding bias. Nonetheless, these studies identified at-risk patients and useful prognostic data can be extracted from this review. Future ML studies should utilize the available evidence on average recovery periods for the surgical population of interest, in conjunction with society guidelines defining prolonged opioid use when determining time point cutoffs for ML tools. Improving the diagnostic classification of prolonged postoperative opioid use is a step closer to managing the opioid crisis in a pragmatic way.

A main finding of this study is fair adherence to predictive modeling reporting guidelines and good discriminatory performance metrics of ML algorithms including (1) Brier scores, a mathematical function of describing how close predictions are to the actual outcome, (2) the c-statistic, which calculates the area under the receiver operating characteristic curve to assess discriminative ability, and (3) calibration, which refers to the agreement between observed outcomes and model predictions [41]. Discrimination analysis demonstrated c-statistics ranging from 0.70 to 0.81, indicating good model performance ranging from 0.70 to 0.81 in spine surgery, 0.75 to 0.77 in knee, and 0.71 to 0.77 in hip. Brier scores ranged from 0.04 to 0.08 in spine surgery, 0.01 to 0.12 in knee, and 0.05 to 0.13 in hip, indicating excellent performance of ML predictions (Table 1). Each metric has inherent limitations and thus utilizing multiple metrics provides a more accurate understanding of the prediction model’s performance [3,10]. In addition, all 10 studies used objective measures with weighted variables—Boruta algorithm, multivariate variable selection, recursive feature selection with random forest algorithms, or LASSO regression (Supplemental File 3)—for feature selection, which likely improved algorithm performance. In all, these features highlight the quality and precision of the prediction models, further supporting their use in identifying patients at risk of prolonged opioid use. Clinical practice metrics such as decision curve analysis may also assist the practitioner in deriving the clinical utility of a ML tool. Nine (90%) articles in this review reported decision curves to assist in translation to the clinical setting [42].

While model performance was adequate, current reporting methodology limits the utility of these models. Nine of 10 studies detailed their methodology for internal validation (Supplemental File 3, Training: Test Split), but no studies performed external validation. In the absence of external validation studies, adherence to predictive modeling guidelines such as the TRIPOD statement or JMIR guidelines acts as a surrogate for methodologic quality. Adherence to TRIPOD guidelines within this review was imperfect, despite 80% of studies reporting adherence. TRIPOD guidelines serve as a minimum standard for methodologic integrity; however, investigation-specific methodology must be optimized to enhance clinical utility of study findings. Furthermore, methodologic assessment by MINORS score indicated limitations in study design. Lack of adherence to ML reporting guidelines must be addressed in future studies to support clinical implementation of predictive algorithms. Opioid-specific ML literature should report according to standardized dosing guidelines, use procedure-specific literature as a benchmark for defining prolonged use, and work to improve methodological transparency (such as by providing source code). Rigorous external validation and translation to the clinical setting is required before ML can become a ubiquitous tool for personalized patient care.

In conclusion, ML algorithms created to predict orthopedic surgery patients at risk for prolonged postoperative opioid use demonstrate good discriminatory performance. The frequency and predictive features of prolonged postoperative opioid use identified in this review are consistent with existing literature. However, algorithms remain limited by the absence of external validation efforts and imperfect adherence to predictive modeling guidelines.

Supplemental Material

sj-docx-1-hss-10.1177_15563316231164138 – Supplemental material for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review

Supplemental material, sj-docx-1-hss-10.1177_15563316231164138 for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review by Laura M. Krivicich, Kyleen Jan, Kyle N. Kunze, Morgan Rice and Shane J. Nho in HSS Journal®: The Musculoskeletal Journal of Hospital for Special Surgery

Supplemental Material

sj-docx-2-hss-10.1177_15563316231164138 – Supplemental material for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review

Supplemental material, sj-docx-2-hss-10.1177_15563316231164138 for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review by Laura M. Krivicich, Kyleen Jan, Kyle N. Kunze, Morgan Rice and Shane J. Nho in HSS Journal®: The Musculoskeletal Journal of Hospital for Special Surgery

Supplemental Material

sj-docx-3-hss-10.1177_15563316231164138 – Supplemental material for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review

Supplemental material, sj-docx-3-hss-10.1177_15563316231164138 for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review by Laura M. Krivicich, Kyleen Jan, Kyle N. Kunze, Morgan Rice and Shane J. Nho in HSS Journal®: The Musculoskeletal Journal of Hospital for Special Surgery

Supplemental Material

sj-pdf-4-hss-10.1177_15563316231164138 – Supplemental material for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review

Supplemental material, sj-pdf-4-hss-10.1177_15563316231164138 for Machine Learning Algorithms Can Be Reliably Leveraged to Identify Patients at High Risk of Prolonged Postoperative Opioid Use Following Orthopedic Surgery: A Systematic Review by Laura M. Krivicich, Kyleen Jan, Kyle N. Kunze, Morgan Rice and Shane J. Nho in HSS Journal®: The Musculoskeletal Journal of Hospital for Special Surgery

Footnotes

Correction (April 2023):

This article has been updated to correct the affiliations since its original publication.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Kyle N. Kunze, MD, reports a relationship with Arthroscopy. Shane J. Nho, MD, MS, reports relationships with Allosource, Arthrex, Inc, Athletico, DJ Orthopaedics, Linvatec, Miomed, Smith & Nephew, Ossur, Springer, Stryker, American Orthopaedic Association, American Orthopedic Society for Sports Medicine, Arthroscopy Association of North America. The other authors declare no potential conflicts of interest.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Human/Animal Rights

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2013.

Informed Consent

Informed consent was not required for this review article.

Level of Evidence

Level III, systematic review of level III studies.

ORCID iDs

Laura M. Krivicich

Kyle N. Kunze

Supplemental Material

Supplemental material for this article is available online.

References

Almasri

Simunovic

Heels-Ansdell

Ayeni

Investigators

. Femoroacetabular impingement surgery leads to early pain relief but minimal functional gains past 6 months: experience from the FIRST trial. Knee Surg Sports Traumatol Arthrosc. 2021;29(5):1362–1369. https://doi.org/10.1007/s00167-020-06401-x.

Anderson

Grazal

Balazs

, et al. Can predictive modeling tools identify patients at high risk of prolonged opioid use after ACL reconstruction? Clin Orthop Relat Res. 2020;478(7):0–1618. https://doi.org/10.1097/CORR.0000000000001251.

Assel

Sjoberg

Vickers

. The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models. Diagn Progn Res. 2017;1:19. https://doi.org/10.1186/s41512-017-0020-3.

Beck

Nwachukwu

Jan

, et al. The effect of postoperative opioid prescription refills on achieving meaningful clinical outcomes after hip arthroscopy for femoroacetabular impingement syndrome. Arthroscopy. 2020;36(6):1599–1607. https://doi.org/10.1016/j.arthro.2020.02.007.

Boddapati

Padaki

Lehman

, et al. Opioid prescriptions by orthopaedic surgeons in a Medicare population: recent trends, potential complications, and characteristics of high prescribers. J Am Acad Orthop Surg. 2021;29(5):e232–e237. https://doi.org/10.5435/JAAOS-D-20-00612.

Brummett

Waljee

Goesling

, et al. New persistent opioid use after minor and major surgical procedures in US adults. JAMA Surg. 2017;152(6):e170504. https://doi.org/10.1001/jamasurg.2017.0504.

Centers for Disease Control and Prevention. Drug overdose deaths. Available at: http://www.cdc.gov/drugoverdose/deaths/index.html. Published June 23, 2021. Accessed July 30, 2021.

Choi

Coyner

Kalpathy-Cramer

Chiang

Campbell

. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol. 2020;9(2):14. https://doi.org/10.1167/tvst.9.2.14.

Collins

Reitsma

Altman

Moons

. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. https://doi.org/10.7326/M14-0697.

10.

Cook

. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115(7):928–935. https://doi.org/10.1161/CIRCULATIONAHA.106.672402.

11.

Dart

Surratt

Cicero

, et al. Trends in opioid analgesic abuse and mortality in the United States. N Engl J Med. 2015;372(3):241–248. https://doi.org/10.1056/NEJMsa1406143.

12.

Dowell

Haegerich

Cho

. CDC guideline for prescribing opioids for chronic pain—United States, 2016. JAMA. 2016;315(15):1624–1645. https://doi.org/10.1001/jama.2016.1464.

13.

Fang

Noiseux

Linson

Cram

. The effect of advancing age on total joint replacement outcomes. Geriatr Orthop Surg Rehabil. 2015;6(3):173–179. https://doi.org/10.1177/2151458515583515.

14.

Grazal

Anderson

Booth

, et al. A machine-learning algorithm to predict the likelihood of prolonged opioid use following arthroscopic hip surgery. Arthroscopy. 2022;38:839–847.e2. https://doi.org/10.1016/j.arthro.2021.08.009.

15.

Hannon

Calkins

, et al. The James A. Rand young investigator’s award: large opioid prescriptions are unnecessary after total joint arthroplasty: a randomized controlled trial. J Arthroplasty. 2019;34(7S):S4–S10. https://doi.org/10.1016/j.arth.2019.01.065.

16.

Harris

Marrache

Jami

, et al. Chronic opioid use following anterior cervical discectomy and fusion surgery for degenerative cervical pathology. Spine J. 2020;20(1):78–86. https://doi.org/10.1016/j.spinee.2019.09.011.

17.

Helm

Swiergosz

Haeberle

, et al. Machine learning and artificial intelligence: definitions, applications, and future directions. Curr Rev Musculoskelet Med. 2020;13(1):69–76. https://doi.org/10.1007/s12178-020-09600-8.

18.

Jiang

Gradus

Rosellini

. Supervised machine learning: a brief primer. Behav Ther. 2020;51(5):675–687. https://doi.org/10.1016/j.beth.2020.05.002.

19.

Karhade

Cha

Fogel

, et al. Predicting prolonged opioid prescriptions in opioid-naive lumbar spine surgery patients. Spine J. 2020;20(6):888–895. https://doi.org/10.1016/j.spinee.2019.12.019.

20.

Karhade

Ogink

Thio

, et al. Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation. Spine J. 2019;19(11):1764–1771. https://doi.org/10.1016/j.spinee.2019.06.002.

21.

Karhade

Ogink

Thio

, et al. Machine learning for prediction of sustained opioid prescription after anterior cervical discectomy and fusion. Spine J. 2019;19(6):976–983. https://doi.org/10.1016/j.spinee.2019.01.009.

22.

Karhade

Schwab

Bedair

. Development of machine learning algorithms for prediction of sustained postoperative opioid prescriptions after total hip arthroplasty. J Arthroplasty. 2019;34(10):2272–2277.e1. https://doi.org/10.1016/j.arth.2019.06.013.

23.

Katakam

Karhade

Schwab

Chen

Bedair

. Development and validation of machine learning algorithms for postoperative opioid prescriptions after TKA. J Orthop. 2020;22:95–99. https://doi.org/10.1016/j.jor.2020.03.052.

24.

Khazi

Patel

, et al. Risk factors for opioid use after total shoulder arthroplasty. J Shoulder Elbow Surg. 2020;29(2):235–243. https://doi.org/10.1016/j.jse.2019.06.020.

25.

Kunze

Polce

Alter

Nho

. Machine learning algorithms predict prolonged opioid use in opioid-naive primary hip arthroscopy patients. J Am Acad Orthop Surg Glob Res Rev. 2021;5(5):e21.00093–e21.00098. https://doi.org/10.5435/JAAOSGlobal-D-21-00093.

26.

Forlenza

Wilbur

, et al. Machine-learning model successfully predicts patients at risk for prolonged postoperative opioid use following elective knee arthroscopy. Knee Surg Sports Traumatol Arthrosc. 2022;30:762–772. https://doi.org/10.1007/s00167-020-06421-7.

27.

Luo

Phung

Tran

, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323. https://doi.org/10.2196/jmir.5870.

28.

McGinty

Tormohlen

Barry

, et al. Protocol: mixed-methods study of how implementation of US state medical cannabis laws affects treatment of chronic non-cancer pain and adverse opioid outcomes. Implement Sci. 2021;16(1):2. https://doi.org/10.1186/s13012-020-01071-2.

29.

McHugh

. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–282.

30.

Mendoza-Elias

Dunbar

Ghogawala

Whitmore

. Opioid use, risk factors, and outcome in lumbar fusion surgery. World Neurosurg. 2019;135:e580–e587. https://doi.org/10.1016/j.wneu.2019.12.073.

31.

Moons

Altman

Reitsma

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–W73. https://doi.org/10.7326/M14-0698.

32.

Moutzouros

Jildeh

Tramer

, et al. Can we eliminate opioids after anterior cruciate ligament reconstruction? a prospective, randomized controlled trial. Am J Sports Med. 2021;49:3794-3801. https://doi.org/10.1177/03635465211045394.

33.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. https://doi.org/10.1136/bmj.n71.

34.

Pedersen

Mikkelsen

Cronin-Fenton

, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157–166. https://doi.org/10.2147/CLEP.S129785.

35.

Rao

Chan

Prentice

, et al. Risk Factors for opioid use after anterior cruciate ligament reconstruction. Am J Sports Med. 2019;47(9):2130–2137. https://doi.org/10.1177/0363546519854754.

36.

Schug

Lavand’homme

Barke

, et al. The IASP classification of chronic pain for ICD-11: chronic postsurgical or posttraumatic pain. Pain. 2019;160(1):45–52. https://doi.org/10.1097/j.pain.0000000000001413.

37.

Sheth

Pio

, et al. Prolonged opioid use after primary total knee and total hip arthroplasty: prospective evaluation of risk factors and psychological profile for depression, pain catastrophizing, and aberrant drug-related behavior. J Arthroplasty. 2020;35(12):3535–3544. https://doi.org/10.1016/j.arth.2020.07.008.

38.

Sing

Feeley

Tay

Vail

Zhang

. Age-related trends in hip arthroscopy: a large cross-sectional analysis. Arthroscopy. 2015;31(12):2307–2313.e2. https://doi.org/10.1016/j.arthro.2015.06.008.

39.

Slim

Nini

Forestier

, et al. Methodological index for non-randomized studies (minors): development and validation of a new instrument. ANZ J Surg. 2003;73(9):712–716. https://doi.org/10.1046/j.1445-2197.2003.02748.x.

40.

Steyerberg

Moons

van der Windt

, et al. Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381. https://doi.org/10.1371/journal.pmed.1001381.

41.

Steyerberg

Vickers

Cook

, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–138. https://doi.org/10.1097/EDE.0b013e3181c30fb2.

42.

Steyerberg

Vergouwe

. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925–1931. https://doi.org/10.1093/eurheartj/ehu207.

43.

Sun

Darnall

Baker

Mackey

. Incidence of and risk factors for chronic opioid use among opioid-naive patients in the postoperative period. JAMA Intern Med. 2016;176(9):1286–1293. https://doi.org/10.1001/jamainternmed.2016.3298.

44.

Trasolini

McKnight

Dorr

. The opioid crisis and the orthopedic surgeon. J Arthroplasty. 2018;33(11):3379–3382.e1. https://doi.org/10.1016/j.arth.2018.07.002.

45.

Van Calster

Vickers

. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Making. 2015;35(2):162–169. https://doi.org/10.1177/0272989X14547233.

46.

Zhang

Fatemi

Medress

, et al. A predictive-modeling based screening tool for prolonged opioid use after surgical management of low back and lower extremity pain. Spine J. 2020;20(8):1184–1195. https://doi.org/10.1016/j.spinee.2020.05.098.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB

0.02 MB

5.69 MB