Abstract
Objectives
This current systematic review sought to identify and evaluate all current research-based spine surgery applications of AI/ML in optimizing preoperative patient selection, as well as predicting and managing postoperative outcomes and complications.
Methods
A comprehensive search of publications was conducted through the EMBASE, Medline, and PubMed databases using relevant keywords to maximize the sensitivity of the search. No limits were placed on level of evidence or timing of the study. Findings were reported according to the PRISMA guidelines
Results
After application of inclusion and exclusion criteria, 41 studies were included in this review. Bayesian networks had the highest average AUC (.80), and neural networks had the best accuracy (83.0%), sensitivity (81.5%), and specificity (71.8%). Preoperative planning/cost prediction models (.89,82.2%) and discharge/length of stay models (.80,78.0%) each reported significantly higher average AUC and accuracy compared to readmissions/reoperation prediction models (.67,70.2%) (
Conclusions
Generally, authors of the reviewed studies concluded that AI/ML offers a potentially beneficial tool for providers to optimize patient care and improve cost-efficiency. More specifically, AI/ML models performed best, on average, when optimizing preoperative patient selection and planning and predicting costs, hospital discharge, and length of stay. However, models were not as accurate in predicting postoperative complications, adverse events, and readmissions and reoperations. An understanding of AI/ML-based applications is becoming increasingly important, particularly in spine surgery, as the volume of reported literature, technology accessibility, and clinical applications continue to rapidly expand.
Keywords
Introduction
Machine learning (ML) is increasingly reported on in health care, including orthopedics, especially for its applications in predictive analytics. ML is a form of artificial intelligence (AI) that employs the use of algorithms and mathematical models that can learn from data, identify patterns and complex relationships, and make automated decisions—oftentimes with minimal human intervention.1,2 These algorithms are able to find patterns in the data and apply those patterns to new challenges in the future. Algorithms include artificial neural networks (ANN), decision trees (DT), boosting/ensemble learning models (BEL), Bayesian networks (BN), logistic regression (LR), and support vector machines (SVM). Neural networks are modeled on neurons in the brain, and they use artificial intelligence to untangle and break down extremely complex relationships. Across various medical specialties, AI/ML has been shown to be beneficial in guiding clinical decision-making, and artificial neural networks used as outcome prediction models have been applied in diagnosing various medical conditions.1-4
Within orthopedics, demonstrated applications of AI/ML include surgical risk stratification and optimization, 5 clinical outcome prediction and diagnostics, 6 cost-efficiency analyses, and in total joint arthroplasty literature it has been used for proposed risk-adjusted insurance reimbursement models. 7 Spine surgery, in particular, is a field that involves high-risk procedures and is continually seeking to improve surgical planning, outcomes, and to reduce complications. With its powerful predictive capabilities, AI/ML has the potential to be used in new and innovative applications that may improve the safety of spine surgery and improve outcomes.
The use of AI/ML is rapidly expanding in health care and has the potential to improve surgical care and reduce costs, especially for high-cost and complex spine surgery procedures. As such, it is important for spine surgeons to better understand the current applications of AI/ML, especially in light of the burgeoning literature regarding this topic in recent years. The purpose of this review is to identify and evaluate all current research-based spine surgery applications of AI/ML, namely, in optimizing preoperative patient selection, as well as predicting and managing postoperative outcomes and complications.
Materials and Methods
Search Strategy
A comprehensive search of publications, up to February 2020, was conducted using the EMBASE, Medline, and PubMed databases in accordance with PRISMA guidelines. Sample search query keywords and MeSH terms are provided in Supplementary Table 1. Screening of reference lists of retrieved articles also yielded additional studies.
Eligibility Criteria
Inclusion criteria consisted of original clinical studies, including studies which evaluate spine surgery applications of AI/ML in guiding clinical decision-making. Exclusion criteria consisted of studies that did not evaluate spine surgery applications of AI/ML, studies involving oncologic spine surgery or infectious etiologies, studies involving applications for design and development of hardware or implants, medical imaging analysis studies without explicit reference or application to spine surgery, studies with non-human subjects, non-English-language studies, inaccessible articles, conference abstracts, reviews, and editorials. No limits were placed on level of evidence or timing of the study since the majority of the reviewed studies were published within the last 10 years.
Study Selection
Article titles and abstracts were screened initially by two reviewers, and full-text articles were subsequently screened based on the selection criteria. The studies were rated by their level of evidence, based on the Oxford Centre for Evidence-based Medicine Levels of Evidence. 8 Two authors reviewed each individual article that was included. Discrepancies in inclusion studies were discussed and resolved by consensus.
Data Extraction and Categorization
A database was generated from all included studies which consisted of the journal of publication, publication year, country of origin, study design, level of evidence, study duration, blinding of the study, number of involved institutions, AI/ML methods and clinical applications, surgical domain, data sources, input variables and output variables, sample size, average patient age, percent female patients, and any additional pertinent findings from the study. The reviewed articles were sorted into different, non-mutually exclusive categories based on AI/ML clinical application. AI/ML clinical applications were divided into two major groups:(1) administrative and clinical decision support and (2) postoperative prediction and management of complications and outcomes. The former group contained the following prediction and optimization sub-categories: preoperative planning and cost prediction, hospital discharge and length of stay (LOS), readmissions, and reoperations. The other group included postoperative cardiovascular complications, other complications, mortality, and functional and clinical outcomes.
Data Analysis
Descriptive statistics were employed to summarize important findings and results from the selected articles and to describe trends in AI/ML techniques, clinical applications, and relevant findings associated with its use. Summary data were presented using simple means, frequencies, standard deviations (for normally distributed data on a decimal scale), and proportions. AI/ML model performance within the reviewed studies were summarized using various metrics, including the area under the curve (AUC) of receiver operating characteristic (ROC) curves, accuracy (%), sensitivity (%), and specificity (%). AUC is a measure of a ML model’s discriminative ability (i.e., accurately predicting true positives and negatives while identifying false positive or negative cases).9,10 AUC values range from .50 to 1 and measure a prediction models’ discriminative ability, with a higher AUC value signifying better predictive ability of the model correctly placing a patient into an outcome category. A model with an AUC of 1.0 is a perfect discriminator, .90 to .99 is considered excellent, .80 to .89 is good, .70 to .79 is fair, and .51 to .69 is considered poor.
11
AUC measures a model’s discriminative ability in accurately selecting true positives and negatives, while minimizing false positives and false negatives. Accuracy is simply a measure of a model’s ability to correctly predict true positives and true negatives, without accounting for identifying false positives/negatives. Reported model performance metrics for each AI/ML algorithm type and for each clinical application category were aggregated across the reviewed studies. A formal bias assessment for each study was preformed based on the Cochrane Handbook for Systematic Reviews methodology (Supplementary Table 1).
12
One-way ANOVA with post hoc Tukey tests were performed, with statistical significance set at
Results
Search Results and Study Selection
Using our pre-defined search terms resulted in 335 articles, of which 67 duplicate articles were removed. The remaining 268 articles were screened by title and abstract according to inclusion and exclusion criteria. Ultimately, there were 44 articles included for full review, of which 41 met full inclusion and exclusion criteria. (Figure 1) Over 83% of studies had level of evidence III, and the median number of patients in each study was 964 (mean 2784, standard deviation [SD] 3122). Although there were no limitations on publication dates in the selection process, the majority of studies (77.5%) were published during the last 2 years (2018–2020) (Figure 2) AUC was the most frequently reported performance metric, appearing in 37 out of the 41 total reviewed studies (90.2%). In comparison, accuracy was reported less frequently (16 studies, 39.0%), as were sensitivity and specificity (11 studies, or 26.8%). PRISMA flowchart showing systematic review search strategy. Trends in the annual number of AI/ML publications in spine surgery (2011 to 2020).

Administrative and Clinical Decision Support Applications
Reviewed Studies of Preoperative Patient Selection and Planning in Spine Surgery.
Abbreviations: ASD, adult spinal deformity; ACDF, anterior cervical discectomy and fusion; ANN, artificial neural network; SVM, support vector machine; LOS, length of stay; ACS-NSQIP, American College of Surgery-National Surgical Quality Improvement Program; SPARCS, Statewide Planning and Research Cooperative System.
Statistical Comparisons of Reported Model Performance Metrics, by Administrative/Clinical Decision Support Application.
Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; LOS, length of stay.
Prediction and Management of Postoperative Outcomes and Complications
Reviewed Studies of Postoperative Outcome Prediction in Spine Surgery.
Abbreviations: ANN, artificial neural network; SVM, support vector machine; LOS, length of stay; ASD, adult spinal deformity; ACDF, anterior cervical discectomy and fusion; CHF, congestive heart failure; VTE, venous thromboembolism; UTI, urinary tract infection; PRO, patient-reported outcomes; SF-36, short-form 36 questionnaire; MCS, mental health composite score; PCS, physical health composite score; ODI, Oswestry disability index; ACS-NSQIP, American College of Surgery-National Surgical Quality Improvement Program; CMS, centers for Medicare and Medicaid services; NCI SEER, National Cancer Institute Surveillance, Epidemiology, and End Results database; AOSpine CSM, AOSpine North America cervical spondylotic myelopathy study.
Statistical Comparisons of Reported Model Performance Metrics, by Postoperative Prediction/Management Application.
Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; LOS, length of stay.
Comparison of AI/ML Algorithms
Statistical Comparisons of Reported Model Performance Metrics, by AI/ML Algorithm.
Abbreviations: AUC, area under the curve; SD, standard deviation; N, number of models; AI/ML, artificial intelligence and machine learning.
Discussion
This systematic review is the first to evaluate and summarize AI/ML applications in optimizing patient selection and predicting surgical outcomes and complications in spine surgery. Our review included 41 studies from the literature which tested AI/ML-based prediction and optimization models that may help guide clinical decision-making and surgical planning. Among all the reviewed studies, AI/ML models were fairly accurate, averaging 74.9% overall accuracy and AUC of .75, across all AI/ML methods. In particular, AI/ML models performed best in optimizing preoperative patient selection and planning and predicting costs, hospital discharge, and length of stay. Model performance was also good or fair (AUC between .70 and .89) in predicting postoperative mortality and functional and clinical outcomes. However, model performance was considered poor (AUC between .50 and .69) in predicting postoperative complications (including cardiovascular complications), adverse events, and readmissions and reoperations, which may be due to the difficulty in predicting random events which are out of the surgeon’s control in the postoperative period. In addition, model performance metrics such as AUC must also be carefully interpreted, especially because AUC balances a model’s precision and recall (and resulting false positives and false negatives), and in certain clinical applications such as cancer screening or prediction of potentially fatal complications after spine surgery, providers may prefer a model with a lower AUC that minimizes false negatives.4,53
Although AI/ML models did not perform well in predicting postoperative complications, they offer a potentially beneficial tool for providers to optimize preoperative planning and improve cost-efficiency. For example, a practicing surgeon may use an electronic medical record system with an integrated AI/ML application that accurately predicts which patients will almost certainly require inpatient vs outpatient surgery to ensure that these high-risk patients have access to specialized care and supervision post-operatively. As a result, surgeons can have an incredibly accurate aid in patient selection, thus ensuring that patients are treated in the appropriate setting. In a systematic review of AI/ML applications in neurosurgery, Buchlak et al. 54 reported similar model performance results for deep learning/ANN and logistic regression models as our study. However, their findings reported SVM performance to have an average AUC of .80 and accuracy of 81.8%, which is significantly higher than the results from our study for SVM. It appears that SVM may be shown to be more accurate in certain non-spine neurosurgical applications, such as image classification,55-60 but is perhaps less accurate for guided decision-making in spine surgery.
AI/ML-based predictive modeling may be especially beneficial in spine surgery, which usually involves complex procedures with potentially high complication rates in often highly comorbid patient population. Ames et al. 26 showed that an AI/ML-based classification system for ASD surgical candidates optimizes personalized treatment plans based on patient-specific risk factors. This application may aid surgeons with pre-operative decision-making by informing them about which treatment options may offer optimal clinical improvement and value with the lowest risk of adverse events. In our review, several of the AI/ML prediction and optimization models that were used to improve patient care and postoperative outcomes also showed the potential to reduce unnecessary healthcare expenditures and even provide risk-adjusted reimbursement models for providers and hospitals.13,14,22,26 The predictive capabilities of AI/ML models enable decision makers to forecast costs related to postoperative outcomes and complications, pain medication use, patient discharges and discharge placements, length of stay, unplanned readmissions, and other postoperative interventions. The authors highlighted the potential of AI/ML to improve clinical decision making and patient care by predicting likely postoperative outcomes, which enables providers to optimize resource allocation for post-surgical monitoring and focused care of high-risk patients.16,17,20,22 Of particular relevance to curtailing rising inpatient costs, accurate forecasting of hospital length of stay has important implications for management of bed utilization and other hospital resources. Kalagara et al. 13 analyzed hospital readmissions after laminectomy and used patient-specific variables to develop predictive models for identifying readmitted patients with over 95% accuracy.
AI/ML may also aid surgeons and clinical decision makers to more efficiently plan for surgery and select patients for the optimal surgical setting (for instance, outpatient vs inpatient) that will produce the best care outcomes while improving cost-efficiency. Several studies highlighted the potential value of predictive modeling during the pre-operative period in helping surgeons with optimizing patient selection for surgery and surgical planning, which also allows providers to efficiently allocate needed hospital resources and plan for possible postoperative interventions to ensure the best possible outcomes.22,25-27 The recent shift toward value-based health care has likely also spurred the recent spike in research of AI/ML applications in optimizing cost-efficiency and resource allocation, especially because post-surgical inpatient care and other associated hospital costs are major drivers of US healthcare expenditures 61 and spine surgery costs.62-67 In contrast, outpatient surgical procedures have been shown to be comparatively less costly than inpatient treatment, and treating suitable surgical candidates in the outpatient setting may offer significant cost savings.68-73 Development of well-defined and accurate patient selection criteria for outpatient surgery, along with optimized anesthesia and postoperative pain management protocols, are associated with reduced patient readmission risk and surgical costs.74-76 Predictive modeling of patient length of stay, based on their medical comorbidities, demographic profile, and other variables, may aid surgeons in the selection of outpatient surgical candidates, and has been shown to be effective in selecting patients for outpatient posterior spinal fusions. 27 Through the use of patient-specific risk factors, AI/ML applications may also enable development of risk-adjusted insurance reimbursement models which compensate providers and hospitals commensurate with the case complexity and patient complication risk and comorbidities, providing a potential solution for unwillingness to treat medically complex patients. However, issues of data privacy and security when using AI/ML remain a major challenge which must be addressed, as patients may feel uncomfortable with their personal health information being used on such a large scale.
Although there has been a recent significant increase in the number of AI/ML publications in spine surgery, there remains a general lack of large, powered, and externally validated studies which would elucidate more information on their efficacy in spine surgery practice. In addition, it is important to note that although that our review included studies through February 2020, it still provides a detailed overview of the recent trends in the literature and the potential early applications of AI/ML. Many of the reviewed studies involved different spine procedures that vary in complexity and risk and included studies with models which varied in the quality and quantity of their training and validation data. As such, any conclusions about the efficacy of AI/ML applications in spine surgery require further investigation. This study does not make conclusive relationships between AI/ML and clinical efficacy, but instead presents statistical findings and trends from recent studies. Future directions in research of AI/ML applications in spine surgery, and in health care, must focus on developing externally validated and commercially viable systems that can be easily implemented and incorporated with already-existing hospital systems in a cost-efficient manner. In addition, future studies should evaluate optimal methods that aid in determining surgical candidates and which can use a wide range of preoperative data. An understanding of AI/ML-based applications is becoming increasingly important, particularly in spine surgery, as the volume of reported literature, technology accessibility, and clinical applications continues to rapidly expand.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
IRB statement
This study utilized national, de-identified data and is exempt from IRB review.
