Sage Journals: Discover world-class research

Abstract

Study Design

Literature review.

Objective

The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement was developed to improve the generalizability of predictive models. This study systematically evaluated the quality of predictive models related to spine procedures and assessed their compliance with the TRIPOD guidelines.

Methods

A systematic search was conducted on PubMed to identify original research articles published between January 1st, 2018, and February 1st, 2023 reporting prediction models in the top six spine journals ranked by Scimago Journal Ranking (SJR): Journal of Bone and Joint Surgery, Spine, Journal of Orthopaedic Trauma, Journal of Neurosurgery: Spine, Neurosurgery, and Neurosurgical Focus. We assessed article adherence to the TRIPOD criteria using a standardized checklist.

Results

72 articles were included and analyzed with the TRIPOD checklist. Median compliance with the TRIPOD criteria was 57.14% (IQR: 48.33-64.95%). Compliance varied significantly across journals (P < 0.05). Among the TRIPOD criteria, the lowest compliance was observed in blinding the assessment of predictors (n = 8, 16.00%), fully presenting the model for use (n = 12, 17.91%), and providing sufficient information to allow for the external validation of results (n = 13, 19.70%).

Conclusions

Published machine learning models predicting outcomes in spine surgery often do not meet the established guidelines for their development, validation, and reporting outlined by TRIPOD. This lack of compliance may suggest that these models have not been adequately validated externally or adopted into routine clinical practice in spine surgery.

Keywords

spine surgery predictive models machine learning artificial intelligence transparent reporting of a multivariable prediction model for individual prognosis or diagnosis

Introduction

The United States devotes significant healthcare expenditure toward managing spine pathologies.¹ As healthcare costs continue to rise, there is an increasing need to promote value-based care to improve healthcare resource utilization.^2-4 Artificial Intelligence (AI) has already been leveraged to improve the quality and value of care provided for adult spinal deformity. It holds promise as a tool to maximize the value of care provided across the field of spine surgery.^5,6 AI has demonstrated the ability to outperform conventional statistical techniques,⁷ leading to a subsequent rise in predictive models within spine surgery.^8-10

Despite the rapid growth of AI prediction models, recent literature has drawn attention to inadequacies in developing and validating these models.^11,12 The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) 22-item checklist was designed to increase the transparency of predictive models.¹³ To our knowledge, adherence to reporting guidelines in predictive models for spine surgery remains unexplored. This article critically appraised the development and validation of predictive models in spine surgery through the TRIPOD checklist.

Methods

Search Strategy

A PubMed search string was constructed to identify articles published in the top six orthopedic and neurosurgery journals based on their Scimago Journal Ranking (SJR): The Journal of Bone and Joint Surgery, Spine, Journal of Neurosurgery, Journal of Orthopedic Trauma, Neurosurgery and Neurosurgical Focus. Articles published between January 1^st, 2018, and February 1^st, 2023, that developed or validated a predictive model for spine surgery were included (Supplemental Table 1). Articles that did not report outcomes specific to spine surgery were excluded.

The present study was conducted in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRIMSA) reporting guidelines and was exempt from institutional review board review due to the lack of human participants.

Appraisal of Included Publications, Data Synthesis, and Analysis

The Enhancing the Quality and Transparency of Health Research (EQUATOR) Network aims to promote research transparency by emphasizing reporting guidelines. The TRIPOD checklist consists of thirty-seven items assessed across various manuscript domains that describe the development or validation of a predictive model. Articles included in this review were broadly categorized based on whether they performed development and internal validation (DIV), external validation (EV) of a pre-existing model, development, and external validation (DEV), or incremental value (IV) (updating of a pre-existing model).

The overall TRIPOD adherence score was calculated as the number of checklist items that were adhered to divided by the number of applicable checklist items. Items not applicable to a particular model were excluded from calculations.

Two authors (J.R and A.P) assessed the adherence to reporting guidelines issued in accordance with the TRIPOD checklist. Conflicts were resolved by a third author (S.K).

The results were computed as frequency and percentage of the total. Student T-tests and ANOVA were used to analyze continuous variables. Analyses were performed with R software version 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Our literature search identified 81 articles. After screening for articles that did not utilize machine learning models or were not specific to spine surgery, 72 articles met the criteria for inclusion (Figure 1). Of these 72 articles, 44.4% were published in the Spine Journal—88.7% of articles performed development and internal validation of a predictive model (Table 1). Degenerative/deformity surgery was the most common sub-specialty (n = 48). Figure 2 demonstrates article compliance across the various TRIPOD domains with radar graphs. Figure 3 demonstrates median TRIPOD criterion performance by year and neurosurgery subspecialty (Table 2).

Figure 1.

Article selection paradigm.

Table 1.

Characteristics of Neurosurgical Prediction Models.

Characteristic	Number (%), n = 72
Journal
Journal of Bone and Joint Surgery	3 (4.17%)
Journal of Neurosurgery Spine	17 (23.61%)
Journal of Orthopedics Trauma	1 (1.39%)
Neurosurgical Focus	9 (12.50%)
Neurosurgery	10 (13.90%)
Spine	32 (44.44%)
Publication year
2018	5 (7.04%)
2019	7 (9.86%)
2020	10 (14.08%)
2021	19 (26.76%)
2022	21 (29.58%)
2023	9 (12.68%)
Study type
Development and external validation	4 (5.63%)
Development and internal validation	64 (88.88%)
External validation	4 (5.63%)
Model type
Conventional learning	4 (5.63%)
Deep learning	20 (28.17%)
Machine learning	47 (66.20%)
Prediction type
Diagnostic	15 (21.13%)
Prognostic	56 (78.87%)
Neurosurgical subspecialty
Adult deformity/degenerative	48 (66.67%)
Spinal oncology	8 (11.11%)
Spinal trauma	8 (11.11%)
Pediatric	1 (1.39%)
Other	7 (9.72%)

Figure 2.

Radar diagrams of a) Results, b) methods, and c) all other metrics, percent adherence to the TRIPOD criterion

Figure 3.

Median TRIPOD criterion performance by year and neurosurgery subspecialty

Table 2.

Performance of 72 Studies in the Cohort on the TRIPOD Statement Criteria.

TRIPOD criterion^a	Number (%)
All criteria, median (IQR)	57.14% (48.33-64.95%)
Methods and results criteria, median (IQR)	33.33% (25.87-40.37%)
Informative title (1)	14 (19.72%)
Informative abstract (2)	19 (26.76%)
Informative background (3a)	67 (94.37%)
Clear objectives stated (3b)	49 (69.01%)
Source of data specified (4a)	63 (88.73%)
Key dates specified (4b)	52 (73.24%)
Setting for cohorts specified (5a)	43 (60.56%)
Eligibility criteria stated (5b)	57 (80.28%)
Details of treatment (5c)	33 (75.00%)
Outcome defined (6a)	59 (83.10%)
Blind assessment outcome (6b)	10 (20.00%)
Predictors defined (7a)	48 (67.61%)
Blind assessment of predictors (7b)	8 (16.00%)
Study size stated (8)	20 (28.17%)
Handling of missing data explained (9)	29 (41.43%)
Handling of predictors explained (10a)	21 (32.31%)
Model building steps described (10b)	52 (77.61%)
Validation predictions described (10c)	13 (52.00%)
Performance measures reported (10d)	29 (41.43%)
Methods of model updating reported, if applicable (10e)	2 (22.22%)
Details on creation of risk groups (11)	11 (68.75%)
Differences between development and validation cohorts shown (12)	4 (50.00%)
Participant flow shown (13a)	39 (56.52%)
Demographic and missing data in all cohorts are described (13b)^b	35 (49.30%)
Distribution of predictors and outcomes shown (13c)	3 (25.00%)
Model development described (14a)	29 (46.03%)
Unadjusted association between predictors and outcomes reported, if applicable (14b)	19 (59.38%)
Model specifications and parameters shown (15a)^b	13 (19.70%)
Model fully presented for use (15b)	12 (17.91%)
Model performance fully described (16)	16 (22.54%)
Results from model updating, if applicable (17)	0 (0.00%)
Limitations described (18)	70 (98.59%)
Validation performance interpreted (19a)	11 (57.89%)
Interpretation of results described (19b)	71 (100.00%)
Implications for clinical use and research (20)	70 (98.59%)
Supplemental information provided (21)	36 (52.94%)
Funding explicitly stated (22)	42 (58.33%)

Abbreviation: TRIPOD, transparent reporting of a multivariable prediction model for individual prognosis or diagnosis.

^aThe parenthetical numbers enumerate the TRIPOD criterion.

^bDescribing all model-building steps, reporting missing data and demographic details in all cohorts, and reporting all model parameters were among the criteria with the lowest compliance.

Adherence Across Subspecialties

The median TRIPOD compliance across all predictive spine surgery models was 57.14% (IQR: 48.33-64.95%). Among published studies, the highest compliance was 93.10%, and the lowest was 34.48%. Compliance varied across subspecialties, with pediatric spine surgery exhibiting the highest median compliance (66.66%) and spinal trauma having the lowest compliance (54.25%). There was no significant difference in median compliance across subspecialties (P > 0.05).

Adherence Across Journals

Compliance varied significantly across journals, with the Journal of Orthopedic Trauma exhibiting the highest median compliance (68.97%) and the Journal of Bone and Joint Surgery exhibiting the lowest median compliance (53.12%) (P < 0.05) (Table 3).

Table 3.

Median (IQR) TRIPOD Adherence for Different Journals Included in Our Study.

Journal	TRIPOD Adherence, Median (IQR)	P Value
Journal of orthopedic trauma	68.97 (68.97-68.97)^a	0.036
Spine	57.41 (44.87-63.73)
Journal of bone and joint surgery America	53.12 (51.85-N/A)^b
Neurosurgery	66.68 (61.29-78.76)
Journal of neurosurgery: Spine	54.83 (42.95-63.17)
Neurosurgical focus	53.55 (41.79-66.76)

IQR = interquartile range.

^aOne article met inclusion criteria.

^bThree articles met inclusion criteria and the 75^th percentile for the IQR could not be computed.

Adherence to Individual TRIPOD Criterion

As graded according to the TRIPOD checklist, 80.18% of articles lacked an informative title, and 73.24% did not adequately summarize their study in the abstract. 94.3% of articles described a rationale for their study, and 69.01% listed whether their study performed development or validation of a predictive model. Most articles (>70%) reported adequate information on their study design and period. About 80% of articles lacked adequate information on the assessment of predictors and outcomes used in their predictive model. Sample size calculation and handling of missing data were less frequently reported (28.2% and 41.3%, respectively). Low compliance was observed across TRIPOD items related to model parameters (19.7%) and performance measures (22.4%). None of the articles in our study updated a pre-existing model (item 17). Over 98% of the articles reviewed were found to have adhered to reporting guidelines on clinical implications and relevant limitations to utilizing their predictive model.

Information about the availability of supplementary resources and sources of funding was provided in 52.94% and 58.33% of studies, respectively. Compliance with reporting funding sources varied across journals (P < 0.001). Articles published in the Spine Journal exhibited the highest adherence (81.25%) to item 22 of the checklist.

Discussion

This study critically evaluated the quality of AI predictive models on spine surgery published in the top 6 neurosurgery and orthopedic journals between 2018-2023. The findings of our investigation demonstrate inadequate adherence to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines. These shortcomings must be addressed before implementing AI models to enhance healthcare outcomes.

Our median compliance of 57.14% is comparable to studies that evaluated TRIPOD adherence of predictive models in general surgery and orthopedic surgery.^11,14 However, this is substantially lower than a recent study that appraised the quality of predictive models across five major neurosurgical journals.¹⁵ We also observed lower compliance across ML models than conventional statistical models such as logistic regression, possibly due to the TRIPOD statement’s terminology for regression models. The EQUATOR network aims to develop guidelines specific to AI-predictive models to overcome this drawback.¹⁶

TRIPOD guidelines require the title to explicitly mention whether the study developed, validated, or added incremental value to a predictive model. Our results revealed suboptimal compliance in items that governed the reporting of a title and abstract, which could potentially compromise readership and retrieval of articles during database searches. About 80% of articles lacked adequate information on the blinded assessment of predictors and outcomes used in their predictive model. This allows for bias during the evaluation of characteristics that could vary subjectively, such as the grading of radiographic parameters.¹⁷

The median compliance to items in the methods and results criteria was 33.33%. This could impact the clinical utility of predictive models due to limited generalizability during the selection of participants. Reporting of sample size calculation and handling of missing data was observed in 28.17% and 41. 43% of studies, respectively. Despite exhibiting adequate performance on the development dataset, models developed using smaller cohorts may lack generalizability.¹⁸ A lack of information about how missing data was handled raises the question of bias that may have been introduced during participant selection.¹⁹ Over half the predictive models in our review lacked adequate demographic information or information on the number of participants with missing data (item 13b). Results from models that fail to adhere to reporting guidelines for demographic characteristics must be interpreted and utilized cautiously. Race and sex have been identified as independent predictors of unfavorable outcomes such as increased length of stay, re-admissions, and post-operative complications after spine surgery.^20,21 Few models (19.7%) provided adequate information on model-building parameters, which would allow for external validation, thereby compromising the ability to assess their performance on a cohort. Satisfactory rates of compliance (>95%) to reporting of items related to rationale, limitations, and clinical implications (3a, 18, 19b) suggest adequate literature review among spine surgery publications. This could result from dedicated outcomes research groups and mentorship within the field of spine surgery. We observed significant differences in overall compliance to TRIPOD criteria and reporting of sources of funding across journals. This could be a result of varying journal specifications before manuscript submission. Based on previous research demonstrating increased compliance to reporting guidelines after the mandatory checklist requirement during manuscript submission,²² we hypothesize that a similar strategy would improve adherence to TRIPOD guidelines within spine surgery.

Clinical Implications and Future Directions

Low adherence to reporting guidelines for the development and validation of predictive models in spine surgery could impact their use in clinical practice. For instance, many predictive models lack adequate information on demographic data and statistical analyses. Outcomes in spine surgery are affected by demographic characteristics.^23,24 Additionally, many models lacked external validation. Clinical prediction models require externally validated in order to demonstrate their predictive ability across datasets to demonstrate their generalizability. Future research could encourage the use of better reporting practices through submission of checklists during submission in order to promote transparency in clinical research.

Limitations

Our systematic review is limited by virtue of inclusion of articles published in the top six journals according to their SJR ranking. Thus, our results may not be generalizable to other journals. We would also like to acknowledge that the TRIPOD criteria was not specifically developed for ML models and aim to assess the compliance of studies in accordance with the TRIPOD-AI checklist in a future study. Journals included in our study may have had varying specifications for reporting guidelines prior to the peer review process. We were also unable to determine whether low compliance was due to inadequate standards by journals or due to a lack of knowledge by the authors in adhering to reporting guidelines.

Conclusion

Despite an increase in the development of AI predictive models in spine surgery, inadequate adherence to reporting guidelines poses a drawback to utilization in clinical practice. Emphasizing the submission of a checklist based on TRIPOD guidelines could potentially improve the quality of predictive models.

Supplemental Material

Supplemental Material - An Appraisal of the Quality of Development and Reporting of Predictive Models in Spine Surgery

Supplemental Material for An Appraisal of the Quality of Development and Reporting of Predictive Models in Spine Surgery by Syed I. Khalid, Joanna M. Roy, Elie Massaad, Kyle Thomson, Pranav Mirpuri, Aashka Patel, Ankit I. Mehta, Ali Kiapour and John H. Shin in Global Spine Journal.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Syed I. Khalid

Data Availability Statement

The datasets used in this study are available upon reasonable request from the corresponding author.*

Supplemental Material

Supplemental material for this article is available online.

References

Davis

Onega

Weeks

Lurie

. Where the United States spends its Spine Dollars: expenditures on different ambulatory services for the management of back and neck conditions. Spine. 2012;37(19):1693-1701. doi:10.1097/BRS.0b013e3182541f45

King

Abbed

Gould

Benzel

Ghogawala

. Cervical spine reoperation rates and hospital resource utilization after initial surgery for degenerative cervical spine disease in 12,338 patients in Washington State. Neurosurgery. 2009;65(6):1011-1022, discussion 1022-1023. doi:10.1227/01.NEU.0000360347.10596.BD

Martin

Turner

Mirza

Lee

Comstock

Deyo

. Trends in health care expenditures, utilization, and health status among US adults with spine problems, 1997-2006. Spine. 2009;34(19):2077-2084. doi:10.1097/BRS.0b013e3181b1fad1

Koltsov

JCB

Sambare

Alamin

Wood

Cheng

. Healthcare resource utilization and costs 2 years pre- and post-lumbar spine surgery for stenosis: a national claims cohort study of 22,182 cases. Spine J. 2022;22(6):965-974. doi:10.1016/j.spinee.2022.01.020

Hornung

Mallow

, et al. Artificial intelligence in spine care: current applications and future utility. Eur Spine J. 2022;31(8):2057-2081. doi:10.1007/s00586-022-07176-0

Ames

Smith

Pellisé

, et al. Artificial intelligence based hierarchical clustering of patient types and intervention categories in adult spinal deformity surgery: towards a new classification scheme that predicts quality and value. Spine. 2019;44(13):915-926. doi:10.1097/BRS.0000000000002974

Ngiam

Khor

. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262-e273. doi:10.1016/S1470-2045(19)30149-4

Wilson

Gaonkar

Yoo

, et al. Predicting spinal surgery candidacy from imaging data using machine learning. Neurosurgery. 2021;89(1):116-121. doi:10.1093/neuros/nyab085

Elsamadicy

Koo

Reeves

, et al. Utilization of machine learning to model important features of 30-day readmissions following surgery for metastatic spinal column tumors: the influence of frailty. Glob Spine J. 2022;14:21925682221138053. doi:10.1177/21925682221138053

10.

Senders

Staples

Karhade

, et al. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 2018;109:476-486.e1. doi:10.1016/j.wneu.2017.09.149

11.

Marwaha

Chen

Habashy

Choi

Spain

Brat

. Appraising the quality of development and reporting in surgical prediction models. JAMA Surg. 2023;158:214-216. doi:10.1001/jamasurg.2022.4488

12.

Feridooni

Cuen-Ojeda

, et al. Machine learning in vascular surgery: a systematic review and critical appraisal. npj Digit Med. 2022;5:7. doi:10.1038/s41746-021-00552-y

13.

Collins

Reitsma

Altman

Moons

KGM

. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. doi:10.1136/bmj.g7594

14.

Groot

Ogink

Lans

, et al. Machine learning prediction models in orthopedic surgery: a systematic review in transparent reporting. J Orthop Res. 2022;40(2):475-483. doi:10.1002/jor.25036

15.

Warman

Kalluri

Azad

. Machine learning predictive models in neurosurgery: an appraisal based on the TRIPOD guidelines. Systematic review. Neurosurg Focus 2023 Jun;54(6):E8. doi:10.3171/2023.3.FOCUS2386

16.

Collins

Dhiman

Andaur Navarro

, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11(7):e048008. doi:10.1136/bmjopen-2020-048008

17.

Won

Lee

Park

. Spinal stenosis grading in magnetic resonance imaging using deep convolutional neural networks. Spine. 2020;45(12):804-812. doi:10.1097/BRS.0000000000003377

18.

Steyerberg

Bleeker

Moll

Grobbee

Moons

KGM

. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56(5):441-447. doi:10.1016/s0895-4356(03)00047-7

19.

Janssen

KJM

Donders

ART

Harrell

, et al. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol. 2010;63(7):721-727. doi:10.1016/j.jclinepi.2009.12.008

20.

Elsamadicy

Reddy

Nayar

, et al. Impact of gender disparities on short-term and long-term patient reported outcomes and satisfaction measures after elective lumbar spine surgery: a single institutional study of 384 patients. World Neurosurg. 2017;107:952-958. doi:10.1016/j.wneu.2017.07.082

21.

Khan

Huang

Maeder-York

, et al. Racial disparities in outcomes after spine surgery: a systematic review and meta-analysis. World Neurosurg. 2022;157:e232-e244. doi:10.1016/j.wneu.2021.09.140

22.

Agha

Fowler

Limb

, et al. Impact of the mandatory implementation of reporting guidelines on reporting quality in a surgical journal: a before and after study. Int J Surg. 2016;30:169-172. doi:10.1016/j.ijsu.2016.04.032

23.

Elsamadicy

Sayeed

Sherman

JJZ

, et al. Racial/ethnic disparities among patients undergoing anterior cervical discectomy and fusion or posterior cervical decompression and fusion for cervical spondylotic myelopathy: a national administrative database analysis. World Neurosurg. 2024;183:e372-e385. doi:10.1016/j.wneu.2023.12.103

24.

Issa

Lambrechts

Canseco

, et al. Reporting demographics in randomized control trials in spine surgery - we must do better. Spine J. 2023;23(5):642-650. doi:10.1016/j.spinee.2022.11.011

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB