Sage Journals: Discover world-class research

Abstract

Study Design

Systematic Review of Randomized Controlled Trials.

Objectives

To assess the statistical fragility of randomized controlled trials (RCTs) comparing lumbar fusion and lumbar disc arthroplasty (LDA).

Methods

Following PRISMA guidelines, PubMed, Embase, and Medline databases were searched for RCTs on lumbar fusion and LDA published between January 1, 2000 and August 1, 2023. Eligible studies reported dichotomous, categorical outcomes. A two-tailed Fisher’s exact test was used to confirm reported P-values (α = 0.05) for each outcome.

Results

18 RCTs met the inclusion criteria for analysis. Across the 18 studies, 146 dichotomous outcomes were identified. The median fragility index (FI) for all outcomes was 5 (IQR: 2.0-9.0), while the median fragility quotient (FQ) was 0.022 (IQR: 0.008-0.046). Subgroup analysis revealed that adjacent segment disease outcomes had a median FI of 3 (IQR: 2.75-3) and an FQ of 0.013 (IQR: 0.013-0.014). Industry-funded studies had a significantly higher FI (6 vs 4, P = 0.036) and lower FQ (0.019 vs 0.040, P = 0.023) compared to non-industry-funded studies.

Conclusions

This systematic review demonstrates that the statistically significant findings from RCTs comparing LF to LDA are susceptible to small changes in event outcomes. Of note, industry-funded studies were found to be significantly more statistically fragile compared to non-industry-funded studies.

Keywords

lumbar disc arthroplasty LDA lumbar disc replacement lumbar fusion degenerative disc disease adjacent segment disease

Introduction

Lumbar fusion has long been the procedural paradigm for treating lumbar spine pathology and degenerative disc disease.^1,2 However, the introduction of lumbar disc arthroplasty (LDA) offered a novel approach aimed at preserving segmental mobility and restoring normal biomechanics, with the proposed benefit of reducing the risk of adjacent segment disease.^1-5 Some studies have shown LDA to have lower incidences of ASD compared to lumbar fusion cohorts, reporting rates of 2-2.8% and 7-18%, respectively; however, these studies may carry a high risk of bias due to financial conflicts of interest, publication bias, and study design manipulation.^1,6,7 Despite these proposed advantages, slow and even decreasing adoption has persisted with the majority of surgeons opting for fusion in most cases.^8,9 Concurrently, the proportional utilization of cervical disc replacement relative to anterior discectomy and fusion has increased from 4.00% in 2010 to 14.47% in 2021.¹⁰ Thus, greater examination of the randomized controlled trials (RCT) comparing LDA vs lumbar fusion is warranted.

The inconsistency between research supporting use of LDA and declining clinical usage calls into question the statistical analyses of the clinical trials. In particularly, the P-value can lead to conflicting interpretations in the scientific literature because of its arbitrary and reductionist threshold of 0.05. Fragility is an important, but less frequently used statistical measure that explores the robustness of statistically significant results.^11,12 The fragility index (FI) is the number of outcome reversals required to render a statistically significant finding insignificant. Oppositely, the reverse fragility index (rFI) represents the number of outcome reversals required to render a statistically insignificant finding significant. Lastly, the fragility quotient (FQ) adjusts for sample size simplifying comparisons between different studies.

In the context of disc replacements, a fragility analysis comparing cervical disc arthroplasty and anterior cervical discectomy and fusion demonstrated that RCTs comparing the two procedures had moderate statistical robustness (median FI = 7, FQ = 0.043) and did not suffer from fragile results.¹¹ However, no such comparison has been performed for lumbar fusion and LDA. In this study, we aim to assess the statistical fragility of RCTs comparing lumbar fusion and LDA. As a secondary objective, we aim to compare differences in statistical fragility between industry-funded and non-industry-funded studies, and studies conducted between 2000-2010 to 2011-2018. We hypothesized that statistical outcomes reported in the LDA vs fusion literature would be fragile with only a few outcome reversals required to change statistical significance. Additionally, we hypothesized that industry-funded studies would show more fragile results due to the higher risk of bias and we hypothesized that the fragility of LDA vs fusion trials would be similar for recent and older trials.

Methods

Inclusion Criteria

This systematic review was in accordance with the guidelines of the preferred reporting items for systematic reviews and meta-analyses (PRISMA).¹³ The PubMed, Embase, and Medline databases were searched to identify RCTs published between January 1, 2000, and August 1, 2023 related to lumbar fusion (Figure 1). The search keywords used across all databases were “total disc replacement,” “intervertebral disc replacement,” “artificial disc replacement,” “fusion,” “lumbar degenerative diseases,” “lumbar degeneration,” “spondylolisthesis,” “lumbar disc herniation,” “lumbar disc protrusion,” “lumbar spinal stenosis,” (Posterior Lumbar Interbody Fusion), (Transforaminal Lumbar Interbody Fusion), (Anterior Lumbar Interbody Fusion), “PLIF,” “TLIF,” “ALIF.” Studies were included if they were randomized controlled trials reporting dichotomous, categorical outcomes, and had LDA vs fusion as the two treatment arms. The minimum follow-up period for studies included was 12 months postoperatively. Non-English language, biomechanical, cadaveric, animal, in vitro, and non-RCT studies were excluded. Studies were included only if the full text was available online for review. Two independent reviewers performed title/abstract screening and full-text review, and a third independent reviewer resolved conflicts. This systematic review analyzed statistical reporting and significance rather than direct outcomes and did not qualify for the International Prospective Register of Systematic Reviews. Since all data analyzed is publicly available, Institutional Review Board approval was not required.

Figure 1.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses Flow Diagram Showing Identification, Screening, and Inclusion, of Eligible Articles From PubMed, Embase, and Cochrane. RCT, Randomized Controlled Trial

Risk of Bias Assessment

To ensure that RCT design bias did not confound fragility results, a risk of bias assessment using the Cochrane Risk of Bias 2 (RoB 2) tool was done.¹⁴ A secondary analysis was also done in which outcomes with losses to follow-up greater than their fragility indices were excluded. The included studies demonstrated an overall low risk of bias across the evaluated domains, including selection, performance, detection, and attrition biases (Appendix Table 1).

Data Extraction

The first author, year of publication, and journal of publication were extracted for trial identification during the extraction process. Industry funding status was also extracted for each included study. Outcome events in each treatment arm were recorded and the number of patients lost to follow-up were recorded. After extraction, outcome categories were established by two reviewers for subgroup analysis based on clinical relevance and outcome sample size. Outcome categories included adjacent segment disease, anatomical change, composite endpoint, pain, patient satisfaction, return to functionality, and adverse event. Anatomical change included outcomes related to disc height success and fusion status while adjacent segment disease only included outcomes on adjacent segment disease. Composite endpoints were the outcomes utilized to define the primary endpoints for the RCT, and included outcomes such as neurological status, patient reported outcome scores (PROMs), and overall success of the procedure.

Fragility Analysis

A two-tailed Fisher’s exact test was used to confirm reported P-values for each outcome. The statistical significance threshold was set at a P-value <0.05. The FI was calculated for significant outcomes by manipulating outcome events until the P-value was reversed from <0.05 to ≥0.05, as demonstrated in Figure 2. The rFI was calculated for non-significant outcomes through iterative event reversals until the P-value switched from ≥0.05 to <0.05. The FQ was calculated by dividing the FI or rFI for each outcome by the study sample size, representing the proportion of patients that require an outcome event reversal for significance to be altered. All fragility statistics were reported as medians with corresponding interquartile ranges (IQRs).

Figure 2.

Illustration of the Concept of Statistical Significance Reversal Using a 2 × 2 Contingency Table and Demonstrates How the Fragility Index (FI) of 1 is Calculated, as Reported by Skold and Colleagues. + Indicates Patients With Outcome of Interest while – Indicates Patients Without the Outcome of Interest. The Contingency Table on the Left has 17 Patients With the Outcome of Interest and 80 Patients Without in the LDA Group. The P-value is Below 0.05 for that Table. The Contingency Table on the Right Shows an Event Reversal and now 18 Patients Have the Outcome of Interest and 79 Patients do not in the LDA Group. Now, the P-value is Above 0.05, Indicating that Only 1 Event Reversal is Needed for the Statistical Significance of that Outcome to Change

Results

From our queries, we identified 1945 RCTs related to search terms. After duplicates were removed and studies were marked as ineligible, 692 studies underwent title and abstract screening and 39 studies continued to full-text review for inclusion. There were 18 studies ultimately included for analysis (Figure 1). The range of follow-up times of studies included in this review was 12 to 120 months, with a mean follow-up time of 48.5 months. The included studies were drawn from nine different journals, with the highest number of studies⁴ published in JNS. Both The Spine Journal and Spine contributed 3 studies each, while Seminars in Spine Surgery and European Spine Journal each had 2 studies. The remaining journals—Journal of Bone and Joint Surgery, International Journal of Spine Surgery, Clinical Spine Surgery, and Global Spine Journal—each contributed 1 study (Table 1).

Table 1.

Characteristics of Included Studies: Year, Journal of Publication, Total Sample Size

Author	Year	Journal	Total sample size	Industry funded?
Berg	2009	The Spine Journal	152	No
Auerbach	2005	Seminars in Spine Surgery	14	No
Berg	2011	The Spine Journal	129	No
Gornet	2010	JNS	449	Yes
Gornet	2011	Spine	554	Yes
Delamarter	2011	Journal of Bone and Joint Surgery	215	Yes
Holt	2007	International Journal of Spine Surgery	304	Yes
Skold	2013	European Spine Journal	151	No
Berg	2009	European Spine Journal	149	No
Zigler	2012	JNS	186	Yes
Blumenthal	2005	Spine	227	Yes
Radcliff	2018	Clinical Spine Surgery	229	Yes
Zigler	2012	JNS	236	Yes
Tropp	2012	Global Spine Journal	151	No
Auerbach	2005	Seminars in Spine Surgery	15	Yes
Zigler	2007	Spine	236	Yes
Geisler	2004	JNS	258	Yes
Guyer	2009	The Spine Journal	133	Yes

FI = Fragility Index; FQ = Fragility Quotient; IQR = Interquartile Range.

Across the 18 included RCTs, we identified 146 dichotomous outcomes (Table 2). A total of 42 outcomes were statistically significant (P < 0.05) and 104 were nonsignificant (P ≥ 0.05). For the 146 total outcomes, the median fragility index (FI) was 5 (IQR: 2.0-9.0). The median fragility quotient (FQ) across the 146 total outcomes was 0.022 (IQR: 0.008-0.046), indicating that a reversal of only 2.2% of patients is required to alter study significance in the included RCTs. For the 42 statistically significant outcomes, the median FI was 4 (IQR: 1.0-19.0) with an associated FQ of 0.017 (IQR: 0.006-0.095). For the 104 nonsignificant outcomes, the median rFI was 6 (IQR: 2.5-9.0) with an associated FQ of 0.023 (IQR: 0.011-0.046).

Table 2.

Overall Fragility Data Based on Trial and Outcome Characteristics

	Number of outcomes	FI, median (IQR)	FQ, median (IQR)
All RCT outcomes	146	5 (2.0-9.0)	0.022 (0.008-0.046)
Significant outcomes (P < 0.05)	42	4 (1.0-19.0)	0.017 (0.006-0.095)
Nonsignificant outcomes (P ≥ 0.05)	104	6 (2.5-9.0)	0.023 (0.011-0.046)

FI = Fragility Index; FQ = Fragility Quotient; IQR = Interquartile Range.

On subgroup analysis, adjacent segment disease outcomes had a median FI or 3 (2.75-3) and FQ of 0.013 (0.013-0.014) (Table 3). Notably, the most fragile outcome category was anatomical change, with a median FI of 2.00 (IQR: 1.0-3.0). For anatomical change outcomes, we identified a median FQ of 0.006 (IQR: 0.006-0.008), suggesting that the reversal of only 0.6% of patients is required to alter statistical significance for these outcomes. The least fragile category was composite endpoint outcomes, with a median FI of 6 (2.0-66.5) and FQ of 0.026 (0.009-0.129). Return to functionality was the most frequently reported finding, with 52 recorded outcomes. The median FI for return to functionality was 5.00 (IQR: 2.0-9.3), with an associated FQ of 0.023 (IQR: 0.008-0.055). Across 35 composite endpoints, the median FI was 6.00 (IQR: 2.0-66.5) with an associated FQ of 0.026 (IQR: 0.009-0.149). There were 35 adverse event-related outcomes, with a median FI of 7.00 (IQR: 9.0-3.5) and an associated FQ of 0.026 (IQR: 0.018-0.038).

Table 3.

Subgroup Analysis Based on Outcome Category (Adjacent Segment Disease, Adverse Event, Anatomical Change, Composite Endpoint, Pain, Patient Satisfaction, Return to Functionality)

	Number of outcomes	FI, median (IQR)	FQ, median (IQR)
Adjacent segment disease	4	3 (2.75-3)	0.013 (0.013-0.014)
Adverse event	33	7 (5-10)	0.026 (0.011-0.040)
Anatomical change	9	2 (1-3)	0.006 (0.006-0.008)
Composite endpoint	35	6 (2.0-66.5)	0.026 (0.009-0.149)
Pain	3	3 (3.0-3.5)	0.020 (0.018-0.110)
Patient satisfaction	10	3.5 (1.0-4.8)	0.024 (0.007-0.058)
Return to functionality	52	5 (2.0-9.3)	0.023 (0.008-0.055)

FI = Fragility Index; FQ = Fragility Quotient; IQR = Interquartile Range.

Year of publication did not affect statistical fragility of the reported findings (Table 4). For studies published between 2004-2010, the median FI was 5.00 (IQR: 2.0-8.0), compared to 7.00 (IQR: 2.0-12.0) for studies published between 2011-2023.

Table 4.

Fragility Data Stratified by Year Published (2000-2010, 2011-2023)

	Number of outcomes	FI, median (IQR)	P-value	FQ, median (IQR)	P-value
Published 2004-2010	89	5 (2.0-8.0)	0.125	0.019 (0.008-0.043)	0.186
Published 2011-2023	57	7 (2.0-12.0)	0.125	0.026 (0.012-0.050)	0.186

FI = Fragility Index; FQ = Fragility Quotient; IQR = Interquartile Range

Industry-funded studies demonstrated a significantly higher FI than non-industry-funded studies (Table 5). Industry-funded studies had median FI of 6.00 (IQR: 2.0-11.0), compared to non-industry-funded studies, which had a median FI of 4.00 (IQR: 1.5-6.0) (P = 0.036). However, when accounting for sample size, industry-funded studies had a significantly lower FQ (median: 0.019, IQR: 0.008-0.042) compared to non-industry-funded studies (median: 0.040, IQR: 0.012-0.062) (P = 0.023).

Table 5.

Fragility Data Based on Industry Funding Status

	Number of outcomes	FI, median (IQR)	P-value	FQ, median (IQR)	P-value
Industry funded	115	6 (2.0-11.0)	0.036	0.019 (0.008-0.042)	0.023
Not industry funded	31	4 (1.5-6.0)		0.040 (0.012-0.062)

FI = Fragility Index; FQ = Fragility Quotient; IQR = Interquartile Range.

Discussion

The main finding of this study is that only ∼2% of the reported outcomes in studies comparing LDA and Fusion needed to change to statistically alter the RCT findings. The declining clinical usage of LDA contrasts the clinical trial results supporting the usage of LDA to motivate this study on statistical fragility of RCTs comparing LDA and fusion. Results demonstrated that the RCTs comparing LDA and fusion report significant differences that are statistically fragile, with a median FI of 5 and FQ of 0.022 across 146 outcomes. The FQ, which reflects the proportion of patients whose outcomes would need to change to alter the statistical significance of a study, ranged from 0.017 for significant outcomes to 0.023 for nonsignificant outcomes. We also found that industry-funded studies have a significantly lower FQ than non-industry funded studies (industry-funded FQ: 0.019, non-industry funded: 0.040, P = 0.023), consistent with our hypothesis. Interestingly, industry-funded studies had a higher FI, but lower FQ, indicating that the larger sample sizes in industry-funded studies did not increase statistical robustness of the study. Further, the fragility of outcomes of RCTs did not change based on publication between 2004-2010 and 2011-2023, suggesting RCTs are not improving in the robustness of their findings. The most statistically fragile outcomes were related to pain, patient satisfaction, anatomical changes, and ASD. In particular, for ASD, significance could be altered by converting outcomes in just 1.3 out of 100 patients, highlighting the statistical vulnerability of this outcome.

ASD is a key consideration when comparing lumbar fusion to lumbar LDA. Studies have consistently highlighted that a major disadvantage of lumbar fusion is the development of ASD, often leading to the need for further surgery.^6,15-19 The development of motion-preserving treatments such as LDA for symptomatic disc degeneration was largely driven by concerns that stabilizing one spinal segment, as seen in lumbar fusion, may inadvertently increase stress on adjacent levels, potentially accelerating degeneration in those areas.^7,20 LDA aims to minimize the iatrogenic acceleration of degenerative disease at segments adjacent to the operative levels. However, the literature on this topic is mixed. While some studies demonstrate significant reductions in ASD following LDA, others report comparable rates between the two procedures.^{6,15,17,18,21,22} Even amongst systematic reviews and meta-analyses, there is disagreement on whether LDA truly reduces ASD rates.^15,17,21,22 Given this variability, the statistical robustness of ASD-related outcomes warrants careful consideration. In this study, outcomes related to ASD were particularly fragile, with only 1.3% of patients requiring an outcome change to render results non-significant. Thus, while studies have reported significantly lower rates of ASD with LDA compared to lumbar fusion, the fragility of these data underscores the importance of cautious interpretation. It is interesting to note that declining rates of LDA procedures, suggests that the broader surgical community has already broadly understood that RCTs supporting LDA are of lower quality, and this paper quantifies this point with statistical fragility result.

Non-industry funded research is generally regarded as less prone to bias than industry-funded studies. Systematic reviews and meta-analyses have consistently shown that industry-sponsored research is more likely to report favorable outcomes for the sponsor’s product. For instance, a systematic review and meta-analysis by Lundh et al found that industry-sponsored trials were significantly more likely to report positive efficacy results and favorable conclusions compared to non-industry-sponsored studies.^23,24 This suggests the presence of potential bias toward the sponsor’s product. Similarly, Jorgensen et al found that industry-supported meta-analyses exhibited lower methodological quality and transparency compared to those funded by non-profit sources or without external support, contributing to biased outcomes.²⁵ Additionally, Riaz et al²⁶ reported that industry-sponsored studies were almost four times more likely to yield positive outcomes than those funded by the NIH, further reinforcing the notion of bias in favor of the sponsor’s product in industry-funded research. In the context with the literature, this study found that industry-funded studies were more statistically fragile with significantly lower FQ than non-industry funded ones, consistent with the broader literature on increased potential for bias in industry-funded studies. Of note, this study found that industry-funded studies had a higher median FI compared to non-industry funded studies (median FI: 6 vs 4), suggesting that more outcome reversals were required to alter statistical significance. However, when accounting for sample size using the FQ, industry-funded studies exhibited lower robustness (median FQ: 0.019 vs 0.040). This indicates that, despite a higher raw FI, the proportion of patients required to change outcomes to reverse significance was actually smaller in industry-funded studies. This is likely due to the larger sample sizes typical of industry-funded trials, which can inflate the FI but may not truly reflect greater statistical robustness. The FQ provides essential context, revealing that these studies may remain statistically fragile despite appearing robust based on FI alone.

While our subgroup analysis accounted for differences in sample size through the use of the FQ, we acknowledge that other factors, such as differences in study design, methodological rigor, and population heterogeneity, may also influence statistical fragility. To mitigate this, we performed a formal risk of bias assessment using the Cochrane RoB 2 tool and found that the overall methodological quality was comparable between industry-funded and non-industry funded studies. However, due to variability in reporting in individual studies, we were unable to systematically control for differences in patient population heterogeneity across studies. This represents a limitation of our subgroup analysis and underscores the need for cautious interpretation when comparing fragility across funding sources.

The FQ reported in this study emphasizes the need for high-quality, controlled, and unbiased studies with sufficiently long follow-up periods. However, it is in line with other studies in the orthopedic literature examining significance and fragility.^27-32 For example, Ortiz-Babilonia et al found a median FI of 7 and FQ of 0.043 in a study examining RCTs in cervical disc arthroplasty vs anterior cervical discectomy and fusion, and Parisien et al found a median FI of 4 and FQ of 0.066 in a study examining studies in the orthopaedic shoulder literature.^11,28 This suggests a broader need to improve study design throughout the orthopaedic literature.

Understanding the statistical fragility of study outcomes enables clinicians to make more informed decisions. For instance, when a study has a low FI, clinicians may exercise greater caution in altering clinical practice based on its findings.^33,34 Awareness of FI and FQ can also enhance future study design, as researchers can strive for higher FIs—indicating more robust results—by increasing sample sizes or implementing more rigorous follow-up protocols.^33,35 FI and FQ offer additional context to P-values, which alone may not fully reflect the reliability of study outcomes, reducing the over-reliance on P-values as the sole measure of statistical significance.^34,35 In summary, calculating FI and FQ provides a quantitative measure of a study’s robustness, helping to bridge research gaps, guide clinical decisions, improve study design, and supplement P-values. This ultimately fosters more reliable, evidence-based clinical practice. Strategies to increase FI and FQ of a study include increasing sample size, increasing study timeframe to ensure an adequate number of events, minimizing loss to follow-up, or using multiple outcomes to make a conclusion rather than a single outcome.

There are significant limitations to consider when interpreting the results of this study. While fragility measures are valuable for assessing the robustness of RCTs, they require the exclusion of non-RCTs, limiting their general applicability in clinical research. Small sample sizes and rare events can result in a low FI, potentially penalizing studies that are otherwise well-conducted and clinically significant.^36,37 However, this issue is more relevant in fields where small sample sizes are common.³⁷ Additionally, the FI can be misleading in the context of high loss to follow-up rates; if the number of patients lost exceeds the FI, the study’s findings could be overturned with just better follow-up.³⁸ Standardized thresholds for the FI and FQ have not yet been established to evaluate the fragility of outcomes in comparative trials. A standardized definition of these thresholds would enhance the ability to assess the robustness of study findings. Lastly, when using statistical fragility in clinical decision-making, it’s crucial to understand that a study’s fragility isn’t the sole determinant of its clinical utility. A low FI indicates that a study’s significant findings could be overturned by a small number of events or minor changes in data. This highlights the vulnerability of the statistical significance, not necessarily the clinical importance of the intervention. Therefore, while a low FI should prompt a closer look at the study’s design, sample size, event rates, and follow-up completeness, it shouldn’t automatically lead to the dismissal of potentially beneficial interventions. Instead, consider the FI as one piece of evidence among many, including the biological plausibility of the intervention, the magnitude of the observed effect, and its consistency with other research, to make informed and patient-centered decisions.

Conclusion

This systematic review demonstrates that the statistically significant findings from RCTs comparing LF to LDA are statistically fragile and susceptible to a different conclusion with only small changes in event outcomes. The reversal of events in as few as 2.2 out of 100 patients overall and 1.3 out of 100 patients regarding ASD, may be enough to reverse the statistical significance of results from the RCTs included in this analysis. Given the low FI and FQ across many outcomes, the robustness of current evidence in the LDA vs fusion literature should be interpreted with caution. Further high-quality trials are needed to confirm and strengthen these findings.

Footnotes

ORCID iDs

Justin Tiao

Mateo Restrepo Mejia

Niklas H. Koehne

Saad B. Chaudhary

James C. Iatridis

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated and analyzed during this study are available from the corresponding author upon reasonable request. All data were extracted from publicly available randomized controlled trials (RCTs) identified through a systematic search of PubMed, Embase, and Medline. The search strategy and inclusion/exclusion criteria are detailed in the Methods section of this manuscript to ensure transparency and reproducibility.*

IRB Statement

This research was not considered human subjects research given the availability of all data through the three databases mentioned in the manuscript.

Appendix

Table 1.

Risk of Bias Assessment using the Cochrane Risk of Bias 2.0 (RoB 2) Tool

First author	Domain 1: Risk of bias arising from randomization process	Domain 2: Risk of bias due to deviations from the intended interventions	Domain 3: Risk of bias due to missing outcome data	Domain 4: Risk of bias in measurement of the outcome	Domain 5: Risk of bias in selection of the reported result	Overall risk of bias
Auerbach	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Auerbach	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Berg	Low risk	Some risk	Low risk	Low risk	Low risk	Low risk
Berg	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Berg	Low risk	Some risk	Low risk	Low risk	Low risk	Low risk
Blumenthal	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Delamarter	Low risk	Some risk	Low risk	Some risk	Low risk	Some risk
Geisler	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Gornet	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Gornet	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Guyer	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Holt	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Radcliff	Low risk	Low risk	Some risk	Low risk	Low risk	Low risk
Skold	Low risk	Low risk	Some risk	Some risk	Low risk	Some risk
Tropp	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Zigler	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk
Zigler	Low risk	Low risk	Low risk	Low risk	Low risk	Low risk

References

van den Eerenbeemt

Ostelo

van Royen

Peul

van Tulder

. Total disc replacement surgery for symptomatic degenerative lumbar disc disease: a systematic review of the literature. Eur Spine J. 2010;19(8):1262-1280.

Upfill-Brown

Policht

Sperry

, et al. National trends in the utilization of lumbar disc replacement for lumbar degenerative disc disease over a 10-year period, 2010 to 2019. J Spine Surg. 2022;8(3):343-352.

Sandhu

Dowlati

Garica

. Lumbar arthroplasty: past, present, and future. Neurosurgery. 2020;86(2):155-169.

Shukla

Matur

, et al. Lumbar arthroplasty is associated with a lower incidence of adjacent segment disease compared with ALIF: a propensity-matched analysis. Spine. 2023;48(14):978-983.

Koutsogiannis

Khan

Phillips

, et al. A cross-sectional analysis of 284 complications for lumbar disc replacements from medical device reports maintained by the United States food and drug administration. Spine J. 2022;22(2):278-285.

Rainey

Blumenthal

Zigler

Guyer

Ohnmeiss

. Analysis of adjacent segment reoperation after lumbar total disc replacement. Int J Spine Surg. 2012;6:140-144.

Zigler

Gornet

Ferko

Cameron

Schranck

Patel

. Comparison of lumbar total disc replacement with surgical spinal fusion for the treatment of single-level degenerative disc disease: a meta-analysis of 5-Year outcomes from randomized controlled trials. Glob Spine J. 2018;8(4):413-423.

Eskandar

Ahmed

Pan

Agrawal

. The decline of lumbar artificial disc replacement. J Spine Res Surg. 2024;6(3):86-92.

Mills

Shelby

Bouz

Hah

Wang

Alluri

. A decreasing national trend in lumbar disc arthroplasty. Glob Spine J. 2023;13(8):2271-2277.

10.

Ratnasamy

Rudisill

Maloy

Grauer

. Cervical disc arthroplasty usage has leveled out from 2010 to 2021. Spine. 2023;48(20):E342-E348.

11.

Ortiz-Babilonia

Gupta

Cartagena-Reyes

, et al. The statistical fragility of trials comparing cervical disc arthroplasty and anterior cervical discectomy and fusion: a meta analysis. Spine. 2024;49(10):708-714.

12.

Walsh

Srinathan

McAuley

, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a fragility index. J Clin Epidemiol. 2014;67(6):622-628.

13.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

14.

Sterne

JAC

Savović

Page

, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

15.

Harrop

Youssef

Maltenfort

, et al. Lumbar adjacent segment degeneration and disease after arthrodesis and total disc arthroplasty. Spine. 2008;33(15):1701-1707. doi:10.1097/BRS.0b013e31817bb956

16.

Xia

Chen

Cheng

. Prevalence of adjacent segment degeneration after spine surgery: a systematic review and meta-analysis. Spine. 2013;38(7):597-608. doi:10.1097/BRS.0b013e318273a2ea

17.

Ren

Song

Liu

Xue

. Adjacent segment degeneration and disease after lumbar fusion compared with motion-preserving procedures: a meta-analysis. Eur J Orthop Surg Traumatol. 2014;24(Suppl 1):S245-S253.

18.

David

. Long-term results of one-level lumbar arthroplasty: minimum 10-year follow-up of the CHARITE artificial disc in 106 patients. Spine. 2007;32(6):661-666. doi:10.1097/01.brs.0000257554.67505.45

19.

Cinotti

David

Postacchini

. Results of disc prosthesis after a minimum follow-up period of 2 years. Spine. 1996;21(8):995-1000.

20.

Zigler

Blumenthal

Guyer

Ohnmeiss

Patel

. Progression of adjacent-level degeneration after lumbar total disc replacement: results of a post-hoc analysis of patients with available radiographs from a prospective study with 5-year follow-up. Spine. 2018;43(20):1395-1400.

21.

Donnally

3rd Patel

Canseco

, et al. Current incidence of adjacent segment pathology following lumbar fusion versus motion-preserving procedures: a systematic review and meta-analysis of recent projections. Spine J. 2020;20(10):1554-1565.

22.

Wang

Arnold

Hermsmeyer

Norvell

. Do lumbar motion preserving devices reduce the risk of adjacent segment pathology compared with fusion surgery? A systematic review. Spine. 2012;37(22 Suppl):S133-S143.

23.

Lundh

Lexchin

Mintzes

Schroll

Bero

. Industry sponsorship and research outcome: systematic review with meta-analysis. Intensive Care Med. 2018;44(10):1603-1612.

24.

Lundh

Lexchin

Mintzes

Schroll

Bero

. Industry sponsorship and research outcome. Cochrane Database Syst Rev. 2017;2(2):MR000033.

25.

Jørgensen

Maric

Tendal

Faurschou

Gøtzsche

. Industry-supported meta-analyses compared with meta-analyses with non-profit or no support: differences in methodological quality and conclusions. BMC Med Res Methodol. 2008;8:60.

26.

Riaz

Raza

Khan

Riaz

Krasuski

. Impact of funding source on clinical trial results including cardiovascular outcome trials. Am J Cardiol. 2015;116(12):1944-1947.

27.

Parisien

Dashe

Cronin

Bhandari

Tornetta

3rd . Statistical significance in trauma research: too unstable to trust? J Orthop Trauma. 2019;33(12):e466-e470.

28.

Parisien

Trofa

Cronin

, et al. Comparative studies in the shoulder literature lack statistical robustness: a fragility analysis. Arthrosc Sports Med Rehabil. 2021;3(6):e1899-e1904.

29.

Yendluri

Chiang

Linden

, et al. The fragility of statistical findings in the reverse total shoulder arthroplasty literature: a systematic review of randomized controlled trials. J Shoulder Elb Surg. 2024;33(7):1650-1658.

30.

Brown

Yendluri

Lawrence

, et al. The statistical fragility of tranexamic acid use in the orthopaedic surgery literature: a systematic review of randomized controlled trials. J Am Acad Orthop Surg. 2024;32(11):508-515.

31.

Lawrence

Okewunmi

Chakrani

Cordero

Parisien

. Randomized controlled trials comparing bone-patellar tendon-bone versus Hamstring tendon autografts in anterior cruciate ligament reconstruction surgery are statistically fragile: a systematic review. Arthroscopy. 2024;40(3):998-1005.

32.

Yendluri

Megafu

Wang

, et al. The fragility of statistical findings in the femoral neck fracture literature: a systematic review of randomized controlled trials. J Orthop Trauma. 2024;38(6):e230-e237.

33.

Tignanelli

Napolitano

. The fragility index in randomized clinical trials as a means of optimizing patient care. JAMA Surg. 2019;154(1):74-79.

34.

Rickard

Lorenzo

Hannick

Blais

Koyle

Bägli

. Over-reliance on P values in urology: fragility of findings in the Hydronephrosis literature calls for systematic reporting of robustness indicators. Urology. 2019;133:204-210.

35.

Mian

Megafu

, et al. The statistical fragility of the distal fibula fracture literature: a systematic review of randomized controlled trials. Injury 2023. doi:10.1016/j.injury.2023.03.022

36.

Andrade

. The use and limitations of the fragility index in the interpretation of clinical trial findings. J Clin Psychiatry 2020;81(2). doi:10.4088/JCP.20f13334

37.

Schröder

Muensterer

Oetzmann von Sochaczewski

. Paediatric surgical trials, their fragility index, and why to avoid using it to evaluate results. Pediatr Surg Int. 2022;38(7):1057-1066.

38.

Proal

Moon

Kwon

. The fragility index and reverse fragility index of FDA investigational device exemption trials in spinal fusion surgery: a systematic review. Eur Spine J. 2024;33(7):2594-2603.

The Statistical Fragility of Lumbar Disc Arthroplasty vs Lumbar Fusion: A Systematic Review of Randomized Controlled Trials

Abstract

Study Design

Objectives

Methods

Results

Conclusions

Keywords

Introduction

Methods

Inclusion Criteria

Risk of Bias Assessment

Data Extraction

Fragility Analysis

Results

Discussion

Conclusion

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

Data Availability Statement

IRB Statement

Appendix

References