Abstract
Study Design
Systematic Review.
Introduction
Randomized controlled trials (RCTs) on lumbar endoscopic decompression inform treatment decisions for disk disease, radiculopathy, and lumbar spinal stenosis. This study assessed the fragility of statistical outcomes in these RCTs.
Methods
PubMed, Embase, and MEDLINE were queried for RCTs reporting dichotomous outcomes with at least 1 endoscopic decompression arm. The fragility index (FI) and reverse FI (rFI) represented the number of event reversals needed to change significance for significant and nonsignificant outcomes, respectively. The fragility quotient (FQ) was calculated by dividing FI by sample size. Subgroup analysis was performed by outcome type.
Results
37 RCTs met the inclusion criteria for analysis. A total of 160 outcomes were analyzed. The median FI was 4 (IQR: 3-5) and FQ 0.038 (IQR: 0.017-0.067). Significant outcomes (n = 23) had a median FI of 7 (IQR: 2-13), FQ 0.024 (IQR: 0.012-0.056); nonsignificant outcomes (n = 137) had FI 4 (IQR: 3-5), FQ 0.041 (IQR: 0.020-0.068). Revisions/reoperations were most robust (FI: 5, FQ: 0.037); microscopic outcomes most fragile (FI: 4, FQ: 0.022). Pain outcomes had FI 4 (FQ: 0.051); complications FI 4 (FQ: 0.038). In 47.5% of outcomes, patients lost to follow-up exceeded FI.
Conclusions
Findings from RCTs on lumbar endoscopic decompression are vulnerable to small changes in outcome events. In nearly half of outcomes, patients lost to follow-up outnumbered the FI. Reporting FI and FQ with P-values may improve interpretation and reliability of trial results.
Introduction
Endoscopic lumbar decompression is a minimally invasive surgical technique used to alleviate pressure on spinal nerves caused by conditions like spinal stenosis, herniated discs, or degenerative changes. 1 This procedure enables the removal of a herniated disc, hypertrophic ligamentum flavum, and bony overgrowths through small incisions using an endoscope, minimizing tissue damage and preserving spinal stability. Depending on the condition and location of nerve compression, surgeons may utilize the transforaminal approach to access herniated discs or the interlaminar approach to address central or lateral recess stenosis. 2 Compared to traditional open surgery endoscopic lumbar decompression offers numerous advantages including reduced postoperative pain, shorter recovery time, and shorter hospital stays by minimizing muscle dissection and tissue damage. 3 This innovative technique has seen increasing adoption among surgeons for patients seeking effective relief from symptoms like back pain, leg numbness, and weakness while minimizing the risks associated with more invasive surgeries.1,4
Randomized controlled trials (RCTs) represent the highest level of evidence for evaluating clinical outcomes and guiding clinical decision-making in spine surgery due to their rigorous methodology and controlled study design. However, conclusions drawn from these trials rely heavily on P values, which have been criticized for overlooking important factors such as patient loss to follow-up and study design. 5 The concept of fragility index (FI) was first introduced by Feinstein et al to complement the P value and address its limitations. The FI quantifies the minimum number of patients whose outcome status must change to convert a statistically significant result (P < 0.05) to non-significance (P ≥ 0.05). It offers insight into the trial’s susceptibility to minor changes in data, emphasizing the potential fragility of its conclusions. 6 The Reverse Fragility Index (RFI) is the minimum number of outcome reversals required to convert a statistically non-significant result into a significant one, helping evaluate the stability of non-significant findings.7,8 The Fragility Quotient (FQ) is the ratio of the Fragility Index (FI) or Reverse Fragility Index (RFI) to the total sample size, providing a standardized measure of result fragility relative to study size.6,9,10 In conjunction with the P-value, the FI and FQ offer a more comprehensive assessment of a trial’s fragility. Studies with low susceptibility to fragility yield stronger, more reliable conclusions than those with high susceptibility, enabling readers to critically evaluate the literature and enhance clinical decision-making based on evidence-based principles. 11
The purpose of this study was to determine the overall fragility of outcomes in randomized controlled trials evaluating lumbar endoscopic decompression techniques by utilizing the FI, RFI, and FQ metrics. Furthermore, we aimed to evaluate the statistical fragility of these RCT findings according to outcome type. We hypothesized that statistical outcomes reported in the lumbar endoscopic decompression literature would be fragile, with only a few outcome-event reversals altering significance. We further hypothesized that significant outcomes would be especially fragile and that statistical fragility would be observed across the various outcome types assessed.
Methods
Systematic Search Strategy
This study systematically searched PubMed, Embase, and MEDLINE databases to identify randomized controlled trials (RCTs) published between January 1, 2010, and July 16, 2024. The review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 12 The search strategy employed various Boolean combinations of keywords, synonyms, and term variations, including “endoscopic,” “spine surgery,” “lumbar,” “discectomy,” “laminectomy,” “laminotomy,” and “decompression.” The comprehensive search strings are detailed in the supplemental material.
Eligible studies included RCTs that reported dichotomous outcomes and featured at least 1 treatment arm involving endoscopic lumbar decompression. Exclusion criteria encompassed studies published in non-English languages or those utilizing cadaveric, biomechanical, animal, in vitro, or non-RCT designs. Additionally, studies without full-text availability were excluded. Titles and abstracts were screened by 2 independent reviewers, followed by a full-text review, with conflicts resolved by a third independent reviewer. Reasons for exclusion were documented, and the senior author confirmed the final study selection.
The revised Cochrane Risk of Bias tool was used to evaluate bias in the included RCTs. 13 This review focused on the statistical reporting and significance of outcomes rather than direct clinical outcomes. Thus, it did not meet the criteria for registration with the International Prospective Register of Systematic Reviews (PROSPERO). Only publicly accessible studies were analyzed, eliminating the need for institutional review board (IRB) approval.
Study Screening and Data Extraction
Key data extracted from the selected studies included the first author, year of publication, journal title, experimental and control group interventions, reported outcomes, results, number of patients lost to follow-up, and P-values where available.
The primary outcome was the fragility index and fragility quotient for outcomes in each included randomized controlled trial. The secondary outcome was the proportion of studies in which the number of patients lost to follow-up exceeded the calculated FI. Subgroup analyses were considered tertiary outcomes.
Outcome measures were categorized into subgroups of complications/adverse events, revision/readmission rates, and patient reported pain. Additional subgrouping was applied for studies assessing microscopic endoscopic approaches, which were reported as comparator subgroup outcomes. A separate subgroup was also created for studies specifically employing a biportal endoscopic approach, which were reported as a subanalysis within endoscopic techniques. Reviewers performed data extraction independently using standardized forms to ensure consistency. Figure 1 displays a PRISMA flow chart detailing the screening process and literature search outcomes. PRISMA Flowchart of Study Selection Process
Fragility Analysis
Fragility analysis was conducted using a two-tailed Fisher’s exact test to evaluate the statistical significance of reported outcomes at a threshold of P < 0.05. The fragility index (FI) was calculated for significant outcomes by determining the minimum number of event reversals required for the P-value to rise above 0.05, rendering the results no longer statistically significant (Figure 2). Demonstration of Statistical Significance Reversal Using a 2 × 2 Contingency Table With a Resulting Fragility Index (FI) = 3. P-Values Were Calculated Using a Two-Tailed Fisher Exact Test
For non-significant outcomes, the reverse Fragility Index (rFI) was calculated by manipulating event outcomes until the P-value dropped below 0.05. The Fragility Quotient (FQ) was derived by dividing the FI or rFI by the study sample size, reflecting the proportion of patients needing an outcome reversal to alter statistical significance. Subgroup analyses were performed based on outcome type and statistical significance. Fragility analysis results were summarized as medians with interquartile ranges (IQRs).
Results
Search Results
Characteristics of Included Studies
Statistical Fragility of Overall Outcomes
Statistical Fragility of Subgroup Outcomes
A Cochrane risk of bias assessment indicated that 36 out of the 37 evaluated RCT’s were categorized as having an overall “low risk of bias.” The single study categorized as having ‘some concerns’ exhibited potential bias related to inadequate concealment of the allocation sequence before participants were enrolled and assigned to their interventions (Table 4). Fragility Quotient of Significant Outcomes With Respect to Year of Publication Bias Assessment for Included Studies Evaluated Using Revised Cochrane Risk-Of-Bias Tool for Randomized Trials
Discussion
This study analyzed the fragility of RCTs investigating endoscopic lumbar decompression. Subgroup analysis of each outcome type revealed variable robustness of study findings, with complications, microscopic technique, and revisions and reoperations identified as the most fragile. In contrast, self-reported pain and biportal technique demonstrated the greatest robustness. While progress has been made in examining the statistical fragility in other orthopedic subspecialties,10,14,15 the spine literature has not received comparable attention. Examining the statistical fragility of lumbar endoscopic decompression clinical trials provides insight into the robustness of the outcomes assessed in pertinent literature.
This study’s median FQ was 0.038 across all outcomes, indicating that in a sample of 100 patients, approximately 4 patient outcomes would need to be reversed to flip the statistical significance. This result demonstrates that RCTs on lumbar endoscopic decompression are statistically fragile. While no FQ threshold signifies fragility, the literature suggests that our result falls under fragile findings, with some studies deeming FQs as high as 8.0% fragile. 16 We observed similar results when examining other spine-related fragility studies. For example, in a comparative analysis of cervical disc arthroplasty and anterior cervical discectomy and fusion, Ortiz-Babilonia et al. reported a median FQ of 0.043, slightly less fragile than the FQ reported in this study. 17 Additionally, Tiao et al examined lumbar disc arthroplasty vs fusion and reported a median FI of 5 with an FQ of 0.022, 18 while Yu et al analyzed vertebroplasty trials and found a median FI of 5 with an FQ of 0.053. 19 Yu et al also noted that nearly 80% of outcomes had more patients lost to follow-up than the FI. These findings are consistent with our results and reinforce that spine surgery RCT outcomes are statistically fragile, as even small numbers of unreported events could overturn significance and alter trial conclusions.
The median FQ for significant outcomes was 0.024, which is more fragile than the median FQ for all outcomes and the nonsignificant outcomes. Significant outcomes tend to be more fragile, especially in RCTs with smaller sample sizes. 20 There are many risks associated with fragile significant outcomes, including false confidence in the effectiveness of treatment and an increased risk for false null hypothesis rejection. For instance, an RCT reviewed in this study compared the frequency of unintended durotomy between endoscopic and open discectomy procedures, reporting a P-value of 0.005. Although this may initially appear statistically significant and robust to a reader, the FQ for this specific primary outcome was found to be 0.017, suggesting fragility. 17 While the result may appear statistically convincing, its practical reliability is questionable. Physicians should interpret such findings with caution, considering the fragility of the evidence before adopting 1 approach over another based solely on presented results. Reporting fragility metrics alongside P-values may provide readers with a better sense of the robustness of an RCT outcome which would help guide evidence-based surgical decision-making for lumbar decompression.
Revisions and reoperation provide valuable insight into the safety and success of the initial surgery, as success rates decline with each subsequent procedure.21,22 With revision surgeries occurring in over 13% of patients undergoing lumbar spine surgery within a 10 year follow-up, 23 the fragility of these outcomes raises concern over their validity. While existing literature often highlights lower complication rates for endoscopic procedures compared to open surgery, the observed statistical fragility suggests that the perceived safety advantages of endoscopic techniques could be overly optimistic. 21 Complications such as infections, nerve damage, and instability can dramatically influence a patient’s quality of life (QoL), ability to recover,24,25 and overall satisfaction with their surgical outcomes. 26 Although self-reported pain was the most robust outcome, the observed fragility suggests that even a small shift in patient outcomes—just 5.1% of the study sample—could negate statistical significance. Pain relief is the primary goal of endoscopic lumbar decompression surgeries, as these procedures aim to alleviate nerve compression and improve the QoL for patients. 1 Additionally, pain reduction is associated with a variety of postoperative recovery factors, such as decreased morbidity, shorter recovery time, decreased length of opioid use, and lower health-care costs. 27 The fragility of revision and reoperation, complications, and pain-related outcomes raises concerns regarding the accurate assessment of these critical surgical measures.
Minimally invasive spine surgery (MISS) has undergone significant evolution over the past few decades. It is becoming more popular as it offers faster recovery times, reduced complications, and improved postoperative outcomes compared to open surgery. 28 Microscopic spine surgery, a common MISS procedure, utilizes a microscope to guide visualization through a single portal. 29 However, the microscopic technique subgroup was found to be the most fragile, raising concerns regarding their statistical validity given its widespread use in clinical practice, 76.54% of the studies assessing microscopic techniques reported a loss to follow-up that exceeds the FI, which is notable since. As a lack of reporting from these patients could alter the significance of the findings. In contrast, the more recent biportal endoscopic technique is becoming an increasingly popular alternative. 30 It utilizes 2 portals, 1 for the endoscope and the other for the surgical tools, offering physicians a direct visualization of the anatomy as they perform the procedure. While slightly more robust, the fragility of the biportal technique raises similar questions about the consistency of its reported benefits. Given that these techniques have been compared in the literature with findings showing no significant difference between the 2,31-33 ensuring statistically robust results is crucial to validate these outcomes and confirm the clinical viability of both techniques. This is particularly relevant given that complications related to biportal endoscopic spinal surgery may be as high as 8.1%. 34 However, the novelty of biportal techniques may account for these limitations, indicating a need for additional comparative trial literature to evaluate clinical significance further.
One concern in fragility analysis is patients who are lost to follow-up. These patients can significantly impact statistical significance, as the patients who are not reporting their outcomes have a possibility to meet the required number of outcome changes (FI) in order to swap the statistical significance of a study’s outcome. It is important to keep this component in mind when analyzing the fragility statistics obtained from this study.
Fragility is becoming an increasingly used tool to measure the reliability of study outcomes — by incorporating fragility assessments early on in the study design; researchers can proactively identify areas of improvement that can increase the robustness of the study. However, various fragility studies over the years continuously suggest that the robustness of study outcomes has not significantly improved.10,15,35,36 Key factors contributing to persistent fragility include small sample sizes and bias-influenced outcome measures. Underpowered trials are particularly vulnerable to fragile outcomes, as insufficient sample size reduces the likelihood of detecting true effects while also increasing the susceptibility of reported findings to reversal with only a few outcome changes. Thus, power analysis and fragility analysis should be viewed as complementary tools: power reflects the ability to detect an effect, whereas fragility reflects the stability of the reported result. Addressing these issues requires increasing sample sizes to enhance statistical power, adopting objective standardized outcome measures to reduce variability, and implementing stricter blinding protocols to minimize bias. Additionally, using objective criteria allows for less variability and more reliable results, ultimately improving the quality of conclusions drawn related to surgical outcomes such as pain, compilations, and reoperation, ultimately leading to more beneficial patient results.
Limitations
This study has limitations that should be taken into account. First, this fragility analysis is limited to dichotomous outcomes, reducing its applicability to studies with continuous measures. Alternative fragility metrics, such as the continuous fragility index, may be more appropriate. Additionally, while widely accepted methodologies and parameters are clearly defined for dichotomous fragility outcomes, there is no consensus on standardized thresholds for the FI and FQ measures. This lack of standardization introduces variability in the interpretation of these outcomes, which can then impact the comparability of our findings. Furthermore, our analysis focuses on outcome fragility and P-values, without accounting for study design limitations, sample size, inclusion and exclusion criteria, and blinding protocols. This indicates that our assessment of fragility is based solely on statistical measures without considering methodological factors that could impact the reliability of the studies included in our systematic review. As a result, the true robustness of the evidence may be underestimated, and potential biases introduced by study design limitations may be overlooked, affecting the overall validity of our conclusions.
Despite these limitations, our study highlights the importance of the fragility of outcomes associated with lumbar decompression. Clinicians and researchers should approach lumbar decompression outcomes cautiously, given the fragility of the outcomes discovered in this study. The fragility of the outcomes reveals a necessary shift to be made in RCT design, and future research on lumbar decompression should consider the potential influence of patient loss to follow-up and incorporate larger sample sizes.
Conclusion
This systematic review is the first to analyze the fragility of lumbar decompression outcomes in RCTs. The significance of the fragility of these outcomes necessitates caution in interpreting the outcomes of lumbar decompression in guiding clinical decision-making. Integrating fragility metrics into current study designs will ensure the robustness of the study and that even small changes in sample size and follow-up will not reverse the significance of the findings. This approach will lead to enhanced patient outcomes and improve the reliability of research results by highlighting the fragility of outcomes and emphasizing the importance of robust study designs. This shift will have profound implications for the field, encouraging more consistent evidence and fostering greater confidence in clinical decision-making and scientific advancements.
Supplemental Material
Supplemental Material - Statistical Fragility of Endoscopic Lumbar Decompression Outcomes: A Systematic Review of Randomized Controlled Trials
Supplemental Material for Statistical Fragility of Endoscopic Lumbar Decompression Outcomes: A Systematic Review of Randomized Controlled Trials by Kareem S. Mohamed, Alexander Yu, Yazan Alasadi, Prabhjot Singh, Luca Valdivia, Avanish Yendluri, Junho Song, Nikan K. Namiri, John Corvi, Samuel K. Cho in Global Spine Journal
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Disclosures
Samuel K. Cho, MD, FAAOS. AAOS: Board or committee member. American Orthopaedic Association: Board or committee member. AOSpine North America: Board or committee member. Cervical Spine Research Society: Board or committee member. Globus Medical: IP royalties and Fellowship support. North American Spine Society: Board or committee member. Scoliosis Research Society: Board or committee member. Stryker: Paid consultant. Cerapedics: Fellowship support.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
