Sage Journals: Discover world-class research

Abstract

Study design

Systematic Review.

Objectives

Cervical total disc arthroplasty (CTDA) remains an alternative to anterior cervical discectomy and fusion (ACDF) in select patients with cervical radiculopathy or myelopathy secondary to degenerative disc disease. Studies comparing CTDA to ACDF often have conflicting conclusions and varying quality. The purpose of this study was to utilize the fragility index (FI) to assess the robustness of randomized controlled trials (RCT) comparing CTDA to ACDF.

Methods

A systematic review was performed by searching PubMed, Ovid MEDLINE, Web of Science, and Embase for RCTs with 2 parallel study arms and 1:1 allocation of subjects investigating CTDA vs ACDF with at least 1 statistically significant, dichotomous outcome. The FI was calculated by individually shifting 1 patient from the event group to the non-event group with re-calculation of Fisher’s Exact test until the reported P value was no longer statistically significant (P > 0.05).

Results

The search identified 934 abstracts with 19 RCTs meeting inclusion criteria. The mean patient sample size was 276.4 (median 209, range 30-541). The number of patients lost to follow-up ranged from 0-229 (mean 69.7, median 45). The mean FI was 4.6 (range 0-30, median 2) with 3 (13.6%) of the studies having an associated FI of 0. Loss to follow up exceeded the fragility index in all but 2 studies.

Conclusion

RCTs comparing ACDF to CTDA are often fragile with only 1-2 patients experiencing an alternative outcome or lost to follow-up to change the studied outcome.

Keywords

fragility index anterior cervical discectomy and fusion cervical total disc arthroplasty randomized controlled trials

Introduction

Anterior cervical discectomy and fusion (ACDF) has long been the mainstay of treatment for cervical radiculopathy or myelopathy in the setting of cervical radiculopathy.¹ More recently, cervical total disc arthroplasty (CTDA) has arisen as an alternative surgical option with the goal of preserving motion in the cervical spine and decreasing rates of adjacent segment disease (ASD). Although there is a substantial volume of literature comparing anterior cervical discectomy and fusion to cervical total disc arthroplasty, questions have arisen regarding the robustness of these studies which often have small sample sizes and inherent difficulties with blinding and follow-up. Given that these randomized controlled trials (RCTs) guide surgical intervention, it is important to verify the robustness or fragility of these RCTs in order to determine the relative degree of influence they should exert over clinical decision making.

The Fragility Index (FI) is a metric which can be utilized to assess study robustness or fragility.² The FI is the minimum threshold for the number of subjects required to have an alternative result in order for a significant dichotomous result to lose statistical significance.^2-5 Maldonado et al present an illustration of this statistical test.⁶ FI can also be divided by sample size to derive the fragility quotient (FQ), which inversely correlates with endpoint robustness.⁴ FQ is thought to be more resilient to sample size bias.⁶

An FI of 1 means that a change of only 1 event will change the statistical significance of an outcome. As a result, low FIs correlate with poor study robustness and vise-versa. The scale of what constitutes a robust FI does, however, vary throughout medical literature, with high impact medical articles boasting a median FI of 8 compared to a median of 2 associated with most surgical specialties.⁴

The fragility index has been utilized across the orthopaedic and neurosurgical literature to probe RCT fragility or robustness. Comparisons have found that spine literature is amongst the least robust pools of data, with 75% of RCTs boasting an FI less than 3, a percentage which is 3 times greater than that of high impact medicine articles.^2,4 Checketts et al reviewed studies classified as robust by the AAOS Clinical Practice Guidelines, finding a median FI of 2.⁷ Similarly, a median FI of 2 was reported by Muthu et al in an updated systematic review of spine literature in 2021.⁸ The purpose of this study was to investigate the FI of RCTs comparing ACDF vs CTDA in order to assess the robustness, or fragility, of these RCTS. It was hypothesized that the majority of studies would be on par with the FI of 2 demonstrated by both Evaniew et al and Muthu et al for previous spine literature with loss to follow-up frequently exceeding the FI.^2,4,8

Methods

Study Selection

A systematic review was performed via the Preferred Reporting Items for the Systematic Reviews and Meta-analyses (PRISMA) guidelines,⁹ with protocol registration through PROSPERO (ID: CRD42023464590). Search terms included “disc arthroplasty OR spinal arthroplasty OR spine arthroplasty OR cervical arthroplasty OR cervical total disc arthroplasty OR cervical total disc replacement OR cervical disc replacement AND randomized controlled trial OR randomized controlled trial OR randomized controlled trials.” The free-source artificial intelligence (AI) tool Rayyan (https://www.rayyan.ai/) was utilized to screen the PubMed, Ovid Medline, Web of Science, and Embase articles for RCTs with 2 parallel study arms and 1:1 allocation of subjects to treatment or control groups investigating CTDA vs ACDF with at least 1 statistically significant, dichotomous outcome.

Inclusion criteria limited the query to original English-language RCTs with 1:1 allocation of human test subjects to treatment (CTDA) vs control (ACDF) with at least 1 statistically significant (P < 0.05) dichotomous outcome variable. Several of the included studies were iterations of each other with the same initial pool of patients but different follow-up points and associated outcome variables / loss to follow-up.⁸ Exclusion criteria included Non-English language, non-randomized trials (eg. case reports, case series, cohorts, cross-sectional studies, observational studies, commentaries, editorials, review articles), and abstracts only. Two independent reviewers (Z.K.B. and K.P.) screened all abstracts for inclusion/exclusion criteria and a third reviewer (S.L.L.) served as the tie-breaker where needed.

Data Extraction

The following data were extracted from articles queried: authors, journal name and impact factor, publication year, funding source, randomization and allocation methods, application of blinding, use of a priori power analysis, total sample size, loss to follow-up, primary outcome, first statistically significant dichotomous outcome variable encountered with its respective P value, the number of respective events for treatment/control group, and loss to follow-up. The Cochrane Risk of Bias tool was utilized by 2 independent reviewers (S.L.L. and Z.K.B.) to evaluate for bias and quality. When necessary, a third reviewer served as a tiebreaker.

Fragility Index Calculation and Statistical Analysis

The FI was calculated per methodology first described by Walsh et al via Fishers Exact test, moving 1 individual from the event to non-event category until the P value becomes non-significant. The FI essentially represents the number of tests required for this to occur.⁵ The primary outcome or the first statistically significant dichotomous secondary outcome variable encountered was utilized per study. Calculations were performed utilizing the free-source calculator developed by Kane S.P. (https://clincalc.com/Stats/FragilityIndex.aspx). FIs were reported as whole numbers, with an FI of 1 indicating that a change of only 1 patient was required to convert a significant result to non-significant and FIs of 0 indicating that the original choice of statistical test was improper.¹⁰ The larger the FI, the stronger - or more robust - the result and vice versa. The fragility quotient was derived by dividing FI by sample size.

Summary statistics were used to characterize the included studies sample size, loss to follow-up, FI, and FQ. Correlation between FI and sample size as well as journal impact factor were calculated via Spearman Correlation Coefficient with level of significance set at P < 0.05. Subgroup analysis was performed using correlation analysis. For qualitative variables, −1 and 1 were utilized for analysis. All statistical analysis, beyond FI calculations, was performed within Microsoft Excel (Version 16.37, Microsoft, Redmond, WA, USA).

Results

During screening, the search identified 928 abstracts. Of these abstracts, 19 (2.05%) of them were RCTs that met the inclusion criteria (Figure 1). These studies were published between the years of 2008 to 2021. The most frequently cited journal was Spine with 6 articles (31.6%) followed by Journal of Neurosurgery: Spine with 3 articles (13.6%). The majority of articles (n = 16, 84.2%) compared CTDA to ACDF at a single level. The remaining 3 articles compared CTDA to ACDF at either 2 contiguous levels, 2 non-contiguous levels, or up to 3 contiguous levels (Table 1).

Figure 1.

Preferred reporting items for systemic reviews and meta-analysis (PRISMA) flow diagram for systematic review of literature.

Table 1.

Characteristics of RCTs and Outcome Variables Included in Analysis.

Study	Journal	Study comparison	Primary outcome (Dichotomy)	Significant dichotomous outcome	No. of patients treated	No. of patients lost to follow-up	P value	FI	FQ	Impact factor	LTF > FI
Anderson et al, 2008²⁵	Spine	cTDA (BRYAN®) vs ACDF (Single level)	Adverse events (dichotomous)	General medical events unrelated to the operation	463	46	0.049	0	0.000	3.0	Y
Burkus et al, 2014²⁶	JNS Spine	cTDA (Prestige®) vs ACDF (Single level)	Overall success (dichotomous)	Overall success with functional spinal unit	541	146	0.01	6	0.011	4.1	Y
Cheng et al, 2011²⁰	Clinical orthopaedics and related research	cTDA (BRYAN®) vs ACDF (1, 2 and 3 level)	—	Fusion rate	83	2	<0.001	30	0.361	4.2	N
Coric et al, 2011²⁷	JNS Spine	cTDA (Kineflex®) vs ACDF (Single level)	Neck disability index (not dichotomous), visual analog scale (not dichotomous), overall clinical success (dichotomous)	Overall clinical success	269	35	0.05	7	0.026	4.1	Y
Coric et al, 2018¹⁹	JNS Spine	cTDA (Kineflex®) vs ACDF (Single level)	—	Overall success	269	93	<0.05	11	0.041	4.1	Y
Delamarter et al, 2010²⁸	SAS journal	cTDA (ProDisc-C®) vs ACDF (Single level)	—	Secondary surgical procedures	209	95	0.0292	1	0.005	3.0	Y
Delamarter et al, 2013²⁹	Spine	cTDA (ProDisc-C®) vs ACDF (Single level)	Reoperation rates (dichotomous)	Reoperation rates	209	76	0.0079	6	0.029	3.0	Y
Heller et al, 2009¹⁸	Spine	cTDA (BRYAN®) vs ACDF (Single level)	Overall Success (dichotomous)	Overall Success	463	39	0.01	4	0.009	3.0	Y
Hou et al, 2016³⁰	JBJS	cTDA (Mobi-C®) vs ACDF (Single level)	Japanese orthopaedic association score (not dichotomous), visual analog scale for pain (not dichotomous), incidence of further surgery (dichotomous)	Incidence of further surgery	108	8	0.049	1	0.009	5.3	Y
Janssen et al, 2015³¹	JBSJ	cTDA (ProDisc-C®) vs ACDF (Single level)	Neck disability index (not dichotomous), neurologic success (dichotomous), secondary surgical procedures (dichotomous), and adverse events (dichotomous)	Secondary surgical procedures	165	57	0.0201	2	0.012	5.3	Y
Lavelle et al, 2019³²	Spine	cTDA (BRYAN®) vs ACDF (Single level)	Overall Success (dichotomous)	Overall Success	463	231	0.005	4	0.009	3.0	Y
Loidolt et al, 2021³³	Spine journal	cTDA (BRYAN®) vs ACDF (Single level)	Adverse events (dichotomous)	Adverse events resulting from trauma	463	229	0.04	2	0.004	4.2	Y
Murrey et al, 2009³⁴	Spine journal	cTDA (ProDisc-C®) vs ACDF (Single level)	—	Neurological success at 6 mo	209	7	0.046	1	0.005	4.2	Y
Phillips et al, 2015³⁵	Spine	cTDA (PCM®) vs ACDF (Single level)	—	NDI success	403	110	0.026	2	0.005	3.0	Y
Qizhi et al, 2016³⁶	Clinical Spine Surgery	cTDA (DISCOVER®) vs ACDF (2 non-contiguous levels)	—	Rate of adjacent Segment disease	30	0	0.04	0	0.000	1.9	N
Sasso et al, 2011³⁷	JBJS	cTDA (BRYAN®) vs ACDF (Single level)	Overall Success (dichotomous)	Overall success	463	144	0.004	6	0.013	5.3	Y
Sundseth et al, 2017²¹	European Spine journal	cTDA (DISCOVER®) vs ACDF (Single level)	—	Frequency of reoperation	136	23	0.029	1	0.007	3.2	Y
Yang et al, 2018³⁸	Orthopedics	cTDA (Mobi-C®) vs ACDF (2 contiguous levels)	—	Incidence of adjacent-segment degeneration	96	16	<0.05	3	0.031	5.2	Y
Zigler et al, 2013³⁹	Spine	cTDA (ProDisc-C®) vs ACDF (Single level)	—	Rate of Secondary Surgery	209	76	0.0292	1	0.005	3.0	Y

Of the included studies, the risk of bias calculation revealed 1 study to be of fair quality, with the remaining 18 studies being of poor quality (Table 2). Of the included studies, 14 reported a potential funding source conflict, and 11 reported some form of blinding, either with patients or assessors, with full blinding in only 3 studies (Tables 2 and 3). A prior power analysis was performed in 9/19 (47.4%) studies.

Table 2.

Bias and Quality Assessment Using Cochrane Risk-Of-Bias Tool.

Study	Random Sequence Generation	Allocation Concealment	Blinding of Participants	Blinding of Outcome Assessment	Incomplete Outcome Data	Selective Reporting	Other Bias
Anderson et al. 2008²⁵	−	−	−	−	+	+	+
Burkus et al. 2014²⁶	−	−	−	−	+	+	-
Cheng et al 2011²⁰	+	-	-	-	+	+	+
Coric et al. 2011²⁷	−	−	−	−	+	+	−
Coric et al. 2018¹⁹	−	−	−	−	+	+	−
Delamarter et al. 2010²⁸	?	−	?	−	+	+	−
Delamarter et al 2013²⁹	?	−	?	−	+	+	−
Heller et al. 2009¹⁸	?	?	?	−	+	+	−
Hou et al. 2016³⁰	?	?	+	+	+	+	+
Janssen et al. 2015³¹	+	−	?	−	+	+	−
Lavelle et al. 2019³²	+	?	?	−	+	+	−
Loidolt et al. 2021³³	?	?	?	−	+	+	−
Murrey et al. 2009³⁴	?	−	?	−	+	+	−
Phillips et al. 2015³⁵	−	−	−	−	+	+	−
Qizhi et al. 2016³⁶	+	−	−	−	+	+	+
Sasso et al. 2011³⁷	−	−	−	−	+	+	−
Sundseth et al. 2017²¹	+	−	+	−	+	+	−
Yang et al. 2018³⁸	−	?	?	−	+	+	+
Zigler et al. 2013³⁹	−	−	?	−	+	+	−

“-” = high risk of bias; “?” = unclear risk of bias; “+” = low risk of bias.

Table 3.

Summary of Study Details for Included RCT.

Criterion
No. of patients treated
Mean	276.4
Median	209
Range	30–541
Loss to follow up
Mean	69.7
Median	45
Range	0-229
No. of studies with funding bias (%)	14 (73.7)
No. of studies with blinding
Assessors blinded (%)	1 (5.3)
Participants blinded (%)	2 (10.5)
P value
Mean	0.029
Median	0.029
Range	0.001–0.050
Fragility index
Mean	4.6
Median	2
Range	0-30
≤2 (%)	10 (52.6)
>2 (%)	9 (47.4)
Fragility quotient
Mean	0.031
Median	0.009
Range	0.000-0.361
A priori analysis (%)	9 (47.4)

The primary outcome was available in 10 of 19 studies (52.6%) and was subsequently utilized as the significant dichotomous outcome to calculate FI in 9 of these 10 of studies (47.4% of all studies). The most commonly utilized outcomes were rate of reoperation/secondary surgery and overall success (31.6%). The remaining 10 studies used a secondary outcome for calculation of FI due to either lack of primary outcome or lack of significance or dichotomy in the primary outcome (Table 1). The reported P-value of the significant dichotomous outcomes used for FI calculation ranged from <0.001 to 0.049 (Table 3). FI was 0 in 3 (13.6%) studies and less than loss to follow-up in all but 2 studies (89.5%).

A combined 5251 total patients were included across all studies. The mean patient sample size was 276.4 (median 209, range 30-541). The number of patients lost to follow-up was 0 in 1 study with a mean of 69.7 (median 45, range 0-229). The mean calculated FI was 4.6 (median 2, range 0-30). The FI was less than or equal to 2 in 52.6% (10/22) of studies and greater than 2 in 47.4% (9/22) of studies. Two studies had an FI of zero. Loss to follow-up exceeded the FI in all but 1 study (which reported a loss to follow-up of 0 and an FI of 0). The mean FQ was 0.031 (median 0.009, range 0.000-0.361) (Table 3). There was no correlation between the FI and number of patients treated (Spearman coefficient = −0.141, P = 0.564). There was also no correlation between FI and impact factor (Spearman coefficient = 0.197, P = 0.420).

Discussion

Study Characteristics

The current study demonstrates that RCTs comparing ACDF vs CTDA are quite fragile, with the results echoing those of other FI assessments within spine surgery.^2,4 The final number of RCTs included was 19, which is toward the lower end of similar Orthopaedic and Neurosurgical systematic reviews, which range from 5 to 104, with a median of 40.^{2,3,6,8,11-17}

The median study population was 209, which was substantially larger than similar reviews, which ranged from 47 to 165 with a median of 84.^{2,3,6,8,11-15} This could be reflective of a priori power analyses with smaller effect sizes requiring larger sample sizes or could be a byproduct of strong industry funding providing resources for larger study sizes. Indeed, a weak positive correlation (R = 0.196) was found between industry funding and FI. However, Checketts et al previously demonstrated there to be no correlation between study power and funding source.⁷ Additionally, the current review found that implementation of a priori analysis did not correlate with substantially greater FI (R = - 0.075) but did correlate with industry funding (R = 0.567). Interestingly, Journal Impact Factor correlated loosely (R = 0.193) with FI suggesting that reputable journals have a tendency to publish more robust studies; however, the weakness of the correlation suggests that reputability does not directly equate to robustness.

Interestingly, Herndon et al and Muthu et al demonstrated positive correlations between FI and sample size (R 0.14; R = 0.431) in the orthopedic arthroplasty and spine literature, respectively.^8,11 In contrast, a weak negative correlation (−0.141) was found in the current study. The median FI was significantly greater at 6, compared to a median of 2.0 in this study, while the sample size was smaller with a median of 109.5 and mean of 193.4 (compared to 209 and 276.1, respectively).¹¹ Additionally, Muthu reported an FI of 2.0 with a smaller mean sample size of 133.⁸ These findings seemingly contradict the notion that increasing sample size will increase FI and underscore the complexity of factors that undermine study validity.

A complex set of relationships exist between FI, follow-up duration, and percent lost to follow-up. A relatively strong positive relationship exists (R = 0.482) between follow-up duration and loss to follow-up, which likely influences the lack of correlation between follow-up and FI (0.010). As expected, there is a weakly negative correlation between FI and percent lost to follow-up (−0.258), which is important as loss to follow-up exceeded FI in 18 out of 19 studies. These results imply that the statistical utility of FI may be limited to studies with relatively short-term follow-up windows.

Fragility Index: Comparison to Orthopaedic Literature

In terms of FI, this study found that only 31.8% of CTDA vs ACDF studies meet the threshold of >2 that Checketts et al demonstrated for literature designated “strong” by the AAOS Practice Guidelines.⁷ The large percentage of trials ≤2 (62.8%) is on par with that reported by Ruzbarsky et al (40% ≤ 2), Maldonado et al (50% ≤ 2), Ruzbarsky et al (73.3% ≤ 2) and Evaniew et al (75% ≤ 3).^2,6,15,16 The median FI of 2.0 is low, but on par with similar orthopaedic and neurosurgical literature, where the median FI ranges from 1 to 6, with a median of 2.5 and mean of 3.33.^{2,3,6,8,11-17} Compared to the spine literature, the FI of 2 is on par with that demonstrated by Evaniew et al and Muthu et al, which include many overlapping studies, such as that by Coric et al, Cheng et al. Hiller et al and Engquist. et al.^8,18-22

As expected, median loss to follow-up was quite high at 45. Loss to follow-up exceeded FI in 17 (89.5%) studies. This is on par with Orthopaedic and Neurosurgical literature, where losses to follow-up exceed FI in 31.2 – 74% of studies.^{2,3,6,12,14,15} It is, however, high compared to Checketts et al study of “strong” AAOS literature, where loss to follow up was greater than FI in only 32% of studies.⁷ This would suggest that study validity is frequently threatened.^4,23

Fragility Index: Effect Size and Power

Compared to the totality of the medical literature cannon, however, 2 is still rather low.⁴ This could indicate that there is only a very slight difference in most outcomes comparing ACDF and CTDA, leading outcomes to appear fragile. Indeed, the notion that ASD is higher in ACDF was refuted by a meta-analysis by Verma et al, which included many of the same trials as the current study.²⁴ A strong clinical argument to this point is that intervention is often strongly dictated by surgeon preference as it is universally agreed upon that patients do quite well with either intervention.

In contrast, larger differences may make studies appear more robust. Cheng et al serves as an outlier within the current study with an FI of 30 when comparing rates of fusion between ACDF vs CTDA.²⁰ Secondarily to stability, 1 would expect fusion to occur at considerably higher rates in ACDF compared to CTDA. The very high associated FI of this study is likely a manifestation of effect size in relation to this stark contrast. As a result, 1 would expect to see a strong correlation between FI, effect size and, therefore, power. Of note, Checketts et al found there to be a strong correlation between FI and power.⁷ Many studies were missing power analysis, so the direct correlation between power and FI could not be properly assessed; however, no correlation was found between FI and studies which included a priori power analyses (R = 0.075, Tables 4 and 5).

Table 4.

Subgroup Analysis.

Correlations	R
FI vs impact factor	0.193
FI vs Sample Size	−0.141
FI vs percent lost to follow-up	−0.258
FI vs follow-up duration	0.010
FI vs a priori power analysis	−0.075
FI vs funding bias	0.196
Time vs % lost to follow-up	0.482

R = Pearson correlation coefficient.

Table 5.

Evaluation of Spine Literature.

• Sample size• Length of follow up• Industry funding?• Author disclosures• A Priori Power Analysis• Fragility Index• Fragility Quotient• Does population lost to follow up exceed fragility index?

Bias

The current study found that 10 (52.6%) of studies reported losses greater than 20%, with a median of 27.0% and mean of 23.7%, indicating that the vast majority of studies must be analyzed with some degree apprehension. Dettori et al estimated that >20% loss to follow-up results in a serious threat to internal validity.²³ In this systematic review, only 3 studies (15.8%) met the Dettori et al threshold of <5% for little threat of bias.²³ These figures are on par with the findings of Checketts et al who found that 48.6% of trials deemed strong by the AAOS were at high risk of bias with only 4.2% being categorized as low risk.⁷

Due to ethical concerns, blinding is a very difficult source of bias to fully eradicate. The vast majority of studies were unblinded. Only 1 study kept patients blinded for the complete duration of follow-up.²¹ Several others kept concealment up until the time of surgery.^18,25 It is possible that known treatment could influence patient-oriented functional outcomes, such as the SF-36, which were often a component of assessing overall device-success. Additionally, lack of physician blinding could influence physical exam or the decision to re-operate. However, radiographic parameters, such as rates of pseudarthrosis or adjacent segment disease, should hypothetically be immune to lack of proper blinding or concealment.

Clinical Ramifications

The current study demonstrates the inherent fragility of the fragility index, which is influenced by many interrelated factors, such as sample size, loss to follow-up, funding source, and journal impact factor. Given the lability of FI and its negative relationship to long term follow-up, which is crucial for many important clinical trials, the current study suggests that perhaps a new metric for study robustness is required for properly assessing studies. In this regard, the authors advise skepticism when reviewing studies with low FIs but do not believe that a low FI is in of itself enough to invalidate a study. This wholistic ideology outlined in Table 5 is in keeping with AAOS Research Designations, where problematic sources of bias have been found in upwards of 48.6% of influential studies, indicating that most studies are flawed and, therefore, 1 factor is not enough to invalidate results.⁷ Given that strong FIs are rare and difficult to obtain, the Authors believe that a study with a strong FI should be interpreted as strong evidence; however, a weak FI should not automatically cause a study to be viewed as poor quality or prevent an otherwise compelling study from influencing standard of care.

Limitations

While FI provides a unique tool for assessing study robustness, it is not without its limitations. The first major limitation is the requirement of a dichotomous outcome. The majority of outcomes reported in spine literature are functional outcomes or measures of fusion, which are generally reported as discrete or continuous variables. This excludes the application of FI from a large number of outcomes. Additionally, variables must have met statistical significance, causing a large number of studies to be excluded.^2,4,5,11,23 Another limitation to FI calculation is the original statistical analysis of included studies. It is possible for an FI to confusingly be zero if a Chi Square analysis was performed initially where a Fisher’s exact test may instead have been more appropriate. This is generally the case when sample sizes are less than 5.^3,10

Conclusion

Randomized controlled trials comparing anterior cervical discectomy and fusion to cervical total disc arthroplasty are quite fragile with loss to follow-up frequently exceeding the fragility index. In many cases, 1 to 2 patients having an alternative outcome can change the statistically significant result assessed in these trials. Although the FI is unable to assess continuous variables, it offers an additional metric with which surgeons can analyze these trials prior to changing clinical practice.

Footnotes

Authors Note

Presented at AANS/CNS Section on Disorders of the Spine and Peripheral Nerves (Las Vegas, Nevada, February 2024).

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Austin H. Carroll

Kory B. Dylan Pasko

Addisu Mesfin

References

Smith

Robinson

Smith

Robinson

. The treatment of certain cervical-spine disorders by anterior removal of the intervertebral disc and interbody fusion. J Bone Joint Surg. 1958;40(3):607-624. doi:10.2106/00004623-195840030-00009

Evaniew

Files

Smith

, et al. The fragility of statistically significant findings from randomized trials in spine surgery: a systematic survey. Spine J. 2015;15(10):2188-2197. doi:10.1016/j.spinee.2015.06.004

Carroll

Rigor

Wright

Murthi

. Fragility of randomized controlled trials on treatment of proximal humeral fracture. J Shoulder Elb Surg. 2022;31(8):1610-1616. doi:10.1016/j.jse.2022.01.141

Dettori

Norvell

. How fragile are the results of a trial? the fragility index. Glob Spine J. 2020;10(7):940-942. doi:10.1177/2192568220941684

Walsh

Srinathan

McAuley

, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a fragility index. J Clin Epidemiol. 2014;67(6):622-628. doi:10.1016/j.jclinepi.2013.10.019

Maldonado

Huang

Domb

. The fragility index of hip arthroscopy randomized controlled trials: a systematic survey. Arthroscopy. 2021;37(6):1983-1989. doi:10.1016/j.arthro.2021.01.049

Checketts

Scott

Meyer

Horn

Jones

Vassar

. The robustness of trials that guide evidence-based orthopaedic surgery. J Bone Joint Surg Am. 2018;100(12):e85. doi:10.2106/JBJS.17.01039

Muthu

Ramakrishnan

. Fragility analysis of statistically significant outcomes of randomized control trials in spine surgery: a systematic review. Spine. 2021;46(3):198-208. doi:10.1097/BRS.0000000000003645

The

Moher

Liberati

Tetzlaff

Altman

. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. 2009;6(7):e1000097. doi:10.1371/journal.pmed.1000097

10.

Kim

. Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test. Restor Dent Endod. 2017;42(2):152-155. doi:10.5395/rde.2017.42.2.152

11.

Herndon

McCormick

Gazgalis

Bixby

Levitsky

Neuwirth

. Fragility index as a measure of randomized clinical trial quality in adult reconstruction: a systematic review. Arthroplast Today. 2021;11:239-251. doi:10.1016/j.artd.2021.08.018

12.

Volovici

Vogels

Dammers

Meling

. Neurosurgical evidence and randomized trials: the fragility index. World Neurosurg. 2022;161:224-229.e14. doi:10.1016/j.wneu.2021.12.096

13.

Khan

Evaniew

Gichuru

, et al. The fragility of statistically significant findings from randomized trials in sports surgery: a systematic survey. Am J Sports Med. 2017;45(9):2164-2170. doi:10.1177/0363546516674469

14.

McCormick

Tedesco

Swindell

Forrester

Jobin

Levine

. Statistical fragility of randomized clinical trials in shoulder arthroplasty. J Shoulder Elb Surg. 2021;30(8):1787-1793. doi:10.1016/j.jse.2020.10.028

15.

Ruzbarsky

Khormaee

Rauck

Warren

. Fragility of randomized clinical trials of treatment of clavicular fractures. J Shoulder Elb Surg. 2019;28(3):415-422. doi:10.1016/j.jse.2018.11.039

16.

Ruzbarsky

Khormaee

Daluiski

. The fragility index in hand surgery randomized controlled trials. J Hand Surg. 2019;44(8):698.e1-698.e7. doi:10.1016/j.jhsa.2018.10.005

17.

Parisien

Trofa

Dashe

, et al. Statistical fragility and the role of P values in the sports medicine literature. J Am Acad Orthop Surg. 2019;27(7):e324-e329. doi:10.5435/JAAOS-D-17-00636

18.

Heller

Sasso

Papadopoulos

, et al. Comparison of BRYAN cervical disc arthroplasty with anterior cervical decompression and fusion: clinical and radiographic results of a randomized, controlled, clinical trial. Spine. 2009;34(2):101-107. doi:10.1097/BRS.0b013e31818ee263

19.

Coric

Nunley

Guyer

, et al. Prospective, randomized, multicenter study of cervical arthroplasty: 269 patients from the Kineflex|C artificial disc investigational device exemption study with a minimum 2-year follow-up: clinical article. J Neurosurg Spine. 2011;15(4):348-358. doi:10.3171/2011.5.SPINE10769

20.

Cheng

Nie

Huo

Pan

. Superiority of the bryan(®) disc prosthesis for cervical myelopathy: a randomized study with 3-year followup. Clin Orthop. 2011;469(12):3408-3414. doi:10.1007/s11999-011-2039-z

21.

Sundseth

Fredriksli

Kolstad

, et al. The Norwegian cervical arthroplasty trial (NORCAT): 2-year clinical outcome after single-level cervical arthroplasty versus fusion-a prospective, single-blinded, randomized, controlled multicenter study. Eur Spine J : Official Publication of the European Spine Society, the European Spinal Deformity Society, and the European Section of the Cervical Spine Research Society. 2017;26(4):1225-1235. doi:10.1007/s00586-016-4922-5

22.

Coric

Guyer

Nunley

, et al. Prospective, randomized multicenter study of cervical arthroplasty versus anterior cervical discectomy and fusion: 5-year results with a metal-on-metal artificial disc. J Neurosurg Spine. 2018;28(3):252-261. doi:10.3171/2017.5.SPINE16824

23.

Dettori

. Loss to follow-up. Evid Base Spine Care J. 2011;2(1):7-10. doi:10.1055/s-0030-1267080

24.

Verma

Gandhi

Maltenfort

, et al. Rate of adjacent segment disease in cervical disc arthroplasty versus single-level fusion: meta-analysis of prospective studies. Spine. 2013;38(26):2253-2257. doi:10.1097/BRS.0000000000000052

25.

Phillips

Geisler

Gilder

Reah

Howell

McAfee

. Long-term outcomes of the US FDA IDE prospective, randomized controlled clinical trial comparing PCM cervical disc arthroplasty with anterior cervical discectomy and fusion. Spine. 2015;40(10):674-683. doi:10.1097/BRS.0000000000000869

Utilization of the Fragility Index to Assess Randomized Controlled Trials Comparing Cervical Total Disc Arthroplasty to Anterior Cervical Discectomy and Fusion

Abstract

Study design

Objectives

Methods

Results

Conclusion

Keywords

Introduction

Methods

Study Selection

Data Extraction

Fragility Index Calculation and Statistical Analysis

Results

Discussion

Study Characteristics

Fragility Index: Comparison to Orthopaedic Literature

Fragility Index: Effect Size and Power

Bias

Clinical Ramifications

Limitations

Conclusion

Footnotes

Authors Note

Declaration of Conflicting Interests

Funding

ORCID iDs

References