Sage Journals: Discover world-class research

Abstract

Study Design

Prospective Observational Propensity Score.

Objectives

Randomization may lead to bias when the treatment is unblinded and there is a strong patient preference for treatment arms (such as in spinal device trials). This report describes the rationale and methods utilized to develop a propensity score (PS) model for an investigational device exemption (IDE) trial (NCT03115983) to evaluate decompression and stabilization with an investigational dynamic sagittal tether (DST) vs decompression and Transforaminal Lumbar Interbody Fusion (TLIF) for patients with symptomatic grade I lumbar degenerative spondylolisthesis with spinal stenosis.

Methods

Twenty-five baseline covariates were selected for their expected relationship to patient outcomes or enrollment bias. Subclassification by PS quintiles was used to design a sample of investigational DST patients and TLIF controls with excellent covariate balance in which to estimate causal treatment effects. Additionally, balance in PS covariates was compared to available matching covariates from seven randomized spine IDE trials.

Results

The PS subclassification design resulted in excellent balance across baseline covariates, as evidenced by small standardized mean differences and no significant between group differences after accounting for the PS design (all P ≥ 0.768). Differences in SMDs among covariates of randomized spine IDE trials were not significant (P = 0.396).

Conclusion

The PS subclassification design achieved excellent covariate balance between DST investigational and TLIF control participants. This PS designed sample shows covariate balance similar to that observed in published studies in which patients were randomized to investigational or control arms.

Clinical trial registered with https://www.clinicaltrials.gov (NCT03115983).

Keywords

spine lumbar fusion spondylolisthesis spinal stenosis sagittal tether decompression investigational device propensity score

Introduction

The randomized controlled trial (RCT) is considered the gold standard for the evaluation of investigational medical therapies. Randomly allocating participants to experimental and control groups is believed to balance confounding from known and unknown predictors of the outcome to produce an unbiased estimate of the treatment effect.¹ However, a randomized study design may not sufficiently address confounding from risks such as selection and preference bias when blinding is impractical and participants experience a strong aversion to their randomly assigned treatment (i.e., “resentful demoralization”).²

Investigational trials of surgical spine procedures are uniquely challenging to design and to conduct, which is reflected in a paucity of surgical spine RCTs reported in the literature.^3-6 A recent review of the Web of Science Core Collection Database by Muthu et al,⁴ found just 263 spinal surgery RCTs from the United States published over 30 years from 1990 to 2019, or an average of 9 RCTs per year. Although Muthu et al⁴ highlight that spinal surgery RCTs had increased over the last decade of the study period, they, and others^3,7,8 describe effective blinding and strong participant treatment preference as ongoing challenges.

The addition of an investigational device further increases the complexity of spinal surgery trial designs. Surgeons and clinical care teams are aware of the device implanted, and placebo-controlled procedures raise ethical concerns given the risks of surgery and desire to treat a patient’s pain and dysfunction in a timely manner.^7,9 Patients may have strong preferences for a treatment arm and thus be more challenging to recruit to a randomized trial, posing a risk for selection bias and limited generalizability of results.^3,10,11 Additionally, outcomes for spinal device studies often include patient reported outcome measures (PROMs) such as the Oswestry Disability Index.^12,13 In the presence of resentful demoralization, biased responses to PROMs are another potential threat to internal validity.^2,14 Similarly, purported objective outcomes such as reoperations are ultimately influenced by patient reports of pain and dysfunction, and thus may be biased by negative preoperative impressions. When a RCT is impractical, a rigorously designed prospective observational study utilizing propensity score (PS) balancing¹⁵ may be considered. In recent years, PS modeling has become more common in spinal investigational device exemption (IDE) trials for approval by the US Food and Drug Administration (FDA).^16-18

As an alternative to randomization, well-designed PS matching studies can permit causal interpretations from comparisons of non-randomized device and control groups.¹⁹ Briefly, the propensity score (PS) is the probability of receiving one treatment vs the other conditional on a set of evidenced-based baseline covariates known or suspected to be associated with the clinical outcome of interest (e.g. demographics and disease state).¹⁵ Confounding is reduced by matching treated and control subjects based on their PS, using one of several approaches.²⁰ This matching often results in good balance in baseline covariates between treatment groups, similar to the results of randomization.^21,22 As such, comparisons between treatment groups within the context of the PS design have valid causal interpretations.

The purpose of this report is twofold. Firstly, we describe the approach to leveraging a PS designed comparison to address potential selection bias in a non-randomized IDE trial (NCT03115983²³) to evaluate decompression and stabilization with an investigational dynamic sagittal tether (DST) vs decompression with Transforaminal Lumbar Interbody Fusion (TLIF), a common treatment for symptomatic grade I degenerative lumbar spondylolisthesis with spinal stenosis. Secondly, we compare the covariate balance in the PS designed sample to that observed in randomized IDE trials of lumbar spinal devices.

Methods

The study was approved by the institutional review board (IRB) for the 27 participating sites per each site’s institutional policies: a centralized IRB (WCG IRB) for 10 sites, local IRB at 13 sites, and a hybrid approval by WCG IRB and a local IRB at four sites. Participation in the study was voluntary and written informed consent was obtained. A total of 280 patients were enrolled from 15 control (n = 140) and 12 experimental (n = 140) clinical sites within the United States between July 2017 and September 2020. Patients with lumbar degenerative spondylolisthesis (Grade I per Meyerding classification²⁴) at one level from L1 to S1, requiring stabilization at the level of spondylolisthesis and decompression at up to two levels, were eligible. Complete inclusion criteria can be found at ClinicalTrials.Gov.²³ Twenty-five baseline covariates encompassing demographics, medical history, disease state, and radiographic parameters were utilized to develop the PS design (see Table 1). Covariates were selected based on their expected relationship to patient outcomes, an approach shown to increase precision of exposure effect on outcome while limiting bias.²⁵ In this case, covariates were selected in collaboration with the FDA to include preoperative demographic information, disease state characteristics including pain and disability related to the spinal condition, comorbidities, surgical history, and radiographic parameters associated with treatment outcomes for symptomatic DS with LSS.^26-28 As the study involves a novel investigational procedure, covariates were not strictly required to have a previously demonstrated association with clinical outcomes, but only a reasonable expectation of a possible relationship and acknowledgment of possible redundancy.

Table 1.

Baseline Covariates Included in the PS Model of the Observational Study.

Demographics

Age

BMI

Height

Weight

Sex

Race

Medical history

Smoking history

Charleson comorbidity index

Work status

Osteoporosis screening tool (OST) score

Diabetes

Narcotics use

Prior lumbar surgery

Symptom duration

Spinal disease state

Oswestry disability index (ODI)

Visual analog scale (VAS) back pain VAS leg/hip pain (worst side)

Neurologic status

Treated spinal level

1- vs 2-levels spinal decompression

Radiographic parameters

Angular motion

Translational motion

Disc angle

Spondylolisthesis

Disc height

Subclassification based on PS quintiles²⁹ was used to create a sample of DST investigational participants and TLIF controls well-balanced with respect to the selected covariates within each of the 5 subclasses. The design was implemented using BSC-Design-PS™ which follows a published heuristic from Maislin and Rubin,³⁰ keeping all investigational participants and trimming (excluding) of control patients with PS values least like those in the investigational group. Within each subclass, subjects in either treatment group have similar likelihoods of receiving the study treatment as a function of baseline covariates. Therefore, analyses proceed as if there was stratified randomization. Group comparisons are made within each of these balanced subclasses and then statistically combined to determine a valid estimate of the average treatment effect on the treated (ATT).²⁰ The effectiveness of the PS design in improving covariate balance is illustrated in a “Love Plot”, as described by Ahmed et al,³¹ as visualized through BSC-Visualize-PS™.

Additionally, the covariate balance achieved through the PS design was compared to the covariate balance reported in seven randomized spine IDE trials.^32-38 Covariate data from the randomized IDE trials were obtained from the published summary of safety and effectiveness data (SSED) for each device. The absolute value of the standardized mean difference (|SMD|) was used to quantify the between-group covariate differences of the observational PS design and the seven comparable randomized spinal IDE trials. Comparison of the |SMD| of covariates among the RCTs and PS-balanced trial was performed with 1-way ANOVA as well as Student’s t test.

Results

The PS subclassification heuristic resulted in a final PS design including all eligible 140 DST participants and 123 of 140 (88%) of eligible TLIF control patients. As illustrated in Figure 1 and detailed in Table 2, excellent balance in included baseline covariates was achieved through the application of the PS design. After controlling for PS subclass there were no significant covariate differences between investigational and control subjects (all P ≥ 0.768) and pooled |SMD| values across subclasses were close to zero (all |SMD| ≤0.127), representing near optimal balance as defined by Austin et al.²¹ The balance in baseline covariates achieved in the PS observational design and the covariate balance achieved in seven lumbar spine IDE device trials is presented in Table 3 and summarized in Figure 2. The |SMD| across covariates in the PS designed sample were significantly lower than the average across recent RCTs (P < 0.01), with the PS design showing similar or smaller |SMD| than each of the individual RCTs. The |SMD| did not differ among the 7 RCTs alone (P = 0.396).

Figure 1.

Love Plot of pooled standardized mean differences illustrating the improved balance in covariates achieved through the PS design.

Table 2.

Comparison of Investigational Dynamic Sagittal Tether (DST) and Control Transforaminal Lumbar Interbody Fusion (TLIF) Subjects Included in the Final PS Design.

Variable	DST Group	TLIF Group	SMD^a	P ^†
N	140	123	–	–
Continuous covariates, mean ± SD
Age, years	65.8 ± 7.7	64.6 ± 8.9	0.039	0.891
BMI, kg/m²	28.1 ± 4.7	29.2 ± 5.2	0.038	0.815
Height, cm	169.1 ± 10.2	165.9 ± 9.7	0.047	0.957
Weight, kg^b	80.7 ± 17.1	80.6 ± 17.2	0.058	0.882
ODI	52.6 ± 11.9	51.6 ± 13.6	0.014	0.909
VAS back pain	67.4 ± 23.9	68.3 ± 22.9	−0.007	0.998
VAS worst leg/hip pain	79.5 ± 12.3	79.0 ± 15.0	−0.071	0.982
OST score	2.99 ± 3.97	3.21 ± 4.12	0.035	0.856
Angular motion, °	5.73 ± 4.53	5.58 ± 3.77	0.051	0.775
Translational motion, mm	1.29 ± 1.07	1.22 ± 0.94	0.040	0.844
Disc angle, °	7.96 ± 4.64	8.19 ± 4.76	−0.027	0.962
Spondylolisthesis, %	−14.5 ± 8.5	−14.4 ± 8.4	0.010	0.941
Average disc height, mm	6.87 ± 1.91	7.11 ± 1.99	0.024	0.964
Categorical covariates, %
White race	86.4%	91.9%	−0.001	0.794
Smoking status
Current	2.9%	3.3%	0.032	0.961
Former	28.6%	36.6%	0.089	0.850
Never	68.6%	60.2%	−0.093	0.840
CCI
0	72.9%	68.3%	−0.091	0.806
1	16.4%	19.5%	0.061	0.907
≥2	10.7%	12.2%	0.079	0.835
Diabetes	15.7%	16.3%	0.077	0.896
Work status
Working	50.0%	40.7%	0.127	0.869
Not working (back pain)	7.1%	9.8%	0.028	0.969
Not working (no back pain)	42.9%	49.6%	−0.117	0.851
Narcotics use	22.9%	33.3%	−0.069	0.925
Sensory abnormality	21.4%	26.0%	−0.007	0.989
Motor strength deficit	3.6%	5.7%	0.062	0.993
Prior lumbar surgery	0.7%	2.4%	−0.018	0.957
Symptom duration ≥12 mos	78.6%	79.7%	−0.033	0.768
Treated level L4/L5	85.0%	87.0%	0.030	0.971
Two-level decompression	19.3%	19.5%	0.092	0.891
Male	42.1%	30.1%	0.104	0.960

^†P-value comparing DST and TLIF subjects, adjusted for PS subclass.

^aMean of within PS subclass standardized effect sizes (i.e., pooled SMD);

^bWeight (kg) not included in PS modeling, given inclusion of BMI (kg/m2) and height (cm).

Table 3.

Absolute Value of Standardized Mean Differences (|SMD|) of Baseline Covariates Between Investigational and Control Groups of the DST IDE Study and Seven Randomized Lumbar Spine IDE Studies.

Baseline Covariate	Observational	Coflex³³	TOPS³⁰	Superion³¹	ActivL³⁴	Prodisc-L³⁴	Charite³⁵	Barricaid²⁹
	DST IDE	Coflex³³	TOPS³⁰	Superion³¹	ActivL³⁴	Prodisc-L³⁴	Charite³⁵	Barricaid²⁹
	NCT03115983	P110008	P220002	P140004	P120024	P050010	P040006	P160050
Age, years	0.039	0.220	0.072	0.071	0.150	0.077	0.000	0.103
BMI, kg/m²	0.038	0.021	0.059	0.043	0.118	0.165	0.222	0.000
Height, cm	0.047	0.098	0.000	0.175	---	---	---	0.032
Weight, kg	0.058	0.071	0.036	0.166	---	---	---	0.007
ODI	0.014	0.009	0.040	0.064	0.107	0.011	---	0.061
VAS back pain	0.007	0.021	0.053	0.011	0.007	0.123	---	0.029
VAS worst leg/hip pain	0.071	0.118	0.213	---	0.141	---	---	0.000
OST score	0.035	---	---	---	---	---	---	---
Angular motion, °	0.051	0.111	0.180	0.055	0.000	---	---	---
Translational motion, mm	0.040	0.000	---	0.056	0.153	---	---	---
Disc angle, °	0.027	---	---	0.061	---	---	---	---
Spondylolisthesis, %	0.010	---	---	---	---	---	---	---
Average disc height, mm	0.024	---	---	---	---	---	---	---
White race, %	0.001	0.058	0.008	0.205	0.247	0.143	0.134	0.099
Current smoker	0.032	0.117	0.051	0.021	0.007	0.208	---	0.006
Former smoker	0.089	---	0.061	0.055	0.062	0.035	---	0.033
Never smoker	0.093	---	0.075	0.068	0.043	0.148	---	---
CCI (no age pts) = 0	0.091	---	---	---	---	---	---	---
CCI (no age pts) = 1	0.061	---	---	---	---	---	---	---
CCI (no age pts) ≥2	0.079	---	---	---	---	---	---	---
Diabetes, %	0.077	---	---	---	---	---	---	---
Working	0.127	---	---	---	---	---	---	---
Not working, due to back pain, %	0.028	---	---	---	---	---	---	---
Not working, not due to back pain, %	0.117	---	---	---	---	---	---	---
Narcotics use, %	0.069	---	---	---	0.070	---	---	---
Sensory abnormality, %	0.007	---	---	---	0.043	---	---	---
Motor strength deficit, %	0.062	---	---	---	0.146	---	---	---
Prior lumbar surgery, %	0.018	---	0.025	---	0.100	0.111	---	---
Symptom duration ≥12 months, %	0.033	0.188	---	---	0.035	---	---	---
Treated level L4/L5, %	0.030	---	0.105	0.066	0.081	0.011	0.043	0.190
Two-level decompression, %	0.092	0.012	---	0.054	---	---	---	---
Male, %	0.104	0.098	0.072	0.129	0.064	0.098	0.221	0.084
Summary Statistics
N covariates	32	14	15	16	18	11	5	12
Mean	0.052	0.082	0.070	0.081	0.087	0.103	0.124	0.054
SD	0.034	0.067	0.058	0.056	0.064	0.064	0.101	0.057

Figure 2.

Mean and 95% CI of |SMD| of available baseline covariates between investigational and control groups from the propensity score-balanced observational DST IDE study and seven randomized lumbar spine IDE device studies.

Discussion

Applying a propensity score subclassification design in this prospective, observational cohort achieved excellent covariate balance between patients receiving DST and TLIF, permitting causal interpretations of future outcomes comparisons between the non-randomized groups. When compared to similar IDE randomized trials, the PS-designed sample shows similar or better balance based on the average |SMD| values among the selected baseline covariates.

While the RCT is often considered the gold-standard study design, there is a dearth of RCTs in spinal surgery research literature likely due to practical challenges to blinding, ethical placebo conditions, and participant treatment preference.^4,7 Spinal device studies, in particular, may not benefit from a RCT design given the inability to effectively double-blind and the potential for strong patient preference for the treatment condition leading to selection bias or poor recruitment.³ When a RCT is not practical, an observational trial design incorporating propensity scores presents an alternative approach to creating the covariate balance between control and experimental groups required for causal inference.

The primary weakness of a PS design is the potential for unmeasured confounding.²⁵ While a PS design creates excellent balance between non-randomized groups on measured covariates (e.g., those included in the PS model), as well as balance in unmeasured covariates to the extent that they are correlated with these measured covariates, there remains the potential for remaining imbalance in covariates not included in the model. Randomization, on the other hand, is expected to achieve balance among both measured and unmeasured covariates across treatment and control groups.¹ To minimize the potential for such unmeasured confounding, following evidence-based practice,²⁵ the PS design described here included a rich set of covariates that are expected to be associated with study outcomes. These covariates were identified a priori through a collaborative process of statistical and clinical evaluation of current literature with subject matter experts and FDA. The chosen covariates captured a wide-array of information, including subject demographics and lifestyle factors, clinical and radiographical characteristics, and surgery-specific characteristics. As shown by Brookhart et al,²⁵ including covariates associated with study outcome can improve precision without increasing bias, whereas including covariates related to exposure but unrelated to outcome will worsen precision without decreasing bias. To further understand the robustness to unmeasured confounding, metrics such as the E-value³⁹ could be reported as part of future outcomes analyses. As defined in the seminal paper from Vanderweele and Ding,³⁹ the E-value is defined as “the minimum strength of association […] that an unmeasured confounder would need to have with both the treatment and outcome to fully explain away a specific treatment-outcome association, conditional on the measured covariates”. Thus, the larger the E-value, the more robust the results of the PS design are to possible unmeasured confounding.

We acknowledge that alternative study designs exist that attempt to mitigate the risk for bias from participant treatment preference in unblinded RCTs. An example is the two-stage randomized preference trial, where one cohort is assigned to their preferred treatment while another cohort is randomly allocated to treatment arms. Through this design, the effect of treatment preference on the relationship between exposure and outcome can be estimated.⁴⁰ An important drawback to the two-stage trial design is the larger sample size required, suggested to be double a traditional RCT.⁴¹ Additionally, there may be limited patients available who are willing to be randomized, delaying recruitment of the RCT cohort. Our trial design incorporated open label allocation, separate investigational and control sites, plus a PS design to efficiently mitigate the effects of patient preference or other selection bias on the relationship between exposure and outcome. Participants from control sites were never offered the investigational treatment; therefore, there was no other study treatment available to prefer. Those at investigational sites chose the IDE arm, reducing the risk of resentful demoralization affecting responses to subjective outcomes or trial retention. Through the PS design, we were able to achieve balance across numerous baseline covariates between the control and investigational groups, limiting the risk of selection bias from inherent differences in patients who preferred the investigational treatment and those who received the control.

Finally, we note that while the PS design was shown to have similar or better covariate balance than that achieved across 7 RCT studies available for comparison, the publicly available covariates were not the same across all studies as these data were obtained from the Summary of Safety and Effectiveness Data documents published by FDA. There may have been unreported or unmeasured covariates from the RCT studies that demonstrated significantly better balance than those of our study (e.g., smaller |SMD|). However, the very small |SMD| achieved across numerous relevant covariates through the PS design makes it unlikely that existence of these covariates would result in meaningfully better balance within the RCTs.

In conclusion, the PS design is efficient and allows for causal interpretations from treatment group comparisons, while potentially avoiding patient preference bias and “resentful demoralization” sometimes seen in control groups of randomized trials. Propensity score designs should be considered a rigorous study design alternative when randomization is impractical, unethical, or may introduce bias.

Footnotes

Author Contributions

Greg Maislin: Study design, analysis, writing and editing. Brendan Keenan: Study design, analysis, writing and editing. Todd F. Alamin: Study design and editing. Louie Fielding: Study design, analysis, writing and editing. Ashley Scherman: Writing and editing. Robert Hachadoorian: Study design, analysis, editing. Clifford Pierre: Editing. Rick C Sasso: Study Design and Editing. William F Lavelle: Study Design and Editing. Jens Chapman: Editing.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Author Jens Chapman is the editor-in-chief of the Global Spine Journal. Author Todd Alamin received consulting fees, and grant and research support from Empirical Spine, Inc. Todd Alamin also holds stock in Empirical Spine, Inc. Author Louie Fielding is employed by Empirical Spine, Inc. Author Ashley Scherman received consulting fees from Empirical Spine, Inc for medical writing. Authors Rick Sasso, William Lavelle, and Jens Chapman received grant and research support from Empirical Spine, Inc. Biomedical Statistical Consulting® LLC provided paid biostatistical consulting services for this project and authors Greg Maislin, Brendan T Keenan, and Robert Hachadoorian received salary from BSC® for this work. Author Clifford Pierre has no conflicts to disclose. Authors did not receive compensation for authorship.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in-full by Empirical Spine, Inc.

Ethical Statement

ORCID iDs

Louis C. Fielding

Ashley Scherman

Clifford Pierre

William F. Lavell

References

Bothwell

Podolsky

. The emergence of the randomized, controlled trial. N Engl J Med. 2016;375(6):501-504. doi:10.1056/NEJMp1604635

Preference Collaborative Review Group . Patients’ preferences within randomised trials: systematic review and patient level meta-analysis. BMJ 2008;337:a1864. doi:10.1136/bmj.a1864

Hanson

Kopjar

. Clinical studies in spinal surgery. Eur Spine J. 2005;14(8):721-725. doi:10.1007/s00586-005-0926-2

Muthu

Jeyaraman

. Evolution of evidence in spinal surgery - past, present and future scientometric analysis of randomized controlled trials in spinal surgery. World J Orthoped. 2022;13(9):853-869. doi:10.5312/wjo.v13.i9.853

Kirchner

Kim

Smith

, et al. Few randomized controlled trials in spine surgery in the United States include sociodemographic patient data: a systematic review. J Am Acad Orthop Surg. 2023;31(8):421-427. doi:10.5435/jaaos-d-22-00838

Robinson

Fremes

Hameed

, et al. Characteristics of randomized clinical trials in surgery from 2008 to 2020: a systematic review. JAMA Netw Open. 2021;4(6):e2114494. doi:10.1001/jamanetworkopen.2021.14494

Schoenfeld

. Randomized controlled trials and high-intensity spine surgery. Spine J. 2020;20(10):1725-1727. doi:10.1016/j.spinee.2020.04.004

Mobbs

van Gelder

, et al. Challenges of conducting a randomised placebo-controlled trial of spinal surgery: the success trial of lumbar spine decompression. Trials. 2023;24(1):794. doi:10.1186/s13063-023-07772-5

Wartolowska

Collins

Hopewell

, et al. Feasibility of surgical randomised controlled trials with a placebo arm: a systematic review. BMJ Open. 2016;6(3):e010194. doi:10.1136/bmjopen-2015-010194

10.

Hróbjartsson

Emanuelsson

Skou Thomsen

Hilden

Brorson

. Bias due to lack of patient blinding in clinical trials. A systematic review of trials randomizing patients to blind and nonblind sub-studies. Int J Epidemiol. 2014;43(4):1272-1283. doi:10.1093/ije/dyu115

11.

King

Nazareth

Lampe

, et al. Impact of participant and physician intervention preferences on randomized trials: a systematic review. JAMA. 2005;293(9):1089-1099. doi:10.1001/jama.293.9.1089

12.

Fairbank

Couper

Davies

O’Brien

. The Oswestry low back pain disability questionnaire. Physiotherapy. 1980;66(8):271-273.

13.

Jamjoom

Gahtani

Alzahrani

Albeshri

Sharab

. Review of the most cited patient-reported outcome measure (PROM) studies published in the neurospine surgical literature. Cureus. 2023;15(8):e44262. doi:10.7759/cureus.44262

14.

Lanier

Lohse

Hooker

Francois

van Dillen

. Treatment preference changes after exposure to treatment in adults with chronic low back pain. PM&R. 2023;15(7):817-827. doi:10.1002/pmrj.12897

15.

Rosenbaum

Rubin

. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41-55. doi:10.1093/biomet/70.1.41

16.

Phillips

Coric

Sasso

, et al. Prospective, multicenter clinical trial comparing M6-C compressible six degrees of freedom cervical disc with anterior cervical discectomy and fusion for the treatment of single-level degenerative cervical radiculopathy: 2-year results of an FDA investigational device exemption study. Spine J. 2021;21(2):239-252. doi:10.1016/j.spinee.2020.10.014

17.

Coric

Guyer

Bae

, et al. Prospective, multicenter study of 2-level cervical arthroplasty with a PEEK-on-ceramic artificial disc. J Neurosurg Spine. 2022;37(3):357-367. doi:10.3171/2022.1.SPINE211264

18.

Guyer

Coric

Nunley

, et al. Single-level cervical disc replacement using a PEEK-on-ceramic implant: results of a multicenter FDA IDE trial with 24-month follow-up. Internet J Spine Surg. 2021;15(4):633-644. doi:10.14444/8084

19.

Rubin

. For objective causal inference, design trumps analysis. Ann Appl Stat 2008;2(3):808-840. doi:10.1214/08-AOAS187

20.

Stuart

. Matching methods for causal inference: a review and a look forward. Stat Sci. 2010;25(1):1-21. doi:10.1214/09-STS313

21.

Austin

. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28(25):3083-3107. doi:10.1002/sim.3697

22.

Kane

Fang

Galetta

, et al. Propensity score matching: a statistical method. Clin Spine Surg. 2020;33(3):120-122. doi:10.1097/BSD.0000000000000932

23.

Empirical Spine, Inc . LimiFlex clinical trial for the treatment of degenerative spondylolisthesis with spinal stenosis. ClinicalTrials.gov identifier: NCT03115983. Updated February 20, 2024. https://clinicaltrials.gov/study/NCT03115983?term=limiflex&rank=1. Accessed May 30, 2024.

24.

Koslosky

Gendelberg

. Classification in brief: the meyerding classification system of spondylolisthesis. Clin Orthop Relat Res. 2020;478(5):1125-1130. doi:10.1097/CORR.0000000000001153

25.

Brookhart

Schneeweiss

Rothman

Glynn

Avorn

Stürmer

. Variable selection for propensity score models. Am J Epidemiol. 2006;163(12):1149-1156. doi:10.1093/aje/kwj149

26.

Anwar

Roca

Hartman

, et al. Worse pain and disability at presentation predicts greater improvement in pain, disability, and mental health in patients undergoing minimally invasive transforaminal lumbar interbody fusion for degenerative spondylolisthesis. Clin Spine Surg 2024. doi:10.1097/BSD.0000000000001650

27.

Blumenthal

Curran

Benzel

, et al. Radiographic predictors of delayed instability following decompression without fusion for degenerative grade I lumbar spondylolisthesis. J Neurosurg Spine. 2013;18(4):340-346. doi:10.3171/2013.1.SPINE12537

28.

Chan

Bisson

Bydon

, et al. Women fare best following surgery for degenerative lumbar spondylolisthesis: a comparison of the most and least satisfied patients utilizing data from the quality outcomes database. Neurosurg Focus. 2018;44(1):E3. doi:10.3171/2017.10.FOCUS17553

29.

Rosenbaum

Rubin

. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79(387):516-524. doi:10.1080/01621459.1984.10478078

30.

Maislin

Rubin

. Design of non-randomized medical device trials based on sub-classification using propensity score quintiles. In: Proceedings of the joint statistical meetings, Vancouver, British Columbia, 31 July - 5 August 2010, 2182-2196.

31.

Ahmed

Husain

Love

, et al. Heart failure, chronic diuretic use, and increase in mortality and hospitalization: an observational study using propensity score methods. Eur Heart J. 2006;27(12):1431-1439. doi:10.1093/eurheartj/ehi890

32.

U.S. Food and Drug Administration . PMA P160050: FDA summary of safety and effectiveness data. 2019. https://efaidnbmnnnibpcajpcglclefindmkaj. https://www.accessdata.fda.gov/cdrh_docs/pdf16/P160050B.pdf. Accessed 15 January 2024.

33.

U.S. Food and Drug Administration . PMA P220002: FDA summary of safety and effectiveness data. 2023. https://efaidnbmnnnibpcajpcglclefindmkaj. https://www.accessdata.fda.gov/cdrh_docs/pdf22/P220002B.pdf. Accessed 15 January 2024.

34.

U.S. Food and Drug Administration . P P140004: FDA summary of safety and effectiveness data. 2015. https://efaidnbmnnnibpcajpcglclefindmkaj. https://www.accessdata.fda.gov/cdrh_docs/pdf14/P140004B.pdf. Accessed 15 January 2024.

35.

U.S. Food and Drug Administration . PMA P120024: FDA summary of safety and effectiveness data. 2015. https://www.accessdata.fda.gov/cdrh_docs/pdf12/P120024b.pdf. Accessed 15 January 2024.

36.

U.S. Food and Drug Administration . PMA P110008: FDA summary of safety and effectiveness data. 2012. https://efaidnbmnnnibpcajpcglclefindmkaj. https://www.accessdata.fda.gov/cdrh_docs/pdf11/p110008b.pdf. Accessed 15 January 2024.

37.

U.S. Food and Drug Administration . PMA P050010/S020: FDA summary of safety and effectiveness data. 2020. https://efaidnbmnnnibpcajpcglclefindmkaj. https://www.accessdata.fda.gov/cdrh_docs/pdf5/P050010S020B.pdf. Accessed January 15, 2024.

38.

U.S. Food and Drug Administration . P040006: FDA summary of safety and effectiveness data. 2004. https://efaidnbmnnnibpcajpcglclefindmkaj. https://www.accessdata.fda.gov/cdrh_docs/pdf4/p040006b.pdf. Accessed 15 January 2024.

39.

VanderWeele

Ding

. Sensitivity analysis in observational research: introducing the e-value. Ann Intern Med. 2017;167(4):268-274. doi:10.7326/M16-2607

40.

Rücker

. A two‐stage trial design for testing treatment, self‐selection and treatment preference effects. Stat Med. 1989;8(4):477-485. doi:10.1002/sim.4780080411

41.

Turner

Walter

Macaskill

McCaffery

Irwig

. Sample size and power when designing a randomized trial for the estimation of treatment, selection, and preference effects. Med Decis Making. 2014;34(6):711-719. doi:10.1177/0272989X14525264

Are Randomized Trials Better? Comparison of Baseline Covariate Balance of a Propensity Score-Balanced Lumbar Spine IDE Trial and Comparable RCTs

Abstract

Study Design

Objectives

Methods

Results

Conclusion

Keywords

Introduction

Methods

Results

Discussion

Footnotes

Author Contributions

Declaration of Conflicting Interests

Funding

Ethical Statement

ORCID iDs

References