Abstract
Study Design
Systematic review.
Clinical Questions
(1) Has the proportion and number of randomized controlled trials (RCTs) as an indicator of quality of evidence regarding lumbar fusion increased over the past 10 years? (2) Is there a difference in the proportion of RCTs among the four primary fusion diagnoses (degenerative disk disease, spondylolisthesis, deformity, and adjacent segment disease) over the past 10 years? (3) Is there a difference in the type and quality of clinical outcomes measures reported among RCTs over time? (4) Is there a difference in the type and quality of adverse events measures reported among RCTs over time? (5) Are there changes in fusion surgical approach and techniques over time by diagnosis over the past 10 years?
Methods
Electronic databases and reference lists of key articles were searched from January 1, 2004, through December 31, 2013, to identify lumbar fusion RCTs. Fusion studies designed specifically to evaluate recombinant human bone morphogenetic protein-2 or other bone substitutes, revision surgery studies, nonrandomized comparison studies, case reports, case series, and cost-effectiveness studies were excluded.
Results
Forty-two RCTs between January 1, 2004, and December 31, 2013, met the inclusion criteria and form the basis for this report. There were 35 RCTs identified evaluating patients diagnosed with degenerative disk disease, 4 RCTs evaluating patients diagnosed with degenerative spondylolisthesis, and 3 RCTs evaluating patients with a combination of degenerative disk disease and degenerative spondylolisthesis. No RCTs were identified evaluating patients with deformity or adjacent segment disease.
Conclusions
This structured review demonstrates that there has been an increase in the available clinical database of RCTs using patient-reported outcomes evaluating the benefit of lumbar spinal fusion for the diagnoses of degenerative disk disease and degenerative spondylolisthesis. Gaps remain in the standardization of reportage of adverse events in such trials, as well as uniformity of surgical approaches used. Finally, continued efforts to develop higher-quality data for other surgical indications for lumbar fusion, most notably in the presence of adult spinal deformity and revision of prior surgical fusions, appear warranted.
Study Rationale and Context
Evidence-based medicine (EBM) emphasizes the prioritization of information from well-designed trials in health care decision making. This term now describes the use of the best clinical evidence as the basis for guidelines for the medical and surgical management of problems on a population level. Well-designed randomized controlled trials (RCTs) are considered the highest-level quality of evidence (level 1) regarding a treatment method. As such, clinicians and payers typically refer to them as justification for performance and coverage of specific treatments.
Lumbar fusion surgery is performed for a variety of spinal pathologies. In addition, lumbar fusion can be achieved via a variety of approaches, including isolated posterior fusion, as well as interbody fusion from posterior, lateral, or anterior approaches. 47 More recently, minimally invasive methods of fusion utilizing all of these approaches have also been devised. 7 , 31 Despite these improvements in surgical technique, some indications for lumbar fusion surgery, such as in the treatment of axial back pain from degenerative disk disease (DDD), remain controversial. 14 , 16 Other conditions such as instability, tumor, trauma, or spinal deformity are considered better-proven indications, although there remains significant variability of fusion utilization and technique performed nationally and internationally. 1 , 14
Given a relative lack of RCT-quality data, other analyses of billing databases have questioned the indication and benefit of lumbar fusion. However, in many cases these evaluations fail to define the surgical indication and often resort to a relatively nonspecific diagnosis such as “back pain,” which leads to increased confusion for health care economists and hospital administrators, many of whom may lack a clinical understanding of surgical diagnoses. 15 Although many surgical patients’ complaints may include back pain, a large number are not undergoing surgical fusion exclusively for that symptom but instead are due to associated features such as spinal instability, deformity, or neurologic compression. Thus, large database analyses are not an adequate substitute for higher-quality RCT data.
With the introduction of the Affordable Care Act and increased emphasis on comparative effectiveness research, more attention has been focused on the costs associated with spine care in the United States. 39 Concomitantly, there have been significant technological advances in spinal surgery, increasing the associated costs. Among other issues, questions about the benefits of bone morphogenic protein and incomplete reportage of its complication profile have emerged. 10 It has also recently been shown that reporting of adverse events in cervical total disk trials was inconsistent. 1 All of these features argue for an increase in the quality of clinical research of spine surgical outcomes, both with respect to study design as well as clinical outcome and adverse events recording and reporting.
In this analysis, we set out to determine if there is a difference in the number and proportion of RCTs in the past 10 years among the four most common indications for lumbar spine fusion: DDD, spondylolisthesis, spinal deformity, and adjacent segment disease. We also sought to ascertain whether there has been an improvement in the consistency of clinical outcomes measured among RCTs over time, as well as in the quality of recording and reporting of adverse events. Finally, we also evaluated whether there were consistent changes in fusion surgical approaches reported over the same period.
Clinical Questions
Is the proportion of RCTs as a surrogate for quality of evidence regarding lumbar fusion increasing over the past 10 years?
Is there a difference in the proportion of RCTs among the four primary fusion diagnoses (DDD, spondylolisthesis, deformity, and adjacent segment disease) over the past 10 years?
Is there a difference in type and quality of clinical outcomes measured among RCTs over time?
Is there a difference in type and quality of adverse events measured among RCTs over time?
Are there changes in fusion treatment approaches over time by diagnosis over the past 10 years?
Materials and Methods
Details about methods can be found in the online supplementary material.
Results
We identified 42 RCTs between January 1, 2004, and December 31, 2013, that met the inclusion criteria and form the basis for this report (Fig. 1). See online supplementary material.
There were 35 RCTs identified evaluating patients diagnosed with DDD, 4 RCTs evaluating patients diagnosed with DS, and 3 RCTs evaluating patients with a combination of DDD and DS (Table 1). No RCTs were identified evaluating patients with deformity or adjacent segment disease.
Demographics and characteristic of included studies
Abbreviations: AF, anterior fusion; CBO, clinician-based outcome; CF, circumferential fusion; CBT, cognitive behavior therapy; DDD, degenerative disk disease; DS, degenerative spondylolisthesis; EuroQoL, European quality of life; JOA, Japanese Orthopedic Association; NR, not reported; ODI, Oswestry Disability Index; PF, posterior fusion; PRO, patient-reported outcome; TF, transforaminal fusion; TDR, total disk replacement; SF-36, Short-Form 36; VAS, visual analog scale; +, fusion + hardware comparison.

Flowchart showing results of literature search.
Clinical Question 1: Is the Proportion of RCTs as a Surrogate for Quality of Evidence Regarding Lumbar Fusion Increasing over the Past 10 Years?
The overall proportion of RCTs in the lumbar fusion literature over 10 years was 10.5% (
The largest proportion of RCTs was in 2004 (
The smallest proportion of RCTs was in 2008 (
The other 6 years within the past 10 varied from 8.3 to 9.8%.

Proportion of randomized controlled trials (RCTs) as a surrogate for quality of evidence regarding lumbar fusion increasing over the past 10 years.
Clinical Question 2: Is There a Difference in the Proportion of RCTs among the Four Primary Fusion Diagnoses (DDD, Spondylolisthesis, Adult Deformity, Adjacent Segment Disease) over the Past 10 Years?
The overall proportion of RCTs evaluating lumbar fusion in patients with DDD over 10 years was 13.4% (
The overall proportion of RCTs evaluating lumbar fusion in patients with DS over 10 years was 11.7% (
There were no RCTs in the lumbar fusion literature evaluating patients with adult spinal deformity or adjacent segment disease.
The greatest proportion of fusion RCTs evaluating patients with DDD occurred in the year 2004 (
The smallest proportion of fusion RCTs evaluating patients with DDD occurred in the years 2006 (
The greatest proportion of fusion RCTs evaluating patients with DS occurred in the year 2009 (
Five years (2004, 2005, 2008, 2010, and 2011) did not include any RCTs evaluating patients with DS.
The proportion of fusion RCTs evaluating patients with DS for the remaining 2 years was 2006 (16.7%) and 2007 (7.1%).

The difference in the proportion of RCTs among the four primary fusion diagnoses (DDD, DS, ASD, AD) over the past 10 years. Abbreviations: AD, adult deformity; ASD, adjacent segment disease; DDD, degenerative disk disease; DS, degenerative spondylolisthesis; RCT, randomized controlled trail. *No RCT found evaluating ASD or AD.
Clinical Question 3: Is There a Difference in Type and Quality of Clinical Outcomes Measured among RCTs over Time? (Fig. 4)

Percentage of included randomized controlled trials (RCTs) measuring Oswestry Disability Index (ODI), visual analog scale (VAS), and Short-Form 36 (SF-36).
Of the 42 included RCTs, 37 trials (88.1%) included patient-reported outcomes, 16 (38.1%) reported on clinician-based outcomes, and two studies (4.8%) did not report type of outcomes.
Thirty-three studies (78.6%) administered the Oswestry Disability Index, 25 studies (59.5%) administered a pain visual analog scale, and 17 studies (40.5%) administered the Short-Form 36 (Fig. 4).
There was no trend over time regarding type or quality of outcome.
Clinical Question 4: Is There a Difference in Type and Quality of Adverse Events Measured among RCTs over Time?
Of the 42 included RCTs, 34 trials (81%) included complications, 25 (59.5%) included reoperations, and 5 (11.9%) did not report any adverse events.
The most common adverse events reported across the studies were reoperation (59.5%), dural sac tear (26.2%), and deep vein thrombosis (16.7%).
There were 5 trials (11.9%) that included an adverse events severity system.
There was no trend over time regarding adverse events severity system as these 5 trials were from the years 2004, 2008, 2009, 2010, and 2011.
Clinical Question 5: Are There Changes in Fusion Treatment Approaches over Time by Diagnosis over the Past 10 Years?
Over the course of the 10-year period, anterior, posterior, circumferential, transforaminal, and a combination of these approaches have been used.
A posterior approach was used in 33.3%; circumferential in 21.4%; anterior in 19%; transforaminal in 11.9%; combination of one or more approaches used in 9.5%; and one study did not report a specific approach (2.4%).
There were no discernible changes in treatment approaches over time or by diagnosis in the past 10 years.
Discussion
This structured review was performed in an effort to assess whether the quality of clinical research on lumbar fusion has shown consistent improvement over the past decade. In the end, we are unable to make clear statements regarding trends over this period. On the other hand, there are some positive features to be noted from our results.
Although there has not been an apparent shift toward a greater percentage of RCT design among published studies, there has been a steady increase in the number of RCT studies published with a focus on DDD and on DS. As the two most common surgical indications for fusion, it is an encouraging finding. Although it is beyond the scope of this article to derive treatment guidelines, the numbers available suggest that there has likely emerged a relatively high level of evidence data on which to base such recommendations.
We are also encouraged by the relatively high percentage (88.1%) of RCTs using validated, patient-centered outcomes over the past decade. The most widely used questionnaire was the Oswestry Disability Index, which was used in 78.6% of reviewed RCTs. Although debate regarding which outcomes instruments are the best designed or the most responsive for patients receiving lumbar fusion is perhaps unsettled, the importance of using validated, patient-reported outcomes as opposed to clinician-reported outcomes is well accepted. This approach appears to be fairly consistently used by authors of the highest level of medical evidence in the field of lumbar fusion.
Unfortunately, the same cannot be said regarding the reportage of adverse events in these same studies. Although 81% of RCTs did include some discussion of adverse events, only 11.9% utilized some classification or scale of complications, which may in part reflect the lack of availability or development of clinical research tools with a valid weighting of adverse events following lumbar fusion surgery. We hope that this review may serve as an illustration of the need for such an effort.
The lack of a consistent approach to surgical fusion remains a barrier to development of a reliable body of high-quality clinical data on which to base treatment recommendations. Although the variety of approaches available does reflect a significant effort and investment in surgical innovation, it is unlikely that all of the approaches currently in use are equally safe or effective. Although undoubtedly some clinical decision making regarding approach is tailored to the needs of an individual patient, it is also likely driven at least in part by the training and experience of the surgeon performing the procedure. 27 This review highlights the need for higher-level comparisons of specific surgical approaches and techniques.
The lack of high-level data to assess fusion for patients with adult spinal deformity or adjacent segment disease remains an area of concern. The lack of published RCTs in these areas may reflect the even greater variations of clinical presentation and surgical approach among such patients. The comparatively smaller number of such patients also presents difficulty in obtaining patient cohorts of sufficient size to allow meaningful statistical comparisons. Despite such obstacles, however, patients and surgeons would undoubtedly benefit from efforts at improving the clinical data guiding treatment recommendations.
This review ultimately does not prove that the quality of the reported data is truly improved. A more detailed analysis of the actual content of the published studies would be required to gain a better understanding of their true level of quality. Nonetheless, this study does provide at least a partial assessment of the current landscape of lumbar spine clinical research. Our results do show that there appears to be an increasing adoption of an EBM-supported approach within the discipline of lumbar spine surgery over the past decade.
Conclusion
This structured review demonstrates that there has been an increase in the available clinical database of RCTs using patient-reported outcomes evaluating the benefit of lumbar spinal fusion for the diagnoses of DDD and DS. Gaps remain in the standardization of reportage of adverse events in such trials, as well as uniformity of surgical approaches used. Finally, continued efforts to develop higher-quality data for other surgical indications for lumbar fusion, most notably in the presence of adult spinal deformity and revision of prior surgical fusions, appear warranted.
Disclosures
Robert Hart, Board membership: CSRS, ISSLS, ISSGF; Consultant: DePuy Spine, Globus, Medtronic; Royalties: Seaspine, DePuy Synthes
Jeffrey T. Hermsmeyer, none
Rajiv K. Sethi, none
Daniel C. Norvell, none
The premise that prospective randomized clinical trials (PRCTs) represent the height of scientific evidence in surgical care has become something that has been increasingly challenged (see Editorial “Nothing Hurts Follow-Up like Follow-Up” on page 165 of this issue). A PRCT studies “efficacy” of a procedure—it seeks to prove or disprove the likelihood of a given intervention in comparison to another treatment to result in a desired therapeutic effect under tightly controlled circumstances. The purported main benefit of this type of “explanatory” RCT is the promise of bias reduction. In light of an apparent increasing unwillingness of some populations to allow their care to be chosen by randomization—even under the premise of therapeutic equipoise—the role of efficiency trials, meaning studies where treatments are studied in a real life practice of medicine, has gained increasing consideration. It is not difficult to foresee where large-scale “pragmatic trials” and registry-derived studies may supersede surgical PRCTs as the most impactful study on the evidence pyramid. Therefore, the current study premise of the authors to focus on level 1 PRCTs as the pinnacle of scientific validity may not be representative of the actually most meaningful form of research for the future.
The current study has further underscored the ongoing categorical confusion of studies using the clinical symptom of “low back pain” as their study foundation. Indeed many studies lump together entities such as such as “discogenic back pain,” “degenerative spondylolisthesis,” “(postdiskectomy) disk degeneration,” and “stable'” (isthmic) spondylolisthesis based on their common generalized clinical presentation of “low back pain.” Part of this confusion arises out of our lack of universally accepted operational definitions. Part of the problem also arises out of the insufficient specificity of the International Classification of Diseases, 9th Revision system with its overabundance of spine related terms. The reviewers expressed the hope that the increasing prevalence of International Classification of Diseases, 10th Revision and electronic medical records will foster improved specificity of medical terminology. The use of undifferentiated terms such as “low back pain” as presenting symptomatology without subdifferentiation for inclusion in PRCTs will likely not be sustainable in the future.
One reviewer pointed out the ongoing common disregard of nonorganic factors in studies regarding back pain. Clinical comorbidities such as anxiety, depression, fear avoidance, catastrophizing, presence of pre-existing chronic pain, sleep deprivation, and many other psychosocial variables likely heavily influence patient-reported outcomes more than the actual treatment interventions, thus leading to spurious result reporting.
In conclusion, the reviewers welcomed the finding of an increasing number of PRCTs being generated on the subject of lumbar fusions but warned of placing too much emphasis on PRCTs in generalized discussions regarding preferred treatments of “low back pain” without necessary further differentiation and due deliberation of “treatment efficiency.” Finally, the reviewers shared the authors’ surprise that there had been no high-level studies on the subject of adult degenerative scoliosis and adjacent segment disease.
Footnotes
Acknowledgments
Analytic support for this work was provided by Spectrum Research, Inc. with funding from AOSpine.
