Abstract
Study Type
Systematic Scoping Review.
Introduction
Radiographic assessment is crucial for diagnosing symptomatic pseudarthrosis and evaluating spinal fusion outcomes, yet no consensus exists on defining successful lumbar fusion. This scoping review presents criteria for imaging-based assessment after posterolateral and interbody lumbar fusion, aiming to guide consistent evaluation methods.
Methods
Following PRISMA guidelines, a comprehensive search of Medline, Embase, and Scopus identified eligible randomized controlled trials and Federal Drug Administration clinical trials involving lumbar fusion. Studies on revision surgeries, non-lumbar fusions, adult spinal deformity, traumatic fractures, tumors, infections, or lacking defined fusion assessment methods were excluded. Data extraction focused on classification and descriptive systems of fusion evaluation, analyzing parameters such as bony bridging, angular motion, translation, hardware failure, cage migration, radiolucency, and cleft within the fusion mass.
Results
A total of 142 articles (1995-2024) were reviewed. Computerized tomography was the most common imaging modality (102, 71.8%), followed by static (96, 67.6%) and dynamic radiographs (88, 62%). Descriptive criteria were used in 108 studies (76.1%) and classification systems in 47 (33.1%). Interbody fusion was assessed in 90 articles (63.4%) and posterolateral fusion in 68 (47.9%). Bony bridging continuity was the most reported descriptive criterion (105, 73.9%), followed by angular motion (72, 50.7%) and translation (43, 30.3%). Radiolucency was reported around the cage (31, 21.8%), pedicle screws (17, 11.9%), and within the fusion mass (36, 25.4%) to describe nonunion. Common classification systems included Bridwell (10 studies), Brantigan (9), and Lenke (9).
Conclusions
This scoping review highlights the variability in lumbar fusion assessment across RCTs and FDA trials. Over time, assessment methods have evolved from static radiographs to greater use of dynamic imaging and classification systems in the mid-2000s, with CT emerging as the dominant modality in the past decade. Despite these advancements, fusion assessment criteria remain inconsistent across studies.
Introduction
Lumbar fusion surgery has become an increasingly common procedure for the treatment of various spinal pathologies, including degenerative disc disease, spondylolisthesis, and spinal stenosis, among others.1-3 Its prevalence has risen significantly over the past three decades, driven by advancements in surgical techniques, instrumentation, imaging modalities, as well as implant and biologic materials.4,5 A primary goal of lumbar fusion is to achieve solid osseous union between vertebral segments, thereby stabilizing the spine and alleviating pain or neurological symptoms.2,6 However, despite its widespread use, challenges remain in consistently defining and assessing successful fusion outcomes, particularly through radiographic assessment modalities.
Common indicators of successful fusion by radiographic assessment are the presence of continuous bony bridging between the fused vertebral segments, suggesting solid bone healing and stability7,8. Additionally, a lack of visual gaps at the fusion site and minimal or no movement between vertebrae during flexion-extension imaging are used to infer successful fusion. 9 Advancing imaging modalities, such as computed tomography (CT) scans, offer more detailed visualization of implant integration and bone growth, further allowing for a more precise evaluation of the fusion mass7,10,11 Despite these tools, subjective interpretation and the absence of universally accepted assessment standards contribute to ongoing challenges in reliably defining fusion success.12-14
Variations in radiographic evaluation methods have resulted in significant methodological inconsistencies across studies, potentially limiting the comparability, relevance, and generalizability of their findings. Prior reviews have highlighted the heterogeneity in imaging techniques, criteria, and classifications used to define fusion. For example, a review of 374 studies identified over 250 combinations of criteria, emphasizing the absence of standardized definitions. 14 Another review of 187 studies similarly found frequent reliance on bony bridging, motion assessment, and radiolucency. Still, inconsistent use of classification systems such as Lenke and Christensen was noted. 13 These variations underscore the need for higher-quality evidence to establish consistent and reproducible definitions of fusion success.
The purpose of this study is to conduct a systematic scoping review of fusion assessment methods, focusing exclusively on higher-quality evidence, including randomized controlled trials (RCTs), and Federal Drug Administration (FDA) regulated clinical trials. By analyzing how successful fusion is evaluated in high-level evidence, this review aims to provide insights into current practices and highlight methodological strengths and gaps. Ultimately, these findings seek to inform future efforts to establish standardized guidelines for postoperative lumbar fusion assessment, ensuring consistency and reliability in defining fusion success.
Methods
Search Strategy
On July 12, 2024, a comprehensive literature search of Medline, Embase, and Scopus for peer-reviewed journal articles was performed. Peer-reviewed original articles were queried with deduplication performed automatically. The search strategy was adapted from systematic reviews by Duits et al and Lehr et al. including keywords, synonyms, and variations of the terms: “spine”, “lumbar”, “fusion”, “arthrodesis”, “posterolateral”, and “interbody”.13, 14
The following PICOT acronym was used:
P (Population): Adult patients receiving lumbar fusion for degenerative pathologies.
I (Intervention): Posterolateral lumbar fusion and/or interbody fusion surgery.
C (Comparison): Fusion vs non-fusion.
O (Outcome): Classification and descriptive systems of fusion evaluation, analyzing parameters
Such as bony bridging, angular motion, translation, hardware failure, cage migration, radiolucency, and cleft within the fusion mass.
T (Time): January 1995 to July 2024.
This literature review was reported in accordance with the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines.
15
The PRISMA screening process is outlined in Figure 1. PRISMA Flow Chart of the Study Selection Process
Study Eligibility
Studies met inclusion criteria if they utilized lumbar fusion interventions such as posterolateral fusion and/or interbody fusion in adult populations with lumbar spine degenerative disease. RCTs, prospective randomized studies, and FDA clinical trials involving an Investigational Device Exemption (IDE) were selected for inclusion as these categories provide a superior quality of clinical evidence, providing a more accurate reflection of fusion assessment. Studies were included only if they were written in English and if full-text was available for review. Cases involving revision surgery were excluded, as were those focused on cervical or thoracic fusion. Studies were also ineligible if diagnoses involved scoliosis, traumatic fractures, or pathological conditions such as tumor growth or infection (e.g. tuberculosis). Studies with no definition of fusion assessment methods were also excluded.
Processing of Studies and Data Extraction
Two independent reviewers screened each study based on titles, abstracts, and full texts according to the inclusion and exclusion criteria. Any conflicting decisions between two reviewers were resolved through discussion with a third reviewer. Reasons for ineligibility or exclusion of studies were documented. The senior author (S.K.C) confirmed the included studies.
Criteria of Common Classification Systems
Criteria of Descriptive System Subcategories
Risk of Bias Assessment
The methodological quality of each included study was independently assessed by two reviewers using the Cochrane Risk of Bias Tool (RoB 2.0) for randomized controlled trials. For non-randomized FDA trials, the ROBINS-I tool was applied. Discrepancies were resolved by consensus or third-party adjudication. Each study was rated across domains including bias due to the randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of reported result. A summary judgment of “low,” “some concerns,” or “high” risk of bias was assigned to each study and can be found in the supplemental material. Importantly, our review focuses exclusively on the methodological quality of the studies and does not assess clinical or efficacy outcomes.
Results
Search Results
After removing duplicates, the initial database searches yielded 3963 studies. Following the title and abstract screening, 2125 studies were excluded. The remaining 597 full-text reports were assessed, and 142 studies were deemed suitable for inclusion in the final analysis. Figure 1 displays a PRISMA flowchart detailing the screening process and literature search outcomes.
Study Characteristics
Study Characteristics
Eight descriptive criteria (Figure 2) for assessing fusion were identified and are summarized in Table 4. Indicators of fusion quality such as continuity of bony bridging and radiolucency were typically qualitative. Continuity of bony bridging was the most frequently reported criterion, appearing in 105 articles (73.9%). Dynamic instability criteria, including angular motion and translation, were quantitative but showed variability in cutoff values. Angular motion thresholds ranged from <2° to <5° (72 articles, 50.7%). Translation cutoffs varied between 1 and 3 mm and were used in 43 articles (30.3%). The static instability criteria, consisting of hardware failure and subsidence, were reported less frequently, appearing in 17.6% (n = 25) and 20.4% (n = 29) of studies, respectively. Radiolucency around the cage, around pedicle screws, and within the fusion mass were used in 31 (21.8%), 17 (11.9%), and 36 (25.4%) studies, respectively. Representative Radiographic Features Commonly Used to Assess Lumbar Spinal Fusion. (A) Continuity of Bony Bridging Between Vertebral Bodies, Indicating Osseous Union (B) Angular Motion and Translation on Dynamic Radiographs, Used to Evaluate Spinal Stability (C) Evidence of Cage or Graft Subsidence Into Adjacent Endplates (D) Hardware Failure, Such as Pedicle Screw Loosening or Breakage (E) Radiolucency Around Screws, Suggesting Potential Nonunion or Loosening Descriptive Criteria Utilized for Assessing Fusion
Classifications Utilized for Assessing Fusion
Temporal analysis revealed distinct shifts in fusion assessment practices over the three-decade study period. Use of descriptive criteria remained consistently high across all time intervals, with continuity of bony bridging being the most frequently reported criterion in each decade (22 studies from 1995-2004, 42 from 2005-2014, and 41 from 2015-2024). Meanwhile, indicators such as angular motion, translation, and subsidence saw modest increases over time, reflecting growing interest in dynamic and structural parameters. Notably, reliance on classification systems increased significantly in recent years. While classifications such as Brantigan and Bridwell were absent or rare in studies prior to 2005, their use rose sharply after 2014—Bridwell was cited in 9 studies and Brantigan in 9, all in the most recent decade. Similarly, Lenke and Glassman classifications showed increased use.
Discussion
Evaluating the status of bony fusion following spinal fusion surgery is essential. While open surgical exploration is the most definitive technique, noninvasive diagnostic imaging is widely used for routine assessment. However, there is no universally accepted definition of successful fusion in this context. The various imaging modalities come with distinct strengths and weaknesses, resulting in reported fusion rates that depend on varied criteria supported by limited clinical evidence. To explore the postoperative evaluation of lumbar spinal fusion, this study examined how the highest levels of evidence assessed successful postoperative fusion by conducting a scoping review of 142 RCTs and FDA trials published from 1995-2024.
The results demonstrate that interest in fusion has grown, with 45.1% of studies published between 2015 and 2024, consistent with the 95% rise in lumbar fusion procedures billed to Medicare between 2000 and 2019. 20 CT was the most used imaging modality (71.8%) due to clinical guidelines and its high sensitivity, 21 while static and dynamic radiographs (67.6% and 62.0%) may have been favored for their accessibility at follow-up.22,23 Descriptive criteria (76.1%) predominated, while classification systems like Bridwell, Brantigan, and Lenke appeared in only a third of the studies, reflecting a lack of standardization. Continuity of bony bridging was the most reported descriptive criterion (73.9%) to define union, followed by lack of angular motion and translation as key indicators of spinal stability. Radiolucency, a marker of nonunion, was inconsistently noted, with a 25.4% prevalence in the fusion mass (i.e., cleft in fusion mass). These findings confirm the hypothesis of significant variability in assessing fusion status in the literature.
Prior meta-analyses regarding postoperative fusion methods in retrospective studies reported similar variances in the criteria used. Duits et al. (2024) found that 32% of interbody fusion studies used classification systems, primarily Bridwell and Brantigan, while 65% relied on descriptive criteria, with bony bridging being the most common. 14 Importantly, they revealed that descriptive criteria and classifications were applied in 256 unique combinations in the literature, with 45% of the combinations being used by only a single article. Similarly, Lehr et al (2022) found that 47% of posterolateral lumbar fusion studies used classification systems, mainly Lenke and Christensen, while 63% used descriptive criteria, again favoring bony bridging. Imaging usage included static radiographs (72%), dynamic radiographs (51%), and CT (35%). 13 As demonstrated in both the results and the prior literature, the variability in classification system usage and imaging modalities across studies highlights the need for more standardized criteria to improve comparability and reliability.
Our temporal analysis further revealed evolving patterns in both imaging modalities and fusion assessment strategies over the past three decades. Notably, the use of CT imaging increased substantially from 33.3% of studies in 1995-2004 to 84.4% in 2015-2024, reflecting growing reliance on higher-resolution, cross-sectional imaging to evaluate interbody fusion. In parallel, although descriptive criteria—particularly continuity of bony bridging—have remained the predominant method across all time periods, the use of formal classification systems rose sharply in the most recent decade, with Bridwell, Brantigan, and Lenke classifications increasingly adopted. This trend may reflect the maturing landscape of lumbar fusion research and a growing recognition of the need for reproducible grading tools amid technological advancements in imaging. However, despite these encouraging shifts, the continued dominance of non-standardized descriptive criteria underscores the field’s lack of consensus.
Clinically, it is crucial to establish a standardized methodology for defining spinal fusion as many studies investigate the impact of various factors on fusion rates13,14,24-28 The lack of uniform criteria across the literature complicates comparisons between studies. While previous efforts to standardize the assessment of bony fusion have been made, they predominantly focus on imaging modalities.21,29-31 For instance, although CT scans are the most utilized imaging methodology for assessing fusion rates, variability in their accuracy persists. 14 This variability stems from several factors, including differences in imaging protocols, interobserver variability in interpretation, and the influence of metal artifacts from spinal implants, which can obscure bony details and lead to inconsistent assessments of fusion status.32-34 Additionally, the sensitivity of CT scans in detecting pseudarthrosis may be limited in certain cases, particularly when subtle motion or radiolucency is present but not clearly visualized due to technical or anatomical constraints. 31 Advances in CT technology, such as helical scanning, multiplanar reconstructions, higher resolution, and reduced artifacts from metal implants, may contribute to further inconsistencies in the literature.35-37 For example, newer techniques like dual-energy CT and iterative metal artifact reduction algorithms have shown promise in reducing implant-related artifacts, but their adoption and interpretation vary widely across institutions, leading to discrepancies in fusion assessment.38-40 Although MRI studies on spinal fusion remain limited, MRI has been explored as an alternative imaging modality. MRI does not involve radiation exposure and provides detailed visualization of neural and adjacent soft tissue structures. Some studies have examined its potential role in assessing fusion, with varying degrees of agreement compared to CT. Kitchen et al. reported moderate concordance between MRI and CT in evaluating fusion within interbody cages, particularly in coronal planes (κ = 0.58). 41 Similarly, Kröner et al. found that coronal MRI could visualize bony bridging through cages with high interobserver agreement (88%). 42 While MRI may address some limitations associated with CT, its use for fusion assessment is not yet widely established.
Fusion is a costly procedure, with the industry heavily invested in implants and biologics as surgeons pursue the “ideal fusion.”43-45 However, the absence of both consistency and consensus in evaluation methods complicates efforts to optimize outcomes and manage costs. These challenges underscore the need for standardized assessment criteria that integrate imaging modalities with clear guidelines for interpretation and reporting, ensuring consistency and reliability across studies and clinical practice.
The FDA has proposed its own criteria for defining fusion in its guidance document for manufacturers preparing an IDE for spinal systems. 46 This criteria was first established in 1998, with subsequent updates in 2000 and 2004. 47 According to the FDA, successful fusion is defined as: (1) evidence of bridging trabecular bone between the involved motion segments, (2) translational motion less than 3 mm, and (3) angular motion less than 5° demonstrated on X-ray (anterior-posterior, lateral, flexion, and extension views). For manufacturers utilizing an alternative radiographic modality, the FDA requires a demonstration of the validity and reliability of the chosen modality before its utilization as a primary study endpoint. 48 However, the current study found that only 30% of studies utilized translational motion as one of its criteria for assessing fusion, indicating that many studies do not adhere to FDA-recommended imaging criteria. Additionally, more than 70% of studies utilized CT scans to assess fusion, further underscoring the deviation from FDA recommendations. This suggests that the FDA guidelines may no longer reflect current clinical practices regarding the optimal imaging modality, as our findings indicate a significant rise in CT utilization from 2005 to 2024, well after these guidelines were established. These findings emphasize the importance of developing and validating standardized criteria for fusion assessment to ensure consistency across studies and clinical practices, ultimately enhancing the reliability and generalizability of research findings. Until a universally accepted definition is established, wider adoption of the FDA’s recommendations with current imaging best practices could help standardize fusion assessment and enhance comparability across studies.
Another factor contributing to the variability in assessing fusion is the wide range of classification systems utilized. The investigation identified 15 unique classification systems. This aligns with previous literature, which has also reported varied classification systems used to assess fusion.13,14 Among these, the Bridwell classification was the most commonly utilized, whereas other studies, such as Lehr et al. (2002), identified the Lenke classification to be the most prevalent. 13 Despite minor differences, most classification systems share common elements, such as evaluating bony continuity and the presence or absence of motion. However, subtle differences between these systems make it challenging to compare studies utilizing different criteria. Moreover, even within the same classification system, discrepancies in the imaging modalities used (e.g., CT scan vs X-ray) further limit comparability. 14
To achieve true generalizability in studies assessing spinal fusion, a standardized and validated set of criteria, including imaging modalities, must be established. With rapid advancements in technology, minimizing human error in fusion assessment is becoming increasingly feasible. Several approaches warrant investigation to develop a comprehensive and standardized assessment framework. Standardized imaging protocols can enhance consistency by ensuring uniform imaging techniques, patient positioning, and interpretation across studies. Additionally, advanced image processing and AI-driven analysis offer objective and reproducible measures of fusion, reducing observer bias and improving accuracy.49,50 Machine learning-based predictive modeling can further integrate diverse clinical and imaging factors to forecast fusion outcomes, facilitating a more standardized evaluation across different fusion types and patient populations. 51 Separately defining fusion subtypes, such as posterolateral or interbody fusion, is also essential, as it enables tailored assessment methods that improve specificity and generalizability. Furthermore, consolidating existing classification systems into a unified framework could enhance consistency and comparability in fusion research. Collectively, these strategies aim to refine and unify fusion assessment criteria, ultimately improving reliability and standardization in both clinical and research settings.
Limitations and Future Directions
This scoping review, while comprehensive, entails inherent limitations to be considered when interpreting the findings. First, the scope of this review is descriptive and classification-focused, rather than quantitative or accuracy-focused. This approach limits this study’s ability to directly compare the effectiveness or precision of different lumbar fusion assessment methodologies. As such, the conclusions drawn are primarily based on the presence and descriptions of methodologies rather than their quantitative validation against clinical outcomes.
Moreover, the review was restricted to articles published in English, introducing a potential language bias. Additionally, the inclusion of older studies introduces another layer of complexity given that imaging technologies have significantly evolved over the decades covered by this review (1997-2024). Earlier studies utilized imaging modalities that, by today’s standards, might be considered less precise. This technological discrepancy could impact the interpretation of certain fusion assessment methods, as older imaging techniques may not provide the necessary level of detail required by modern descriptive criteria.
Another challenge is that not all instances of delayed fusion or radiographic pseudoarthrosis are clinically relevant. 52 Some patients with radiographic evidence of pseudoarthrosis remain asymptomatic, whereas others may require intervention. This highlights the need for an integrated approach to distinguish between unfused patients who require treatment and those with asymptomatic pseudoarthrosis who may not. Additionally, differing techniques in the surgical approach to lumbar fusion (e.g., anterior, posterior, transforaminal) as well as variations in cage types were not considered in this study’s analysis.
Future research should prioritize several key areas. First, quantitative validation studies are essential to objectively compare the effectiveness and precision of different lumbar fusion assessment methodologies. This would help establish standardized criteria that can be universally adopted in clinical settings. Alongside this, there is a pressing need for developing or validating unified classification systems for lumbar fusion specific to the fusion type (interbody/posterolateral) and assessment method (radiography/CT), with clear measurment cut-offs to define instability. A standardized system would facilitate consistent assessments across studies and clinical practices, enhancing the reliability and comparability of research findings.
Another critical research direction is the integration of clinical outcome measures in fusion assessments. Beyond radiographic evaluation, studies should investigate how well different assessment methodologies correlate with functional outcomes, pain relief, and the need for reoperation. Prospective studies assessing clinical indicators of successful fusion—such as patient-reported outcomes, biomechanical assessments, and adjacent segment degeneration—could provide a more holistic understanding of fusion success beyond imaging findings alone.
Conclusion
This systematic scoping review highlights the considerable variability in assessing lumbar fusion success in RCTs and FDA-regulated trials. The evolution of assessment methods and imaging modalities since 1995 reflects both technological advancements in spinal fusion research. Early studies primarily relied on static radiographs and descriptive criteria, often lacking standardized classification systems. In the mid-2000s, there was a notable increase in dynamic radiographic assessments, along with greater adoption of classification systems. The most recent decade saw the most significant shift, with CT becoming the predominant imaging modality. Despite these advancements, fusion assessment criteria remain inconsistent across studies (Supplemental Material).
Supplemental Material
Supplemental material - Radiographic Assessment of Successful Lumbar Spinal Fusion: A Systematic Review of Fusion Criteria in Randomized Trials
Supplemental material for Radiographic Assessment of Successful Lumbar Spinal Fusion: A Systematic Review of Fusion Criteria in Randomized Trials by Alexander Yu, BS; Justin Tiao, BS; Charlene W. Cai; Jonathan J. Huang, AB; Kareem Mohamed, BS; Ryan Hoang, BS; James Hong, BS; Daniel Berman, MD; Joshua Lee, MD; Luca Ambrosio, MD; Zorica Buser, PhD, MBA; Juan P. Cabrera; MD, Xiaolong Chen, MD, PhD; Chiara Cini, MD; Stipe Ćorluka, MD; Andreas K. Demetriades, MB.BChir; Ashish Diwan, PhD, FRACS; Amit Jain, MD, MBA; Jin-Sung Kim, MD, PhD; Xudong Li, MD, PhD; Sathish Muthu, MS, PhD; Javad Tavakoli, PhD; Gianluca Vadalà, MD, PhD; Patrick C. Hsieh, MD, MBA; Samuel K. Cho, MD; AO Spine Knowledge Forum Degenerative in Global Spine Journal
Footnotes
Acknowledgments
Jill K Gregory, MFA, CMI - Certified Medical Illustrator. Associate Director of Scholarly Publishing and Visualization. Gustave L. and Janet W. Levy Library.
ORCID iDs
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Disclosures
Samuel K. Cho, MD, FAAOS
AAOS: Board or committee member
American Orthopaedic Association: Board or committee member
AOSpine North America: Board or committee member
Cervical Spine Research Society: Board or committee member
Globus Medical: IP royalties and Fellowship support
North American Spine Society: Board or committee member
Scoliosis Research Society: Board or committee member
Stryker: Paid consultant
Cerapedics: Fellowship support.
Supplemental Material
Supplemental material is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
