Sage Journals: Discover world-class research

Abstract

Study Design

Systematic Review.

Objective

To assess the current literature regarding the accuracy of different imaging modalities and criteria used to assess lumbar fusion, and their correlation with surgical direct observation as the current Gold Standard.

Methods

Following PRISMA guidelines, we conducted a comprehensive search of PubMed, Embase, Google Scholar, and Cochrane Library, studies were included if they focused on patients with a prior history of lumbar interbody fusion and at least 1 year of radiographic follow-up. The review assessed the sensitivity, specificity, and accuracy of different imaging techniques, and their correlation with surgical findings.

Results

Thirteen studies (1989-2019) were reviewed, including 715 patients, common imaging modalities included plain radiographs (53.8%), Computed Tomography (CT) (69.2%), and dynamic radiographs (30.7%). CT appeared as the most utilized modality post-2006. There was substantial variability in diagnostic accuracy, with CT showing high variability in sensitivity and specificity. Descriptive criteria for fusion were widely used, but interobserver agreement was generally low.

Conclusion

The review highlights a lack of standardized criteria for assessing lumbar fusion. Despite advancements in imaging techniques, the variability in diagnostic parameters suggests a need for consensus and multicentric studies to stablish reliable, universal criteria for evaluating fusion success.

Keywords

spine lumbar vertebrae arthrodesis diagnostic techniques and procedures pseudarthrosis

Introduction

Lumbar Disc Degenerative Disease (LDDD) is the leading cause of low back pain worldwide, and the most common cause of disability in the aging population. LDDD and facet joint disease may result in mechanical back pain, radicular symptoms, increased morbidity and poor quality of life. In this regard, Lumbar Interbody Fusion (LIF) surgery is the mainstay of treatment for a wide range of lumbar spine diseases, including degenerative, traumatic, infectious and tumoral pathologies. The concept of “Spinal Fusion” was first introduced by Albee and Hibbs as a successful surgical treatment for Pott’s disease.^1-5

Lumbar Interbody Fusion procedures require the placement of an implant such as cages, spacers, or structural bone-grafts within the intervertebral space after performing a discectomy and proper preparation of the adjacent endplates. Most of the time, posterior instrumentation and/or different bone graft options are advocated to improve the fusion rate. Despite this, a common complication following LIF procedures is pseudarthrosis.^1,2

The term “pseudarthrosis” (Greek pseudo = false, and arthrosis = joint) is used to describe a failure in spinal fusion, defined as the lack of osseous bridging at more than 1 year after surgery. The incidence of pseudarthrosis after a LIF surgery may range from 5% to 35%, with a higher incidence in patients requiring fusions of 3 or more spinal levels. Patients who develop pseudarthrosis may complain of axial or radicular pain as their initial symptoms, while clauditory and/or myelopathic symptoms may present later. However, in up to 50% of patients pseudarthrosis may be completely asymptomatic despite radiographic findings of nonunion. Even in patients with symptoms suggestive of pseudarthrosis, findings on physical examination are often nonspecific, needing further radiographic confirmation.^2,6,7

Addressing this problem, routine follow-up of patients undergoing LIF often include radiographic studies along with physical examination at determined intervals after surgery. Over the past several decades, many imaging techniques have been proposed to evaluate fusion outcomes, including plain static or dynamic radiographs, computed tomography (CT), magnetic resonance imaging (MRI), bone scintigraphy, and radio stereometric assessment (RSA). However, at present there is not a universal consensus on imaging criteria for assessing fusion.^7,8

Given the lack of consensus regarding radiographic techniques assessing fusion rate in patients undergoing LIF procedures, we performed a systematic review of the current literature to evaluate this topic.

Methods

This review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.⁹

Search Strategy

A systematic search of electronic databases of PubMed, Embase, Google Scholar, and Cochrane Library was conducted from their inception to August 2^nd, 2024, to identify all publications in the English language. The search strategy comprised a combination of the keywords “pseudarthrosis” OR “pseudoarthrosis” AND “lumbar spine” OR “lumbar spine fusion assessment” OR “arthrodesis” OR “fusion” OR “lumbar arthrodesis” OR “lumbar spine fusion outcome” AND “plain radiograph” OR “dynamic radiograph” OR “flexion-extension radiograph” OR “flexion extension radiograph” OR “computed tomography” OR “CT” OR “magnetic resonance imaging” OR “MRI”.

Eligibility Criteria

Eligible articles and their reference lists were reviewed for additional relevant articles and added to the analysis. Inclusion criteria regarded patients with a minimum age of 18 years who had a previous history of a lumbar interbody fusion and/or lumbar posterolateral fusion procedure, and a minimum of 1-year of postoperative radiographic follow-up. Regarding radiographic modalities, further inclusion criteria encompassed the reported sensitivity and specificity of the imaging modality; Additionally, studies that correlate radiographic findings with surgical exploration were considered, as surgical exploration is currently the Gold Standard for lumbar fusion assessment.

Study Selection

All available results were exported to the Elsevier Reference Manager Mendeley (Elsevier; Amsterdam, the Netherlands) to exclude duplicates. Two authors (E.G.G. and R.N.R.) independently reviewed the titles and abstracts of the remaining articles. All articles with no relevance to radiological non-union assessment in lumbar spine pseudarthrosis were excluded.

Risk of Bias Assessment

Two reviewers (J.O.E. and H.G.I.) independently assessed the risk of bias for the included studies. Depending on the reported outcomes on each study, we employed the Quality Appraisal of Reliability Studies (QAREL), and the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) checklists. The resulting risk of bias was considered high for studies assessing reliability if less than 60% of the signaling questions for each domain of the QAREL checklist were answered “yes”, and for studies evaluating accuracy, if more than 2 of the QUADAS-2 signaling questions for each domain were answered with either “no” or “unclear”.

Data Extraction

The following information was extracted and summarized in tables: (1) Authors and date of publication, (2) number of treated patients, (3) age and sex, (4) information regarding previous fusion procedures depending on each case, (5) follow-up time, (6) time to radiographic presentation of pseudarthrosis, (7) employed radiographic modality, (8) radiographic fusion and/or non-union criteria, (9) surgical outcome describing the rate of successful fusion cases and non-union cases, and 10) imaging correlation with surgical findings depending on each case. For studies describing diagnostic parameters for the employed imaging modality information regarding sensitivity and specificity, and predictive values was extracted. Summarized data is presented descriptively.

Data Analysis

The initial search resulted in 865 articles (186 articles from PubMed, 475 from Google Scholar, 85 from Embase, and 119 from Cochrane Library). After searching for duplicates, 575 articles were selected for the first screening. The initial screening excluded 505 articles, this was due to non-full-text availability, studies performed on animal subjects or articles out of the study’s aim. 70 full-text articles were then screened for eligibility, excluding 5 additional articles due to inclusion criteria. Finally, 13 articles were included in our final analysis. Details are provided in Figure 1.

Figure 1.

Flow chart of the systematic review according to PRISMA guidelines.

Results

Systematic Literature Review and Population Demographics

13 studies published between 1989 and 2019 were revised. Most of studies were from North America (61.5%), followed by Europe (30.7%) and Asia (7.6%). The most frequent study design were retrospective studies (53.8%), whereas 6 (46.1%) were prospective cohort studies, including 1 (7.6%) randomized controlled study. We identified reports of 715 patients with mean ages ranging from 42-78 years. From overall, information of 615 patients regarding gender is available, with 309 (43.2%) male participants, 306 (42.7%) females, and the remaining 13.9% is not listed.

Surgical Approaches

Lumbar fusion procedures were mainly (93.1%) performed through an open posterior approach (both interbody fusion and posterolateral interbody fusion), with only 1 study describing an anterior lumbar approach (6.8%). In 100% of studies, both IBF and PLF were supplemented with a posterior fixation system, however, only 6 studies (46.1%) described the use of interbody cages and/or implant/graft use, whereas this is unclear in the remaining studies (N = 7, 53.8%). The mean follow-up after the first lumbar fusion procedure ranged from 6 months up to 29 years. Table 1.

Table 1.

List of Studies Included in our Systematic Review.

Reference (Author/Year)	Study Design	Patients (Age / Sex)	Surgical Procedure	IBF Implant Type			Posterior / Posterolateral Fusion			Follow-Up (Months)	Outcome		Mean Time Interval to Surgical Exploration	Radiographic Fusion Criteria	Correlation with Surgical Exploration
Reference (Author/Year)	Study Design	Patients (Age / Sex)	Surgical Procedure	Cage	Graft	Other/Unclear	Yes	No	Other/Unclear	Follow-Up (Months)	Succesful Fusion Rate (N/%)	Non-Union Rate (N/%)	Mean Time Interval to Surgical Exploration	Radiographic Fusion Criteria	Correlation with Surgical Exploration
Laasonen et al. (1989)¹⁰	Retrospective cohort study	18 M (37.5%) / 30 F (62.5%)	IBF ± PLF	0	6	0	42	0	0	Range: 6-48	28 (58.3%)	20 (42.6%)	≈33.7 months	Plain/Flexion-Extension Radiographs: N/A	N/A
Laasonen et al. (1989)¹⁰	Retrospective cohort study	18 M (37.5%) / 30 F (62.5%)	IBF ± PLF	0	6	0	42	0	0	Range: 6-48	28 (58.3%)	20 (42.6%)	≈33.7 months	CT: N/A	78%
Brodsky et al. (1991)¹¹	Retrospective cohort study	99 M (57%) 76 F (43%)	PLF	0	0	0	175	0	0	Range: 8 months-29 years	N/A	N/A	N/A	Plain/Flexion-Extension Radiogrpahs: N/A.	64%
Brodsky et al. (1991)¹¹	Retrospective cohort study	99 M (57%) 76 F (43%)	PLF	0	0	0	175	0	0	Range: 8 months-29 years	N/A	N/A	N/A	CT: Evidence of bony continuity of the fusion mass.	57%
Kant et al. (1995)¹²	Retrospective study	75 patients	IBF ± PLF	0	0	37 (49.3%)	75 (100%)	0	0	12 months	51 (69%)	24 (31%)	≈12 months	Plain Radiographs: Evidence of solid bone from one transverse process to the other transverse process or when oblique views revealed obliteration and fusion in facet joints.	68%
Larsen et al. (1996)¹³	Prospective study	25 patients	PLF	0	0	0	25 (100%)	0	0	At least 12 months	16 (64%)	9 (36%)	N/A	Plain radiogrpahs: Evidence of bringing bony trabeculae.	62%
														Flexion-Extension Radiographs: <3° of motion on flexion-extension.	N/A
														CT: Evidence of bringing bony trabeculae.	63%
														Bone Scintigraphy: Lack of increased uptake on bone.	60%
Jacobson et al. (1997)¹⁴	Prospective study	≈ 43.2 years/3 M (30%) 7 F (70%)	IBF ± PLF	0	0	1 (10%)	10 (100%)	0	0	At least 12 months	10 (100%)	0 (0.0%)	≈9 months	Ultrasonography: Presence of an echogenic and shadowing interface that bridged contiguos vertebral body levels.	80%
Albert et al. (1998)¹⁵	Prospective study	≈ 42.8 years/21 M (55.2%) 17 F (44.7%)	IBF ± PLF	N/A	N/A	N/A	35 (92.1%)	3 (7.8%)	0	≈ 23.9 months	24 (63.1%)	14 (36.8%)	≈23.9 months	SPECT: Absence of increaed bone uptake beyond background signal on coronal, sagittal, or transverse images.	N/A
Bohnsack et al. (1999)¹⁶	Prospective study	≈ 42.0 years/21 M (50%) 21 F (50%)	IBF ± PLF	N/A	N/A	10 (24%)	32 (76%)	0	0	At least 12 months	38 (91%)	4 (9%)	≈27 months	Bone Scintigraphy: Lack of increased uptake on bone.	N/A
Kanayama et al. (2006)¹⁷	Prospective, Randomized and Controlled study	Group 1: 70.3 ± 8.0 years/5 M (55.5%) 4 F (44.4%)	PLF	0	0	0	9 (100%)	0	Autograft + Bone Ceramic Sustitute (N = 9, 100%)	16.4 ± 3.8 months	9 (100%)	0 (0.0%)	≈15.3 months	Plain / Flexion-Extension Radiographs: Evidence of bringing bony trabeculae / <5° of angular motion on flexion-extension or <2mm of translation.	78%
Kanayama et al. (2006)¹⁷	Prospective, Randomized and Controlled study	Group 2: 58.7 ± 9.0 years/6 M (60%) 4 F (40%)	PLF	0	0	0	10 (100%)	0	rhOP-1 (N = 10, 100%)	13.3 ± 1.4 months	7 (77.7%)	3 (33.3%)	≈15.3 months	CT: Evidence of bringing bony trabeculae.	57%
Carreon et al. (2007)¹⁸	Retrospective Cohort Study	≈ 57.0 years/42 M (45.1%) 51 F (54.8%)	PLF	0	0	0	93 (100%)	N/A	N/A	N/A	61 (66%)	32 (34%)	≈49 months	Fine-Cut CT: A facet fusion was defined as obliteration of the joint space between the superior and inferior articulating surfaces. Gutter fusion was defined as continuos trabeculated bone conecting the transverse processes.	84%
Carreon et al. (2008)¹⁹	Retrospective cohort study	≈ 43.0 years/26 M (53%) 23 F (46.9%)	ALIF	49 (100%)	N/A	N/A	28 (57.1%)	16 (32.6%)	5 (10.2%)	At least 12 months	43.5%	56.5%	≈22 months	Fine-Cut CT: Presence of trabecular bony bridging termed as a ‘‘sentinel sign’’ and a ‘‘posterior sentinel sign.’’	N/A
Fogel et al. (2008)²⁰	Retrospective cohort study	≈ 43.0 years/48 M (53.3%) 42 F (57.7%)	PLIF	90 (100%)	Iliac Bone Autograft (N = 90, 100%)	0	90 (100%)	0	Local Bone Autograft (N = 90, 100%)	Range: 24-68 months	87 (97%)	3 (3%)	≈38 months	Plain Radiographs: Interbody fusion was evaluated using the BSF Scale defining fusion as a BSF-3 grade, and Posterolateral fusion was evaluated using the Lenke's Classification, defining fusion as a Lenke-A grade.	N/A
Fogel et al. (2008)²⁰	Retrospective cohort study	≈ 43.0 years/48 M (53.3%) 42 F (57.7%)	PLIF	90 (100%)	Iliac Bone Autograft (N = 90, 100%)	0	90 (100%)	0	Local Bone Autograft (N = 90, 100%)	Range: 24-68 months	87 (97%)	3 (3%)	≈38 months	CT: Interbody fusion was evaluated using the BSF Scale defining fusion as a BSF-3 grade, and Posterolateral fusion was evaluated using the Lenke's Classification, defining fusion as a Lenke-A grade.	N/A
Damgaard et al. (2010)²¹	Retrospective cohort study	≈ 44 years/1 M (11.1%) 9 F (88.8%)	IBF ± PLF	3 (33.3%)	Local Bone Autograft (N=3, 33.3%)	0	9 (100%)	0	Local Bone Autograft (N = 9, 100%)	≈ 17 months	7 (77.7%)	2 (22.2%)	N/A	SPECT/CT: N/A.	N/A
Spirig et al. (2019)²²	Prospective study	67.3 ± 10.9 years/19 M (46.3%) 22 F (53.6%)	PLF	N/A	N/A	N/A	42 (100%)	N/A	N/A	3.05 ± 3.12 years	N/A	N/A	≈36 months	Plain Radiography: N/A.	N/A
														CT: Absence of peri-screw osteolysis or radiolucent zones.	N/A
														MRI: Absence of peri-screw edema.	N/A

M, Male; F, Female; IBF, Interbody Fusion; PLF, Posterolateral Fusion; N/A, Not Available; CT, Computed Tomography; SPECT, Single Proton Emission Computed Tomography; rhOP-1, Recombinant Human Osteogenic Protein-1; BSF Scale, Brantigan-Steffe-Fraser Scale; MRI, Magnetic Resonance Imaging.

Imaging Modality

All reviewed studies employed different imaging modalities, such as conventional plain radiographs (N = 7, 53.8%), dynamic flexion-extension radiographs (N = 4, 30.7%), computed tomography (N = 9, 69.2%), single proton emission computed tomography (SPECT) and/or bone scintigraphy (N = 2, 15.3% for each one), while the use of magnetic resonance imaging and ultrasonography were the least frequently employed for fusion assessment (N = 1, 7.6% each one). The studies also reflected the trend-change among the different imaging modalities employed for fusion evaluation over time; For example, before 2006, the use of computed tomography was reported on 57.4% (N = 4) of the studies, reflecting its status as a relatively new imaging technique at that time, and therefore its use was not so widespread, in addition to the technical limitations of that time, for example, the average thickness of the axial cuts was initially 3-6 millimeters. Then, after 2006, CT became the most used modality for lumbar fusion assessment (N = 6, 100%), while other techniques such as bone scintigraphy showed a downward trend (N = 1, 16.6%).

Radiographic Criteria of Lumbar Fusion

The reviewed studies used different criteria for fusion definition, regardless of if it was IBF and/or PLF. However, they can all be grouped as studies describing positive signs of fusion and/or the absence of negative findings. Positive signs referred to the presence of an imaging feature such as bony bridging trabeculae between vertebral bodies and/or transverse processes, space obliteration between 2 adjacent articulating joints and other signs of bone maturation. Negative signs were defined as the absence of dynamic measures of instability such as angular motion or signs of translation between 2 vertebral segments, static parameters of instability, presence of radiolucency, and other parameters. From overall, more than two-thirds of reviewed articles (N = 10, 76.9%) assessed lumbar fusion employing descriptive criteria, with 2 studies (20%) among them differentiating between interbody fusion and posterolateral fusion, 1 study (10%) described a classification for each 1; On the remaining 23% of studies the employed fusion criteria were not clearly described.

Only 15.3% of reviewed studies described quantitative instability criteria for failed fusion in patients evaluated with flexion-extension radiographs, however there were differences between their cutoffs ranging from <3-5 degrees of angular motion in a determined vertebral level, and <2 millimeters in anteroposterior linear motion. It is of interest to mention that studies assessing instability parameters considered a “permissive” range of motion between 2-3°, which was present on some of the patients considered as adequately fused on surgical exploration.

Addressing the employed classification systems, they consisted in the anterior fusion scale developed by Brantigan-Steffe-Fraser (BSF) and the 4-grade scale of Lenke to assess posterior/posterolateral fusion. Both were based on CT images, however the patients in that same study had plain radiographs as part of their preoperative screening. These classifications relied on the presence of trabecular bony bridges occupying the intended site of fusion on both sides. Table 2.

Table 2.

Identified Radiographic Criteria for Assessing Lumbar Fusion.

Positive Signs	N Studies (%)
Presence of bony bridging trabeculae
Between vertebral bodies	6 (46.1%)
Between facet joints	5 (38.4%)
Between transverse processes	5 (38.4%)
Anterior sentinel sign	1 (7.8%)
Posterior sentinel sign	1 (7.8%)
Negative Signs
Signs of nonunion
Fragmentation of fusion mass	5 (38.4%)
Hairline pseudoarthrosis	1 (7.8%)
Pseudoarthrosis at the end of fusion	2 (15.3%)
Bone resorption	1 (7.8%)
Increased bone uptake (on Scintigraphy)	4 (30.7%)
Signs of Instability
<3 degrees of angular motion	2 (15.3%)
<5 degrees of angular motion	2 (15.3%)
<2 mm of linear translation	1 (7.8%)
Hardware Failure
Fracture	3 (23%)
Loosening	3 (23%)
Migration	3 (23%)
Radiolucency
Peri-screw radiolucency	2 (15.3%)
Peri screw edema (on MRI)	1 (7.8%)
Classifications
Brantigan-steffee-fraser scale	1 (7.8%)
Lenke’s classification	1 (7.8%)

Diagnostic Accuracy and Reliability of Radiographic Modalities Assessing Fusion

There was a significant variability in the diagnostic parameters among the different imaging modalities used in the reviewed studies. In overall, ten of the reviewed studies (76.9%) contained information regarding the sensitivity and specificity of their employed imaging modalities. For plain radiographs, the sensitivity ranged from 42% up to 100%, and their specificity ranged between 60% and 89%. Followed by CT with a sensitivity range of 53%-100% and specificity range of 78%-96.7%; and Bone Scintigraphy with a sensitivity range of 50%-83% and specificity range of 25%–93%. Regarding flexion-extension radiographs, they exhibited a sensitivity and specificity ranging from 68.7% to 96%, and from 0% to 71.4% respectively, thus being the study with the highest observed variability. The studies with the least variability in terms of these parameters were SPECT with a sensitivity of 50% and specificity of 58%, which was employed in 2 studies. Finally, MRI had a sensitivity of 43.9% and specificity of 92.1%, and ultrasonography with 100% and 60% respectively. It is important to note that the former 2 imaging modalities were employed only on 1 study for each 1.

Accuracy

76.9% of the reviewed studies contained information regarding the accuracy of their imaging modalities employed on each one. For studies that had not formally described this parameter, this was then calculated using the described values for sensitivity, specificity, positive and negative predictive values, and the successful fusion rate. Once again, a significant variability was observed; For the CT, it showed the highest variability with an accuracy ranging from 62.5% to 96%; followed by bone scintigraphy (60%-88%), plain radiographs (61.9%-88.9%), and flexion-extension radiographs (54.5%-73.1%), which had the lowest variability. Regarding SPECT, ultrasonography and MRI, their lack of variability was because they were the least employed modalities. The rest of the details are provided in Table 3 and Appendix A.

Table 3.

Diagnostic Parameters of the Studies Included in our Systematic Review.

Reference (Author/Year)	Radiographic Fusion Criteria	Radiographic Modality		Predictive Values		Positive Likelihood Ratio	Negative Likelihood Ratio	Correlation with Surgical Exploration	Accuracy	Inter Observer Agreement (Kappa Value /p value)
Reference (Author/Year)	Radiographic Fusion Criteria	Sensitivity	Specificity	Positive Predictive Value	Negative Predictive Value	Positive Likelihood Ratio	Negative Likelihood Ratio	Correlation with Surgical Exploration	Accuracy	Inter Observer Agreement (Kappa Value /p value)
Laasonen et al. (1989)¹⁰	Plain/Flexion-Extension Radiographs: N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Laasonen et al. (1989)¹⁰	CT: N/A	N/A	N/A	N/A	N/A	N/A	N/A	78%	N/A	N/A
Brodsky et al. (1991)¹¹	Plain Radiogrpahs: N/A	89%	60%	76.4% (73.81-78.88)	78.3% (73.39-82.67)	2.22 (1.93-2.56)	0.19 (0.14-0.25)	64%	77% (73.88-79.99)	N/A
	Flexion-Extension Radiographs: N/A	96%	37%	70.5% (66.97-73.90)	86.1% (71.51-93.87)	1.52 (1.29-1.80)	0.10 (0.04-0.25)	N/A	73.1% (66.71-78.93)	N/A
	CT: Evidence of bony continuity of the fusion mass	63%	86%	78.7% (70.77-84.97)	73.2% (67.80-78.15)	4.16 (2.72-6.35)	0.41 (0.31-0.53)	57%	75.4% (69.47-80.73)	N/A
Kant et al. 1995¹²	Plain Radiographs: Evidence of solid bone from one transverse process to the other transverse process or when oblique views revealed obliteration and fusion in facet joints	54%	76%	49.5% (37.59-61.59)	78.3% (70.54-84.55)	2.19 (1.34-3.57)	0.61 (0.41-0.93)	68%	68.7% (59.84-76.67)	N/A
Larsen et al. (1996)¹³	Plain Radiographs: Evidence of bringing bony trabeculae	42%	89%	83.3% (41.19-97.27)	53.3% (40.19-66.03)	3.75 (0.53-26.77)	0.66 (0.39-1.12)	62%	61.9% (38.44-81.89)	N/A
	Flexion-Extension Radiographs: <3° of motion on flexion-extension.	86%	0%	60% (52.57-66.99)	0%	0.86 (0.63-1.16)	0	N/A	54.5% (23.38-83.25)	N/A
	CT: Evidece of bringing bony trabeculae.	53%	78%	80% (51.89-93.69)	50% (34.44-65.56)	2.40 (0.65-8.90)	0.60 (0.32-1.14)	63%	62.5% (40.59-81.20)	N/A
	Bone Scintigraphy: Lack of increased uptake on bone	83%	25%	62.5% (50.94-72.79)	50% (14.88-85.12)	1.11 (0.69-1.78)	0.67 (0.12-3.81)	60%	60% (36.05-80.88)	N/A
Jacobson et al. (1997)¹⁴	Ultrasonography: Presence of an echogenic and shadowing interface that bridged contiguos vertebral body levels	100%	60%	71.4% (53.92-84.23)	100% (54.07-100.0)	2.50 (1.17-5.34)	0	80%	80% (56.34-94.27)	N/A
Albert et al. (1998)¹⁵	SPECT: Absence of increaed bone uptake beyond background signal on coronal, sagittal, or transverse images	50%	58%	41.1% (25.68-58.65)	66.6% (51.74-78.86)	1.20 (0.59-2.43)	0.86 (0.46-1.60)	N/A	55.2% (38.30-71.38)	K=0.08 (p = 0.6)
Bohnsack et al. (1999)¹⁶	Bone Scintigraphy: N/A	50%	93%	38.5% (12.67-73.01)	94.9% (87.44-98.03)	6.33 (1.47-27.35)	0.54 (0.20-1.45)	N/A	88.3% (74.64-96.15)	N/A
Kanayama et al. (2006)¹⁷	Plain / Flexion-Extension Radiographs: Evidence of bringing bony trabeculae/<5° of angular motion on flexion-extension or <2 mm of translation. CT: Evidence of bringing bony trabeculae	68.70%	71.40%	84.6% (61.96-94.89)	50% (29.64-70.36)	2.41 (0.71-8.13)	0.44 (0.18-1.04)	68%	69.5% (47.08-86.79)	N/A
Carreon et al. (2007)¹⁸	Fine-Cut CT: A facet fusion was defined as obliteration of the joint space between the superior and inferior articulating surfaces. Gutter fusion was defined as continuos trabeculated bone conecting the transverse processes.	N/A	N/A	N/A	N/A	5.19	2.9	84%	96%	K = 0.42-0.62
Carreon et al. (2008)¹⁹	Fine-Cut CT: Presence of trabecular bony bridging termed as a ‘‘sentinel sign’’ and a ‘‘posterior sentinel sign’’	97%	85%	N/A	N/A	N/A	N/A	N/A	N/A	K = 0.25/p < 0.0001
Fogel et al. (2008)²⁰	Plain Radiographs: Interbody fusion was evaluated using the BSF Scale defining fusion as a BSF-3 grade, and Posterolateral fusion was evaluated using the Lenke's Classification, defining fusion as a Lenke-A grade	100%	89%	17.3% (12.11-24.33)	100% (97.55-100.00)	8.84 (5.79-13.50)	0	88%	88.9% (83.29-93.22)	X² = 0.5595 (p = 0.455) Fisher's Exact Test = 0.046 and McNemar's Test = 5.400 (p = 0.02)
Fogel et al. (2008)²⁰	CT: Interbody fusion was evaluated using the BSF Scale defining fusion as a BSF-3 grade, and Posterolateral fusion was evaluated using the Lenke's Classification, defining fusion as a Lenke-A grade	100%	86%	21% (14.30-29.88)	100% (95.98-100.00)	7.00 (4.38-11.18)	0	N/A	86.2% (78.32-92.09)	N/A
Damgaard et al. (2010)²¹	SPECT/CT: N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Spirig et al. (2019)²²	Plain Radiography: N/A	54.2%	83.5%	N/A	N/A	N/A	N/A	N/A	N/A	N/A
	CT: Absence of peri-screw osteolysis or radiolucent zones.	64.8%	96.7%	N/A	N/A	N/A	N/A	N/A	N/A	N/A
	MRI: Absence of peri-screw edema	43.9%	92.1%	N/A	N/A	N/A	N/A	N/A	N/A	N/A

CT, Computed Tomography; SPECT, Single Proton Emission Computed Tomography; BSF Scale, Brantigan-Steffe-Fraser Scale; MRI, Magnetic Resonance Imaging; X², Chi-square Value; N/A, Not Available.

Interobserver Agreement

Only 4 studies (30.7%) assessed the agreement between different observers for their qualitative fusion criteria, and one of them for the employed classifications. Most studies described this parameter using Cohen’s Kappa values, with the sole exception of the study by Fogel G, et al., which described interobserver agreement for X-ray interpretation comparing interbody and posterolateral fusion employing the chi-square test, as well as Fisher's and McNemar's tests. Of note is to mention that none of the statistics showed a significative difference. The rest of the details are provided in Table 3 and Appendix A.

Discussion

The current study provides a comprehensive review of the current literature regarding the radiographic assessment of patients undergoing both interbody and posterolateral lumbar fusion procedures. Despite the great technological advances in neuro-imaging techniques, the current literature review has shown that nowadays there are no universally accepted radiological criteria for assessing a successful lumbar arthrodesis, which is paramount in the management of patients undergoing lumbar fusion surgery. As was previously stated by Choudry et al. the surgical exploration to identify, under direct observation, the occurrence of a solid arthrodesis, remains as a hypothetical gold standard, however in most cases and medical settings, this represents an impractical alternative.^19,23

A previous systematic review by Duits A, et al. outlined the substantial variation regarding the different fusion criteria and/or classification systems employed throughout the literature, this makes comparing the described fusion outcomes between studies a difficult task to achieve, if not almost impossible. An example of this were the studies by Kant A, et al. and Larsen J, et al. They both assessed the presence of bridging bone on plain radiographs as a lumbar fusion criterion, however they reported successful fusion rates of 69% and 64% respectively. Later, Fogel G, et al. obtained a fusion rate of 87% also employing the same descriptive criteria on plain radiography.^12,13,20,24

As we showed, descriptive criteria were the most frequently employed methods for assessing lumbar fusion. Among these, the presence of bony bridging trabeculae was identified as the most common criteria for evaluating lumbar fusion. This criterion was primarily assessed through plain radiographs and/or computed tomography (CT). Regarding plain radiography, it is still nowadays a widespread method because of its relatively low cost, availability, and long history as a method of assessing fusion in spine surgical patients. However, plain radiographs are just bi-dimensional projections, whereas pseudarthrosis must be considered as a three-dimensional and dynamic entity. In addition, because many posterior fusion procedures are often performed in addition to laminectomy and/or variable degrees of other bone osteotomies, the subsequent movement of the posterior elements cannot be always properly assessed. To solve this problem, flexion-extension radiographs are often advocated to detect motion within the operated segment as a sign of pseudarthrosis. In this context, Shen F, et al. reported that there are instances where signs of pseudarthrosis may not be evident on plain radiographs, CT, or MRI. However, flexion-extension radiography can reveal both signs of angular and linear translational motion between operated lumbar segments. Nevertheless, these findings should be interpreted carefully, because the simple absence of motion is not necessarily an indicative of a successful arthrodesis, and conversely, the presence of motion is not a pathognomonic sign of pseudarthrosis. According to Larsen J, et al. in the absence of lumbar instrumentation, up to 2°-3° of motion demonstrated by flexion-extension radiographs is a finding thought to be normal, and this is considered to be due to “springiness” within the bone graft. While in other patients, this permissive motion may have been prevented by transpedicular instrumentation. Because of this, they concluded that flexion-extension radiographs had poor, or no value in assessing the functional integrity of lumbar fusion in the presence of a transpedicular instrumentation construct.^13,23-27

The observed correlation between plain/dynamic radiography and surgical exploration was also heterogeneous. Studies employing these imaging techniques described different rates in terms of both sensitivity and specificity, and only one study provided information regarding their accuracy and their interobserver reliability, however, none of them resulted statistically significant.

Addressing the current role of Computed Tomography (CT) as a reliable method for assessing lumbar fusion, it is important to highlight the observed differences between the reviewed studies employing CT scans in terms of sensitivity and specificity. Laasonen E, et al. and Brodsky et al. were the first ones to describe a correlation between CT findings assessing lumbar fusion and surgical exploration in patients with suspected pseudarthrosis. On both studies, the use of CT rendered a correlation ranging from 57% to 78% compared to open surgical direct observation. Later, Carreon L, et al. reported a higher correlation for CT employing fine cuts, as well as an 97% for its sensitivity and 85% of specificity. This difference is explained by the fact that the early studies involved the use of axial sequences alone, as well as the use of thicker axial slices (i.e. 6 mm) compared to the later study, which employed fine axial cuts and multiplanar reconstructions.^10,11,23

Despite of this, the use of CT is nowadays, one of the most preferred methods assessing both, interbody and posterolateral fusion because it is a fast and cost-effective technique, which also offers the potential for obtaining high-quality images, also providing a detailed image of bone tissue, which makes CT able to identify subsidence and lucency around fusion hardware as a possible sign of pseudarthrosis. On the other side, CT is not exempt of disadvantages; it is well-known that the use of metallic hardware such as interbody cages, and lumbar instrumentation hardware may represent an obstacle due the presence of metallic artefacts. To address this problem, Rothman S, et al. introduced the concept of curved coronal reconstructions using CT scans, obtaining a better visualization of the lumbar spine. Later, Lang P, et al. described the use of multiplanar CT reconstructions on 30 patients with clinically suspected spinal fusion pseudarthrosis. They found that sagittal, axial, and curved coronal two-dimensional reconstructions were more accurate in the detection of bony nonunion compared with traditional axial CT images. They advocated the use of multiplanar CT scans as an adjunctive imaging method in the evaluation of posterior lumbar fusion patients. Finally, Williams A, et al. developed and proposed, in conjunction with a group of spine surgeons, a CT scan protocol to periodically monitor the progress of interbody fusions. This protocol suggests performing CT scans at 3, 6, 12, and 24 months after performing a fusion procedure, or until a solid arthrodesis is ascertained.^25,28-32

Continuing with studies employing Single Proton Emission Computed Tomography (SPECT) scans and Bone Scintigraphy, both modalities use Technetium 99m-labeled (^99mTC) phosphonates as their radionuclide tracer. These ^99mTc-labeled phosphonates have the property of being adsorbed both onto and into the crystalline structure of hydroxyapatite, making them a convenient way to identify bone-remodeling zones. In our review, Larsen J, et al. described the use of bone scintigraphy in 25 patients and compared its accuracy with other imaging modalities such as plain radiography, flexion-extension radiography, and CT scans. On that series, the accuracy of bone scintigraphy was around 60%, with 83%, and 25% for sensitivity and specificity respectively. Later, Bohnsack M, et al. in a retrospective study of 42 patients with a prior history of lumbar fusion with transpedicular instrumentation and who were candidates for revision surgery; They performed bone scintigraphy studies before surgical revision, obtaining a reported accuracy of 88%, but a very low sensitivity (50%) and high specificity (93%). These findings are similar to those reported by Albert T, et al. who used SPECT to evaluate a series of 38 patients, obtaining a very low accuracy (55.2%), as well as a low sensitivity (50%) and specificity (58%). These findings provide evidence suggesting that these imaging modalities are not reliable enough to assess the functional status of lumbar arthrodesis.^13,15,16

Evaluating the use of other imaging techniques, our database search identified only one paper using MRI to assess non-union in lumbar fusion patients. In the study by Spirig J, et al. MRI rendered a very low sensitivity (43.9%) and a very high specificity (92.1%), adding little information to the standard protocols. According to Kröner A, et al. the use of MRI may demonstrate the presence of bone bridging trabeculae through the interbody cages using coronal plane images, also describing changes within the vertebral body’s bone marrow as a sign of functional instability related to failed fusion. However, metallic hardware also produces artifacts that may make reliable evaluation difficult. On this regard, Stradiotti P, et al. advocated the use of fast spin-echo sequences, as well as Short Tau Inversion Recovery (STIR) sequences, as options to minimize metallic-artifact interference on patients with a prior history of lumbar spine constructs.^22,33,34

Regarding the use of ultrasonography as an option to assess lumbar fusion status, our review did not find other studies involving the use of this modality. In the study by Jacobson J, et al. the obtained results corresponded to a small group of patients, which makes difficult to extrapolate its reliability to other settings. In our opinion, there is not enough evidence to recommend the use of ultrasonography for lumbar fusion assessment.¹⁴

Finally, on evaluating the studies describing interobserver agreement, Carreon L, et al. reported a retrospective study of 93 patients with a prior history of instrumented posterolateral fusion who were evaluated employing CT scans prior to surgical revision, they employed 3 spine surgeons for evaluate the obtained CT scans and compared their opinion regarding the status of fusion for every patient. The author found that the interobserver variability was lower for assessment of posterolateral fusion (k = 0.62) than it was for facet fusion status (k = 0.42), however, none of these results resulted statistically significant. Also, the results by other authors describing interobserver agreement are only applicable to their own clinical settings.^19,23

Study Limitations

It is necessary to recognize certain limitations of the current study. Firstly, it should be noted that the obtained results are challenging to extrapolate due to the abscence of studies describing a multi-centric population. Consequently, the observed results may not represent the entire population of patients undergoing lumbar fusion assessment. Another significant limitation is the lack of studies describing features such as interobserver agreement, which indirectly evaluates the accuracy of a diagnostic test in a specific population. Additionally, many of the reviewed studies had relatively small populations, though it is important to recognize that the evaluated patients belong to a specific subgroup suspected of having a failed fusion. This supports the use of revision surgery as the current benchmark. Furthermore, the use of different types of bone graft and/or hardware may increase the risk of bias when evaluating patients, thus making it challenging to establish more general criteria.

Conclusions

Lumbar interbody fusion surgery remains as the mainstay of treatment for a wide range of lumbar spine diseases. Despite the major advances in terms of surgical approaches, hardware material and development of different bone graft substitutes offering better surgical outcomes in the short time. There is still a lack of agreement between imaging studies assessing the status of fusion in this group of patients. Also, in the current literature regarding lumbar fusion assessment, there is an important lack of consensus regarding what would be considered a “successful fusion”, while many clinical settings employ different criteria. It would be desirable for the development of multicentric studies evaluating the reliability of the different criteria to propose a more standardizable criteria.

Supplemental Material

Supplemental Material - Radiological Assessment of Lumbar Fusion Status: Which Imaging Modality is Best Assessing Non-union in Lumbar Spine Pseudarthrosis?

Supplemental Material for Radiological Assessment of Lumbar Fusion Status: Which Imaging Modality is Best Assessing Non-union in Lumbar Spine Pseudarthrosis? by Enrique González-Gallardo, Justyna O. Ekert, Harshvardhan G. Iyer, Juan Pablo Navarro-García de Llano, Jesús E. Sánchez-Garavito, Michaelides Loizos, Jorge Ríos-Zermeño, Ian A. Buchanan, Kingsley O. Abode-Iyamah, Eric W. Nottmeier, Selby G. Chen, Stephen M. Pirris, Oluwaseun O. Akinduro, Alfredo Quiñones-Hinojosa, and Rodrigo Navarro-Ramírez in Global Spine Journal

Footnotes

Acknowledgment

The author gratefully acknowledges the support of the Mission:Brain Foundation and the Juan Beckmann Foundation for their generous scholarship, which made this work possible.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Enrique González-Gallardo

Jesús E. Sánchez-Garavito

Jorge Ríos-Zermeño

Rodrigo Navarro-Ramírez

Supplemental Material

Supplemental material for this article is available online.

References

Mobbs

Phan

Malham

Seex

Rao

. Lumbar interbody fusion: techniques, indications and comparison of interbody fusion options including PLIF, TLIF, MI-TLIF, OLIF/ATP, LLIF and ALIF. J Spine Surg. 2015;1(1):2-18.

Gruskay

Webb

Grauer

. Methods of evaluating lumbar and cervical fusion. Spine J. 2014;14(3):531-539.

Saleem

Aslam

Rehmani

MAK

Raees

Alvi

Ashraf

. Lumbar disc degenerative disease: disc degeneration symptoms and magnetic resonance image findings. Asian Spine J. 2013;7(4):322-334.

Albee

. Transplantation of a portion of the tibia into the spine for Pott’s disease: a preliminary report 1911. Clin Orthop Relat Res. 2007;460:14-16.

Hibbs

. An operation for progressive spinal deformities: a preliminary report of three cases from the service of the orthopaedic hospital. 1911. Clin Orthop Relat Res. 2007;460:17-20.

Raizman

O’Brien

Poehling-Monaghan

. Pseudarthrosis of the spine. J Am Acad Orthop Surg. 2009;17(8):494-503.

Chun

Baker

Hsu

. Lumbar pseudarthrosis: a review of current diagnosis and treatment. Neurosurg Focus. 2015;39(4):E10.

Sugiyama

Wullschleger

Wilson

Williams

Goss

. Reliability of clinical measurement for assessing spinal fusion: an experimental sheep study. Spine (Phila Pa 1976) [Internet]. 2012;37(9):763-768.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ [Internet]. 2021;372:n71.

10.

Laasonen

Soini

. Low-back pain after lumbar fusion. Surgical and computed tomographic analysis. Spine (Phila Pa 1976) [Internet]. 1989;14(2):210-213.

11.

Brodsky

Kovalsky

Khalil

. Correlation of radiologic assessment of lumbar spine fusions with surgical exploration. Spine (Phila Pa 1976) [Internet]. 1991;16(6 Suppl):S261-S265.

12.

Kant

Daum

Dean

Uchida

. Evaluation of lumbar spine fusion. Plain radiographs versus direct surgical exploration and observation. Spine (Phila Pa 1976) [Internet]. 1995;20(21):2313-2317.

13.

Larsen

Rimoldi

Capen

Nelson

Nagelberg

Thomas

. Assessment of pseudarthrosis in pedicle screw fusion: a prospective study comparing plain radiographs, flexion/extension radiographs, CT scanning, and bone scintigraphy with operative findings. J Spinal Disord. 1996;9(2):117-120.

14.

Jacobson

Starok

Pathria

Garfin

. Pseudarthrosis: US evaluation after posterolateral spinal fusion: work in progress. Radiology. 1997;204(3):853-858.

15.

Albert

Pinto

Smith

Balderston

Cotler

Park

. Accuracy of SPECT scanning in diagnosing pseudoarthrosis: a prospective study. J Spinal Disord. 1998;11(3):197-199.

16.

Bohnsack

Gossé

Rühmann

Wenger

. The value of scintigraphy in the diagnosis of pseudarthrosis after spinal fusion surgery. J Spinal Disord. 1999;12(6):482-484.

17.

Kanayama

Hashimoto

Shigenobu

Yamane

Bauer

Togawa

. A prospective randomized study of posterolateral lumbar fusion using osteogenic protein-1 (OP-1) versus local autograft with ceramic bone substitute: emphasis of surgical exploration and histologic assessment. Spine (Phila Pa 1976) [Internet]. 2006;31(10):1067-1074.

18.

Carreon

Djurasovic

Glassman

Sailer

. Diagnostic accuracy and reliability of fine-cut CT scans with reconstructions to determine the status of an instrumented posterolateral fusion with surgical exploration as reference standard. Spine (Phila Pa 1976) [Internet]. 2007;32(8):892-895.

19.

Carreon

Glassman

Schwender

Subach

Gornet

Ohno

. Reliability and accuracy of fine-cut computed tomography scans to determine the status of anterior interbody fusions with metallic cages. Spine J. 2008;8(6):998-1002.

20.

Fogel

Toohey

Neidre

Brantigan

. Fusion assessment of posterior lumbar interbody fusion using radiolucent cages: X-ray films and helical computed tomography scans compared with surgical exploration of fusion. Spine J. 2008;8(4):570-577.

21.

Damgaard

Nimb

Madsen

. The role of bone SPECT/CT in the evaluation of lumbar spinal fusion with metallic fixation devices. Clin Nucl Med. 2010;35(4):234-236.

22.

Spirig

Sutter

Götschi

Farshad-Amacker

Farshad

. Value of standard radiographs, computed tomography, and magnetic resonance imaging of the lumbar spine in detection of intraoperatively confirmed pedicle screw loosening-a prospective clinical trial. Spine J. 2019;19(3):461-468.

23.

Choudhri

Mummaneni

Dhall

, et al. Guideline update for the performance of fusion procedures for degenerative disease of the lumbar spine. Part 4: radiographic assessment of fusion status. J Neurosurg Spine. 2014;21(1):23-30.

24.

Duits

AAA

van Urk

Lehr

, et al. Radiologic assessment of interbody fusion: a systematic review on the use, reliability, and accuracy of current fusion criteria. JBJS Rev 2024;12(1):1-9.

25.

Williams

Gornet

Burkus

. CT evaluation of lumbar interbody fusion: current concepts. AJNR Am J Neuroradiol. 2005;26(8):2057-2066.

26.

Schuler

Subach

Branch

Foley

Burkus

Lumbar Spine Study Group . Segmental lumbar lordosis: manual versus computer-assisted measurement using seven different techniques. J Spinal Disord Tech. 2004;17(5):372-379.

27.

Shen

Samartzis

. Assessment of lumbar fusion: importance of dynamic plain standing x-rays. J Am Coll Surg. 2008;207(6):955-956.

28.

Peters

MJM

Bastiaenen

CHG

Brans

Weijers

Willems

. The diagnostic accuracy of imaging modalities to detect pseudarthrosis after spinal fusion-a systematic review and meta-analysis of the literature. Skelet Radiol. 2019;48(10):1499-1510.

29.

Shah

Mohammed

Saifuddin

Taylor

. Comparison of plain radiographs with CT scan to evaluate interbody fusion following the use of titanium interbody cages and transpedicular instrumentation. Eur Spine J. 2003;12(4):378-385.

30.

Cook

Patron

Christakis

Bailey

Banta

Glazer

. Comparison of methods for determining the presence and extent of anterior lumbar interbody fusion. Spine (Phila Pa 1976) [Internet]. 2004;29(10):1118-1123.

31.

Lang

Genant

Chafetz

Steiger

Morris

. Three-dimensional computed tomography and multiplanar reformations in the assessment of pseudarthrosis in posterior lumbar fusion patients. Spine (Phila Pa 1976) [Internet]. 1988;13(1):69-75.

32.

Rothman

Dobben

Rhodes

Glenn

Azzawi

. Computed tomography of the spine: curved coronal reformations from serial images. Radiology. 1984;150(1):185-190.

33.

Kröner

Eyb

Lange

Lomoschitz

Mahdi

Engel

. Magnetic resonance imaging evaluation of posterior lumbar interbody fusion. Spine (Phila Pa 1976) [Internet]. 2006;31(12):1365-1371.

34.

Stradiotti

Curti

Castellazzi

Zerbi

. Metal-related artifacts in instrumented spine. Techniques for reducing artifacts in CT and MRI: state of the art. Eur Spine J. 2009;18(Suppl 1):102-108.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.56 MB