Abstract
Study Design
Systematic Review.
Objective
To assess the current literature regarding the accuracy of different imaging modalities and criteria used to assess lumbar fusion, and their correlation with surgical direct observation as the current Gold Standard.
Methods
Following PRISMA guidelines, we conducted a comprehensive search of PubMed, Embase, Google Scholar, and Cochrane Library, studies were included if they focused on patients with a prior history of lumbar interbody fusion and at least 1 year of radiographic follow-up. The review assessed the sensitivity, specificity, and accuracy of different imaging techniques, and their correlation with surgical findings.
Results
Thirteen studies (1989-2019) were reviewed, including 715 patients, common imaging modalities included plain radiographs (53.8%), Computed Tomography (CT) (69.2%), and dynamic radiographs (30.7%). CT appeared as the most utilized modality post-2006. There was substantial variability in diagnostic accuracy, with CT showing high variability in sensitivity and specificity. Descriptive criteria for fusion were widely used, but interobserver agreement was generally low.
Conclusion
The review highlights a lack of standardized criteria for assessing lumbar fusion. Despite advancements in imaging techniques, the variability in diagnostic parameters suggests a need for consensus and multicentric studies to stablish reliable, universal criteria for evaluating fusion success.
Introduction
Lumbar Disc Degenerative Disease (LDDD) is the leading cause of low back pain worldwide, and the most common cause of disability in the aging population. LDDD and facet joint disease may result in mechanical back pain, radicular symptoms, increased morbidity and poor quality of life. In this regard, Lumbar Interbody Fusion (LIF) surgery is the mainstay of treatment for a wide range of lumbar spine diseases, including degenerative, traumatic, infectious and tumoral pathologies. The concept of “Spinal Fusion” was first introduced by Albee and Hibbs as a successful surgical treatment for Pott’s disease.1-5
Lumbar Interbody Fusion procedures require the placement of an implant such as cages, spacers, or structural bone-grafts within the intervertebral space after performing a discectomy and proper preparation of the adjacent endplates. Most of the time, posterior instrumentation and/or different bone graft options are advocated to improve the fusion rate. Despite this, a common complication following LIF procedures is pseudarthrosis.1,2
The term “pseudarthrosis” (Greek pseudo = false, and arthrosis = joint) is used to describe a failure in spinal fusion, defined as the lack of osseous bridging at more than 1 year after surgery. The incidence of pseudarthrosis after a LIF surgery may range from 5% to 35%, with a higher incidence in patients requiring fusions of 3 or more spinal levels. Patients who develop pseudarthrosis may complain of axial or radicular pain as their initial symptoms, while clauditory and/or myelopathic symptoms may present later. However, in up to 50% of patients pseudarthrosis may be completely asymptomatic despite radiographic findings of nonunion. Even in patients with symptoms suggestive of pseudarthrosis, findings on physical examination are often nonspecific, needing further radiographic confirmation.2,6,7
Addressing this problem, routine follow-up of patients undergoing LIF often include radiographic studies along with physical examination at determined intervals after surgery. Over the past several decades, many imaging techniques have been proposed to evaluate fusion outcomes, including plain static or dynamic radiographs, computed tomography (CT), magnetic resonance imaging (MRI), bone scintigraphy, and radio stereometric assessment (RSA). However, at present there is not a universal consensus on imaging criteria for assessing fusion.7,8
Given the lack of consensus regarding radiographic techniques assessing fusion rate in patients undergoing LIF procedures, we performed a systematic review of the current literature to evaluate this topic.
Methods
This review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. 9
Search Strategy
A systematic search of electronic databases of PubMed, Embase, Google Scholar, and Cochrane Library was conducted from their inception to August 2nd, 2024, to identify all publications in the English language. The search strategy comprised a combination of the keywords “pseudarthrosis” OR “pseudoarthrosis” AND “lumbar spine” OR “lumbar spine fusion assessment” OR “arthrodesis” OR “fusion” OR “lumbar arthrodesis” OR “lumbar spine fusion outcome” AND “plain radiograph” OR “dynamic radiograph” OR “flexion-extension radiograph” OR “flexion extension radiograph” OR “computed tomography” OR “CT” OR “magnetic resonance imaging” OR “MRI”.
Eligibility Criteria
Eligible articles and their reference lists were reviewed for additional relevant articles and added to the analysis. Inclusion criteria regarded patients with a minimum age of 18 years who had a previous history of a lumbar interbody fusion and/or lumbar posterolateral fusion procedure, and a minimum of 1-year of postoperative radiographic follow-up. Regarding radiographic modalities, further inclusion criteria encompassed the reported sensitivity and specificity of the imaging modality; Additionally, studies that correlate radiographic findings with surgical exploration were considered, as surgical exploration is currently the Gold Standard for lumbar fusion assessment.
Study Selection
All available results were exported to the Elsevier Reference Manager Mendeley (Elsevier; Amsterdam, the Netherlands) to exclude duplicates. Two authors (E.G.G. and R.N.R.) independently reviewed the titles and abstracts of the remaining articles. All articles with no relevance to radiological non-union assessment in lumbar spine pseudarthrosis were excluded.
Risk of Bias Assessment
Two reviewers (J.O.E. and H.G.I.) independently assessed the risk of bias for the included studies. Depending on the reported outcomes on each study, we employed the Quality Appraisal of Reliability Studies (QAREL), and the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) checklists. The resulting risk of bias was considered high for studies assessing reliability if less than 60% of the signaling questions for each domain of the QAREL checklist were answered “yes”, and for studies evaluating accuracy, if more than 2 of the QUADAS-2 signaling questions for each domain were answered with either “no” or “unclear”.
Data Extraction
The following information was extracted and summarized in tables: (1) Authors and date of publication, (2) number of treated patients, (3) age and sex, (4) information regarding previous fusion procedures depending on each case, (5) follow-up time, (6) time to radiographic presentation of pseudarthrosis, (7) employed radiographic modality, (8) radiographic fusion and/or non-union criteria, (9) surgical outcome describing the rate of successful fusion cases and non-union cases, and 10) imaging correlation with surgical findings depending on each case. For studies describing diagnostic parameters for the employed imaging modality information regarding sensitivity and specificity, and predictive values was extracted. Summarized data is presented descriptively.
Data Analysis
The initial search resulted in 865 articles (186 articles from PubMed, 475 from Google Scholar, 85 from Embase, and 119 from Cochrane Library). After searching for duplicates, 575 articles were selected for the first screening. The initial screening excluded 505 articles, this was due to non-full-text availability, studies performed on animal subjects or articles out of the study’s aim. 70 full-text articles were then screened for eligibility, excluding 5 additional articles due to inclusion criteria. Finally, 13 articles were included in our final analysis. Details are provided in Figure 1. Flow chart of the systematic review according to PRISMA guidelines.
Results
Systematic Literature Review and Population Demographics
13 studies published between 1989 and 2019 were revised. Most of studies were from North America (61.5%), followed by Europe (30.7%) and Asia (7.6%). The most frequent study design were retrospective studies (53.8%), whereas 6 (46.1%) were prospective cohort studies, including 1 (7.6%) randomized controlled study. We identified reports of 715 patients with mean ages ranging from 42-78 years. From overall, information of 615 patients regarding gender is available, with 309 (43.2%) male participants, 306 (42.7%) females, and the remaining 13.9% is not listed.
Surgical Approaches
List of Studies Included in our Systematic Review.
Imaging Modality
All reviewed studies employed different imaging modalities, such as conventional plain radiographs (N = 7, 53.8%), dynamic flexion-extension radiographs (N = 4, 30.7%), computed tomography (N = 9, 69.2%), single proton emission computed tomography (SPECT) and/or bone scintigraphy (N = 2, 15.3% for each one), while the use of magnetic resonance imaging and ultrasonography were the least frequently employed for fusion assessment (N = 1, 7.6% each one). The studies also reflected the trend-change among the different imaging modalities employed for fusion evaluation over time; For example, before 2006, the use of computed tomography was reported on 57.4% (N = 4) of the studies, reflecting its status as a relatively new imaging technique at that time, and therefore its use was not so widespread, in addition to the technical limitations of that time, for example, the average thickness of the axial cuts was initially 3-6 millimeters. Then, after 2006, CT became the most used modality for lumbar fusion assessment (N = 6, 100%), while other techniques such as bone scintigraphy showed a downward trend (N = 1, 16.6%).
Radiographic Criteria of Lumbar Fusion
The reviewed studies used different criteria for fusion definition, regardless of if it was IBF and/or PLF. However, they can all be grouped as studies describing positive signs of fusion and/or the absence of negative findings. Positive signs referred to the presence of an imaging feature such as bony bridging trabeculae between vertebral bodies and/or transverse processes, space obliteration between 2 adjacent articulating joints and other signs of bone maturation. Negative signs were defined as the absence of dynamic measures of instability such as angular motion or signs of translation between 2 vertebral segments, static parameters of instability, presence of radiolucency, and other parameters. From overall, more than two-thirds of reviewed articles (N = 10, 76.9%) assessed lumbar fusion employing descriptive criteria, with 2 studies (20%) among them differentiating between interbody fusion and posterolateral fusion, 1 study (10%) described a classification for each 1; On the remaining 23% of studies the employed fusion criteria were not clearly described.
Only 15.3% of reviewed studies described quantitative instability criteria for failed fusion in patients evaluated with flexion-extension radiographs, however there were differences between their cutoffs ranging from <3-5 degrees of angular motion in a determined vertebral level, and <2 millimeters in anteroposterior linear motion. It is of interest to mention that studies assessing instability parameters considered a “permissive” range of motion between 2-3°, which was present on some of the patients considered as adequately fused on surgical exploration.
Identified Radiographic Criteria for Assessing Lumbar Fusion.
Diagnostic Accuracy and Reliability of Radiographic Modalities Assessing Fusion
There was a significant variability in the diagnostic parameters among the different imaging modalities used in the reviewed studies. In overall, ten of the reviewed studies (76.9%) contained information regarding the sensitivity and specificity of their employed imaging modalities. For plain radiographs, the sensitivity ranged from 42% up to 100%, and their specificity ranged between 60% and 89%. Followed by CT with a sensitivity range of 53%-100% and specificity range of 78%-96.7%; and Bone Scintigraphy with a sensitivity range of 50%-83% and specificity range of 25%–93%. Regarding flexion-extension radiographs, they exhibited a sensitivity and specificity ranging from 68.7% to 96%, and from 0% to 71.4% respectively, thus being the study with the highest observed variability. The studies with the least variability in terms of these parameters were SPECT with a sensitivity of 50% and specificity of 58%, which was employed in 2 studies. Finally, MRI had a sensitivity of 43.9% and specificity of 92.1%, and ultrasonography with 100% and 60% respectively. It is important to note that the former 2 imaging modalities were employed only on 1 study for each 1.
Accuracy
Diagnostic Parameters of the Studies Included in our Systematic Review.
Interobserver Agreement
Only 4 studies (30.7%) assessed the agreement between different observers for their qualitative fusion criteria, and one of them for the employed classifications. Most studies described this parameter using Cohen’s Kappa values, with the sole exception of the study by Fogel G, et al., which described interobserver agreement for X-ray interpretation comparing interbody and posterolateral fusion employing the chi-square test, as well as Fisher's and McNemar's tests. Of note is to mention that none of the statistics showed a significative difference. The rest of the details are provided in Table 3 and Appendix A.
Discussion
The current study provides a comprehensive review of the current literature regarding the radiographic assessment of patients undergoing both interbody and posterolateral lumbar fusion procedures. Despite the great technological advances in neuro-imaging techniques, the current literature review has shown that nowadays there are no universally accepted radiological criteria for assessing a successful lumbar arthrodesis, which is paramount in the management of patients undergoing lumbar fusion surgery. As was previously stated by Choudry et al. the surgical exploration to identify, under direct observation, the occurrence of a solid arthrodesis, remains as a hypothetical gold standard, however in most cases and medical settings, this represents an impractical alternative.19,23
A previous systematic review by Duits A, et al. outlined the substantial variation regarding the different fusion criteria and/or classification systems employed throughout the literature, this makes comparing the described fusion outcomes between studies a difficult task to achieve, if not almost impossible. An example of this were the studies by Kant A, et al. and Larsen J, et al. They both assessed the presence of bridging bone on plain radiographs as a lumbar fusion criterion, however they reported successful fusion rates of 69% and 64% respectively. Later, Fogel G, et al. obtained a fusion rate of 87% also employing the same descriptive criteria on plain radiography.12,13,20,24
As we showed, descriptive criteria were the most frequently employed methods for assessing lumbar fusion. Among these, the presence of bony bridging trabeculae was identified as the most common criteria for evaluating lumbar fusion. This criterion was primarily assessed through plain radiographs and/or computed tomography (CT). Regarding plain radiography, it is still nowadays a widespread method because of its relatively low cost, availability, and long history as a method of assessing fusion in spine surgical patients. However, plain radiographs are just bi-dimensional projections, whereas pseudarthrosis must be considered as a three-dimensional and dynamic entity. In addition, because many posterior fusion procedures are often performed in addition to laminectomy and/or variable degrees of other bone osteotomies, the subsequent movement of the posterior elements cannot be always properly assessed. To solve this problem, flexion-extension radiographs are often advocated to detect motion within the operated segment as a sign of pseudarthrosis. In this context, Shen F, et al. reported that there are instances where signs of pseudarthrosis may not be evident on plain radiographs, CT, or MRI. However, flexion-extension radiography can reveal both signs of angular and linear translational motion between operated lumbar segments. Nevertheless, these findings should be interpreted carefully, because the simple absence of motion is not necessarily an indicative of a successful arthrodesis, and conversely, the presence of motion is not a pathognomonic sign of pseudarthrosis. According to Larsen J, et al. in the absence of lumbar instrumentation, up to 2°-3° of motion demonstrated by flexion-extension radiographs is a finding thought to be normal, and this is considered to be due to “springiness” within the bone graft. While in other patients, this permissive motion may have been prevented by transpedicular instrumentation. Because of this, they concluded that flexion-extension radiographs had poor, or no value in assessing the functional integrity of lumbar fusion in the presence of a transpedicular instrumentation construct.13,23-27
The observed correlation between plain/dynamic radiography and surgical exploration was also heterogeneous. Studies employing these imaging techniques described different rates in terms of both sensitivity and specificity, and only one study provided information regarding their accuracy and their interobserver reliability, however, none of them resulted statistically significant.
Addressing the current role of Computed Tomography (CT) as a reliable method for assessing lumbar fusion, it is important to highlight the observed differences between the reviewed studies employing CT scans in terms of sensitivity and specificity. Laasonen E, et al. and Brodsky et al. were the first ones to describe a correlation between CT findings assessing lumbar fusion and surgical exploration in patients with suspected pseudarthrosis. On both studies, the use of CT rendered a correlation ranging from 57% to 78% compared to open surgical direct observation. Later, Carreon L, et al. reported a higher correlation for CT employing fine cuts, as well as an 97% for its sensitivity and 85% of specificity. This difference is explained by the fact that the early studies involved the use of axial sequences alone, as well as the use of thicker axial slices (i.e. 6 mm) compared to the later study, which employed fine axial cuts and multiplanar reconstructions.10,11,23
Despite of this, the use of CT is nowadays, one of the most preferred methods assessing both, interbody and posterolateral fusion because it is a fast and cost-effective technique, which also offers the potential for obtaining high-quality images, also providing a detailed image of bone tissue, which makes CT able to identify subsidence and lucency around fusion hardware as a possible sign of pseudarthrosis. On the other side, CT is not exempt of disadvantages; it is well-known that the use of metallic hardware such as interbody cages, and lumbar instrumentation hardware may represent an obstacle due the presence of metallic artefacts. To address this problem, Rothman S, et al. introduced the concept of curved coronal reconstructions using CT scans, obtaining a better visualization of the lumbar spine. Later, Lang P, et al. described the use of multiplanar CT reconstructions on 30 patients with clinically suspected spinal fusion pseudarthrosis. They found that sagittal, axial, and curved coronal two-dimensional reconstructions were more accurate in the detection of bony nonunion compared with traditional axial CT images. They advocated the use of multiplanar CT scans as an adjunctive imaging method in the evaluation of posterior lumbar fusion patients. Finally, Williams A, et al. developed and proposed, in conjunction with a group of spine surgeons, a CT scan protocol to periodically monitor the progress of interbody fusions. This protocol suggests performing CT scans at 3, 6, 12, and 24 months after performing a fusion procedure, or until a solid arthrodesis is ascertained.25,28-32
Continuing with studies employing Single Proton Emission Computed Tomography (SPECT) scans and Bone Scintigraphy, both modalities use Technetium 99m-labeled (99mTC) phosphonates as their radionuclide tracer. These 99mTc-labeled phosphonates have the property of being adsorbed both onto and into the crystalline structure of hydroxyapatite, making them a convenient way to identify bone-remodeling zones. In our review, Larsen J, et al. described the use of bone scintigraphy in 25 patients and compared its accuracy with other imaging modalities such as plain radiography, flexion-extension radiography, and CT scans. On that series, the accuracy of bone scintigraphy was around 60%, with 83%, and 25% for sensitivity and specificity respectively. Later, Bohnsack M, et al. in a retrospective study of 42 patients with a prior history of lumbar fusion with transpedicular instrumentation and who were candidates for revision surgery; They performed bone scintigraphy studies before surgical revision, obtaining a reported accuracy of 88%, but a very low sensitivity (50%) and high specificity (93%). These findings are similar to those reported by Albert T, et al. who used SPECT to evaluate a series of 38 patients, obtaining a very low accuracy (55.2%), as well as a low sensitivity (50%) and specificity (58%). These findings provide evidence suggesting that these imaging modalities are not reliable enough to assess the functional status of lumbar arthrodesis.13,15,16
Evaluating the use of other imaging techniques, our database search identified only one paper using MRI to assess non-union in lumbar fusion patients. In the study by Spirig J, et al. MRI rendered a very low sensitivity (43.9%) and a very high specificity (92.1%), adding little information to the standard protocols. According to Kröner A, et al. the use of MRI may demonstrate the presence of bone bridging trabeculae through the interbody cages using coronal plane images, also describing changes within the vertebral body’s bone marrow as a sign of functional instability related to failed fusion. However, metallic hardware also produces artifacts that may make reliable evaluation difficult. On this regard, Stradiotti P, et al. advocated the use of fast spin-echo sequences, as well as Short Tau Inversion Recovery (STIR) sequences, as options to minimize metallic-artifact interference on patients with a prior history of lumbar spine constructs.22,33,34
Regarding the use of ultrasonography as an option to assess lumbar fusion status, our review did not find other studies involving the use of this modality. In the study by Jacobson J, et al. the obtained results corresponded to a small group of patients, which makes difficult to extrapolate its reliability to other settings. In our opinion, there is not enough evidence to recommend the use of ultrasonography for lumbar fusion assessment. 14
Finally, on evaluating the studies describing interobserver agreement, Carreon L, et al. reported a retrospective study of 93 patients with a prior history of instrumented posterolateral fusion who were evaluated employing CT scans prior to surgical revision, they employed 3 spine surgeons for evaluate the obtained CT scans and compared their opinion regarding the status of fusion for every patient. The author found that the interobserver variability was lower for assessment of posterolateral fusion (k = 0.62) than it was for facet fusion status (k = 0.42), however, none of these results resulted statistically significant. Also, the results by other authors describing interobserver agreement are only applicable to their own clinical settings.19,23
Study Limitations
It is necessary to recognize certain limitations of the current study. Firstly, it should be noted that the obtained results are challenging to extrapolate due to the abscence of studies describing a multi-centric population. Consequently, the observed results may not represent the entire population of patients undergoing lumbar fusion assessment. Another significant limitation is the lack of studies describing features such as interobserver agreement, which indirectly evaluates the accuracy of a diagnostic test in a specific population. Additionally, many of the reviewed studies had relatively small populations, though it is important to recognize that the evaluated patients belong to a specific subgroup suspected of having a failed fusion. This supports the use of revision surgery as the current benchmark. Furthermore, the use of different types of bone graft and/or hardware may increase the risk of bias when evaluating patients, thus making it challenging to establish more general criteria.
Conclusions
Lumbar interbody fusion surgery remains as the mainstay of treatment for a wide range of lumbar spine diseases. Despite the major advances in terms of surgical approaches, hardware material and development of different bone graft substitutes offering better surgical outcomes in the short time. There is still a lack of agreement between imaging studies assessing the status of fusion in this group of patients. Also, in the current literature regarding lumbar fusion assessment, there is an important lack of consensus regarding what would be considered a “successful fusion”, while many clinical settings employ different criteria. It would be desirable for the development of multicentric studies evaluating the reliability of the different criteria to propose a more standardizable criteria.
Supplemental Material
Supplemental Material - Radiological Assessment of Lumbar Fusion Status: Which Imaging Modality is Best Assessing Non-union in Lumbar Spine Pseudarthrosis?
Supplemental Material for Radiological Assessment of Lumbar Fusion Status: Which Imaging Modality is Best Assessing Non-union in Lumbar Spine Pseudarthrosis? by Enrique González-Gallardo, Justyna O. Ekert, Harshvardhan G. Iyer, Juan Pablo Navarro-García de Llano, Jesús E. Sánchez-Garavito, Michaelides Loizos, Jorge Ríos-Zermeño, Ian A. Buchanan, Kingsley O. Abode-Iyamah, Eric W. Nottmeier, Selby G. Chen, Stephen M. Pirris, Oluwaseun O. Akinduro, Alfredo Quiñones-Hinojosa, and Rodrigo Navarro-Ramírez in Global Spine Journal
Footnotes
Acknowledgment
The author gratefully acknowledges the support of the Mission:Brain Foundation and the Juan Beckmann Foundation for their generous scholarship, which made this work possible.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
ORCID iDs
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
