Abstract
Study Design
Retrospective analysis of prospectively collected data.
Objectives
Our goal was to assess radiographic characteristics associated with agreement and disagreement in treatment recommendation in thoracolumbar (TL) burst fractures.
Methods
A panel of 22 AO Spine Knowledge Forum Trauma experts reviewed 183 cases and were asked to: (1) classify the fracture; (2) assess degree of certainty of PLC disruption; (3) assess degree of comminution; and (4) make a treatment recommendation. Equipoise threshold used was 77% (77:23 distribution of uncertainty or 17 vs 5 experts). Two groups were created: consensus vs equipoise.
Results
Of the 183 cases reviewed, the experts reached full consensus in only 8 cases (4.4%). Eighty-one cases (44.3%) were included in the agreement group and 102 cases (55.7%) in the equipoise group. A3/A4 fractures were more common in the equipoise group (92.0% vs 83.7%, P < .001). The agreement group had higher degree of certainty of PLC disruption [35.8% (SD 34.2) vs 27.6 (SD 27.3), P < .001] and more common use of the M1 modifier (44.3% vs 38.3%, P < .001). Overall, the degree of comminution was slightly higher in the equipoise group [47.8 (SD 20.5) vs 45.7 (SD 23.4), P < .001].
Conclusions
The agreement group had a higher degree of certainty of PLC injury and more common use of M1 modifier (more type B fractures). The equipoise group had more A3/A4 type fractures. Future studies are required to identify the role of comminution in decision making as degree of comminution was slightly higher in the equipoise group.
Keywords
Introduction
Thoracolumbar (TL) ‘burst type’ fractures account for 45% of all major thoracolumbar injuries.1-4 Despite being a frequent pathology, which is regularly seen and managed by spine surgeons, there is still no consensus on the indications for surgical treatment in cases without neurologic involvement. The current literature offers mixed and inconclusive results with various recommendations and multiple treatment algorithms.1,5-8 Thus, treatment of TL ‘burst type’ fractures is an example of ‘equipoise’ where there is uncertainty within the expert community on the optimal treatment approach. 9 Additionally, thoracolumbar burst fractures are an heterogenous patient population that can hamper the comparability among reported series. To find clarity in this critical clinical dilemma, an alternative method has been proposed: the equipoise methodology. The core principle of the equipoise methodology is that the inclusion of each patient is based on the presence of uncertainty as to the best management among expert reviewers.
Although the concept of ‘equipoise’ has been frequently utilized in other contexts, the definition of ‘equipoise’ in terms of the threshold of disagreement has not been clearly defined. Medical ethics researchers suggested a trial to be unethical when agreement among experts is above 70% or 80%.10,11 Commensurate with using equipoise methodology to determine inclusion for prospective clinical trials, another option is to use the methodology to better understand what leads surgeons to agree or disagree on optimal management of TL burst fractures without neurological deficits. To achieve this, we aimed to explore which fracture characteristics are associated with various thresholds of agreement and disagreement on treatment. In other words, we aimed to assess which features lead surgeons to achieve total agreement, partial agreement or total disagreement on the best management of a particular case. This will be key in understanding what drives surgeon’s decision-making and identifying the sources of disagreement in the controversial topic of TL burst fractures.
The goal of this study was to assess the radiographic characteristics associated with various thresholds of agreement in recommending either surgical or non-surgical treatment in TL burst fractures. We aimed to assess the association of agreement and equipoise with fracture classification, the degree of certainty of PLC (Posterior Ligamentous Complex) injury, the use of the AO Spine Thoracolumbar Injury Classification M1 modifier (used to designate fractures with an indeterminate injury to the tension band based on spinal imaging or clinical examination) and the degree of vertebral body comminution.
Methods
The detailed methodology is available in the article of Dandurand et al “Understanding Decision Making as it Influences Treatment in Thoracolumbar Burst Fractures Without Neurological Deficit: Conceptual Framework and Methodology” in this focus issue. The AO Spine Knowledge Forum Trauma completed consent and recruitment for a multicenter prospective observational study of TL Fractures; the Spine A3/A4 study. 12 Each enrolling center obtained local approval from their local institutional review board. The baseline CT scans and conventional radiographs of 183 patients were available for this study. All patients were neurologically intact and had injuries between T11 and L2.
The 22 Spine Trauma experts all with extensive experience in management of spinal trauma were recruited from the AO Spine Knowledge Forum Trauma (KF Trauma). Each member of the expert panel independently reviewed the DICOM images of the 183 TL fracture cases and were asked to classify each injury based on the latest AO TL Injury Classification System, assess the degree of certainty of PLC disruption and the degree of comminution. The reviewers were also asked to indicate the use of the M1 modifier or not. The M1 modifier is used to designate a fracture with an indeterminate injury to the tension band based on spinal imaging. Finally, they were asked to recommend treatment – either surgical or non-operative, which specific type of treatment and finally asked how confident they were in this recommendation. These experts were agnostic to the actual treatment that the patient received within the Spine TL A3/A4 Study and were also agnostic to any results of the TL Spine A3/A4 study.
Subgroups were created based on various agreement thresholds. The subgroup thresholds ranged from total consensus on proposed treatment to total disagreement. Total agreement means that all 22 experts recommended surgery, or all 22 experts recommending non-surgical management. Total disagreement is defined as half of the experts recommending surgery and half recommending non-surgical management (11 vs 11 expert reviewers).
The cases were divided into 2 groups: (1) agreement group and (2) equipoise group. The agreement group was defined as 18 experts or more agreeing on treatment and the equipoise group was defined as 17 experts or less agreeing on treatment.
This corresponds to an equipoise level of 77% (77:23 distribution of uncertainty). This is in accordance with previous literature.10,11,13,14 A recent randomized clinical trial on surgical approach for cervical spondylotic myelopathy set the level of agreement to achieve equipoise at 80%. 13
Statistical Analysis
Frequency tables were produced for the distribution of each injury type for each member of the expert panel. Fleiss Multi-rater Kappa scores were produced for analyzing the agreement of all expert panel raters for both injury classification and treatment. Kappa results were interpreted as follows: values ≤0 indicating no agreement, .01-.20 as none to slight, .21-.40 as fair, .41-.60 as moderate, .61-.80 as substantial, and .81-1.00 as strong. 15 Inter Class correlation coefficients were produced as a measure of reliability whenever data were continuous or ordinal. The intraclass correlation was interpreted as follows: values less than .5 indicating poor reliability, .5-.75 as moderate, .75-.9 as good and .9-1.0 as excellent. 16
Associations of various imaging characteristics (comminution, PLC status), and treatment recommendations were analyzed through a process of multivariable regression analysis and development of predictive modeling equations. We furthermore employed multivariable logistic regression model (in 2 format of marginal model and mixed effect model) for making predictive models whenever it was necessary.
Results
Of the 183 cases reviewed, the experts reached full consensus in only 8 cases (4.4%) In 6 cases (3.2%), all the experts recommended surgery and in 2 cases (1.1%), there was consensus for non-surgical management.
When applying the defined equipoise threshold of 77%, 81 cases (44.3%) were included in the agreement group and 102 cases (55.7%) were included in the equipoise group.
Fracture Classification
The overall interrater reliability for the classification of the fracture was moderate in the agreement group (k = .441, 95% CI 0.430-.451, P < .001) as well as the equipoise group (k = .413, 95% CI 0.403-.424, P < .001). The intraclass correlation was good in the equipoise group (ICC = .893, 95% CI 0.859-.922, P < .001) and excellent in the agreement group (ICC = .948, 95% CI 0.929-.963, P < .001).
Fracture Type Distribution by Equipoise Thresholds.
The Posterior Ligamentous Complex
Degree of Certainty of PLC Injury, Use of M1 Modifier and Degree of Comminution Distributions by Equipoise Thresholds.
The Use of M1 Modifier
The use of the M1 modifier showed slight reliability in the equipoise group (k = .091, 95% CI 0.078-.104, P < .001) and fair reliability in the agreement group (k = .229, 95% CI 0.214-.243, P < .001). The use of the M1 modifier was more common in the agreement group than the equipoise group (44.3% vs 38.3%, P < .001). The use of M1 modifier distribution by equipoise thresholds is represented in Table 2. The M1 modifier was used in 51.7% in the consensus group and 27.3% in the disagreement group.
The Degree of Comminution
The degree of comminution showed excellent reliability for both the equipoise group (ICC = .947, 95% CI 0.930-.961) and the agreement group (ICC = .979, 95%CI 0.971-.985, P < .001). Overall, the degree of comminution was slightly higher in the equipoise group compared to the agreement group [47.8 (SD 20.5) vs 45.7 (SD 23.4), P < .001]. The degree of comminution distribution by equipoise thresholds is represented in Table 2. The degree of comminution showed very small variation from 49.69% in the consensus group to 48.46% in the disagreement group.
Discussion
This part of the study explored the radiographic characteristics of TL burst fractures associated with various thresholds of agreement in treatment recommendations among expert reviewers. First, we found that the expert panel group had moderate interrater reliability and at least good correlation when it came to determining fracture classification as well as excellent reliability in the probability of PLC injury and the degree of vertebral body comminution. However, the agreement on the use of M1 modifier was slight and fair. This means that the disagreement on best treatment was not caused by differences in PLC integrity evaluation or degree comminution. It is also less likely to be due to fracture classification as the correlation was good, but possible given that the agreement was moderate.
For reasons explained in previous articles in this issue, we have included the cases with suspected or probable Type B injuries. The great majority of the cases were, however, A3 and A4 fractures. These types were also more common in the equipoise group than the agreement group. In the agreement group, there was a higher degree of certainty of PLC injury, more common use of M1 modifier and slightly lower degree of comminution compared to the equipoise group. This means that possible non-A types are more easily identified and lead to more agreement on the treatment choice as illustrated by the agreement having more type B injuries.
The vast majority of fractures in the disagreement subgroup were classified as A3/A4 type (95.3%). In comparison, B type fractures consisted of only 3.1% of the disagreement group. This illustrates the controversy in best treatment surrounding the typical A-type burst fracture. Interestingly, the reliability in classifying fractures as either A3/A4 was comparable in both the agreement and equipoise groups. This supports the idea that the disagreement in treatment recommendation is likely not related to disagreement on how to classify a particular fracture. When the expert surgical community collectively has equipoise around treatment decisions, the influence of surgeon training, local experience, and resource availability likely all factor into the decision to recommend specific treatment much more than variables related to fracture morphology or classification. This will be further assessed in subsequent articles in this focus issue.
The biomechanical importance of the PLC is clear among the expert community. Our results illustrate that a higher degree of certainty in PLC injury was associated with higher rate of agreement on treatment. Schroeder et al. previously showed that the overall interobserver reliability in determining the integrity of the PLC was slight (kappa = .11). 17 In their study, surgeon reviewers were asked: «Do you see any significant injury to the posterior ligamentous complex (PLC)? ». The answer could be binary (yes or no). However, evaluation of the PLC integrity relies on the evaluation of many radiographic factors by surgeons, which creates a spectrum of certainty of PLC injury. To account for this, in our study, we asked surgeons: «Based on these CT images how confident are you that the posterior ligamentous complex is injured? (0%-100%, with 0% no PLC injury and 100% absolutely certain the PLC is disrupted) ». When treating PLC injury as a continuous variable on an uncertainty spectrum, the intraclass correlation showed good reliability in the equipoise group and excellent reliability in the agreement group (in which type-B injuries were more common). This can be interpreted as the direction of the surgeon’s evaluation of the PLC is very similar whether there is agreement on treatment or not. Our current study furthermore shows that with lower certainty of PLC injury, more disagreement on treatment strategy occurred. It is noteworthy that for most fractures, upright radiographs and magnetic resonance imaging were not available. It is likely that in the absence of convincing radiographic evidence of instability on CT, other individual factors related to practice environment or training influenced the expert surgeon’s decision making during the review of images. This individual variation created a wide spectrum of opinions on proposed management.
In the latest AO Spine Thoracolumbar Injury classification, 18 a definite ligamentous injury is defined as type B and if uncertainty about ligamentous injury occurs based on available imaging, a patient-specific modifier (M1) is assigned. In our study, the use of the M1 was more common in the agreement group. This correlates with the higher degree of certainty of PLC injury in the agreement group. However, the values of uncertainty that should be awarded the M1 modifier have not been well defined. Evaluating the PLC as a continuous variable on a spectrum of certainty may help define the threshold of the M1 modifier. This will be explored in a subsequent article of this focus issue.
The degree of vertebral body comminution has been included in previous classifications to optimally define fractures as well as the treatment options. The Load Sharing Classification of Spine Fractures included the degree of vertebral body comminution in the score to evaluate anterior support and assist surgeons in making the choice between short segment pedicle construct and long segment constructs. 19 However, the score did not account for ligamentous integrity making this score incomplete for influencing surgical decision making. Highly comminuted fractures can evolve towards local kyphosis as a result of losing anterior support especially in the thoracic or thoracolumbar spine compared to the lumbar spine.20,21 The absence of comminution has been identified as a drawback or limitation of the TLICS system although the inclusion of A4 fractures in the updated AO Spine Thoracolumbar Injury classification may provide a surrogate measure for comminution.
For TL burst fractures without neurological deficits, the total TLICS score is usually 4 points with an uncertain PLC injury. Comminution has been recognized as a cause of difficult decision-making in those cases.20,22 Our results showed a slightly higher degree of comminution in the equipoise group. The variation in mean degree of comminution between the consensus subgroup and disagreement subgroup was also rather small (1.2%). Compared to fracture type and PLC injury, comminution does not seem to differentiate well whether surgeons agree or not on surgical management. Future studies are required to interpret the meaning of comminution in surgeon’s decision making before incorporating this variable in future treatment algorithms. This would be to account for the likelihood of failing conservative treatment.
Considering that the experts agreed on most of the radiological parameters but disagree on the management, this prospective multicenter cohort can be considered as ‘randomized’ in a natural fashion. These patients would get different treatments in different centers although the surgeons agree on what kind of injuries they have. The threshold set for equipoise was based on current literature, which is limited. Appropriate threshold may vary depending on the treatment or pathologies studied. It is possible that the panel of experts does not represent all clinical environments around the world. However, the widespread geography of the AO Spine Knowledge Forum Trauma increases the overall generalizability of these results. It is possible that experts made a treatment recommendation based on their local available resources and their own relative expertise. It is possible that the expert’s treatment recommendation may differ if their available resources or capabilities to offer a certain treatment were different.
Conclusion
Our study showed that the best treatment for TL burst fractures (type A3/A4) remains a controversial topic in modern spine surgery. These fracture types were more common in the equipoise group compared to the agreement group. The agreement group showed a higher degree of certainty of PLC injury and more common use of the M1 modifier. Future studies are required to identify the role of comminution in decision making as the degree of comminution was noted to be surprisingly slightly higher in the equipoise group than in the full agreement group.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was organized and funded by AO Spine through the AO Spine Knowledge Forum Trauma, a focused group of international Trauma experts. AO Spine is a clinical division of the AO Foundation, which is an independent medically-guided not-for-profit organization. Study support was provided directly through AO Network Clinical Research.
