Abstract

This summary article which concludes our focus issue highlights some of the findings of our equipoise evaluations and provides some perspectives on how these observations are relevant to the study of surgical techniques and specifically interpretation of the upcoming AO Spine Trauma Knowledge Forum (AOSTKF) study on A3 and A4 fractures (Thoracolumbar burst fractures, AO Spine A3, A4, in neurologically intact patients: An observational, multicentre cohort study comparing surgical vs non-surgical treatment. ClinicalTrials.gov Identifier: NCT02827214)].
The first paper by Dandurand et al 1 (“Understanding Decision Making as it Influences Treatment in Thoracolumbar Burst Fractures Without Neurological Deficit: Conceptual Framework and Methodology”) outlines the methodology and conceptual framework that connects the elements of fracture morphology seen on a CT scan with the concepts of comminution and PLC injury and then to inclusion in a classification from which an algorithm and scoring system can lead to treatment recommendations. The subsequent manuscripts have dissected the elements of this conceptual framework to better understand the processes of decision making employed by surgeons when treating neurologically intact patients with thoracolumbar burst fractures.
In the reliability analysis presented by Canseco et al, 2 (“Interobserver Reliability in the Classification of Thoracolumbar Fractures using the AO Spine TL Injury Classification System among 22 Clinical Experts in Spine Trauma Care”), the inter-rater reliability of the AO Thoracolumbar Fracture classification as measure by the Kappa statistic was weak although the trend to higher and lower severities of injury produced a stronger correlation coefficient. Interestingly, Canseco et al confirmed the validity of the separation of A3 and A4 fractures by means of the increased comminution of the A4 injuries. In an intriguing observation, Canseco et al posited that the use of the M1 modifier was not objectively correlated with PLC injury, however, it appeared to be more a function of increasing degree of comminution that the expert reviewer perceived as putting the PLC at risk. There appeared to be a general correlation between anterior vertebral body comminution and PLC status, a finding which intuitively makes sense. In only 71 out of 183 cases did all the expert reviewers agree that the fracture should be classified as an A3 or A4 injury. Although on the surface this appears concerning, this may be more an issue of the methodology of presenting CT scan images alone without all the ancillary investigations and clinical examination on which the experts would normally base their assessments and classification. Although in the real world, all these patients were felt by the recruiting surgeons to be A3/4 fractures and eligible for inclusion in the prospective study, in the sterile and artificial environment of a radiographic reliability study there can be some breadth of interpretation. More than questioning the validity of the AO Spine TL fracture classification, this reveals the inherent limitations of radiographic reliability studies.
In the radiographic characteristics paper by Dandurand et al, 3 it was remarkable to see the high degree of equipoise among the expert panel with only 8 cases (4.4%) in which there was agreement on treatment among all the reviewers. These authors separately analysed the agreement and the equipoise groups and demonstrated that even in the full ‘agreement on treatment’ group there was poor agreement on elements of classification, degree of comminution, PLC injury and use of the M1 modifier. This elegantly demonstrates that even when there is agreement among 22 experts on treatment, it is not a linear process of moving from morphologic analysis through classification to treatment. Dandurand et al demonstrated that as the injury reveals more features of B type injuries, then surgery becomes a more likely recommendation. Comminution, however, did not distinguish the agreement group from the equipoise groups. Overall, the authors of these first 2 papers demonstrate that surgeons do tend to view images in a generally consistent fashion, however, with little precision and accuracy in agreement on specific descriptors of fracture morphology and less specific agreement on individual classification categories. There is however a trend towards general agreement on the overall severity of injuries as evidenced by the generally favourable trends in the correlation coefficients.
In the paper by Kweh et al 4 (“The AO Spine Thoracolumbar Injury Classification System and Treatment Algorithm in Decision Making for Thoracolumbar Burst Fractures without Neurologic Deficit”) the recent expansion of the TL fracture classification to distinguish between A4 fractures which score 5 points on the AOSIS and A3 fractures which score 3 points was highlighted. These 2 injury types are at the nexus of disagreement and cause for the substantial equipoise among the surgical community. When experts analyse A3 and A4 fractures together, they recommend surgery for half and non-surgical care for half. However, when A3 fractures are separated, the experts recommend non-surgical care for 70% while the converse is true for A4 fractures where the experts recommend surgery in 70%. These authors assist us in acknowledging that there is a valid distinction between the A3 and A4 fracture that should—theoretically—treatment recommendations.
Aly et al 5 (“The Influence of Comminution and Posterior Ligamentous Complex Integrity on Treatment Decision Making in Thoracolumbar Burst Fractures without Neurologic Deficit?”) focused on whether a more elemental view of degree of vertebral body comminution and certainty of PLC injury could lead to higher agreement and could more reliably predict treatment recommendations. They helpfully analysed categories of comminution and PLC injury that are correlated with the recommendation to non-surgical or surgical care. Specifically, when comminution of the vertebral body is over 45%, then 3 quarters of cases have surgery recommended by the expert panel. Conversely, when comminution involves less that 25% of the vertebral body, then a vast majority (86%) of the recommendations are for non-surgical treatment. Similarly, when the certainty of PLC disruption is over 55%, then almost all cases have surgery recommended. Clearly, both PLC disruption and the degree of comminution of the vertebral body are the foundational elements in guiding treatment recommendations. Whether comminution is better reflected in the distinction between A3 and A4 and to what degree either of these analyses are valid and reliable remains a point to be debated.
The only hint at the results from the AO TL A3/A4 study comes in the paper by Camino-Willhuber et al 6 (“Expert Opinion, Real-World Classification, and Decision-Making in Thoracolumbar Burst Fractures without Neurologic Deficits?”) where they analyse the treatment that was performed in each of the 183 cases in the real-world and compare it to the recommendations of the Expert Panel. Somewhat surprisingly, when the experts recommended non-surgical care, only 39% of those cases received non-surgical care in the real world with 61% having undergone surgery. Although the experts distinguished A4 fractures with a higher tendency to recommend surgery, this was not the case in the real-world where both A3 and A4 fractures each were operated on with the same frequency, 61%. What will be most interesting is to compare these populations where treatment recommendations differ between the real world and expert panel in reference to the post-treatment outcomes from the publication of the AO TL A3/A4 Study. For example, where a high proportion of the expert panel recommended surgery and in the real world the patient was treated non-operatively, what was the outcome in these patients?
The data collected in the analyses that fed the above studies has provided the opportunity to produce a predictive algorithm as outlined in the paper by Dandurand et al 7 (“Predictive Algorithm for Surgery Recommendation in Thoracolumbar Burst Fractures without Neurological Deficits”). This type of analysis is extremely helpful to guide clinicians in making decisions. The influence of degree of vertebral comminution combined with the experts’ assessment of PLC integrity was accurate in over 80% in predicting treatment based on this model. These authors also highlight the geographic variation in decision making. The addition of outcome data would strengthen this algorithm immeasurably. It is intriguing to consider whether the combination of machine learning algorithms and the computerized interpretation of radiographs might be an opportunity to inject a greater degree of consistency in surgeons’ decision making for these TL Burst Fractures?
In the final paper in this focus issue, that by Schnake et al 8 (“What Factors Influence Surgeons in Decision-Making in Thoracolumbar Burst Fractures? A Survey-Based Investigation of a Panel of Spine Surgery Experts”), report that when 27 experts were asked about their decision making related to TL trauma, most (81%) found distinguishing between A3 and A4 in neurologically intact patients to be relevant for decision-making insofar as 59% would treat A3 fractures non-operatively, while only 30% would treat A4 fractures non-operatively. Most experts are concerned about long-term complications such as implant failure or future kyphosis. Radiological factors, such as local kyphosis, fracture comminution, overall sagittal balance, and spinal canal narrowing strongly influence the treatment decision by the experts. Surgeons who treat higher volumes of patients may prefer surgical treatment.
Schnake et al identify that there are strongly influential variables outside the realm of the fracture morphology and specific classification categories that overwhelmingly direct treatment decisions. They highlight some of these factors that have led to individual treatment preferences among surgeons and what we view as ‘schools’ of surgical and non-surgical care where most patients presenting to a specific centre or region receive either only surgery or only non-surgical care regardless of the imaging characteristics of the injury. It is important to realize that issues of compensation and third-party influence do not appear to influence treatment whereas the geography and region of practice of the surgeon is by and large the greatest influence on the predisposition to surgical or non-surgical care.
Summary
It is in this kind of environment of equipoise where patients recruited to an observational study are effectively randomized by the site they are treated at; with some patients almost exclusively receiving surgical care while others in another geography with similar injuries receive almost exclusively non-surgical care. We would posit that an observational study in this environment is more of a high-quality effectiveness trial and more generalizable that would be a highly controlled RCT where there are substantial difficulties related to restrictive inclusion, poor patient recruitment, frequent withdrawal and complexities in standardizing techniques, surgeon experience and post-injury care. The analyses in this focus issue clearly demonstrate that morphologically similar injuries receive different treatment (surgical and non-surgical) fundamentally and primarily based upon the inherent preference of the treating physician where they receive treatment and not as strongly related to differences in the morphology of the fracture. Equipoise in the global surgeon community along with strong treatment predispositions regionally create the ideal environment for the strongest possible effectiveness trial.
Likely 1 of the most significant conclusions from this sequence of papers is that in the absence of objective outcome differences that can be assigned to specific treatment approaches or fracture categories, further refining and modifying classifications and scoring systems is unlikely to reduce the variability in treatment decisions. That is not to say that a precise, valid, and reproducible classification system is not necessary, but to reinforce how it must be linked to measurable outcomes for the various categories of injury and treatment approaches. Only then will surgeons be able to personalize their treatment of specific injury patterns to obtain the best outcomes with minimal costs, risk and do so with greater consistency across geographies and regions.
Footnotes
Acknowledgements
This study was organized and funded by AO Spine through the AO Spine Knowledge Forum Trauma, a focused group of international Trauma experts. AO Spine is a clinical division of the AO Foundation which is an independent medically guided not-for-profit organization. Study support was provided directly through the AO Network Clinical Research.
Conceptual Framework
