Abstract
Study Design:
A multicenter, pilot study, for radiological assessment of thoracolumbar spine fractures was performed with the participation of 7 centers in Africa, Europe, Asia, and South America as a part of the AO Foundation network.
Objectives:
To determine the interobserver variability for computed tomography (CT) scan–based evaluation of posterior ligament complex (PLC) injury in thoracolumbar fractures.
Methods:
Forty-two observers including 1 principal investigator at each participating center performed variability assessment. Each center contributed toward a total of 91 patient images with A3 or A4 thoracolumbar burst fractures (T11-L2) with or without suspected PLC injury. Pathological fractures, multilevel injuries, obvious posterior bony element injury and translation/dislocation injuries were excluded. Ten patients were randomly selected and commonly reported CT parameters indicating PLC injury, including superior inferior endplate angle, vertebral body height loss, local kyphotic deformity, interspinous distance and interpedicular distance were assessed for variability. Observer values were compared with an experienced gold rater in spinal trauma. Analysis of variability was performed for all observers, between the principal investigators and also between observers participating in each center.
Results:
The studied parameters showed considerable variability in measurements among all observers and amongst all participating centers. The variability between the principal investigators was lower, but still substantial. The deviation of observer measurements from the gold rater were also significant for all CT parameters.
Conclusions:
CT-based radiological parameters previously reported to be suggestive of PLC injury showed considerable variability and magnetic resonance imaging verification of a PLC injury in all doubtful cases is suggested.
Keywords
Introduction
Thoracolumbar (TL) fractures are among the most common spine injuries encountered in emergency trauma care 1 -3 with burst fractures accounting for 10% to 20% of all spinal fractures. These fractures usually result from high-energy trauma such as motor vehicle accidents or falling from heights. 4 -9
Treatment strategy for thoracolumbar fractures mainly depends on the stability of the fractured spine. Currently, there are various classification systems available to judge the stability of vertebral fractures and facilitate clinical decision making. While conservative treatments can be used for stable fracture, typically surgical treatments are suggested for unstable vertebral fractures. 10 -12
One of the most important stabilizing structures in the spine is the posterior ligament complex (PLC), which consists of the suprapinous ligament (SSL), interspinous ligament (ISL), ligamentum flavum (LF), and facet joint capsules. Flexion distraction injuries and Chance fractures typically lead to a failure of the anterior and middle column in compression and posterior column failure in distraction. These distraction forces lead to a disruption of the PLC resulting in an unstable spine injury. 10 -12 PLC protects the spine against excessive flexion, extension, translation, and rotation. Injury to the PLC may lead to spinal instability, late spinal deformity, and persistent back pain and can jeopardize neural elements resulting in neurological deficit. 10,11 Assessing the integrity of the PLC has a bearing on the definitive management for these injuries. 13,14 Usually, PLC injury requires surgical fixation as it may result in progressive kyphosis of spine and poor functional outcome. 13 -15
Multidetector computed tomography scan (MDCT) is considered the method of choice to assess TL spine fractures and other visceral injuries in emergency trauma care. 16 However, PLC injury is difficult to assess on radiographs and CT scans, and magnetic resonance imaging (MRI) is considered to be the gold standard to identify PLC damage. 17 -21 MRI assessment of PLC injuries has the disadvantages of a considerably longer scan duration, lower availability, poor feasibility in polytraumatized patients, and higher cost when compared with MDCT scans. 16 -22 In contrast, CT scan is an integral part of trauma evaluation protocols, which is quick and accurate in the diagnosis of vertebral fractures diagnosis, particularly in the evaluation of polytrauma scenarios. 16,22,23 A number of authors have suggested to identify PLC injury on CT scans, based on criteria such as facet diastasis, local kyphosis, and interspinous distance. 12,24 -28 The possibility to accurately identify PLC injury on CT scans would prove to be a great advantage, thus expediting treatment of TL fractures without MRI, especially in the clinical scenario of polytrauma with an unstable patient. 23
Although reports have suggested that CT-based radiological parameters may reliably detect PLC injury. 15 -20 To date, however, the variablility and reliability of these parameters have not been studied. Therefore, the aim of this study was to determine the interobserver variability and hence the feasibility of these CT based parameters to assess PLC injury compared with MRI when used in a multicenter setting.
Methods
A pilot study based on a retrospective multicenter case series was conducted at 7 tertiary referral centers (3 in Asia, 2 in Africa, 1 in Europe, and 1 in South America). The protocol was approved by the ethics committees or institutional review boards at each participating institution.
Each of the centers contributed multiple observers (N = 42) who evaluated the CTs. The observers’ background was either spinal surgeon (N = 38) or radiologist (N = 4). In each of the centers, there was 1 principal investigator (PI) who attended 2 face-to-face training sessions on how to perform the measurements. These 2 training sessions were held by the gold rater at a time interval of 6 months apart. The gold rater was an experienced musculoskeletal radiologist (AM) with more than 10 years of experience in evaluating traumatic spine injury. The training session was a day long workshop with the gold rater training the PIs on the image assessment protocol. The training sessions included ten patient image folders which were used for trial measurements for the PIs. The PIs were responsible to train their own site personnel. Once the training sessions were completed, an instructional video was sent to all sites, serving as a reference tutorial during the assessments, 6 weeks following the second face to face training session. Apart from the video, the investigation protocol document had a section dedicated to elaborating the steps in the measurement of the radiological parameters, which served as a written manual.
From the databases of each center, all adult patients (18-60 years old) with A3 or A4 TL burst fractures between T11 to L2 with or without suspected PLC injury who had undergone complete CT and MRI diagnostics before treatment and admitted to the hospital were identified and entered into our database (REDCap, Vanderbilt University, Nashville, TN, USA). Patients were excluded if they had (a) pathological fractures, (b) multilevel contiguous or noncontiguous injuries, (c) fractures with obvious spinous process split indicating tension band failure, or (d) fractures with translation injuries or dislocations which imply an obvious PLC injury.
Initially, sagittal and coronal reconstructed images of 91 patients obtained from a multislice CT scanner with volume acquisition by axial sections and a slice thickness of at least 1.25 mm were collected. Ten cases were randomly selected to evaluate various parameters that could potentially serve to determine PLC integrity as follows: Superior inferior endplate angle (SIEA): The angle formed between the lines drawn along the superior and inferior endplate of the fractured vertebra (Figure 1). Vertebral body height (BH): The vertebral body height was measured at 2 locations; first along the anterior vertebral body margin between the anterosuperior corner and the anteroinferior corner of the fractured vertebral body and second along the posterior vertebral body margin between the posterosuperior corner and the posteroinferior corner of the fractured vertebral body. The loss in body height was assessed by comparing the fractured body height with the mean dimensions of the uninvolved superior and inferior vertebral body (Figure 2). Local kyphotic deformity (LK): The angle formed between the lines drawn along the superior endplate of the cephalad and the inferior endplate of the caudal uninvolved vertebra (Figure 3). Interspinous distance (ISD): The distance between the spinous process of the cephalad normal vertebra and the fractured vertebra measured and this distance was compared with the caudal uninvolved segment (Figure 4). Interpedicular distance (IPD): This was measured as the distance between the medial borders of the 2 pedicles of the fractured vertebra. This was compared with the mean of the uninvolved adjacent segments cephalad and caudal to the fractured segment (Figure 5).

Measurement of superior inferior end plate angle (SIEA).

Measurement of anterior and posterior body heights (BH).

Measurement of local kyphotic angle (LK).

Measurement of interspinous distance (ISD).

Measurement of interpedicular distance (IPD).
Further details on how the respective parameters were measured are presented in Table 1.
Parameter Measurement Protocol.
All CT measurements were made using Surgimap Spine (Nemaris, Inc, New York, NY, USA) a software program that has been validated for radiological parameter assessment. 29 The landmarks for measurement of radiological parameters were based on manual human measurement. Automatic contour recognition was not used as the fractured vertebra could not be accurately recreated with automated contour recognition. Analysis of variability was performed for all observers, between the 7 PIs and also between observers participating in each center.
Statistical Methods
We conducted several analyses to assess the variability between the observers, between the PIs and between the observers at the different participating centers. Under the assumption that the gold rater’s measurements would be the most accurate of all, we also compared the measurements between the gold rater and the remaining observers. This was done by determining the mean absolute deviation (MAD) from the gold rater. The MAD is the sum of the absolute difference between the measurement of each individual observer and the measurement of the gold rater, divided by the number of observers. Analysis of variability was performed for all observers, between the PIs and also between observers participating in each center using presentations of mean, standard deviation, minimum and maximum values and graphical by using box plots. Intraclass correlation was tested with Shrout-Fleiss reliability coefficient for all observer and for PI separately. 30
Results
There were 42 observers who performed measurements on the CT scans of 10 cases. In most cases, all observers were able to analyse all parameters for each case. However, in a few cases, observers and gold rater were unable to perform the measurements either due to difficulties in assessing the landmarks outlined in the measurement protocol or due to technical issues where many raters were unable to calibrate the images in Surgimap, for example, case 5, which has been marked with a superscript “a” in Tables 2 and 3. Table 2 shows the descriptive statistics for the measurements performed by 42 observers for the 5 CT parameters.
Descriptive Statistics of the Measurements Performed by 42 observers for the 5 Computed Tomography (CT) Parameters.
Abbreviations: CT, computed tomography; LK, local kyphosis; ISD, interspinous distance; IPD, interpedicular distance.
a Observer was not able to assess measurements for all cases.
Throughout all CT parameters, we found a very high variability in the individual measurements. The variability was most pronounced when comparing the measurements among all observers as well as amongst all teams. The variability among the PIs was lower than among all observers, but still considerable (Figures 6 –8).

Box plot for the computed tomography parameters per case: all observers (n = 42).

Box plot for the computed tomography parameters per case: all principal investigators (n = 8).

Box plot for the computed tomography parameters per case per site.
The deviation of all observers measurements from the gold rater’s measurements was also substantial for all 5 CT parameters (Figure 9). The average MAD from the gold rater ranged from 1.19 for IPD to 11.22 for posterior BH. The average MAD for kyphosis measurements was 2.59 for LK and 6.72 for SIEA (Table 3).

Deviation from gold rater per case.
Mean Absolute Deviation (MAD) Assessed by All Observers (n = 41) From the Gold Rater for the 5 CT Parameters.
Abbreviations: CT, computed tomography; LK, local kyphosis; ISD, interspinous distance; IPD, interpedicular distance.
a Gold rater was not able to assess IPD measurements for Cases 03, 05, and 08.
The interclass correlation coefficients generated by the Shrout-Fleiss reliability coefficient analysis are reported for the selected CT parameters in Table 4. LK, IPD, and ISD showed good reliability with values between .75 and .90 for both the PI group and other observers.
Intraclass Correlation Coefficients of CT Measurements
Abbreviations: CT, computed tomography; LK, local kyphosis; ISD, interspinous distance; IPD, interpedicular distance; BH, body height.
Discussion
The management principles in TL fractures are based on the assessment of associated neurological injury, potential spinal instability and possibility of development of late-onset spinal deformity. 1,3,10,31 Management guidelines of TL fractures proposed by both the Spine Trauma Study Group (TLICS classification) and AOSpine proposed thoracolumbar spine fracture classification is influenced by the presence/absence of the PLC injury. 15,32 The PLC includes the supraspinous ligament (SSL), interspinous ligament (ISL), ligamentum flavum (LF), and facet joint capsules, and these structures are best assessed on MRI scans. 24,33 The ligamentous healing potential is considered poor; therefore, an accurate identification of PLC injury is necessary to allow optimal management of TL fractures. 10,11
TL fractures can be associated with other more life threatening visceral organ injuries in the thorax and abdomen. 20 These injuries can render the patient hemodynamically unstable, thus making an early MRI for assessment of PLC challenging if not impossible. CT scans score over MRI as the preferred modality for evaluation of spinal trauma due to a quick turnover time and can be performed even in an unstable clinical scenario. 16,22,28 The CT images can provide clear details to identify fracture morphology, spinal canal compromise, and unstable spine fractures. The possibility to identify PLC injuries on CT scans would prove of great clinical benefit, as it would likely reduce the evaluation time and cost and thus expedite treatment in TL spine fractures. Based on previously reported radiological criteria, our study attempted to assess the interrater variability of measurements based on CT scans.
Various authors have proposed radiological criteria to assess PLC injury based on radiographs and CT scans. 24,27 The drawback of these reports is that they are all single center studies and information on the reliability of the findings in a multicenter setting is lacking. Hiyama et al 24 assessed loss of vertebral body height, local kyphosis, vertebral body translation, canal compromise, interlaminar distance, supraspinous distance, and interspinous distance in 40 patients with TL fractures and concluded that a local kyphosis >20° and increased supraspinous distance were associated with PLC injury. Rajasekaran et al 25 analyzed 60 patients with possible PLC injury using STIR MRI sequences as a gold standard to identify PLC injury. The authors reported that local kyphosis of >20° and interspinous distance increase of ≥2 mm compared with adjacent levels may serve as a radiological criterion to predict PLC injury, especially in the scenario of emergency trauma where MRI is not feasible. 25 Barcelos et al 27 noted that measurements from CT scans could reliably predict PLC injury and an increase in the interspinous distance could be used to differentiate AO type A versus type B injuries. Yee et al 28 reported on a survey conducted among members of the spine trauma study group for radiological predictors of PLC injury. The survey concluded that obvious translation as seen in AO type C injuries and interspinous distance ≥7 mm was felt to be the most reliable indicator of PLC disruption. The members felt that diastasis of the facet on CT scans was the best indicator of PLC injury, especially when the plain radiographs appeared normal. However, there was no consensus on any particular radiological parameters to suggest PLC injury. 28
Our study was based on measurements of 42 observers from 7 different centers, which is a considerably higher number of observers than in previous reports on the predictive value of CT-based parameters for the assessment of PLC damage. Therefore, it offers a more robust assessment of the reliability of CT scan images to predict PLC injury. Our study also included both radiologists and spine surgeons as observers, which is closer to a real-life setting.
We saw a considerable variability in the reported measurements for all radiological parameters. The variability was present amongst all observers as well as among the different centers. Of note, the variability among the PIs alone was slightly lower, suggesting suboptimal transfer of knowledge to the remaining observers. However, the variability even amongst the PIs was high enough to preclude using the proposed CT-based parameters to accurately assess PLC injury.
A possible explanation for the significant variability was the difficulty encountered in identifying the landmarks for the measurements. In particular, identification of the anterior and posterior vertebral margin was challenging due to the presence of retropulsion at the posterosuperior corner and displacement of fragments at the anterosuperior corner.
The effect of the difficultly to accurately identify the landmarks on the endplates is also illustrated by the different degrees of variability seen with the measurement of SIEA and LK. Both parameters are measures for kyphotic deformity. LK, which was measured between the superior and inferior uninvolved vertebral body, showed less variability than SIEA, which was measured directly on the fractured vertebra. This clearly shows that measurements taken from intact endplates, which require less interpretation from the observer’s side, offer less potential for different interpretation than measurements taken from endplates with retropulsed or otherwise displaced fragments. This probably resulted in the greater uniformity of results for LK compared to SIEA. The authors feel that, although findings such as increased interspinous distance and local kyphosis have been reported to be indicative of PLC disruption, an objective definition to quantify the change in radiological measurement is difficult to develop. There is considerable natural variation in the individual fracture patterns and spinous process anatomy, which may preclude development of an objective cutoff or measurement protocol and eventual prediction of PLC injury on CT scans.
The absolute values for parameters such as ISD and IPD were very small, so that the MAD from the observers to the gold rater ranged between 1 and 3 mm. With such small values, minor deviations on setting the landmark points may lead to an extremely high variability of measurements especially as the measurements for the landmark were performed manually. Additionally, for one case the images could not be calibrated by the majority of observers, including the gold rater. In an attempt to ensure uniform measurements, every observer had received detailed instructions on how to identify these landmarks and personal hands-on training had been provided to the PIs of each site. However, the considerable variability of results suggests that either the instructions were insufficient, the Surgimap software may have been unsuitable to perform the measurements with sufficient accuracy, or the proposed predictive parameters could be unsuitable for the assessment of PLC integrity altogether.
Limitations
An important limitation of the study is that the reference measurements were not objective, because they were also derived from personal judgement. Therefore, human error could have also been present in the reference measurement. Another difficulty was that the experience of the observers with regard to CT evaluation was very heterogeneous. On one hand, this may have contributed to the low reproducibility of results. On the other hand, this reflects exactly how evaluation of the proposed radiological parameters is done in clinical practice, which increases the generalizability of the results we found. There were 42 observers however, the radiologist to surgeon ratio was 4:38, which could have been more comparable. The interclass correlation coefficients for LK, IPD and ISD showed good reliability. However, since the number of cases is small the statistical power of the analysis is low and the authors would advice caution in the interpretation of these values.
Conclusions
The CT parameters local kyphosis (LK), superior inferior end plate angle (SIEA), vertebral body height (BH), interspinous distance (ISD) and interpedicular distance (IPD) have been proposed to be used to determine PLC integrity without the need for MRI. When assessing the inter-rater variability of these parameters in a multicenter setting with 42 observers of heterogeneous experience from 7 different centers from 4 continents, a high variability of results was seen. Therefore, previous reports could not be validated. The study did not allow to determine whether the proposed parameters are unsuitable for PLC evaluation per se or whether insufficient knowledge transfer and an unsuitable measurement method were responsible for the poor reproducibility. Modified instructions to identify the landmarks as well as face to face training of each observer may help improve measurement uniformity. We conclude that currently, MRI verification of a PLC injury should be done in all doubtful cases.
Footnotes
Acknowledgments
The authors thank AOCID, in particular Christian Knoll and Elke Rometsch, for support with statistics and editing of the manuscript. Furthermore, special thanks to the different study teams participating in this study by providing and evaluating cases from the Assiut University Hospitals in Assiut, Egypt; the First Hospital of Zhejiang University in Hangzhou, China; The Ondo State Trauma and Surgical Centre in Ondo State, Nigeria; the Cajuru University Hospital in Curitiba, Brazil; the Uijeongbu St. Mary’s Hospital in Uijeongbu-si, South-Korea; and the Royal Victoria Hospital in Belfast, United Kingdom.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the AO Foundation network via AOSpine.
