Abstract
Introduction
The spleen is the most commonly injured solid organ in blunt trauma. 1 The standard of care in hemodynamically stable patients with no other indication for trauma laparotomy is a trial of non-operative management.2-4 Non-operative management has been shown to be safe, preserves subsequent splenic function, and avoids complications associated with splenectomy including overwhelming post-splenectomy infection (OPSI), pancreatic leaks, incisional hernias, thromboembolic, and other complications.5-7 A key factor in clinical decision making for blunt splenic injury is the grade of injury, as increasing grade is predictive of requiring splenectomy, non-operative management failure (NOMF), 8 and the decision to perform splenic artery embolization (SAE). 9 Standardized protocols, good clinical judgment, and accurate and timely interpretation of imaging studies have shown to reduce the incidence of failure of non-operative management in patients with splenic trauma. 10 Careful patient selection based on accurate injury grading is therefore crucial to achieve successful non-operative management.
The Revised Organ Injury Scale (OIS) of the American Association for Surgery of Trauma (AAST) is currently the most widely accepted scoring system for the classification of splenic trauma. This was initially published in 1989 11 and although initially meant to provide an anatomical description, it is now widely used to guide treatment given its association with outcomes. 12 The current 2018 revision of the AAST scoring system classifies splenic trauma into 5 grades based on imaging, operative, and/or pathologic criteria if available, determined by the presence and extension of several types of injuries. 13 Grades I to III are considered low grade, and grades IV-V high grade. Currently an intravenous contrast-enhanced computed tomography (CT) scan is the imaging modality of choice in diagnosing splenic injuries.
Despite its pivotal role in the management of splenic trauma, only a few studies have evaluated the inter-rater agreement of CT in grading splenic injuries. Leschied et al 14 found moderate inter-rater agreement (kappa .64) for absolute AAST injury grade and strong agreement for relative AAST injury grade in CT scoring of splenic injuries in pediatric patients. Prior to the latest 2018 AAST revision, reported inter-rater agreement of CT in grading splenic injuries in adult patients vary from moderate to strong (kappa = .80), 15 to minimal (kappa .32-.60). 16 There are no reported studies of the inter-rater reliability of splenic injury grades since the 2018 AAST scoring system update. The objective of this study was to evaluate the inter-rater agreement of CT in grading splenic injuries in adult patients using the 2018 AAST scoring system.
Methods
Overview and Setting
This single centre retrospective study was approved by our hospitals Ethics Review Board with a waiver for informed consent. Our institutional radiology information system (Syngo, Siemens Medical Solutions USA, Inc., Malvern, PA) was searched using Nuance mPower (Nuance Communications, Burlington, MA) to identify emergency department CT scans of the chest, abdomen, and pelvis for adult (age ≥18 years) trauma patients with splenic injury performed between January 1, 2005 and July 31, 2021. All patients with a CT reporting a blunt splenic injury were included. Only the presenting scan was included. Patients who went directly to laparotomy were not included. Images from the CT scans were downloaded in DICOM format from our institutional Picture Archiving and Communications System (Philips Vue PACS, Philips). The axial, coronal, and sagittal soft tissue window series were extracted using a custom Python script. The extracted images underwent de-identification using RSNA Anonymizer (Oak Brook, IL), an open source DICOM de-identification tool that complies with Health Insurance Portability and Accountability Act (HIPAA) guidelines. A custom Python script was then used to limit DICOM metadata attributes to a “white list”. A manual review of the resultant images was performed to ensure all private health information was removed.
CT Technique
All emergency department trauma CT examinations were performed on a 64-slice MDCT scanner (Revolution, LightSpeed 64, or Optima 64, General Electric Medical Systems, Milwaukee, WI). Our trauma CT protocol involves a split-bolus single-pass CT of the chest, abdomen, and pelvis from above the lung apices to the lesser trochanters, a commonly used whole body trauma protocol (17-19). Intravenous contrast was administered as follows: 85 cc Visipaque 320 at 3cc/s, 40 cc normal saline at 2 cc/s, injector delay of 25 seconds, 45 cc Visipaque 320 at 4 cc/s, and 20 cc of normal saline at 4 cc/s. Oral contrast is not routinely administered. Rectal contrast consisting of 13 cc Gastrografin in 500 cc water may be administered for stab and gunshot wounds at the discretion of the trauma team leader. The soft tissue window images were reconstructed at 3 × 3 mm for axial imaging and 5 × 5 mm for the coronal and sagittal series.
Image Analysis
Five board-certified fellowship trained abdominal radiologists independently graded each examination using the 2018 version of the AAST injury scoring scale for splenic injuries. The experience levels of the readers were 2 (MB), 3 (MW), 6 (MT), 10 (PV), and 13 years (EC). The experience of the expert panel was 13 (EC) and 6 (DG) years. Each reviewer was provided with a copy of the AAST splenic injury scale prior to scoring any CT examinations. Scoring of the CT examinations was performed using a commercial web-based annotation platform (MD.ai, New York, NY) which shares many features of a PACS workstation environment. Radiologists were blinded to the content of the formal radiology reports and provided with axial, coronal, and sagittal CT images. Each CT was evaluated independently as either negative for splenic trauma (assigned a score of 0), or positive using the AAST CT grading from I to V.
Statistical Analysis
Statistical analysis was performed using Stata 14.2 software (StataCorp 2017, College Station, TX) and R 4.2.2 software (The R Foundation for Statistical Computing 2004-2021, Vienna, Austria). The inter-rater absolute agreement for AAST CT injury grade and low-grade (AAST scores I to III) vs high-grade (AAST scores IV and V) splenic injury amongst the 5 radiologists was assessed using the Fleiss kappa statistic. Relative inter-rater agreement was also assessed using the Kendal coefficient of concordance. Data was presented using frequency tables. A P-value <.05 was considered statistically significant. Kappa values were interpreted based on ranges and descriptors provided by McHugh. 20
Qualitative Review of Disagreement
Examinations with two rater disagreement in key clinical scenarios were reviewed by a senior fellowship trained abdominal radiologist (EC) and trauma surgeon (DG) to identify possible underlying causes of disagreement. They reviewed cases from each scenario until it was felt no new underlying causes were being identified, totaling 15 cases. The first scenario is disagreement between no injury (AAST grade 0) and injury (AAST grade ≥ I). This is important as patients with no splenic injury may be discharged depending on their other injuries. Furthermore, if there is free fluid in the abdomen but no solid organ injury identified, the trauma team may pursue further investigation and management algorithms to identify a source (e.g. occult hollow viscus injury). The second scenario was disagreement between low grade (AAST grade I-III) and high grade (AAST grade IV-V) injuries, which may impact clinical decision making for splenectomy, patient monitoring, risk of non-operative management failure, and increased likelihood of splenic artery embolization.
Results
A total of 610 (172 female, 438 male; age range 18-97 years; mean age 44.4 ± 19.9 years) associated reports were manually reviewed and classified as being positive or negative for a splenic injury. Of these, 9 (1.5%) were rated by the majority of raters as no injury (AAST grade 0). This reflects that the cases were selected from CT scans with reported positive splenic injuries. There were 132 (21.6%) AAST Grade I, 184 (30.2%) Grade II, 130 (21.3%) Grade III, 104 (17%) Grade IV, and 51 (8.4%) Grade V injuries. The smaller number of high grade injuries may reflect both a real decreased frequency of occurrence and that those with high grade injuries may be more likely to proceed directly to laparotomy and therefore were not included.
Inter-Rater Agreement for Splenic Injury Scoring
Inter-Rater Agreement of Splenic Injury by AAST CT Score.
Qualitative Review of Disagreement
We then examined two key scenarios of disagreement for review by the expert panel of a radiologist and surgeon. The first scenario was when a minimum of 2 raters disagreed about no injury vs any injury (AAST grade ≥ I). This occurred in 34 (5.6%) cases. The median rating of splenic injury of the raters who identified an injury was AAST grade I (28 cases), but there were 5 cases otherwise rated as grade II injuries. After review of illustrative cases by a fellowship trained abdominal radiologist and trauma surgeon, the most common reason for discrepancy was interpretation of injury vs a cleft (Figure 1A) or artifact (Figure 1B) which would otherwise be interpreted as a grade I injury. In another case, a subtle subcapsular hematoma was not readily identified but could be seen with adjustments of the window width and level (Figure 1C). Finally, there was also a single case where two raters identified no injury, and three identified a grade 4 injury (Figure 1D) due to differences in interpretation of subtle hypoattenuation (>50% of splenic parenchyma) that was best appreciated with a coarse window width and level. Examples of discrepancy between no-injury vs injury (AAST Grade ≥ I). Disagreement due to the interpretations of a (A) grade II injury vs congenital cleft, (B) grade II injury vs streak artifact from the left arm, (C) subcapsular hematoma vs perisplenic fluid, and (D) heterogeneous enhancement vs grade IV injury.
In the second scenario, cases with minimum 2 rater disagreement about high grade (AAST grade ≥ IV) vs low grade (AAST grade ≤ III) were identified in 46 (7.5%) of cases. In the majority of these (82%, n = 38) the disagreements were between a median rating of AAST grade III and IV. In 7 cases, the disagreement was between AAST grade II and IV. In one case (Figure 1D), there was disagreement between AAST grade 0 and grade IV. The disagreement in the majority of these 7 cases was based on identification and interpretation of hypervascular foci as vascular injuries or sites of intraparenchymal bleeding (Figure 2 A-C). Examples of disagreement between low grade (AAST Grade I-II) and high grade (AAST Grade IV) injuries. (A) and (B) A cluster of lacerations is present at the upper pole of the spleen (arrowhead). A small focus of hyperdensity (arrows) is closely related to the lacerations and was considered a splenic vascular injury by 2 of the 5 readers. (C) Two radiologists interpreted the punctate hyperdensity (arrow) adjacent to a solitary laceration as a vascular injury.
Discussion
In this study we collected a large number (n = 610) of CT scans for blunt trauma patients with splenic injury presenting to a level 1 trauma centre over a 15-year period. Each scan was independently reviewed by 5 fellowship trained abdominal radiologists and provided an AAST OIS spleen injury grade to assess the inter-rater agreement. The key finding of our study was that absolute inter-rater agreement between grades was minimal (Fleiss kappa statistic .38, P < .001) with the best agreement being moderate for grade V injuries, and otherwise being weak (grade IV) or minimal (grades I-III). The inter-rater agreement significantly improved when accounting for the degree of disagreement between raters (Kendall coefficient of concordance for inter-rater relative agreement of .80, P < .001), demonstrating that disagreements between close grades of splenic injury contributed significantly to the overall rate of disagreement. To our knowledge, this is the first study of inter-rater agreement for the AAST classification system since the 2018 update.
While differences in rating between grade I and II injuries, for example, may not have significant clinical impact, we performed a qualitative review of instances where disagreement is of high clinical importance. We found that in 5.6% of cases, there was at minimum two rater disagreement about whether a splenic injury occurred or not. This could have significant clinical implications, such as patient disposition or determining the source of any abdominal free fluid. Similarly, in 7.5% of cases a minimum two raters disagreed about whether an injury was high grade (AAST grade IV-V) or low grade (AAST grade I-III). Again, this could impact the decision to manage a patient non-operatively, the risk of non-operative management failure, or as an indication to perform splenic artery embolization.
We therefore performed a qualitative review of these cases by an abdominal radiologist and trauma surgeon, and found that likely causes of disagreement included interpretation of clefts or artifact which could otherwise be interpreted as a splenic laceration. Similarly, there were several differences in interpretation of peri-splenic fluid vs subcapsular hematoma. Another source of disagreement was a differential application of adding multiple low grade injuries to higher grade injuries. Lastly, particularly in disagreement between low and high grade injuries, the identification and interpretation of subtle vascular injuries likely contributed to the disagreement.
There is limited research on the inter-rater reliability of the AAST scoring system for blunt splenic injuries relative to its clinical importance in managing these patients. Our findings are consistent with Clark et al 16 who found minimal inter-rater agreement in 64 patients with blunt splenic injury, though this was performed prior to the 2018 update. Our study has almost 10 times the number of patients and provides similar findings. In contrast, Olthof et al 15 found strong agreement in AAST grading prior to the 2018 update in 83 scans with three raters. They found strong agreement between two of their experienced raters, which likely contributed to the overall strength of agreement in the study. There have been no other studies reporting on inter-rater agreement of the AAST classification since the 2018 update. One major contributor to disagreement may be knowledge translation surrounding the scoring system 2018 update. The scoring system consists of a single table which lacks CT-specific illustrative cases leaving significant room for interpretation of the wording. Similarly, the increase in grading for multiple injuries, mentioned only as a footnote to the table, lacks details and examples leading to potential differences in application. For example, if three lacerations were identified in a scan which on their own would be rated as a grade I injury, some raters would provide the overall grade as II, while others may grade it as III given the three injuries. Providing educational resources with illustrative cases may help to reduce these problems. Finally, the current grading system attempts to encompass imaging, surgical, and pathological findings which significantly limits the detail required to grade injuries for each physician team.
While non-operative management has become standard of care for hemodynamically stable blunt injury patients and is successful in >85% of patients in most modern series, increased splenic injury grade has been found to predict non-operative management failure. 8 The splenic injury grade is also critical for splenic artery embolization (SAE), with the Society of Interventional Radiology stating SAE should be considered for stable patients with grade IV – V injuries. 21 Routine prophylactic embolization of grade IV-V injuries has been shown to have similar splenic rescue rates compared to surveillance and selective embolization but reduced hospital length of stay, 9 therefore discrepancies in splenic injury grading could impact management plans. Our results show that the highest levels of agreement do occur in grade IV and V injuries, though it remains only weak and moderate respectively. In 7.5% of cases, there was disagreement between two or more raters about high grade vs low grade injury, often due to subtle differences in identification and interpretation of vascular injuries. Therefore, extra care should be taken by the radiology and trauma team to identify these injuries.
This study is the first to measure inter-rater reliability using the AAST OIS 2018 update, and included a large number of scans (n = 610) compared to previous similar studies. The qualitative review by a radiologist and trauma surgeon also adds to our understanding of possible sources of disagreement. The study does have limitations, mainly that this was performed at a single center and therefore the results may not be generalizable to other centers. An important difference between centers may be the protocol used to acquire the CT. At our center, we used a split-bolus whole body CT resulting in a single acquisition that combines arterial and portal venous phase imaging. The advantages of this are decreased radiation exposure in this often young patient population. It also results in fewer images to store and interpret. Some studies, however, have suggested that the arterial phase improves performance in detecting active extravasation or contained vascular injury. 22 In a 2021 study by Hemachandran et al, 23 the addition of an arterial phase resulted in increased detection of vascular injuries that resulted in increased splenic injury grades using the AAST OIC 2018 update. Therefore it is possible that inter-rater agreement may be different using a dual phase protocol. Given our qualitative review did identify that identification and interpretation of subtle vascular injuries contributed to disagreement particularly between low and high grade injuries, our center is contemplating a trial of a dual phase protocol for trauma patients.
Conclusion
We found low absolute agreement in grading of splenic injuries using the existing AAST OIS for splenic injuries, including for two key clinical scenarios that could significantly impact patient management decisions. Extra attention is therefore needed by radiology and trauma teams for identification and interpretation of subtle vascular injuries which can significantly impact splenic injury grading, and therefore possible clinical management of these patients. Improving the reliability of splenic injury grading may also improve prediction of non-operative management failure for spleen injury patients. Development of new knowledge translation methods such as representative images on the AAST website or creation of online modules could potentially improve agreement. Finally, the impact of dual phase CT protocols on inter-rater reliability should be evaluated further.
Footnotes
Author Contributions
RCAM participated in literature search, data analysis and interpretation, writing, and critical revision. MT participated in literature search, study design, data collection, analysis, and interpretation, writing, and critical revision. PV, MW, MB, and PC contributed with data collection and critical revision. HL participated in literature search, data collection and analysis. DG and EC contributed to study design, data collection, analysis, and interpretation, writing, and critical revision.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the David Gomez received funding from the Division of General Surgery, St Michaels Hospital, Innovation Funds. Errol Colak received funding from the Odette Professorship in Artificial Intelligence for Medical Imaging, St. Michael’s Hospital, Unity Health Toronto.
