Abstract
Objective:
Appendicitis is a common surgical emergency, for which sonography is widely used to assess the presence of absence of appendicitis. The objective of this study was to identify factors associated with the diagnostic accuracy of sonography in patients with a suspected case of acute appendicitis.
Methods:
A retrospective review was conducted in all patients who were assessed for acute appendicitis (with sonography) in two emergency rooms of a large hospital. The sonography result was compared with the pathological report in patients who underwent surgery under suspicion of an acute appendicitis.
Results:
A multivariate analysis revealed that operator (radiologist or resident), time of the sonogram, site (hospital), and body mass index were independent variables significantly influencing the sonographic result. Correctly diagnosing suspected appendicitis with sonography is 2.2 times more likely when performed by a radiologist compared to a resident.
Conclusions:
Sonography is widely used in diagnosing appendicitis in the emergency room. In this study, the probability of sonography being correctly diagnosed decreases during on-call hours or when conducted by a resident in patients with suspected appendicitis. Furthermore, increasing body mass index decreases the probability of a sonogram being correct when conducted by a resident compared to a radiologist.
Acute appendicitis is one of the most common surgical emergencies, 1 for which diagnostic medical sonography (DMS) is widely used to diagnose. A DMS examination has certain advantages and disadvantages compared to magnetic resonance imaging (MRI) and computed tomography (CT). The advantages of DMS are a lack of radiation exposure and wide availability. It has therefore been recommended as the first choice in evaluating patients with abdominal pain. 2 The disadvantage of DMS has been its limited diagnostic accuracy. There is a great variation in the published evidence specific to the diagnostic accuracy in DMS to detect appendicitis.3–6 One study even reported a rate of 82% inconclusive diagnoses with DMS. 4 Two Dutch studies have been published involving a diagnostic pathway using DMS as a primary diagnostic resource and adding CT. The conclusion of that study was that combining CT and DMS after an inconclusive sonographic result had higher positive and negative predictive values. Both the cited articles, however, recommended that DMS was the first step in diagnosing appendicitis.7,8 An MRI examination is indicated for children,9,10 pregnant women, and when the DMS result is not conclusive of suspected appendicitis. 11
Due to the limited diagnostic accuracy, the rate of inconclusive DMS results for appendicitis may be high. This may be attributed to patient-related factors such as a higher body mass index (BMI) for age percentile (FAP) class, 12 greater abdominal wall thickness,13,14 lower pain scores, 14 and a retrocecal appendix.13,15 Furthermore, the inter- and intraobserver variabilities for DMS in suspected appendicitis are very high, with κ values of 0.15 to 0.20, 16 an interobserver variability of 0.7, 17 and an intraobserver variability of 0.39 to 0.42. 16 A prospective study concluded that DMS performed by residents, without an attending radiologist’s supervision, resulted in a high number of missed emergency conditions. 18
In the Netherlands, teaching hospitals generally have residents and attending radiologists performing DMS for appendicitis as part of the acute radiology service. In the Netherlands, a radiology residence program takes five years to complete. During this time, residents work under the supervision of an attending radiologist in varying degrees of independence, based on their experience. Every resident has at least three months of training in both DMS and CT before being available for an on-call service. In hospitals without radiology training programs, attending radiologists conduct the radiology service during office hours and during on-call hours. In one of the larger hospitals in the Netherlands, there are two sites where patients are assessed in the emergency room for acute abdominal conditions. The first site does not have a radiology training program. Therefore, only attending radiologists conduct the radiology service in the emergency room (site 1). The second site has a training program for radiologists, and therefore in that emergency room (site 2), both residents and attending radiologists provide the services. During office hours, either a resident or an attending radiologist provides a DMS, but during on-call hours, a resident at site 2 performs this service. In site 1, attending radiologists conduct DMS during on-call hours and during office hours.
DMS is a dynamic examination that requires experience to visualize the appendix and to interpret the findings upon imaging. At one of the sites, there was a concern that it had a lower chance of identifying patients with acute appendicitis. The hypothesis, based on the delivery of services at that site, was that patients being assessed on call might have a less conclusive DMS result than those performed during regular working hours. In this study, the researchers evaluated the differences in conclusive DMS results for appendicitis between residents and attending radiologists, time of DMS performance (office hours and on-call hours), and between site 1 and site 2.
Materials and Methods
Records were retrospectively reviewed for patients who presented to the emergency room with suspected acute appendicitis during 2015 and 2016. Both children and adults were included if they had the diagnostic code for appendicitis. The pathology report was considered the gold standard for confirming the diagnosis of acute appendicitis. Based on the pathology report, only patients with a proven diagnosis of appendicitis were included in this review. Patients were excluded from the study if they had not undergone an appendectomy, given the lack of pathological confirmation. After the inclusion and exclusion criteria were met, 435 patients were included in this review. The hospitals’ joint medical ethical committee granted approval (17-N-46), and informed consent was waived.
The data collected were as follows: patient characteristics, DMS report, pathology report, and the site of presentation. The following patient characteristics were noted: age, sex, BMI (in kg/m2), suspicion of appendicitis (extracted from the DMS request form), and duration of complaints (in three groups: <12 hours, 12–24 hours, and >24 hours), white blood cell count (mmol/L), and C-reactive protein (CRP) value (in mmol/L). In cases where DMS was performed, the result was extracted from the radiologist’s report and noted as either positive or equivocal. The primary outcome measurement for appendicitis on DMS imaging was an appendix diameter of more than 6 mm, but suggestive secondary findings are factors such as the presence of an appendicolith, inflammatory fat changes, and free peritoneal fluid. 19 With these results, the resident or attending radiologist was able to conclude whether the DMS was positive, negative, or equivocal for appendicitis. These conclusions were noted and compared to the findings provided in the pathology report (appendicitis or no appendicitis).
Furthermore, we collected data regarding who performed the DMS (resident or attending) and the time (office hours or on-call hours). If a resident required help from an attending radiologist in performing the DMS, the attending radiologist was considered the operator. On-call hours of DMS were defined as follows: if the examination took place between 5:00
Statistical Analysis
All analyses were performed using IBM SPSS version 24 (SPSS, Inc., an IBM Company, Chicago, Illinois). Patient characteristics were described using mean and standard deviation (SD) or frequency and percentage. The DMS results were compared between operators (resident or attending radiologists). Differences were tested between patient characteristics and procedural characteristics, office hours and on-call hours, and between sites. Clinical, suspicion, sex, duration of complaints, examiner, time of evaluation, site, and pathology examination were compared using Pearson’s χ2 test. Age, BMI, and white blood cell count were compared using the independent t test. CRP was compared using the Mann-Whitney test due to the highly skewed distribution of CRP values. Potential confounding by relevant baseline characteristics was corrected for using multiple logistic regression. To assess the influence of BMI in a logistic regression, subjects were divided in three categories: an underweight group (BMI less than 20.00), an average-weight group (BMI from 20.00–24.99), and an overweight group (BMI 25.00 and larger). Furthermore, operator, time, and site were examined. Sensitivity was calculated by comparing the results of the DMS with the pathology results. Since only patients with a pathologically proven appendicitis were included, only sensitivity could be determined. This was examined by comparing the rate of positive DMS results with the total amount of proven appendicitis using the pathology report.
The χ2 test was used to test for differences in the proportion of false negatives of DMS between site 1 and site 2 during office hours, during on call hours, and in total. The same was done to test for differences in the proportion of false negatives between registrars and consultants during office hours, during on-call hours, and in total and between on-call hours and office hours in total.
A P value <.05 was considered statistically significant. As there were three groups for duration of complaints, the output was corrected for multiple testing using the Bonferroni correction. Hence, the α used for testing was reduced from 0.05 to 0.0513.
Results
A total of 435 patients were included in this study. The baseline characteristics of the participants were reported, stratified by operator (resident vs. attending). There were no statistically significant differences in the baseline characteristics, stratified by operator (Table 1). Furthermore, there were no statistically significant differences in the percentage of patients with a pathologically proven appendicitis (96% vs. 95%, P = .52). A statistically significant difference was observed in the percentage of positive DMS results for appendicitis between operators (resident = 55%, attending = 66%, P = .032). There were no significant differences when comparing the baseline characteristics of DMS during on-call hours and office hours (Table 2). There was also no statistically significant difference in the percentage of positive DMS results for appendicitis between office hours or on-call hours (office hours = 64%, on call = 60%, P = .387) (Table 2).
Differences in Baseline Demographics in Patients Undergoing DMS, Performed by a Resident or an Attending Radiologist. a
Abbreviations: BMI, body mass index; CRP, C-reactive protein; DMS, diagnostic medical sonography.
P values were calculated to assess a significant difference in characteristics. Values are numbers (percentages) or mean (standard deviation).
P value ≤.05 is significant.
Differences in Baseline Demographics in Patients Undergoing DMS During Office Hours or On-Call Hours. a
Abbreviations: BMI, body mass index; CRP, C-reactive protein; DMS, diagnostic medical sonography.
P values were calculated to assess a significant difference in characteristics. Values are numbers (percentages) or mean (standard deviation).
P value ≤.05 is significant.
Analysis of the patients seen at the different sites indicated that there were no statistically significant differences in the baseline characteristics or percentage of positive DMS results (Table 3).
Differences in Baseline Demographics in Patients Undergoing DMS in Site 1 or Site 2. a
Abbreviations: BMI, body mass index; CRP, C-reactive protein; DMS, diagnostic medical sonography.
P values were calculated to assess a significant difference in characteristics. Values are numbers (percentages) or mean (standard deviation).
P value ≤.05 is significant.
Table 4 shows the comparison between correct and incorrect results after DMS. There was no statistically significant difference between resident and attending radiologists (P = .83). There was, however, a significant difference in results between operators when carried out during on-call hours, with an odds ratio indicating that DMS performed during on-call hours by an attending radiologist was more likely to be conclusive for appendicitis (odds ratio [OR], 2.2; 95% confidence interval [CI], 1.10–4.31; P = .02). The same comparison applies to the hospital’s two sites as there was no statistically significant difference between them during office hours (P = .63), but during on call hours, DMS results for suspected appendicitis were less likely to be correct in site 2 compared to site 1 (OR, 0.32; 95% CI, 0.16–0.66; P = .002). No overall difference was seen between DMS being performed during on-call hours and during office hours (P = .99).
Reporting χ2 Values of the Different Groups, Comparing Correct vs. Incorrect Results. a
Abbreviations: CI, confidence interval; OR, odds ratio.
Numbers are values.
P value ≤.05 is significant.
Multiple logistic regression analyses were conducted to predict a true-positive DMS result in patients with proven appendicitis (Table 5). The influence of the operator was assessed; the attending radiologist was used as the reference group. A difference was observed between attending radiologists and residents on the probability of a true-positive DMS result for pathologically proven appendicitis (P = .047). An association was noted between the resident and the odds of a true-positive result when compared to an attending radiologist (OR, 0.643; 95% CI, 0.415–0.995).
Predicting Correct Results of Diagnostic Medical Sonography Using Simple and Multivariate Logistic Regression. a
Abbreviations: AW, average weight; BMI, body mass index; CI, confidence interval; HW, heavy weight; LW, light weight.
Numbers are values. Model 1 consisted of operator, time, and site and the three mutual interactions. After backward logistic regression, only the interaction between site and time was statistically significant. Model 2 consisted of operator, time, site, and the interactions among operator, BMI (three groups), and site/location. Both interactions were statistically significant.
P value ≤.05 is significant.
The influence of the site (attending radiologist and residents [site 2] vs. only attending radiologists [site 1]) was assessed, with site 1 as the baseline group. The site of DMS examination had a statistically significant influence on the DMS result (P = .028). An association was also noted between site 2 and the odds of a true-positive result compared to site 1 (OR, 0.61; 95% CI, 0.39–0.95). The influence of BMI was assessed, with the heavy weight category (BMI ≥25) as the reference group. Compared to the reference, the average-weight group (BMI 20–24.99) had a statistically significant influence on the DMS result (P = .014), but the light-weight group (BMI <20.00) did not seem to differ from the reference group (P = .192). There was an association between patients of average weight and the odds of a true-positive result compared to the heavy-weight group (OR, 2.02; 95% CI, 1.15–3.55).
In a backward stepwise elimination test using operator, time, site, and mutual interactions as predictors, the interaction of time and site was statistically significant (P = .021). The baseline group was the interaction of site 2 and office hours. An association was observed between the DMS results during office hours in site 2 and the odds of a true-positive result (OR, 2.98; 95% CI, 1.18–7.51).
A backward test using operator, time, site, and BMI and the interactions of time and site, as well as the three BMI groups and operator as predictors, demonstrated that both interactions were statistically significant. The baseline groups were site 2 and office hours as well as registrar and the heavy-weight group. The interaction of time and site was statistically significant (P = .0231). In this model, an association was observed between DMS results during office hours in site 2 and the odds of a true-positive result (OR, 4.06; 95% CI, 1.22–13.53). The interaction between operator and BMI was statistically significant. In this model, there was an association between DMS performed by residents in the middle-weight group and the odds of a true-positive result (OR, 3.61; 95% CI, 1.07–12.11). There was also an association between DMS performed by residents in the light-weight group and the odds of a true-positive result (OR, 7.68; 95% CI, 1.34–44.02).
Discussion
The aim of this study was to identify factors associated with DMS accuracy in patients with appendicitis. The study hypothesis was that the operator, time, and site for DMS had a significant influence on the correct diagnosis of acute appendicitis. These results clearly depict a significant difference in conclusive DMS results for appendicitis with independent variables being operator, time, and site for the DMS in this institution. BMI has a significant influence on the DMS result. The study results suggest that operators have more difficulty examining patients with an increasing BMI. When comparing correct and incorrect results between different groups, site 1 had more correct results than site 2. When comparing examiner and site for DMS, results are only statistically significant if the DMS was performed during on-call hours. This might suggest that supervision by an attending radiologist is more readily available during office hours.
Operator, site, and BMI were all influential in obtaining the correct diagnosis of appendicitis. But when comparing operator, time, and site in a model, the interaction of time and site suggests that DMS in site 2, during normal office hours, was three times more likely to be diagnostically accurate for appendicitis. Taking into account that site 2 has a lower OR (0.325 in model 1, 0.611 as sole predictor) compared to site 1 for correctly diagnosing appendicitis, one could assume that DMS conducted in site 2, during office hours, was significantly more accurate to diagnose appendicitis compared to on-call DMS in site 2. When adding BMI to the model, the interaction of BMI and operator suggests that a radiology resident has more difficulty performing DMS with a conclusive diagnosis in patients with an increasing BMI. There were no significant differences between the light-weight and average-weight group. However, there is a statistically significant difference when both operator and BMI are compared as an interaction. This may suggest that residents may have difficulty performing DMS with higher BMI patients.
The strength of this study is that it evaluates the influence of operator and time on the diagnostic result. In this study, no distinction was made in level of experience of the attending radiologists, but a significant difference was observed between residents and attending radiologists. In many hospitals in the Netherlands, the on-call radiology service is conducted by residents; this study therefore adds valuable information when consulting the radiology service for cases of suspected appendicitis. Further research could be done to assess the different experience levels of the residents and their influence on the DMS result.
The applicability of this study is not only medically important but also has economic implications. Appendicitis is a common abdominal emergency, 1 and knowing that in certain cases the accuracy of DMS in suspected appendicitis is low, a CT could be the primary investigation to be conducted. DMS could be more time efficient due to more unequivocal diagnoses when compared to other imaging modalities.16,20–22 CT is known to be more sensitive, resulting in a smaller time to diagnosis, therefore leading to a more efficient patient flow in the emergency room,
There are certain limitations to this study. Foremost, this was a retrospective study, and patients were not examined by both residents and attending radiologists. It would be interesting to assess true differences in DMS accuracy in both groups of operators. Second, the ultrasound equipment was not the same at both sites. Information regarding the specific equipment used for a patient was not recorded for this study. This may have a confounding effect on the results. Third, experience levels of the resident or attending radiologists performing DMS were unknown, possibly leading to a generalized view of all residents and attending radiologists. Finally, another potential disadvantage was that only patients with a pathologically proven appendicitis were included. This study parameter could have led to biased results, as patients with a clinical suspicion of appendicitis but lacking pathological evidence were excluded. The specificity of DMS could therefore not be calculated in this study.
There are several clinical implications. First, in cases in which an unequivocal DMS result is expected, CT may be the preferred imaging modality to shorten the time to diagnosis in the emergency room. Second, attending radiologists could play a greater role in achieving a conclusive diagnosis by supervising radiology residents when needed.
Conclusion
DMS is widely used in diagnosing appendicitis in the emergency department. The probability of diagnostic accuracy for DMS decreased during on-call hours or when conducted by a radiology resident at this particular institution. Increasing BMI reduced the probability that DMS would yield the correct diagnosis when conducted by residents compared to attending radiologists.
This study examined the significant difference between residents and attending radiologists, as well as time of DMS for appendicitis with pathologic confirmation. It also explored the difficulty that residents have accurately assessing patients by using DMS. 18 These results suggest that there may be a correlation between increasing BMI and the experience level of the operator, with an OR of 1.20 for residents compared to 1.06 for an increasing BMI.
A considerable body of evidence suggests that DMS performed during on-call hours or by a radiology resident is less likely to correctly diagnose appendicitis. This could lead to a longer time to diagnosis in the emergency department for those patients suspected of having acute appendicitis.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
