Sage Journals: Discover world-class research

Abstract

Objective

Artificial intelligence (AI) could help medical practitioners in analyzing radiological images to determine the presence and site of bowel obstruction. This retrospective diagnostic study proposed a series of deep learning (DL) models for diagnosing bowel obstruction on abdominal radiograph.

Methods

A total of 2082 upright plain abdominal radiographs were retrospectively collected from four hospitals. The images were labeled as normal, small bowel obstruction and large bowel obstruction by three senior radiologists based on comprehensive examinations and interventions within 48 hours after admission. Gradient-weighted class activation mapping was used to visualize the inferential explanation.

Results

In the validation set, the Xception-backboned model achieved the highest accuracy (0.863), surpassing the VGG16 (0.847) and ResNet models (0.836). In the test set, the Xception model (accuracy: 0.807) outperformed other models and a junior radiologist (0.780) but not a senior radiologist (0.840). In the AI-aided diagnostic framework, the junior and senior radiologists made their judgements while aware of the Xception model predictions. Their accuracy significantly improved to 0.887 and 0.913, respectively.

Conclusions

We developed and validated DL-based computer vision models for diagnosing bowel obstruction on plain abdominal radiograph. DL-based computer-aided diagnostic systems could reduce medical practitioners’ workloads and improve diagnostic accuracy.

Keywords

Computer vision deep learning Xception model gradient-weighted class activation mapping bowel obstruction plain abdominal radiograph diagnostic accuracy convolutional neural network

Introduction

Bowel obstruction is a common cause of acute abdominal pain and is frequently encountered in emergency departments. It accounts for approximately 15% of cases presenting with abdominal pain.¹ The blockage of the gastrointestinal tract due to abnormalities such as adhesions, hernias or tumors leads to the dilation of the bowel upstream.^2,3 Small bowel obstruction (SBO) is more common than large bowel obstruction (LBO), although symptoms of SBO and LBO often overlap.⁴ The early diagnosis of bowel obstruction is important to minimize morbidity and mortality. Practitioners must obtain a comprehensive medical history to identify important risk factors associated with bowel obstruction. To guide treatment decisions, medical imaging is used to determine the severity and cause of the obstruction.^5,6

Abdominal radiography is commonly used as the initial imaging modality for suspected SBO patients owing to its wide availability and cost-effectiveness. The accuracy of this technique ranges from 50% to 86%.^7,8 A recent study involving trainee, junior and senior radiologists found that the accuracy on supine and upright abdominal radiographs ranged from 69% to 93%.⁸ Correct diagnosis of bowel obstruction requires long-term training for radiologists.

The development of machine learning applications in medical imaging has gained considerable attention both in academia and industry.⁹ Deep learning (DL), a subfield of machine learning that uses deep convolutional neural networks, has gained popularity because of its exceptional performance in image analysis tasks.^10,11 Several previous studies have reported on the application of DL in the detection of pulmonary nodules, the diagnosis of gastrointestinal lesions and the progress of diabetic retinopathy.^12–14

In this multicenter study, we aimed to develop and validate a series of DL models for diagnosing and classifying bowel obstruction on upright plain abdominal radiograph, and to compare the performance of junior and senior radiologists.

Methods

This retrospective diagnostic study was approved by the ethics committee of The First Affiliated Hospital of Soochow University (approval number: 2022098). The ethics committee of The First Affiliated Hospital of Soochow University waived the requirement for written informed consent owing to the retrospective nature of the study and the use of deidentified data. The study was conducted in accordance with the Helsinki Declaration of 1975, as revised in 2013. The reporting of the study conformed to the Checklist for Artificial Intelligence In Medical Imaging (CLAIM).¹⁵

Datasets

Patients with suspected bowel obstruction who underwent upright plain abdominal radiographs were recruited from four hospitals: #1, The First Affiliated Hospital of Soochow University, #2, Suzhou Hospital of Traditional Chinese Medicine, #3, Kowloon Hospital of Shanghai Jiao Tong University and #4 Jintan Affiliated Hospital of Jiangsu University. Patients were randomly selected between 2015 and 2021. The abdominal radiographs were performed within 12 hours after admission. Data from the first three hospitals were used as training and validation datasets; data from the fourth hospital served as an independent test dataset. Patients’ personal data, such as name and sex, were deidentified to prevent unauthorized use of the data.

Images were labeled based on comprehensive data from further examinations (e.g., computed tomography [CT], magnetic resonance imaging, colonoscopy and surgical operation) within 48 hours after admission. These data were used as the ground truth to label the images by three senior independent radiologists (with more than 15 years of experience). The images were labeled as normal, SBO, and LBO.

Model training

After selection and labeling, a total of 1932 upright plain abdominal radiographs (Normal: 640; SBO: 654; LBO: 638) from three hospitals (#1, #2 and #3) were used to train the models. These radiographs were split into a training set (Normal: 512; SBO: 523; LBO: 510) and a validation set (Normal: 128; SBO: 131; LBO: 128) at an 8:2 ratio.

All radiographs were saved in JPEG format. All images were rescaled to 331 × 331 pixels and then the pixel values were normalized from 0–255 to 0–1. Three convolutional neural network (CNN) backbones, MobileNet V1, ResNet50V2 and Xception, were selected. Three fully connected layers (ReLU activation) and one dense layer (Softmax activation) were added to the top of the backbones for transfer learning. The backbones were previously trained in the ImageNet database (www.image-net.org). The pretrained parameters were obtained from Keras (https://keras.io/2.15/api/applications/). The models were trained in Python (version: 3.8.18) (https://www.python.org) and TensorFlow (2.13.0) (https://www.tensorflow.org). The Adam optimizer and the categorical cross-entropy cost function, with a fixed learning rate of 0.0001 and a batch size of 32, were compiled in the model training. A link to the codes for the training procedure can be found here: https://osf.io/4tdhu.

Comparison of model and radiologist performance

To further evaluate the performance of the models, abdominal radiographs from the test dataset were determined by an additional two radiologists (a junior radiologist with 5 years of experience, and a senior radiologist with 17 years of experience).

Visualization of the model

The visualization of the models was performed using gradient-weighted class activation mapping (Grad-CAM) to provide an inferential explanation.^12,16 Grad-CAM uses the class-specific gradient information in the final convolutional layers of the CNN backbones to map the key areas of images.

Statistical analysis

Statistical analysis was conducted using R studio (version 4.2.2; www.r-project.org). True positives (TP), true negatives (TF), false positives (FP) and false negatives (FN) were enumerated to assess the classifiers.

The accuracy represents the proportion of samples that were classified correctly among all samples, as shown in Equation 1.

Accuracy = \frac{T P + T N}{T P + T N + F N + F P}

(1)

Recall quantifies the number of positive class predictions made out of all positive samples, as shown in Equation 2.

Recall = \frac{T P}{T P + F N}

(2)

The Matthew correlation coefficient (MCC) measures the differences between the actual and predicted values, as shown in Equation 3. The MCC is the best single-value classification metric for summarizing the confusion matrix.

Matthew ’ s correlation coefficient = \frac{T P * T N - F P * F N}{\sqrt{\begin{matrix} (T P + F P) (T P + F N) & (T N + F P) (T N + F N) \end{matrix}}}

(3)

Cohen’s kappa was used to measure the level of agreement between two raters or judges who each classified items into mutually exclusive categories, as shown in Equation 4.

Cohen’s  Kappa = \frac{p 0 - p e}{1 - p e}

(4)

p0: relative observed agreement between the raters; pe: hypothetical probability of agreement.

Results

The study flowchart is shown in Figure 1. The process of image collection is shown in Figure 2.

Figure 1.

Study flowchart. The flowchart of the study comprises three parts: (1) using a CNN model pretrained on ImageNet; (2) training and validating the CNN model based on 1932 abdominal radiographs; (3) testing the final model based on 150 external abdominal radiographs. SBO, small bowel obstruction; LBO, large bowel obstruction; CNN, convolutional neural network.

Figure 2.

Process of image collection. A total of 1932 radiographs (Normal: 640; SBO: 654; LBO: 638) were collected from three hospitals to train the CNN model. A total of 150 radiographs (Normal: 50; SBO: 50; LBO: 50) were collected to test the trained model. SBO, small bowel obstruction; LBO, large bowel obstruction; CNN, convolutional neural network.

Model performance in the validation set

The confusion matrix of the three models in the validation set is plotted in Figure 3(a). The Xception-backboned model achieved the highest accuracy of 0.863, surpassing the VGG16 model (0.847) and ResNet (0.836). The Xception model also demonstrated the highest recall for SBO and LBO, reaching 0.815 and 0.893 respectively, with a Marco recall of 0.854.

Figure 3.

Confusion matrix of the models and radiologists in sets. (a) Confusion matrix of the models (VGG16, ResNet50V2, Xception) in the validation set and (b) confusion matrix of the models (VGG16, ResNet50V2, Xception) and radiologists (junior and senior) in the test set. SBO, small bowel obstruction; LBO, large bowel obstruction.

Model performance in the test set

The confusion matrix of the models in the test set is plotted in Figure 3(b). The Xception-backboned model outperformed others, with an accuracy of 0.807, followed by VGG16 (0.780) and ResNet (0.773) (Table 1). Similarly, the Xception model showed superior recall for SBO and LBO (0.800 and 0.780), and a Marco recall of 0.790. In terms of multiclass metrics, the Xception model maintained its position as the best performer with the highest MCC and Cohen’s kappa values (0.710 and 0.710.

Table 1.

Performance of deep learning models and radiologists in the test dataset.

	Accuracy	Recall (SBO)	Recall (LBO)	Marco recall	Matthew correlation coefficient	Cohen’s kappa
VGG16	0.780	0.740	0.780	0.760	0.671	0.670 [0.570–0.777]
ResNet50V2	0.773	0.780	0.740	0.760	0.661	0.660 [0.560–0.760]
Xception	0.807	0.800	0.780	0.790	0.710	0.710 [0.620–0.800]
Junior radiologist	0.780	0.800	0.800	0.800	0.661	0.660 [0.560–0.760]
Senior radiologist	0.840	0.840	0.860	0.850	0.760	0.760 [0.670–0.850]
Xception + Junior radiologist	0.887	0.900	0.800	0.890	0.820	0.820 [0.740–0.900]
Xception + Senior radiologist	0.913	0.920	0.920	0.920	0.870	0.870 [0.800–0.940]

SBO, small bowel obstruction; LBO, large bowel obstruction.

Comparison of model and radiologist performance

To further investigate the models, a comparison was made between the models’ performance in the test set and the performance of junior and senior radiologists. The junior radiologist achieved an accuracy of 0.780, with SBO and LBO recall both 0.800, along with MCC and Cohen’s kappa values of 0.661 and 0.660, respectively (Table 1). The senior radiologist demonstrated an accuracy score of 0.840, with SBO and LBO recall rates of 0.840 and 0.860, respectively, resulting in a Marco recall of 0.850, and MCC and Cohen’s kappa values of 0.760 and 0.760 [0.670–0.850].

AI-aided performance

In the AI-aided diagnostic framework, the junior and senior radiologists made their judgements while aware of the Xception model predictions. They showed a significant improvement in accuracy (junior radiologist: from 0.780 to 0.887; senior radiologist: from 0.840 to 0.913).

The Grad-CAM heatmap

Using the gradient information of the last convolution layer of the Xception model, the Grad-CAM was plotted and highlighted the lesions of the original images (Figure 4). The left column displays the original abdominal radiographs. The middle column illustrates the Grad-CAM heatmap of the output of the last convolution layer. The right column shows the Grad-CAM heatmap added to the original abdominal radiographs, which highlights the key areas for inferential explanation identified by the Xception-backboned model.

Figure 4.

Grad-CAM heatmap. The left column displays the original abdominal radiographs. The middle column illustrates the output of the last convolution layer. The right column shows the Grad-CAM heatmaps added to the original abdominal radiographs, which highlights the key areas for inferential explanation identified by the Xception-backboned model.

Discussion

This study explored a series of DL models for computer-aided diagnosis of bowel obstruction on plain abdominal radiograph. Three CNN backbones were chosen and developed to multiclassification models. Among them, the Xception-backboned model performed the best, and showed better results than those of a junior radiologist. In the proposed AI-aided diagnostic framework, the performance of the junior and senior radiologists showed significant improvement.

Although CT scans are superior at identifying obstructions, determining obstruction sites and demonstrating early-stage complications, their performance is based on abnormal abdominal radiograph. Moreover, abdominal radiographs have advantages for identifying obstructions. First, this technique can be used in less-developed areas or community health examination centers.¹⁷ Second, it is commonly used for initial screening of abdominal pain and post-surgical follow-up owing to its widespread availability, low cost, low radiation exposure and continuous tracking abilities.¹⁸ Finally, one study indicated that CT scans should not be frequently used in the decision-making process except when clinical symptoms, physical examinations and radiograph data are not conclusive for bowel obstruction.¹⁹ However, accurately identifying obstruction sites on radiographs can be challenging, especially for non-radiologists or less experienced radiologists.²⁰ The presence of normal gas in the bowel and dilated bowel often leads to ambiguity and missed or misdiagnosed obstruction sites.²

To address this challenge, computer-assisted decision support systems could be used to help medical practitioners to analyze radiological images to determine the presence and site of bowel obstructions. Vanderbecq et al.²¹ developed a DL model to locate transitional zones using 562 CT scans of adhesion-related SBO, achieving an area under the receiver operating characteristic curve of 0.934. Oh et al.²² used CT images to generate a prediction model for identifying high-risk acute SBO patients; the model showed superior performance (accuracy: 0.726). In 2018, Cheng et al.²³ proposed a DL model based on Inception V3 that identified high-grade SBO on abdominal radiography. One year later, Cheng et al. used a larger number of radiographs to assess the performance of CNN in detecting high-grade SBO. They conducted two-stage training of the DL model Inception V3 based on databases of two sizes, to classify normal and high-grade SBO in supine plain films. The predictive area under the receiver operating characteristic curve of the introduced model increased with the addition of positive samples, ultimately reaching 0.971.²⁴ Kim et al.²⁵ proposed an ensemble model that combined various DL models, including VGG, DenseNet, NasNet, Inception V3 and Xception, to identify SBO on plain abdominal radiographs. However, these previous studies only focused on the absence or grade of obstruction. No studies have developed models using the classification of SBO, LBO and normal radiography. Moreover, the obstruction labels in previous studies were based on the consensus of senior radiologists rather than clinical outcomes.

In this multicenter retrospective study, transfer learning was used to compare three CNN-backboned DL models for diagnosing and classifying bowel obstruction on upright plain abdominal radiograph. In addition, the model performance was compared with the performance of junior and senior radiologists. The proposed Xception-backboned DL model performed better than the junior radiologist, and the AI-aided diagnostic framework helped the radiologists to improve their classification.

The interpretability of DL-based computer-aided diagnostic tools (e.g., the model inference evidence) is a major concern for medical practitioners, especially those in the field of computer vision. Therefore, we also used the Grad-CAM method to visualize the inferential explanation of the original abdominal radiographs. In the Grad-CAM heatmaps, the correct obstruction areas were identified for feature extraction and further classification.

To our knowledge, this is the first study to use DL to identify bowel obstruction location. We used upright abdominal radiographs rather than supine radiographs to achieve superior diagnostic performance. Moreover, subsequent examinations and surgical notes were used as the ground truth, minimizing the risk of misdiagnosis owing to human error. These findings may assist junior radiologists in diagnostic support and could inform strategies for the initial screening of abdominal pain in community health centers or remote medical service centers.

There were several study limitations. First, owing to retrospective bias and the absence of clinical details and patient information, during the image collection process we did not collect patient clinical information on sex, age and other characteristics such as main indicators (e.g., tube placement, plain/distention and postoperative examinations), treatment situations (e.g., emergency, inpatient, outpatient) and obstruction (more precise sites and obstruction grade). Therefore, we were unable to compare these baseline characteristics between the training and testing datasets. Second, we focused on only one modality: abdominal radiograph. In future research, multimodal models based on medical history, examinations such as white blood cell count and pH value, and other data are needed to develop more complex classifiers for the management of bowel obstruction. Third, owing to a lack of repeated testing, we did not use statistical methods (e.g., calculation of p-values and confidence intervals) to compare the difference in performance between the models and the radiologists, and between independent and AI-aided radiologist performance. Furthermore, a variety of novel DL algorithms have now been developed to handle limited data, especially in medical sets. Additional research is warranted to explore the use of such methods (e.g., few-shot learning and unsupervised learning) for classification modeling of bowel obstruction.

Conclusions

In this study, we developed a series of DL-based computer vision models for the multiclassification of bowel obstruction on upright plain abdominal radiograph. The proposed Xception-backboned DL model performed better than a junior radiologist, and the AI-aided diagnostic framework helped radiologists to improve their classification. Moreover, Grad-CAM was used to increase the interpretability of the DL modeling. The findings suggest that DL-based computer-aided diagnostic systems could reduce medical practitioner workload and improve diagnostic accuracy.

Footnotes

Author contributions

Yu Wang, Shiqi Zhu and Bowei Mao performed the acquisition, analysis and interpretation of data, and the statistical analysis. Yu Wang and Jinzhou Zhu developed the methodology and conducted the writing of the manuscript. Yao Li and Jielu Zhou were responsible for the description and visualization of data. Chenqi Gu contributed to the study design. Jinzhou Zhu and Chenqi Gu provided technical and material support. All authors have read and agreed to the published version of the manuscript.

Data availability statement

A link to the codes used in the training procedure can be obtained here: . The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available owing to privacy and ethical restrictions.

Declaration of conflicting interests

The authors declare that there is no conflict of interest.

Funding

This study was supported by the Medical Education Collaborative Innovation Fund of Jiangsu University (JDY2022018) and the Frontier Technologies of Science and Technology Projects of Changzhou Municipal Health Commission (QY202309).

ORCID iDs

Yu Wang

Jinzhou Zhu

References

Paulson

Thompson

WM.

Review of small-bowel obstruction: the diagnosis and when to worry. Radiology 2015; 275: 332–342. DOI: 10.1148/radiol.15131519.

Jaffe

Thompson

WM.

Large-bowel obstruction in the adult: classic radiographic and CT findings, etiology, and mimics. Radiology 2015; 275: 651–663. DOI: 10.1148/radiol.2015140916.

Pisano

Zorcolo

Merli

, et al. 2017 WSES guidelines on colon and rectal cancer emergencies: obstruction and perforation. World J Emerg Surg 2018; 13: 36. 20180813. DOI: 10.1186/s13017-018-0192-3.

Aquina

Becerra

Probst

, et al. Patients with adhesive small bowel obstruction should be primarily managed by a surgical team. Ann Surg 2016; 264: 437–447. DOI: 10.1097/SLA.0000000000001861.

Nurmsoo

Hajizadeh

30-day readmission after operative management of adhesive small bowel obstruction. Ann Surg 2018; 267: e56–e57. DOI: 10.1097/SLA.0000000000002091.

Long

Robertson

Koyfman

Emergency medicine evaluation and management of small bowel obstruction: evidence-based recommendations. J Emerg Med 2019; 56: 166–176. 20181206. DOI: 10.1016/j.jemermed.2018.10.024.

Nicolaou

Kai

, et al. Imaging of acute small-bowel obstruction. AJR Am J Roentgenol 2005; 185: 1036–1044. DOI: 10.2214/AJR.04.0815.

Thompson

Kilani

Smith

, et al. Accuracy of abdominal radiography in acute small-bowel obstruction: does reviewer experience matter? AJR Am J Roentgenol 2007; 188: W233–238. DOI: 10.2214/AJR.06.0817.

Wang

, et al. Scientific discovery in the age of artificial intelligence. Nature 2023; 620: 47–60. DOI: 10.1038/s41586-023-06221-2.

10.

Wang

Hong

Wang

, et al. Automated multimodal machine learning for esophageal variceal bleeding prediction based on endoscopy and structured data. J Digit Imaging 2023; 36: 326–338. 20221024. DOI: 10.1007/s10278-022-00724-6.

11.

Yin

Lin

Wang

, et al. Development and validation of a multimodal model in predicting severe acute pancreatitis based on radiomics and deep learning. Int J Med Inform 2024; 184: 105341. DOI: 10.1016/j.ijmedinf.2024.105341.

12.

Yin

Zhang

Lin

, et al. Identification of gastric signet ring cell carcinoma based on endoscopic images using few-shot learning. Dig Liver Dis 2023; 55: 1725–1734. DOI: 10.1016/j.dld.2023.07.005.

13.

Mellor

Jiang

Fleming

, et al. Prediction of retinopathy progression using deep learning on retinal images within the Scottish screening programme. Br J Ophthalmol 2024; 108: 833–839. DOI: 10.1136/bjo-2023-323400.

14.

Liu

Hsu

Lin

, et al. Lung nodule malignancy classification with associated pulmonary fibrosis using 3D attention-gated convolutional network with CT scans. J Transl Med 2024; 22: 51. DOI: 10.1186/s12967-023-04798-w.

15.

Mongan

Moy

Kahn

Jr.

Checklist for Artificial Intelligence in Medical Imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell 2020; 2: e200029. DOI: 10.1148/ryai.2020200029.

16.

Jiang

Shi

, et al. A multi-label deep learning model with interpretable Grad-CAM for diabetic retinopathy classification. Annu Int Conf IEEE Eng Med Biol Soc 2020; 2020: 1560–1563. DOI: 10.1109/EMBC44109.2020.9175884.

17.

Silva

Pimenta

Guimaraes

LS.

Small bowel obstruction: what to look for. Radiographics 2009; 29: 423–439. DOI: 10.1148/rg.292085514.

18.

Schulwolf

Brower

Karam

, et al. Clinical features vs CT findings to estimate need for surgery in small bowel obstruction. JAMA Netw Open 2023; 6: e2341376. DOI: 10.1001/jamanetworkopen.2023.41376.

19.

Trésallet

Lebreton

Royer

, et al. Improving the management of acute adhesive small bowel obstruction with CT-scan and water-soluble contrast medium: a prospective study. Dis Colon Rectum 2009; 52: 1869–1876. DOI: 10.1007/DCR.0b013e3181b35c06.

20.

Behman

Nathens

Mason

, et al. Association of surgical intervention for adhesive small-bowel obstruction with the risk of recurrence. JAMA Surg 2019; 154: 413–420. DOI: 10.1001/jamasurg.2018.5248.

21.

Vanderbecq

Ardon

De Reviers

, et al. Adhesion-related small bowel obstruction: deep learning for automatic transition-zone detection by CT. Insights Imaging 2022; 13: 13. DOI: 10.1186/s13244-021-01150-y.

22.

Ryu

Shin

, et al. Deep learning using computed tomography to identify high-risk patients for acute small bowel obstruction: development and validation of a prediction model: a retrospective cohort study. Int J Surg 2023; 109: 4091–4100. DOI: 10.1097/JS9.0000000000000721.

23.

Cheng

Tejura

Tran

, et al. Detection of high-grade small bowel obstruction on conventional radiography with convolutional neural networks. Abdom Radiol (NY) 2018; 43: 1120–1127. DOI: 10.1007/s00261-017-1294-1.

24.

Cheng

Tran

Whang

, et al. Refining convolutional neural network detection of small-bowel obstruction in conventional radiography. AJR Am J Roentgenol 2019; 212: 342–350. DOI: 10.2214/ajr.18.20362.

25.

Kim

Wit

Thurston

, et al. An artificial intelligence deep learning model for identification of small bowel obstruction on plain abdominal radiographs. Br J Radiol 2021; 94: 20201407. DOI: 10.1259/bjr.20201407.

Development and validation of deep learning models for bowel obstruction on plain abdominal radiograph

Abstract

Objective

Methods

Results

Conclusions

Keywords

Introduction

Methods

Datasets

Model training

Comparison of model and radiologist performance

Visualization of the model

Statistical analysis

Results

Model performance in the validation set

Model performance in the test set

Comparison of model and radiologist performance

AI-aided performance

The Grad-CAM heatmap

Discussion

Conclusions

Footnotes

Author contributions

Data availability statement

Declaration of conflicting interests

Funding

ORCID iDs

References