Intelligent prediagnosis for nontraumatic acute abdomen with surface-level information using machine learning

Abstract

Objective

Prediagnosis of diseases plays a pivotal role in medical triage. However, only surface-level information is available in this medical service. To achieve the prediagnosis challenge for nontraumatic acute abdomen (NTAA) with limited information, an intelligent framework was proposed.

Methods

This research was conducted using retrospective patients with NTAA data from the Affiliated Hospital of Zunyi Medical University. A machine learning framework, which encompassed a series of combined binary classifiers tailored to various NTAA conditions was developed. Within this framework, disease information was recursively inferred across three tiers: primary categories (I-level), disease subtypes (II-level), and specific diseases (III-level). In model training, the REFCV (Recursive Feature Elimination with Cross-Validation) approach was employed for feature refinement, and five algorithms—Logistic Regression, Deep Neural Networks, Support Vector Machine, Random Forest, and eXtreme Gradient Boosting—were assessed. The data was split into training and testing datasets, with five-fold cross-validation and grid search for model optimization. Performance was evaluated using area under the receiver operating characteristic curve, accuracy, precision, specificity, and sensitivity. The Friedman test and Wilcoxon paired test compared algorithm performance.

Results

I-Level disease identification metrics mostly surpassed 0.90. II-Level classification metrics generally exceeded 0.80. For III-level diseases, models maintained high recognition rates for several common conditions. Logistic regression showed consistent performance comparable to other algorithms.

Conclusion

The framework performed admirably in discerning both primary disease categories and their respective subtypes. The objective of NTAA prediagnosis based solely on superficial information could be realized. Logistic regression proves sufficient for this task, with no significant benefits from more complex algorithms.

Keywords

Nontraumatic acute abdomen machine learning hierarchical prediagnosis surface-level information

Introduction

Background

Acute abdomen is a group of abdominal diseases that present sudden abdominal pain lasting for seven days or less and require urgent treatment. It is characterized by rapid onset, frequent changes, rapid development, and severe illness. Nontraumatic acute abdomen (NTAA) constitutes the predominant form of acute abdominal conditions, its onset being attributed to a diverse spectrum of etiologies, including pathologies of the digestive, urinary, and reproductive systems, as well as other systemic disorders. It was reported that the percentage of patients admitted to the emergency department with acute abdominal pain ranges from 5% to 10%,^1–3 and this proportion is even higher than 20% among individuals over the age of 65.⁴ And these patients are mostly diagnosed with NTAA. In hospitalized pregnant women, NTAA accounts for approximately 1.53% of acute abdominal pain cases.⁵

Given the diverse etiologies that can lead to NTAA, patients may exhibit varying locations and types of abdominal pain. Accurate prediagnosis and timely treatment are crucial for such patients, as misdiagnosis can lead to delays in treating underlying conditions.

In the Chinese healthcare system and comparable systems worldwide, patients often seek doctors directly at large hospitals. In this medical setting, the primary goal is to obtain a primary assessment of the disease and provide medical guidance services. An accurate prediagnosis result would assist with the quick and accurate arrangement of medical treatment departments. However, under these circumstances, usually only surface-level information, such as physical signs, symptoms, medical history, and basic patient data is available. And accessing more detailed information like laboratory tests and radiological scans can be challenging and time-consuming. Moreover, medical professionals are often required to make rapid prediagnoses based on their clinical experience, so, there are inevitably many triage errors, missed diagnoses, and misdiagnoses.^6–8 Additionally, a large volume of manual medical services also places significant pressure on limited medical staff in disease prediagnoses, further increasing triage errors.

Related works

In the field of intelligent NTAA prediagnosis, predicting the severity of the disease is a mainstream objective. Machine learning algorithms were employed to categorize patients with NTAA into those requiring emergency surgery and those who did not.⁹ Furthermore, machine learning techniques had the potential to estimate the emergency severity index (ESI-4) score for NTAA emergencies with precision,^10,11 and subsequently patients were referred to the intensive care unit, operation rooms, or general treatment areas according to the ESI-4 score.

However, as noted in the background, disease identification in medical triage is also crucial. Particularly in the Chinese healthcare system and similar systems globally, disease classification information is essential for the appropriate allocation of patients to relevant medical departments.

The specific diagnosis of NTAA diseases directly provides disease classification information. Since last century, research about the computer-aid diagnosis of specific NTAA diseases has been conducted.^12–18 In most of these studies, Bayesian probability estimations were used.^12–16 It was reported that there was a computer-aided diagnosis accuracy rate of 91.8% for NTAA diseases,¹² but most follow-up studies were unable to replicate this result.¹⁸ In the new century, with the development of artificial intelligence technology, machine learning methods were adopted to diagnosis NTAA diseases based on structured data. For example, the decision tree^19–23 and support vector machine^23–25 algorithms were tested for computer-aided diagnosis of NTAA diseases. A study implemented a scoring voting system that combined seven intelligent algorithms for the diagnosis of NTAA diseases.²⁶ Furthermore, a study introduced a novel machine learning algorithm, hierarchical structured models, to conduct the computer-aided diagnosis of NTAA diseases, and the performance of other machine learning algorithms, such as support vector machines, neural networks, K-nearest neighbors, was compared to this algorithm.²⁷

Although the intelligent diagnosis methods mentioned above could obtain precise disease information, they typically relied on structured data containing detailed clinical information, particularly comprehensive laboratory examination results. However, in triage settings, patient information is often incomplete and only surface-level information is available. In these scenarios, these intelligent diagnosis methods are often impractical.

Nonetheless, it is feasible to conduct a prediagnosis and provide an initial grasp of the diseases before medical triage.

The research objective in this article lies between disease risk assessment and definitive diagnosis. Compared to disease risk assessment research, the aim is to gain deeper clinical disease information. Compared to research focused on accurate diagnosis of specific diseases, the objective is only to obtain an understanding of the disease that can facilitate the allocation of patients to appropriate medical treatment departments.

Materials and methods

Aim

A three-tiered disease information system²⁸ was adopted, encompassing primary disease categories (I-level), disease subtypes (II-level), and specific diseases (III-level).

In the context of medical guidance, accessing information about disease subtypes at the II-level is sufficient. However, the direct inference of disease subtypes (II-level) presents a challenge due to the vast target space. To address this, a hierarchical prediagnosis strategy was investigated.

Though the main purpose was II-level disease information inference, the identification of III-level diseases was also investigated as probing research, with the discussion scope being limited to only a few representative common diseases.

Data collection

Electronic health records of patients with NTAA in the Affiliated Hospital of Zunyi Medical University were collected. Cases were selected based on the following criteria:

Inclusion criteria: Patients who were at least 18 years old, and experienced acute abdominal pain not related to trauma and lasted for less than 7 days were included.

Exclusion criteria: Patients with incomplete clinical data or who did not receive a definite diagnosis at discharge were excluded.

Table 1 provides detailed information regarding the sample sizes for different diagnoses in included cases. As demonstrated in Table 1, the main disease categories include digestive system diseases, obstetrics and gynecology diseases, and urinary system diseases. Disease subtypes consist of pancreas diseases, intestinal tract diseases, biliary tract diseases, gastric diseases, gynecological diseases, diseases of pregnancy and childbirth, kidney diseases, and ureteral diseases. Specific diseases encompass intestinal obstruction, appendicitis, gastric perforation, gallstones, ectopic pregnancy, and so on. Given that patients may have multiple diseases, the count of III-level cases exceeds that of II-level cases, and the count of II-level cases surpasses that of I-level cases in total. And in Table S1, which is put in the supplemental material, collected surface-level features to diagnosis NTAA diseases are displayed.

Table 1.

Sample sizes for different diagnoses

I-Level	II-Level	III-Level
Disease of digestive system (1861 cases)	Pancreas disease (427 cases)	Pancreatitis (427 cases)
	Intestinal tract disease (597 cases)	Intestinal obstruction (195 cases), appendicitis (333 cases), intestinal perforation (42 cases), duodenal ulcer (48 cases)
	Biliary tract disease (578 cases)	Gallstone (431 cases), choledocholithiasis (119 cases)
	Gastric disease (320 cases)	Gastroenteritis (93cases), gastric perforation (85 cases), gastric ulcer (147 cases)
Disease of obstetrics and gynecology (471 cases)	Gynecological disease (135cases)	Pelvic inflammatory disease (20 cases), corpus luteum rupture (37 cases), ovarian disease (54 cases)
Disease of obstetrics and gynecology (471 cases)	Disease of pregnancy and childbirth (337 cases)	Ectopic pregnancy (192 cases), threatened abortion (65 cases), threatened uterine rupture (57 cases)
Disease of urinary system (209 cases)	Kidney disease (123 cases)	Kidney stone (77 cases), pyelonephritis (56 cases), hydronephrosis (65 cases)
Disease of urinary system (209 cases)	Ureteral disease (111 cases)	Ureteral calculus (111 cases)
Total: 2541	Total: 2628	Total: 2654

This table provides a detailed breakdown of the composition of sample-associated diseases. As a single patient can be affected by multiple diseases, the total number of samples corresponding to I-level, II-level, and III-level diseases varies.

Data preprocessing

Binary encoding was used to transform binary variables, and multiclass categorical were transformed into one-hot codes.²⁹ Continuous and ordinal variables were normalized as below:

x_{i} = (x_{i} - x_{min}) / (x_{max} - x_{min})

(1)

The prediagnosis framework

Figure 1 displays the proposed machine learning framework. The identification of I-level and II-level diseases was the key points. And the identification of III-level diseases served as a supplementary function solely intended for common diseases. The framework mainly includes components as follows.

Figure 1.

The intelligent prediagnosis framework. In this framework, different disease identification models are trained in parallel and independently, while I-level, II-level, and III-level disease information is inferred hierarchically during model utilization.

Input data: The training data.

Model training module: The identification models for disease information across I, II, and III levels were trained in this module. To address the challenge of NTAA disease prediagnosis, this study employed an approach that decomposed the problem into a series of conjunctive binary classifiers using the “one-vs-rest” strategy.

In alignment with the study diseases, I-level diseases were illnesses of obstetrics and genecology, the digestive system, and the urinary system. As a result, the development of three distinct I-level disease identification models was required. For II-level disease identification, two models were developed to distinguish the gynecological disorders and conditions associated with pregnancy and childbirth, four models were developed to differentiate among the pancreatic, intestinal, biliary, and gastric diseases and two models were developed to distinguish the kidney and ureteral diseases.

Within this framework, the identification of 10 representative common diseases (III-level) was investigated, encompassing intestinal obstruction, appendicitis, gastric perforation, gallstones, ectopic pregnancy, ovarian diseases, rupture of corpus luteum, and pyelonephritis.

Given the large number of initial features collected, feature refinement was essential. The goal was to reduce the input data's dimensionality for model development while retaining informative features relevant to the targeted diseases. To achieve this purpose, the REFCV (Recursive Feature Elimination with Cross-Validation)³⁰ method was used.

This process could be depicted as below:

R_{n} = REFCV (F, y_{n}, c v), n = 1, 2, 3, \dots, N

(2)

where y_n is the objective disease; cv is the number of folds for cross–validation;

F

is the feature matrix, in which each column indicates an initial collected feature; and

R_{n}

is a return matrix, indicating which features in

F

should be selected in model training for the disease y_n. Let P denote the total number of samples and Q be the number of initial included features. F is constructed with dimensions P*Q, where each row corresponds to a sample and each column corresponds to a feature. The dimension of

R_{n}

is Q*k, where k is the number of selected features.

To mitigate issues stemming from sample imbalance, an additional “Data partition” step was incorporated into the model development for III-level diseases. The identification model for a III-level disease was developed based on sample data from its parent II-level diseases. And the models for I-level and II-level disease identification were trained on the whole studied samples.

The development of models for identifying I-level, II-level, and III-level diseases was conducted in parallel and independently. Hierarchical prediagnosis was applied during the process of disease identification.

More specifically, five machine learning algorithms, which were LR (Logistic Regression), DNN (Deep Neural Networks), SVM (Support Vector Machine), RF (Random Forest), and XGBoost (Extreme Gradient Boosting) were tested and compared in model development. The aim was to find the optimal base classifier. The machine learning classifiers were developed using the software suite which comprises Python (version 3.8.1) integrated with Scikit-learn (version 1.1.3).

Input features: Surface-level features were extracted from patient's self-reports and simple physical examinations. Deep data, such as detailed physical examinations from expensive professional equipment, laboratory tests and radiological scans were unavailable.

Disease identification module: A hierarchical identification process was designed. The task of the identification model for I-level diseases was to provide an initial classification outcome. Upon obtaining a result for an I-level disease, there were numerous identification models available to further determine the II-level disease within that category. Similarly, if a result for a II-level disease was obtained, there were several identification models accessible to further determine the III-level disease within this disease subtype.

In the disease identification module, “Input alignment” was a step to project the input disease-related features to the input vector of a selected model. Suppose the input feature matrix is $F_{input}$ and consist with the feature columns in F. The input alignment procedure was as follows:

X_{alignment} = F_{input} R_{n}

(3)

where

x_{alignment}

is the actual input to an identification model.

The recursive traversal approach for hierarchical disease inference is detailed in Table S2 within the supplementary material of this article.

Data flow in model development and validation

Figure 2 shows the data-processing workflow. The dataset was divided into training and testing sets at an 80:20 ratio, ensuring complete separation and consistent class distribution between the two sets. During model training, the five-fold cross-validation was utilized to optimize machine learning classifiers, with grid search employed to fine-tune model parameters. The classifier achieving the highest AUC (area under the receiver operating characteristic curve) during cross-validation was selected as the optimal model for application on the testing dataset. Bootstrap validation was incorporated in the testing phase to evaluate model stability and generalizability.

Figure 2.

Classifier training and validation procedure in this study. The methodology depicted in this figure outlines the overall process of classifier development.

For the development of prediction models for level III common diseases, due to the limited number of collected cases, five-fold cross-validation was not employed. However, the remaining processes were identical to those used in the development of models for identifying I-level and II-level disease information.

In the feature refinement process using RFECV, the performance of the feature subset was evaluated based on AUC in experiments, with a cv value set at 5. The base classifier selected in this process was consistent with that used in the subsequent model training procedures.

To assess the overall performance of the prediction models, the initial metrics considered were accuracy and AUC. Precision, specificity, and sensitivity were also evaluated.³¹ The specificity and sensitivity values were generated based on the default unbiased threshold of 0.5.

Statistical analysis

Performance metrics were quantified via mean and standard deviation.³² The Friedman test³³ was employed to assess performance differences of different machine learning algorithms, while the Wilcoxon paired test compared paired metrics across training and testing datasets. Statistical significance was set at a P-value < 0.05.

Results

Table 2 presents the performance of various classifiers in identifying I-level and II-level diseases in the testing dataset. Additionally, to assess model performance during training and evaluate their generalization ability, the performance metrics during the five-fold cross-validation in the training dataset are provided in Table S3 in the supplemental material.

Table 2.

Performance in the identification of I-level and II-level diseases in the testing dataset.

Information level	Diseases	Algorithms	AUC (avg. ± std)	Accuracy (avg. ± std)	Precision (avg. ± std)	Specificity (avg. ± std)	Sensitivity (avg. ± std)
I-Level	Disease of digestive system	SVM	0.964 ± 0.005	0.965 ± 0.004	0.965 ± 0.005	0.914 ± 0.011	0.986 ± 0.003
		DNN	0.973 ± 0.004	0.970 ± 0.004	0.965 ± 0.005	0.913 ± 0.011	0.993 ± 0.002
		LR	0.963 ± 0.005	0.954 ± 0.004	0.968 ± 0.005	0.922 ± 0.011	0.968 ± 0.004
		RF	0.978 ± 0.003	0.960 ± 0.004	0.952 ± 0.006	0.880 ± 0.014	0.993 ± 0.002
		XGBoost	0.951 ± 0.004	0.957 ± 0.004	0.961 ± 0.005	0.905 ± 0.011	0.979 ± 0.004
	Disease of urinary system	SVM	0.997 ± 0.003	0.995 ± 0.004	0.999 ± 0.001	0.999 ± 0.001	0.934 ± 0.048
		DNN	0.997 ± 0.003	0.997 ± 0.002	0.999 ± 0.001	0.999 ± 0.001	0.966 ± 0.032
		LR	0.998 ± 0.002	0.997 ± 0.003	0.999 ± 0.001	0.999 ± 0.001	0.966 ± 0.033
		RF	0.999 ± 0.001	0.995 ± 0.003	0.999 ± 0.001	0.999 ± 0.001	0.936 ± 0.043
		XGBoost	0.999 ± 0.001	0.998 ± 0.002	0.999 ± 0.001	0.999 ± 0.001	0.969 ± 0.030
	Disease of obstetrics and gynecology	SVM	0.999 ± 0.001	0.995 ± 0.004	0.986 ± 0.014	0.997 ± 0.003	0.986 ± 0.013
		DNN	0.999 ± 0.001	0.992 ± 0.005	0.972 ± 0.020	0.994 ± 0.005	0.985 ± 0.015
		LR	0.999 ± 0.001	0.990 ± 0.005	0.959 ± 0.022	0.991 ± 0.005	0.985 ± 0.015
		RF	0.999 ± 0.001	0.990 ± 0.005	0.958 ± 0.022	0.991 ± 0.005	0.986 ± 0.014
		XGBoost	0.999 ± 0.001	0.990 ± 0.005	0.959 ± 0.023	0.991 ± 0.005	0.986 ± 0.015
II-Level	Pancreatic disease	SVM	0.959 ± 0.014	0.944 ± 0.014	0.961 ± 0.029	0.991 ± 0.007	0.783 ± 0.052
		DNN	0.962 ± 0.013	0.940 ± 0.014	0.929 ± 0.036	0.982 ± 0.009	0.797 ± 0.052
		LR	0.965 ± 0.012	0.937 ± 0.015	0.899 ± 0.039	0.973 ± 0.011	0.812 ± 0.051
		RF	0.957 ± 0.014	0.936 ± 0.015	0.924 ± 0.035	0.981 ± 0.009	0.780 ± 0.053
		XGBoost	0.946 ± 0.013	0.938 ± 0.014	0.909 ± 0.039	0.976 ± 0.010	0.809 ± 0.050
	Biliary tract disease	SVM	0.959 ± 0.013	0.922 ± 0.016	0.874 ± 0.034	0.943 ± 0.016	0.876 ± 0.035
		DNN	0.955 ± 0.015	0.918 ± 0.016	0.882 ± 0.034	0.948 ± 0.015	0.852 ± 0.037
		LR	0.962 ± 0.013	0.906 ± 0.019	0.783 ± 0.040	0.870 ± 0.024	0.920 ± 0.028
		RF	0.964 ± 0.010	0.900 ± 0.017	0.825 ± 0.039	0.917 ± 0.020	0.862 ± 0.037
		XGBoost	0.961 ± 0.015	0.905 ± 0.017	0.804 ± 0.040	0.898 ± 0.021	0.920 ± 0.029
	Intestinal disease	SVM	0.970 ± 0.010	0.941 ± 0.014	0.901 ± 0.029	0.939 ± 0.019	0.944 ± 0.023
		DNN	0.972 ± 0.009	0.929 ± 0.016	0.888 ± 0.032	0.932 ± 0.019	0.924 ± 0.027
		LR	0.971 ± 0.009	0.911 ± 0.017	0.845 ± 0.034	0.899 ± 0.023	0.931 ± 0.024
		RF	0.971 ± 0.009	0.925 ± 0.015	0.887 ± 0.029	0.932 ± 0.018	0.914 ± 0.027
		XGBoost	0.967 ± 0.009	0.919 ± 0.016	0.859 ± 0.033	0.910 ± 0.021	0.934 ± 0.024
II-Level	Gastric disease	SVM	0.949 ± 0.018	0.909 ± 0.017	0.764 ± 0.067	0.956 ± 0.014	0.683 ± 0.066
		DNN	0.943 ± 0.021	0.922 ± 0.016	0.761 ± 0.059	0.949 ± 0.015	0.793 ± 0.059
		LR	0.945 ± 0.020	0.896 ± 0.018	0.634 ± 0.056	0.896 ± 0.020	0.894 ± 0.045
		RF	0.949 ± 0.016	0.912 ± 0.017	0.727 ± 0.063	0.774 ± 0.059	0.940 ± 0.016
		XGBoost	0.942 ± 0.021	0.896 ± 0.019	0.644 ± 0.057	0.901 ± 0.019	0.875 ± 0.051
	Obstetrical disease	SVM	0.940 ± 0.034	0.848 ± 0.043	0.955 ± 0.033	0.901 ± 0.072	0.827 ± 0.055
		DNN	0.928 ± 0.040	0.856 ± 0.042	0.956 ± 0.032	0.898 ± 0.072	0.840 ± 0.052
		LR	0.954 ± 0.031	0.830 ± 0.045	0.954 ± 0.032	0.901 ± 0.067	0.803 ± 0.056
		RF	0.947 ± 0.032	0.875 ± 0.038	0.958 ± 0.030	0.905 ± 0.064	0.863 ± 0.049
		XGBoost	0.948 ± 0.040	0.803 ± 0.049	0.951 ± 0.034	0.902 ± 0.067	0.763 ± 0.061
	Gynecological disease	SVM	0.955 ± 0.030	0.930 ± 0.029	0.857 ± 0.074	0.941 ± 0.032	0.902 ± 0.066
		DNN	0.934 ± 0.041	0.918 ± 0.034	0.825 ± 0.082	0.924 ± 0.038	0.900 ± 0.071
		LR	0.945 ± 0.034	0.943 ± 0.029	0.900 ± 0.069	0.961 ± 0.028	0.898 ± 0.069
		RF	0.949 ± 0.031	0.940 ± 0.028	0.889 ± 0.070	0.957 ± 0.028	0.898 ± 0.070
		XGBoost	0.925 ± 0.041	0.945 ± 0.029	0.905 ± 0.068	0.963 ± 0.027	0.897 ± 0.070
	Ureteral disease	SVM	0.771 ± 0.083	0.750 ± 0.075	0.763 ± 0.095	0.622 ± 0.133	0.839 ± 0.078
		DNN	0.773 ± 0.093	0.749 ± 0.076	0.788 ± 0.098	0.690 ± 0.132	0.789 ± 0.092
		LR	0.773 ± 0.089	0.721 ± 0.080	0.813 ± 0.103	0.769 ± 0.124	0.688 ± 0.103
		RF	0.774 ± 0.089	0.684 ± 0.080	0.764 ± 0.102	0.693 ± 0.122	0.678 ± 0.109
		XGBoost	0.724 ± 0.093	0.638 ± 0.079	0.719 ± 0.108	0.625 ± 0.139	0.647 ± 0.107
	Kidney disease	SVM	0.953 ± 0.045	0.808 ± 0.069	0.922 ± 0.078	0.934 ± 0.067	0.698 ± 0.113
		DNN	0.900 ± 0.060	0.908 ± 0.049	0.936 ± 0.062	0.931 ± 0.069	0.886 ± 0.078
		LR	0.905 ± 0.056	0.902 ± 0.052	0.935 ± 0.061	0.930 ± 0.067	0.879 ± 0.077
		RF	0.930 ± 0.051	0.874 ± 0.058	0.931 ± 0.067	0.930 ± 0.067	0.824 ± 0.095
		XGBoost	0.936 ± 0.060	0.873 ± 0.057	0.934 ± 0.067	0.936 ± 0.063	0.816 ± 0.094

The results in italics are obtained using LR.

AUC: Area Under the Receiver Operating Characteristic Curve; DNN: Deep Neural Networks; LR: Logistic Regression; RF: Random Forest; SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting.

Table 2 shows that the classifiers perform exceptionally well in identifying I-level diseases, with most metrics exceeding 0.93. For II-level diseases, AUC values are generally above 0.90, except for ureteral disease, where they range from 0.72 to 0.78. Other metrics for gastric and ureteral disease identification occasionally fall below 0.70. Despite these isolated instances, most metrics remain above 0.80, ensuring a commendable overall performance.

The primary objective of this intelligent classification framework was to identify I-level and II-level diseases. Table 3 presents the results of the Friedman test, comparing the performance metrics of various machine learning algorithms in Table 2. Additionally, Figure 3 illustrates the average metric values of different identification models for I-level and II-level diseases.

Figure 3.

Performance comparation of different machine learning models. This figure showcases the comparative performance of various machine learning models across different disease identification tasks.

Table 3.

Performance comparison in statistical tests across metrics.

Performance comparisons	Friedman test—performance of different algorithms	Wilcoxon paired test—LR performance in training and testing
Metrics	Statistics	Statistics (testing performance vs. performance during training)
AUC	$χ_{4}^{2} = 6.283, P = 0.179$	$Z = - 1.067, P = 0.286$
Accuracy	$χ_{4}^{2} = 9.427, P = 0.051$	$Z = - 0.312, P = 0.755$
Precision	$χ_{4}^{2} = 7.272, P = 0.122$	$Z = - 0.356, P = 0.722$
Specificity	$χ_{4}^{2} = 3.876, P = 0.423$	$Z = - 0.178, P = 0.859$
Sensitivity	$χ_{4}^{2} = 0.246, P = 0.993$	$Z = - 0.051, P = 0.959$

This table presents the results of statistical comparisons, including the evaluation of different machine learning algorithms for the identification of different diseases using the Friedman test, as well as the performance comparison of LR models across training and testing datasets. A P-value < 0.05 was considered statistically significant.

AUC: Area Under the Receiver Operating Characteristic Curve; LR: Logistic Regression.

Table 3 indicates that there were no significant differences in performance among different machine learning algorithms, as all P-values exceeded 0.05 across the observed metrics. Figure 3 also demonstrates comparable performance among these machine learning models.

Although LR was not the top-performing model, it exhibited consistent and reliable performance. Its interpretability is notably superior to that of other classifiers, particularly black-box models. The Wilcoxon paired test results in Table 3 indicates no significant differences (P > 0.05) between the LR models’ performance on the training and testing datasets. This implies the LR models had strong generalization capability. Therefore, LR's stability and interpretability render it an ideal base classifier for the proposed framework.

Additionally, Supplemental Tables S4 and S5 detail the feature selection outcomes for identifying I-level and II-level diseases using the REFCV approach with LR in model development. Since the variables were normalized according to formula (1), feature importance was assessed using the regression coefficients from the LR models fitted to the training datasets. The top 20 features with the highest absolute coefficient values for identifying II-level diseases are presented in Figures S1 to S8 in the supplemental material.

LR coefficients reflect the influence of independent variables on the log-odds of the dependent variable. Positive coefficients signify positive correlations, while negative coefficients indicate negative correlations.

Table 4 presents the results for III-level disease identification, which served as an exploratory function. The results indicate a decline in the performance of machine learning algorithms in pinpointing these diseases, with certain performance metrics dipping below 0.50 in some instances. Nonetheless, the models maintain a commendable level of performance for several common diseases. For example, in diagnosing intestinal obstruction, appendicitis, and gastric perforation, most metrics are above 0.90.

Table 4.

Performance of the identification of III-level diseases.

Diseases	Algorithms	AUC (avg. ± std)	Accuracy (avg. ± std)	Precision (avg. ± std)	Specificity (avg. ± std)	Sensitivity (avg. ± std)
Ovarian disease	SVM	0.741 ± 0.028	0.559 ± 0.028	0.803 ± 0.053	0.918 ± 0.024	0.269 ± 0.031
	DNN	0.738 ± 0.028	0.592 ± 0.028	0.834 ± 0.045	0.917 ± 0.024	0.333 ± 0.036
	LR	0.733 ± 0.029	0.666 ± 0.028	0.748 ± 0.037	0.749 ± 0.038	0.599 ± 0.039
	RF	0.717 ± 0.031	0.630 ± 0.028	0.728 ± 0.041	0.750 ± 0.037	0.535 ± 0.041
	XGBoost	0.611 ± 0.029	0.554 ± 0.028	0.668 ± 0.046	0.748 ± 0.037	0.401 ± 0.038
Pyelonephritis	SVM	0.933 ± 0.018	0.919 ± 0.016	0.900 ± 0.027	0.933 ± 0.018	0.898 ± 0.029
	DNN	0.919 ± 0.019	0.881 ± 0.018	0.818 ± 0.033	0.867 ± 0.025	0.902 ± 0.028
	LR	0.926 ± 0.020	0.880 ± 0.020	0.818 ±0.035	0.867 ± 0.027	0.901 ± 0.028
	RF	0.935 ± 0.018	0.961 ± 0.011	0.911 ± 0.025	0.935 ± 0.018	0.999 ± 0.001
	XGBoost	0.946 ± 0.019	0.879 ± 0.019	0.767 ± 0.033	0.798 ± 0.030	0.999 ± 0.001
Corpus luteum rupture	SVM	0.907 ± 0.018	0.816 ± 0.022	0.668 ± 0.059	0.900 ± 0.021	0.574 ± 0.053
	DNN	0.779 ± 0.037	0.775± 0.024	0.999 ± 0.001	0.999 ± 0.001	0.139 ± 0.041
	LR	0.885 ± 0.019	0.739 ± 0.024	0.497 ± 0.046	0.749 ± 0.029	0.712 ± 0.052
	RF	0.950 ± 0.012	0.853 ± 0.021	0.638 ± 0.047	0.801 ± 0.027	0.999 ± 0.001
	XGBoost	0.897 ± 0.037	0.742 ± 0.026	0.502 ± 0.041	0.651 ± 0.033	0.999 ± 0.001
Intestinal obstruction	SVM	0.999 ± 0.001	0.983 ± 0.007	0.977 ± 0.013	0.986 ± 0.008	0.978 ± 0.014
	DNN	0.998 ± 0.001	0.975 ± 0.009	0.956 ± 0.019	0.972 ± 0.012	0.978 ± 0.014
	LR	0.998 ± 0.002	0.983 ± 0.007	0.979 ± 0.013	0.987 ± 0.008	0.978 ± 0.014
	RF	0.997 ± 0.002	0.975 ± 0.009	0.957 ± 0.019	0.973 ± 0.013	0.978 ± 0.013
	XGBoost	0.996 ± 0.001	0.984 ± 0.007	0.959 ± 0.018	0.974 ± 0.011	0.999 ± 0.001
Appendicitis	SVM	0.992 ± 0.005	0.983 ± 0.007	0.983 ± 0.011	0.984 ± 0.010	0.982 ± 0.011
	DNN	0.989 ± 0.005	0.983 ± 0.007	0.983 ± 0.010	0.984 ± 0.010	0.982 ± 0.011
	LR	0.990 ± 0.005	0.983 ± 0.008	0.983 ± 0.011	0.984 ± 0.010	0.983 ± 0.010
	RF	0.991 ± 0.005	0.975 ± 0.009	0.982 ± 0.012	0.983 ± 0.011	0.967 ± 0.014
	XGBoost	0.993 ± 0.005	0.975 ± 0.009	0.982 ± 0.011	0.984 ± 0.010	0.966 ± 0.015
Gastric perforation	SVM	0.979 ± 0.007	0.938 ± 0.015	0.942 ± 0.026	0.978 ± 0.010	0.844 ± 0.039
	DNN	0.980 ± 0.008	0.922 ± 0.016	0.888 ± 0.035	0.956 ± 0.014	0.840 ± 0.043
	LR	0.982 ± 0.007	0.938 ± 0.013	0.857 ± 0.034	0.934 ± 0.016	0.947 ± 0.025
	RF	0.967 ± 0.012	0.937 ± 0.014	0.942 ± 0.027	0.978 ± 0.010	0.838 ± 0.038
	XGBoost	0.973 ± 0.008	0.938 ± 0.014	0.943 ± 0.026	0.978 ± 0.010	0.841 ± 0.037
Gallstone	SVM	0.809 ± 0.031	0.836 ± 0.021	0.846 ± 0.022	0.483 ± 0.059	0.954 ± 0.014
	DNN	0.752 ± 0.038	0.791 ± 0.024	0.823 ± 0.025	0.416 ± 0.059	0.918 ± 0.018
	LR	0.821 ± 0.032	0.774 ± 0.024	0.865 ± 0.023	0.613 ± 0.059	0.827 ± 0.025
	RF	0.792 ± 0.031	0.810 ± 0.023	0.858 ± 0.022	0.549 ± 0.057	0.897 ± 0.020
	XGBoost	0.817 ± 0.038	0.749 ± 0.024	0.863 ± 0.024	0.622 ± 0.055	0.791 ± 0.027
Ectopic pregnancy	SVM	0.964 ± 0.012	0.927 ± 0.014	0.892 ± 0.020	0.816 ± 0.034	0.999 ± 0.001
	DNN	0.958 ± 0.012	0.882 ± 0.019	0.837 ± 0.026	0.703 ± 0.041	0.999 ± 0.001
	LR	0.959 ± 0.011	0.912 ± 0.017	0.872 ± 0.023	0.777 ± 0.038	0.999 ± 0.001
	RF	0.955 ± 0.015	0.912 ± 0.016	0.872 ± 0.023	0.780 ± 0.037	0.999 ± 0.001
	XGBoost	0.929 ± 0.017	0.898 ± 0.017	0.871 ± 0.023	0.780 ± 0.038	0.975 ± 0.011
Corpus luteum rupture	SVM	0.907 ± 0.018	0.816 ± 0.022	0.668 ± 0.059	0.900 ± 0.021	0.574 ± 0.053
	DNN	0.779 ± 0.037	0.775 ± 0.024	0.999 ± 0.001	0.999 ± 0.001	0.139 ± 0.041
	LR	0.885 ± 0.019	0.739 ± 0.024	0.497 ± 0.046	0.749 ± 0.029	0.712 ± 0.052
	RF	0.950 ± 0.012	0.853 ± 0.021	0.638 ± 0.047	0.801 ± 0.027	0.999 ± 0.001
	XGBoost	0.897 ± 0.037	0.742 ± 0.026	0.502 ± 0.041	0.651 ± 0.033	0.999 ± 0.001

This table lists the numerical performance of different machine learning algorithms in the identification of different III-level disease. These experiment cases are just some attempts. The results in italics are obtained using LR.

AUC: Area Under the Receiver Operating Characteristic Curve; DNN: Deep Neural Networks; LR: Logistic Regression; RF: Random Forest; SVM: Support Vector Machine; XGBoost: Extreme Gradient Boosting.

Discussion

Previous research has established the effectiveness of utilizing machine learning for the computer-aid diagnosis of nontraumatic acute abdomen with structured data.^19–27 This article provides further evidence to underscore the ability of machine learning in gaining a certain depth understanding of a variety of NTAA diseases, even when working with limited surface-level information.

The identification of I-level diseases demonstrated exceptional performance, which is intuitively expected given that I-level disease information is inherently more superficial and readily identifiable. The commendable performance was further extended to the recognition of II-level diseases, suggesting the potential of the proposed framework in extracting disease information at a more profound level.

As inferred from Supplemental Table S5, most II-level diseases possess their own significant symptoms. A notable exception was observed in kidney and ureteral diseases, which had far fewer significant features. This may suggest that precise diagnoses of II-level diseases in urinary system require more extensive laboratory examinations.

Without the support of detailed pathological examinations, the diagnosis of specific diseases is prone to errors; hence the attempt in this study was to diagnose some common diseases. In Table 4, the findings reveal that in certain instances, such as the detection of intestinal obstruction and appendicitis, high-accuracy diagnoses were achieved even without the aid of detailed laboratory tests. But, for the identification of III-level diseases, this study only presented localized solution outcomes, with the study samples confined to II-level parent diseases. Whether these diseases can be accurately diagnosed solely based on surface-level information warrants further investigation within a broader context.

Selecting LR as the benchmark classifier may not guarantee optimal performance across all scenarios, yet it offered the advantage of stability and superior interpretability when compared to other algorithms.

For different classifiers, the importance of features was generally not consistent, so the feature regiment processing in our study was intricately linked to diverse classifiers, and the REFCV method was adopted.

The features shown in Supplemental Tables S4 and S5 were notable for their discriminative power in distinguishing between diseases. And the coefficients in Supplemental Figures S1 to S8 indicate the roles of features in disease prediagnosis. A feature with a negative coefficient predicted a higher likelihood of a negative outcome. For example, in the identification of intestinal disease, pancreatic disease history was a significant feature, meaning that if a patient with NTAA has a pancreatic disease history, there is a high probability that the NTAA disease is not the intestinal disease. Further causal analysis is needed to explore the deeper correlations between these features and the target diseases.

It is noteworthy that although Supplemental Tables S4 and S5 list numerous features, only a few questions are required to cover these features during consultations in practice. Moreover, current natural language processing technology³⁴ can efficiently and intelligently extract these features.

This study specified the location of pain based on the anatomical structure of the human abdomen and detailed the types and nature of pain. These improvements enhanced the diagnostic accuracy of II-level diseases to some extent.

This was a preliminary study, and as such, several limitations were present. Firstly, the study was conducted in a single medical center and the study data was limited to the local population, thereby restricting its generalizability. Secondly, the sample size was limited, particularly in the identification of III-level diseases. Thirdly, only features assisting in disease differentiation were listed, the feature attributes requiring further investigation. Fourthly, NTAA can stem from a multitude of I-level diseases. This study focuses on common situations covering digestive system disorders, urinary system ailments, and diseases of obstetrics and gynecology. So, the trained models cannot be directly applied to other different settings. Nevertheless, the proposed framework and methods are generalizable to alternative medical contexts and hold the potential to achieve comparable outcomes.

This study's most notable contribution was showing that NTAA prediagnosis is possible using only superficial information through the proposed framework and methods. And the proposed framework can also be used in conjunction with the medical triage system predicting the severity of diseases, simultaneously achieving precise medical departmental allocation and patient priority scheduling.

Conclusions

This study presented a machine learning framework designed to prediagnose NTAA diseases using only surface-level patient information. The results showed that the framework, using machine learning classifiers, effectively identified main NTAA disease categories and subtypes based on surface-level information. It has the potential to aid in the swift and accurate allocation of medical treatment departments in triage settings.

Supplemental Material

sj-doc-1-sci-10.1177_00368504251350763 - Supplemental material for Intelligent prediagnosis for nontraumatic acute abdomen with surface-level information using machine learning

Supplemental material, sj-doc-1-sci-10.1177_00368504251350763 for Intelligent prediagnosis for nontraumatic acute abdomen with surface-level information using machine learning by Zhichen Liu, Qingping Ran and Xu Luo in Science Progress

Footnotes

List of abbreviations

ORCID iDs

Qingping Ran

Xu Luo

Ethics approval and consent to participate

The present study has been registered with the ethics committee of the Affiliated Hospital of Zunyi Medical University (KLLY-2021-060) and conducted in accordance with the Declaration of Helsinki. In view of the retrospective nature of this research, the Ethics Committee has waived the requirement for informed consent. Prior to the analysis, the patients’ data underwent anonymization and de-identification procedures.

Authors’ contributions

ZL contributed to the data acquisition and study design, while QR and XL primarily undertook the data analysis and interpretation. The initial draft of the manuscript was collaboratively composed by ZL, QR, and XL, with subsequent significant revisions. All authors have perused and sanctioned the final version of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China, Science and Technology Program of Guizhou Province, Science and Technology Program of The Guizhou Provincial Health Commission (grant number 61861047, CXTD (2023) 028, gzwkj2024-283).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The study data are available upon request to the corresponding author.

Supplemental material

Supplemental material for this article is available online.

References

Laméris

Van Randen

Van Es

, et al. Imaging strategies for detection of urgent conditions in patients with acute abdominal pain: diagnostic accuracy study. BMJ 2009; 338: 29–33. doi:https://doi.org/10.1136/bmj.b2431

Macaluso

Mcnamara

. Evaluation and management of acute abdominal pain in the emergency department. Int J Gen Med 2012; 5: 789–797. doi:https://doi.org/10.2147/IJGM.S25936

Kücükkartallar

Cakir

Tekin

, et al. Estimation of the capacity of emergency surgery in Konya: nine-year multicenter study. Ulus Cerrahi Derg 2016; 32: 252–255. doi:https://doi.org/10.5152/UCD.2016.2797

Hustey

Meldon

Banet

, et al. The use of abdominal computed tomography in older ED patients with acute abdominal pain. Am J Emerg Med 2005; 23: 259–265. doi:https://doi.org/10.1016/j.ajem.2005.02.021

Yan

. Treatment principle of cyesis combined with acute abdominal syndrome. Heilongjiang Med J 2003; 27: 164–165. doi:https://doi.org/10.3969/j.issn.1004-5775.2003.03.002

Zhang

, et al. A general outpatient triage system based on dynamic uncertain causality graph. IEEE Access 2020; 8: 93249–93263. doi:https://doi.org/10.1109/access.2020.2995087

Zan

Xie

Tan

, et al. Application of six sigma management in reducing the misdiagnosis rate of acute abdomen pre-examination in emergency department. Chin J Pract Nurs 2020; 36: 757–760. doi:https://doi.org/10.3760/cma.j.cn211501-20190819-02355

Yang

Zhao

. A qualitative study of the triage of patients with non-traumatic acute abdomen. J Clin Nurs Res 2023; 7: 79–88. doi:https://doi.org/10.26689/jcnr.v7i4.5176

Henn

Hatterscheidt

Sahu

, et al. Machine learning for decision-support in acute abdominal pain–proof of concept and central considerations. Zentralbl Chir 2023; 148: 376–383. doi:https://doi.org/10.1055/a-2125-1559

10.

Farahmand

Shabestari

Pakrah

, et al. Artificial intelligence-based triage for patients with acute abdominal pain in emergency department; a diagnostic accuracy study. Adv J Emerg Med 2017; 1: e5. doi:https://doi.org/10.22114/AJEM.v1i1.11

11.

Meng

HH.

Development and validation of a risk early-warning machine learning model for acute abdomen: A real-world-date study. Master Dissertation, Southern Medical University. 2020.

12.

De Dombal

Leaper

Staniland

, et al. Computer-aided diagnosis of acute abdominal pain. Br Med J 1972; 2: 9–13. doi:https://doi.org/10.1136/bmj.2.5804.9

13.

Wilson

Horrocks

Lyndon

, et al. Simplified computer-aided diagnosis of acute abdominal pain. Br Med J 1975; 2: 73–75. doi:https://doi.org/10.1136/bmj.2.5962.73

14.

Lawrence

Clifford

Taylor

. Acute abdominal pain: computer aided diagnosis by non-medically qualified staff. Ann R Coll Surg Engl 1987; 69: 233–234.

15.

Kirkeby

Risø

. Use of a computer system for diagnosing acute abdominal pain in a small hospital. Scand J Gastroenterol 2009; 22: 174–176. doi:https://doi.org/10.3109/00365528709090987

16.

Adams

Chan

Clifford

, et al. Computer aided diagnosis of acute abdominal pain: a multicenter study. Br Med J. 1986; 293: 800–804. doi:https://doi.org/10.1136/bmj.293.6550.800

17.

Orient

. Evaluation of abdominal pain: clinicians' performance compared with three protocols. South Med J 1986; 79: 793–799. doi:https://doi.org/10.1097/00007611-198607000-00003

18.

Fenyö

. Computer-aided diagnosis and decision-making in acute abdominal pain. Dig Dis 1990; 8: 125–137. doi:https://doi.org/10.1159/000171246

19.

Ohmann

Yang

Moustakis

, et al. Machine learning techniques applied to the diagnosis of acute abdominal pain. In: Barahona

Stefanelli

Wyatt

(eds) Artificial intelligence in medicine. Lecture Notes in Computer Science. Berlin, Germany: Springer, 1995, vol. 934, pp. 276–281.

20.

Ohmann

Eich

Sippel

. A data dictionary approach to multilingual documentation and decision support for the diagnosis of acute abdominal pain. Stud Health Technol Inform 1998; 52: 462–466. doi:https://doi.org/10.3233/978-1-60750-896-0-462

21.

Wozniak

Three classifiers for acute abdominal pain diagnosis—comparative study. Paper presented at: The 5th International Conference on Computational Science, May 22–25, 2005,

Atlanta, USA.

22.

Khumrin

Ryan

Judd

, et al. Diagnostic machine learning models for acute abdominal pain: towards an E-learning tool for medical students. Stud Health Technol Inform 2017; 245: 447–451. doi:https://doi.org/10.3233/978-1-61499-830-3-447

23.

Zararsiz

Akyildiz

Göksülük

, et al. Statistical learning approaches in diagnosing patients with nontraumatic acute abdomen. Turk J Electr Eng Comp Sci 2016; 24: 3685–3697. doi:https://doi.org/10.3906/elk-1501-181

24.

Björnsdotter

Nalin

Hansson

, et al. Support vector machine diagnosis of acute abdominal pain. Paper presented at: Biomedical Engineering Systems and Technologies: International Joint Conference, January 16–17, 2009, Porto, Portugal.

25.

Vijayarani

Sivamathi

Tamilarasi

. A hybrid classification algorithm for abdomen disease prediction. ASEAN J Sci Eng 2023; 3: 207–218. doi:https://doi.org/10.17509/ajse.v3i3.45677

26.

Eich

Ohmann

. Internet-based decision-support sever for acute abdominal pain. Artif Intell Med 2000; 20: 23–26. doi:https://doi.org/10.1016/S0933-3657(00)00051-8

27.

Butler

Kenney

, et al. A prospective diagnostic support tool for the differentiation of abdominal pain in the adult emergency department population. Can J Emerge Med 2016; 18: S84. doi:https://doi.org/10.1017/cem.2016.194

28.

World Health Organization. International classification of diseases, 11th revision. Geneva: World Health Organization, 2018. https://www.who.int/classifications/icd/en/ . Accessed February 11, 2025.

29.

Karthiga

Usha

Raju

Transfer learning-based breast cancer classification using one-hot encoding technique. Paper presented at: The 2021 International Conference on Artificial Intelligence and Smart Systems, March 25–27, 2021, Coimbatore, India.

30.

Sung

Han

Park

, et al. Classification of stroke severity using clinically relevant symmetric gait features based on recursive feature elimination with cross-validation. IEEE Access 2022; 10: 119437–119447. doi:https://doi.org/10.1109/ACCESS.2022.3218118

31.

Rainio

Teuho

Klén

. Evaluation metrics and statistical tests for machine learning. Sci Rep 2024; 14: 6086. doi:https://doi.org/10.1038/s41598-024-66611-y

32.

Brisimi

Wang

, et al. Predicting chronic disease hospitalizations from electronic health records: an interpretable classification approach. Proc IEEE 2018; 106: 690–707. doi:https://doi.org/10.1109/JPROC.2017.2789319

33.

Demšar

. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006; 7: 1–30. doi:https://doi.org/10.5555/1248547.1248548

34.

Locke

Bashall

Al-Adely

, et al. Natural language processing in medicine: a review. Trends Anaesth Crit Care 2021; 38: 4–9. doi:https://doi.org/10.1016/j.tacc.2021.02.007

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

4.39 MB