Sage Journals: Discover world-class research

Abstract

Objective

Vectors that cause infectious lung diseases encompass viral, bacterial, fungal, and parasitic agents. Early detection of these infections is critical for timely diagnosis and effective treatment. Several studies have created solutions for early detection with varying performance, but with limitations such as image type specificity, lack of generalizability, potential overfitting, and bias problems. Our model effectively addresses these problems by using diverse image types, enhancing robustness, and generalizability across various contexts that aim for effective performance.

Methods

This study creates an early detection model that works with both CT scans and X-ray images. We applied a convolutional neural network (CNN) model trained on diverse and large augmented datasets with fewer parameters. We then used a generative adversarial network (GAN) to validate our CNN model and create generalized synthetic images. The proposed model was trained primarily on COVID-19, pneumonia, and tuberculosis (TB) cases (n = 432,533 total augmented cases).

Results

The proposed model is a lightweight and explainable model that assists with real-time detection, resulting in a better performance with an average accuracy of 97.93% with a standard deviation of 0.97%, average area under the curve (AUC) of 98.07%, average sensitivity of 98.46%, average specificity of 97.03%, average precision of 97.45%, and average F1 score of 97.95%.

Conclusion

The proposed CNN model offers a validation and generalization capability for diverse image types in real-time. We conducted a comparative analysis of our model with the most advanced research. The integration of our approach with other clinical systems and internet of things (IoT) devices is feasible.

Keywords

Infectious lung diseases convolutional neural network generative adversarial network detection CT scan images x-ray images

Introduction

There were approximately 344 million incidents of lower respiratory infections (LRIs) and 2.18 million fatalities worldwide. This global burden of incidence and mortality from LRIs highlights the need for increased vaccine coverage, robust pathogen surveillance, and early detection approaches.¹ According to the American Lung Association, most lung infections are transmitted between people through either direct or indirect contact with infected individuals. Infectious lung disease can be either acute, caused by a single infection caught from person to person, or chronic, identified by recurrent infections (e.g., (A) chronic obstructive pulmonary disease (COPD) or another chronic condition characterized by chronic mucus retention and subsequent infections from one or more agents, or (B) immunocompromised, resulting in chronic infectious lung disease).²

The lungs are essential organs. COVID-19, tuberculosis, and pneumonia are among the most common lung infections.³ In November 2024, the WHO received reports of 201,454 COVID-19 cases and 3071 deaths.⁴ In 2023, an estimated 10.8 million individuals worldwide suffered tuberculosis, up from 10.7 million in 2022.⁵ In 2021, pneumonia claimed the lives of 2.2 million individuals, including 502,000 children aged less than five years and 152,000 newborns.⁶ Severe cases can lead to hospitalization, complications, and death. Early detection and accurate treatment limit the severity of these diseases—particularly in patients with high-risk conditions.

Several traditional detection methods have been developed for the diagnosis of these lung infections in addition to clinical evaluations, namely a full medical history and physical evaluation, including chest auscultation using a stethoscope. The objective value added by CT and chest X-rays in providing additional insights plus opportunities for longitudinal studies warrants emphasis on the key advantages offered by these radiological modalities.

Detection methods include laboratory testing such as molecular testing, sputum testing, the tuberculin skin test (TST), and blood tests. Computed tomography (CT) scans and chest X-rays are also used to detect these infections.⁷ However, manual detection via CT scans and X-ray images is time consuming and may be incorrect. Therefore, using modern and automated detection methods is preferable. Diverse, standardized, and large image datasets are required to make the proposed model more robust and generalizable.

Artificial intelligence (AI) enhances the detection accuracy of abnormalities for several lung infections and supports clinicians in the early diagnosis and precise treatment of these infections.⁸ Neural networks and machine learning (ML) techniques have been implemented in various studies to detect and analyze infectious diseases such as COVID-19 and other chronic diseases such as heart disease, kidney disease, diabetes, breast cancer, Alzheimer disease, and Parkinson disease (PD).^9,10

Radiologists may give incorrect diagnoses of lung diseases, particularly when interpreting vague X-ray images.¹¹ Various AI models, ML algorithms, and image-processing techniques have been used to detect and classify lung diseases,¹² which can facilitate early diagnosis and treatment. Such tools can support diagnosis—particularly in low-resource countries.

Several studies have implemented chest imaging along with AI techniques to detect infectious diseases. Automatic detection of COVID-19^13,14 using transfer learning techniques with convolutional neural networks (CNNs) on chest CT images has been reported. In addition, a high-resolution network (HRNet) approach used X-ray images.¹⁵ Authors in¹⁶ applied CNN model and a transformer architecture with a self-attention mechanism for early detection of COVID-19 from X-ray images.

A novel learning by teaching (LBT) framework was developed¹⁷ to detect three types of pneumonia through the neural architecture search (NAS) method to find the best convolutional architecture. Other work¹⁸ applied the MobileNetV2 model to detect pneumonia from chest X-ray images. PulmoNet was developed¹⁹ to classify COVID-19, bacterial pneumonia, and viral pneumonia. It shows high performance in binary classification with COVID-19.

An iterative enhancement fusion-based cascaded model was developed²⁰ to detect and localize multiple diseases from chest X-ray images. COVID-19, tuberculosis, and pneumonia were then studied in various studies. Other work²¹ used the knowledge distillation technique where knowledge was transformed from a large and complex model to a smaller and simpler one via a deep convolutional neural network (DCNN). A lightweight deep neural network (DNN) was implemented to detect pulmonary abnormalities in the lung from infectious diseases using x-ray images.²²

This paper has the following contributions:

The proposed study creates a generalized method that is based on diverse and large datasets for various infectious diseases.

It provides a real-time model for the early detection and classification of infectious diseases from either CT or X-ray images.

It enhances the performance of the gold-standard model.

It is a simple method to assist clinicians in decision-making.

The proposed model is lightweight with reduced number of parameters, layers, and epochs. It can differentiate between various infections and can be integrated with other clinical systems.

The deep learning model is verified and validated using synthetic data.

The system provides an explainable AI model that is understandable, transparent, and trusted by clinicians.

This paper is organized as follows: Next section explains the methodology of this study. Section 3 describes the results and discussion. The final section presents conclusions arising from this work.

Methods

Dataset description

This study involves the use of six different datasets for infectious lung diseases named: COVID-19, tuberculosis, and pneumonia. These heterogeneous datasets represent several populations with different characteristics. We concentrated on datasets that previous publications have explored. We use a combination of x-ray images and CT scans to ensure the generalizability of our approach; to the best of our knowledge, no other related studies have used this combination to detect these lung diseases.

COVID-19

C1 dataset: This dataset is from CT-scan images.²³ The dataset includes 2482 CT scan images from 120 patients. It has 1252 CT scans of 60 patients infected with COVID-19, and 1230 CT scan images from 60 non-infected patients with other pulmonary diseases. These images have different sizes.

C2 dataset: This dataset was collected²⁴ from 338 subjects with confirmed cases with high quality images (1024 × 851 pixels).

C3 dataset: This dataset was collected and studied^25,26 to differentiate between COVID-19 and pneumonia. It has 1626 images for COVID-19, 1802 images for normal cases, and 1800 images for pneumonia.

Pneumonia

P1 dataset: This dataset contains 5863 X-ray images for both normal and pneumonia infections.²⁷ Of these images, there are 4273 pneumonia images, and 1583 are normal chest X-ray images. We combined this dataset with pneumonia images from the C3 dataset.

Tuberculosis (TB)

T1 dataset: This is the first dataset collected²⁸ with 700 TB X-ray images for infected cases that are publicly available and 3500 normal images. Based on these datasets, we synthesized an extra dataset for CT scans and X-ray images for COVID-19. We used this dataset for validation purposes.

Images preprocessing

We used images from multiple sources and with diverse infection types. We preprocessed the images to ensure consistency: Steps included image resizing, ensuring that all are on a common color space (grayscale), image normalization, image augmentation, and data splitting. We followed prior work²⁹ for steps on processing the datasets.

Images normalization

In digital images, pixel values are represented as integers ranging from 0 to 255. Where 0 is the minimum intensity and 255 is the maximum intensity. In this step, our goal was to adjust the pixel values of each image to a standard scale between 0 and 1. Normalizing the pixel values is very useful when feeding image data into our AI model. This step also allows gradient stability during training and ensures that all features are on a similar scale.³⁰ For each pixel value, we applied a pixel-wise operation where each pixel of the image in the training and testing sets is divided by 255 to be in a range of 0.0 to 0.99 for the deep learning model. For the lite model used with IoT devices, we transferred the size of the images to 96.

We adopted an approach that combined the following two main steps:

Pixel-wise operation through the following equation:

x_{n o r m a l i z e d} = \frac{x}{255}

(1)The result is the normalized pixel value from the original pixel value (x).

Z-score normalization (standardization) is used to normalize data when the range is not known with certainty. The formula for z-score standardization involves the following steps:

Find the mean of the pixel values by:

μ = \frac{1}{n} \sum_{i = 1}^{n} x_{n o r m a l i z e d i}

(2)

Find standard deviation of the pixel values in the image:

σ = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{n o r m a l i z e d i} - μ)}^{2}}

(3)

Apply Z-score formula:

z = \frac{x_{n o r m a l i z e d} - μ}{σ}

(4)

We thus scaled the images to range from 0 to 1 and then calculated the mean and standard deviation to perform z-score standardization.

Image transformation

The next goal was to improve the performance of the model and avoid overfitting because of the imbalanced class distribution of images and variations in images. We thus, applied a transformation via augmentation. To maintain transparency, we adopted a method that relies on the concept of model-free image augmentation,^31,32 which modifies the spatial relationship between pixels. To ensure diversity and generalization of the dataset, we created an offline image data generator class by focusing on the geometric transformations: rotation, shearing, and shifting of images in the training set. The transformation used the following equations, and, for each pixel coordinate (x, y) in the original image, we obtained (x’, y’)—new transformation pixel coordinates.

Rotation formula:

\begin{aligned} \overset{‘}{x} = & x \times \cos (θ) - y \times \sin (θ) \\ \overset{‘}{y} = & x \times \sin (θ) + y \times \cos (θ) \end{aligned}

(5)

Shearing formula:

\begin{aligned} \overset{‘}{x} = x + s h x \times y \\ \overset{‘}{y} = s h y \times x + y \end{aligned}

(6)shx is the horizontal shear factor, and shy is the vertical shear factor

Shifting images formula:

\begin{aligned} \overset{‘}{x} = x + t x \\ \overset{‘}{y} = y + t y \end{aligned}

(7)tx is the horizontal shift distance, and ty is the vertical shift distance.

Convolutional neural network (CNN) to classify images

The proposed CNN model is built from scratch and trained on the specified datasets to classify different types of images either infected or not infected. CNNs^33,34 are extensively utilized in deep learning for the extraction of image features due to their inherent capability to autonomously acquire hierarchical representations. CNNs are composed of various layers, such as convolutional layers, pooling layers, and fully connected layers. We trained the proposed DL model on the images by selecting regions of interest (ROIs) on images.

Convolutional Layers: Convolutional layers effectively employ a collection of trainable filters (sometimes referred to as kernels) to the input image. Every filter executes a convolution operation by smoothly moving across the input image and calculating dot products at every position. This procedure effectively captures local patterns and provides feature maps that prominently emphasize significant visual traits.

(f \times g) (x, y) = \sum \sum f (i, j) \times g (x - i, y - j)

(8)

The function f (i, j) is the input feature map, g is the convolution kernel (filter), and (x, y) are the spatial coordinates of the output feature map.

Activation Functions: Non-linear activation functions (such as ReLU, sigmoid, or tanh) are expertly applied elementwise to the feature maps, thus introducing non-linearity and empowering CNNs to adeptly model intricate correlations in the data.

R e L U (x) = m a x (0, x)

(9)

Pooling Layers: Pooling layers effectively down-sample the feature maps, hence decreasing their spatial dimensions. This facilitates the establishment of spatial invariance and reduces the computational complexity of future layers. Typical pooling operations encompass max pooling (choosing the maximum value within each pooling zone) or average pooling (calculating the average value) for values of $x_{1}, x_{2}, \dots, x_{n}$ which are input values.

\max_p o o l (x) = max (x_{1}, x_{2}, \dots, x_{n})

(10)

Fully Connected Layers: Following a series of convolutional and pooling layers, the feature maps are transformed into a vector and then inputted into fully connected layers. These layers effectively execute high-level feature integration and conversion, thus adeptly gathering comprehensive information and acquiring intricate relationships between features.

y = f (w_{x} + b)

(11)

W represents the weight matrix, and b is the bias vector. Term f is the activation function (sigmoid) because we used binary classification.

Activation and output Layers: The completely connected layers are commonly accompanied by activation functions and an output layer that are contingent upon the task at hand. Activation functions introduce non-linearity, whereas the output layer is responsible for producing the required format for the given job such as softmax for classification or linear activation for regression. Deep learning architectures like VGGNet, ResNet, Inception, and EfficientNet have exhibited impressive capabilities in extracting picture features.

These designs frequently comprise multiple layers and utilize a range of techniques such as skip connections, residual blocks, and attention mechanisms to enhance feature representation and optimize model performance. Importantly, in the context of transfer learning, pre-trained models that have been trained on extensive picture datasets, such as ImageNet, are frequently employed as feature extractors. These models can be effectively fine-tuned or utilized as fixed feature extractors on new tasks or datasets, thus enabling the seamless transfer of acquired representations from a source domain to a target domain. The following figure shows our simple classification model. The model should be sigmoid for binary classification. We used Adam optimizer, and the loss function is binary cross entropy. Figure 1 shows our proposed CNN model and the approach with two combined models: the first is the GAN model to generate synthetic images and the other is the main CNN model fed with all real and synthetic datasets. Table 1 shows more details about the proposed CNN model.

Figure 1.

Proposed method with CNN model.

Table 1.

Layers and parameters in the proposed CNN model.

Layer	Number of Parameters
InputLayer	0
Conv2D	896
Conv2D	18,496
MaxPooling2D	0
Dropout	0
Conv2D	36,928
MaxPooling2D	0
Dropout	0
Conv2D	73,856
MaxPooling2D	0
Dropout	0
Flatten	0
Dense	1,179,712
Dropout	0
Dense	130

Generative adversarial network (GAN)

We applied a generative adversarial network (GAN) assist in generating similar images for validation purpose in this study. The GAN consists of two main neural networks components: generator (G) and the discriminator (D).^35,36

The objective function is as follows:

m i n_{G} m a x_{D} V (D, G) = E_{x} [L o g D (x)] + E_{z} [L o g (1 - D (G (z)))]

(12)Here, x is the real data, and z is the input noise. D(x) is the probability that input x comes from real distribution and G(z) is the generated data samples. This (G) transforms random vector noise (z) and produces synthetic images. The (D) will be trained to classify real and fake images from these real and generated images. Figure 2 shows our proposed GAN model. Tables 2 and 3 show more details about these models.

Figure 2.

The proposed GAN model.

Table 2.

Layers and parameters in the proposed generator

Layer	Number of Parameters
Dense	2113536
Reshape	0
Conv2DTranspose	1048832
LeakyReLU	0
Conv2DTranspose	1573248
LeakyReLU	0
Conv2DTranspose	3146240
LeakyReLU	0
Conv2DTranspose	5243520
LeakyReLU	0
Conv2DTranspose	17283
flatten (Flatten)	0
Dense	442369

Table 3.

Layers and parameters in the proposed discriminator.

Layer	Number of Parameters
MaxPooling2D	0
Conv2D	12544
BatchNormalization	1024
LeakyReLU	0
MaxPooling	0
Conv2DTranspose	12291

Results

We applied the proposed CNN and GAN models on real datasets using GPU A100 with 500 compute units in Google Colab services, and we measured accuracy, area under the curve (AUC), sensitivity, precision, specificity, F1-score, receiver operating characteristic (ROC) curve, and Matthews correlation coefficient (MCC). We applied false positive rate, false discovery rate, and false negative rate for validation.³⁷ We split datasets in a 60:20:20 ratio for training, testing, and validation respectively. Moreover, we measured the confusion matrix for the testing datasets. We had a large set of images from different sources and different diseases, and thus we performed simple random sampling combined with the augmented data. The rate of random sampling is 0.5 for all datasets. Because we got big dataset for C3 and P1 datasets, we selected more than 0.5 sampling for training, testing, and validation parts. Table 4 lists all parameters for the CNN, generator, and discriminator models. Table 5 lists the sizes of the datasets and the fixed parameters for each. Table 6 lists performance metrics for each dataset.

Table 4.

Parameters of the proposed models.

Model	Parameters
CNN	padding = same, activations = Relu and sigmoid, a hybrid regularization approach that mix regularizers = L1and L2, epochs = 5, batch size = we used 32 and 50 and 100, optimizer = Adam, patience = 1, callback = early stopping, learning_rate = 0.00001
GAN	optimizer = Adam, loss = cross_entropy, random noise = one sample, activations = Relu and tanh, Generation resolution factor =3, image_channels = 3, preview_rows = 1, preview_cols = 1, preview_margin = 16, seed_size = 100, epochs = 1000, batch_size = 100, buffer_size = 60000

Table 5.

Real and augmented data size.

Dataset	Real Data	Augmented data
C1	2481	29784
C2 and C3	3766	90399
C3 and P1	7656	423100 (sample of 211550)
T1	4200	100800

Table 6.

Performance metrics for each dataset.

Dataset	Accuracy	Sensitivity	Specificity	Precision	F1-score	MCC	AUC
C1	0.9862	0.9980	0.9747	0.9747	0.9862	0.9863	0.9863
C2 and C3	0.9871	0.9964	0.9773	0.9788	0.9875	0.9744	0.9875
C3 and P1	0.9631	0.9605	0.9649	0.9514	0.9559	0.9242	0.9615
T1	0.9806	0.9836	0.9643	0.9932	0.9884	0.9290	0.9875

Figure 3 shows the balanced datasets after augmentation. This step is very important before training to prevent bias, improve model performance, avoid overfitting, and allow the proposed model to accurately learn data patterns. Moreover, Figure 4 demonstrates the training and validation accuracy, their loss, and the confusion matrix for each dataset.

Figure 3.

Distribution of classes in original and augmented datasets.

Figure 4.

Results of the proposed CNN model on CT scan and X-ray datasets, training accuracy and loss (left), and confusion matrix for testing parts (right).

Discussion

To avoid the problem of the black box of deep learning model and to provide a simple and interpretable model that increase transparency, several techniques were presented such as bootstrap simulation and SHapely Additive Explanations (SHAP), and Local Interpretable Model-Agnostic Explanations (LIME).^38,39 We interpreted the results using (LIME) to explain a prediction. LIME approximates the local behavior of the prediction model for a specific instance. The results of classification for all datasets are explained using LIME for the following optimization formula:

ξ (x) = a r g_{min ξ \in Π} L (f, ξ, x) + Ω (ξ)

(13)

L (f, ξ, x) = L_{m o d e l (f (x), ξ (x)) . π_{x} (z)}

(14)Term ξ (x) is the local and interpretable model that approximates the original model for the input x. The function L (f, ξ, x) is the loss function that measures the difference between the local model's prediction and the original model's prediction. Term

Ω (ξ)

is the regularization term. (L1 and L2 regularization are implemented in our case for different layers).

The visualization of the explainer displays a plot that shows the top important features in the image that contribute the most to the classification. Figure 5 illustrates the top 10 most important features in the images. The top gray images show the important features and hide others when the class is infected with COVID-19. The images below show the critical features with colors when the class is not positive for COVID-19.

Figure 5.

Explaining the 10 most important features in (a) from C1 dataset, (b) C2 and C3 dataset, (c) C3 and P1 dataset, and (d) T1 dataset.

Figure 5 shows that the positive infections are explained by the proposed model. The gray color shows the important region to be classified as positive cases. The discarded areas in black are not important.

We applied the GAN model to validate our proposed model performance. We randomly selected 499 images from C1 CT scan images and 498 from C2 and C3 X-ray images datasets. We then used the GAN model to generate new images for both COVID-19 and non-COVID-19 infections from each dataset. We got 23,928 augmented images from all datasets. Figure 6 shows the results obtained after 700 epochs for each case. We then used our proposed method to augment these images and then applied our CNN model.

Figure 6.

Generated images using GAN model.

We then applied our CNN model on 23,928 augmented images dataset for both real and synthetic images. We divided this new dataset into 0.6 for training, 0.2 for testing, and 0.2 for validation. After training the proposed model, we got the results in Figure 7 as well as the confusion matrix for the testing part. Figure 8 shows 5-fold cross-validation for both synthetic and real datasets. For the synthetic images dataset, we obtained 0.9925 for accuracy, 0.9979 for sensitivity, 0.9872 for specificity, 0.9870 for precision, 0.9924 for F1 score, 0.9850 for MCC, 0.0128 for false positive rate, 0.0130 for false discovery rate, and 0.0021 for false negative rate. For real dataset images, we obtained 0.9846 for accuracy, 0.9807 for sensitivity, 0.9886 for specificity, 0.9888 for precision, 0.9847 for F1 score, 0.9693 for MCC, 0.0114 for false positive rate, 0.0112 for false discovery rate, and 0.0193 for false negative rate. We mixed 1000 CT scan and X-ray images for 250 real images and 250 synthetic images for each class, and we still obtained good performance as shown in Figures 7 and 8. We obtained 0.9693 for accuracy, 0.9533 for sensitivity, 0.9865 for specificity, 0.9870 for precision, 0.9699 for F1 score, 0.9392 for MCC, 0.0135 for false positive rate, 0.0130 for false discovery rate, and 0.0467 for false negative rate.

Figure 7.

Results of the proposed CNN model on ct scans and X-ray- synthetic, real, and mixed image datasets.

Figure 8.

Results of 5-fold cross validation of the proposed CNN model for synthetic, real, and mixed image datasets.

The three investigations provided the best performance metrics for detecting COVID-19 infection in CT scan and X-ray images, for both real and synthetic data with noise added and for mixed datasets.

To evaluate our proposed CNN model in situations like real-world medical environments, we used a GAN model to generate synthetic images with noise and occlusion. We applied mean squared error (MSE) scores, which give values from 965.84 to 992.35, as in Figure 9. Lower values indicate closer alignment with real images, and higher values indicate differences. The proposed model shows robustness and generalizability in clinical environments where image quality is not always perfect.

Figure 9.

Results of MSE for real and synthetic images.

Table 7 shows that DenseNet121 usually does better than VGG16, getting higher F1 scores, accuracy, and precision in most datasets. For instance, DenseNet121 achieved 99.13% accuracy and 99.21% F1-score on the “C2 and C3” dataset, compared to 98.88% accuracy and 98.96% F1-score for VGG16. Similarly, DenseNet121 showed slightly better performance in T1 with a 99.21% F1 score.

Table 7.

Performance metrics for base models.

Model	Dataset	Accuracy	Sensitivity	Specificity	Precision	F1-score
DenseNet121	C1	0.8996	0.9647	0.9647	0.8571	0.9077
VGG16	C1	0.8554	0.8248	0.8248	0.8616	0.8428
DenseNet121	C2 and C3	0.9913	0.9888	0.9944	0.9955	0.9921
VGG16	C2 and C3	0.9888	0.9907	0.9865	0.9885	0.9896
DenseNet121	C3 and P1	0.9687	0.9697	0.9685	0.8649	0.9143
VGG16	C3 and P1	0.9491	0.8779	0.9697	0.8935	0.8856
DenseNet121	T1	0.9869	0.9857	0.9928	0.9986	0.9921
VGG16	T1	0.9821	0.9796	1.0000	1.0000	0.9897

Our proposed model, on the other hand, has slightly lower average metrics than DenseNet121, but the results are the same across datasets: an F1-score of 97.95%, an average accuracy of 97.93% (standard deviation: 0.97%), and an average sensitivity of 98.46%. DenseNet121 does really well with some datasets, but the proposed model is lightweight and has strong overall performance that makes it suitable for real-time detection.

We compared our results for all datasets for both CT scans and X-ray images in Tables 8 and 9 and noted promising results relative to other related studies. The two benchmark studies²² applied the same datasets we applied in this study. Our work is also better than Kabir and colleagues’ work that used knowledge distillation.²¹ But Mahbub's work that implemented lightweight DNN²² was comparable. However, unlike our work, both studies focused on smaller datasets and exclusively utilized X-ray images. The study²¹ proposed a student model that heavily relies on the teacher model, making it impossible to apply to new data without first retraining it. There is no external validation in both studies and no testing on noisy images, which is common in real-world scenarios. It solely used accuracy, precision, and recall; however, additional important metrics for clinical decision-making are required. These studies employed augmentation techniques that included random rotation, translation, flipping, and zooming²¹ while only resizing the images.²² However, both studies lack dataset diversity and face an overfitting problem. In our study, we incorporated their additional datasets and various augmentation techniques, encompassing a total of 432,533 images from both CT scans and X-rays, to ensure diversity for generalization and to prevent overfitting. We carried out external validation on a newly generated dataset using a GAN model, which included a variety of noisy images.

Table 8.

Comparison between our results and other state-of-the-art studies.

Ref.	Disease	Dataset size	Results
¹³	COVID-19	4171 CT scans	Accuracy: 99.04%, Sensitivity: 99.08%, Precision: 99.08%, Area Under the ROC Curve (AUC): 99.7%
¹⁴	COVID-19	910 X-ray images	Accuracy: 99.26%, Sensitivity: 98.53% Specificity: 98.82%, F1-score: 99.25
¹⁵	COVID-19	6500 X-ray images	Accuracy: 94%, Sensitivity: 90%, Precision: 92%, F1 score: 90%
¹⁶	COVID-19 and viral pneumonia	15,153 X-ray images	Accuracy: 98.2%, AUC: 99.6%, F1 score: 97.6%, Recall : 97.3% Precision: 97.8%
¹⁷	Pneumonia	5000 X-ray images	Accuracy: 97.0%, AUC: 97.6%, Sensitivity: 95.9%, Specificity: 96.7%, F1-score: 97.1%
¹⁸	Pneumonia	3000 X-ray images	Accuracy of: 96%, Specificity: 95%, Sensitivity: 96%
¹⁹	Bacterial and Viral Pneumonia, and COVID-19	16,435 x-ray and CT scan images	Best performance for 2-class COVID-19 classification with Accuracy: 99.4%, Precision: 98.86%, Recall: 99.41%, F1-score: 98.46%.
²⁰	15 infections	2000 X-ray images for each disease	Sensitivity: 95.62%, Specificity: 96.23%
²¹	Covid-19, Pneumonia, and Tuberculosis (TB)	31,907 images X-ray images	Average Accuracy: 99.72%, Average AUC: 100.00%, Average Sensitivity: 99.59%, Average Specificity: 99.97%, Average Precision: 99.97%, Average F1 Score: 99.78%
²²	Covid-19, Pneumonia, and Tuberculosis (TB)	26,848 X-ray images	Average Accuracy: 97%, Average Precision: 94%, Average Recall: 97%
Our study	Covid-19, Pneumonia, and Tuberculosis (TB)	432,533 augmented images for both CT scan and X-ray images	Average Accuracy: 97.93%, Average AUC: 98.07%, Average Sensitivity: 98.46%, Average Specificity: 97.03%, Average Precision: 97.45%, Average F1 Score: 97.95%

Table 9.

A comparative analysis between our study and other related studies.

Ref.	Advantages	Limitations
¹³	The study achieved good accuracy. The use of transfer learning models helps in the case of small datasets and reduces time of analysis.	The proposed work did not include fine-tuning. The dataset includes three classes, but the developed model classifies patients as either infected or not. The results cannot be generalized to other diseases.
¹⁴	Improved feature extraction process. Focusing on relevant lung regions that increase the robustness of the model.	Lack of generalizability. Concerns about bias and overfitting because of the small size of dataset. Varying quality of the images may introduce noise during model's training.
¹⁵	The study provides an early detection of COVID-19 which enhance AI diagnostic tools.	Lack of generalizability. There is a concern of overfitting.
¹⁶	It provides a cost-effective method with high accuracy using X-ray images. It can handle unbalanced datasets.	Lack of generalizability. It requires more training time and computational resources compared to other CNN models. It requires more data for the transformer.
¹⁷	It provides best pseudo-label generation and a robustness model. It has both human-labeled and pseudo-labeled datasets and is comparable with experienced radiologists.	If the model's complexity is high, there is a risk of overfitting. More computational resources are required for the architecture search process. Lack of generalizability.
¹⁸	It provides good accuracy for early diagnosis. It gives different comparative analysis with other models.	There is a possibility of overfitting. Limited dataset size may result in bias. Lack of generalizability.
¹⁹	Compared to other related studies presented in the paper, the proposed model performed exceptionally well in detecting three infections. The study applied augmentation and cross-validation to avoid overfitting and improve generalization, respectively. Classification time is suitable for clinical workflows.	The presented model is relatively complex with 26 layers. There is a deficiency in the diversity of datasets and the distribution of healthy and non-healthy classes. Lack of explainability of results.
²⁰	Proposed model used multilayer approach to detect multiple infections and learn the effect of new variations.	Additional performance measures should be implemented. Lack of interpretability. Required memory footprint and computing resources are not mentioned.
²¹	It provides lightweight architecture and mass screening.	Practical deployment of the proposed model should be implemented to ensure that it works with diverse infectious diseases. There is potential overfitting because they already trained datasets of COVID-19, pneumonia, and TB with healthy cases. They then trained the same infected datasets with each other.
²²	It provides interoperability. It reduced the complexity of integration with clinical workflow.	The developed model may produce high false positive rates. Lack of generalizability.
Our study	It used an augmentation technique to improve robustness of the dataset, increase diversity, and prevent overfitting. Implementing different datasets. Our study used both CT scans and X-ray images to ensure generalizability of the proposed model. Applying normalization helps standardize data, reduce bias, make models converge more quickly, and make the model's outputs more interpretable. We had fewer parameters (total of 1,310,018) which is less than state-of-the-art study²² where they had (3,026,497) and 13,791,684 parameters in other studies²¹	The study should be applied to other infectious diseases to ensure its performance. We had a big dataset, and it took very long time for training the proposed model even with a GPU.

Moreover, our model has 1.31 million parameters, striking a balance between both studies. This makes it more robust while still resource efficient. It uses moderate memory compared to the high memory demand of model²¹ and the very low memory usage of the lightweight model.²² This allows it to extract deeper features without being overly resource intensive. Our proposed model has a moderate speed suitable for real-time applications on mid-tier devices. The proposed model has four Conv2D layers, which makes it better at extracting features for complicated tasks than the shallow architectures used in the other two studies. The use of three dropout layers ensures excellent regularization, significantly reducing overfitting compared to the two previous models, which had limited or no dropout layers. The two dense layers, including one with 1.18 million parameters, enhance the model's classification power, outperforming the simpler dense layers of the previous lightweight models.

The proposed model combines its deeper architecture and dropout layers to achieve strong generalization. Our model achieves 97.93% average accuracy, comparable to the previous lightweight models, but with better robustness due to its superior architecture. It is highly adaptable to moderately complex tasks, unlike the Student Model,²¹ which is better suited for simpler tasks, or the Lightweight DNN,²² which balances efficiency but lacks the depth for more demanding datasets. The proposed model is suitable for deployment on IoT devices, offering a practical solution for real-world applications, unlike the Teacher Model,²¹ which requires significant computational resources. The proposed CNN model is a flexible solution, offering excellent performance, regularization, and adaptability without excessive computational cost.

We trained, tested, and validated the proposed model on various datasets and modalities for three common infections, and it performs better. This underscores the model's adaptability to training and testing for additional conditions. However, like other models, future studies in the literature should expand it to include other diseases in order to evaluate its performance.

We need to further optimize this model to make it suitable for mobile devices. This can be done by quantization, which shrinks the model, or pruning, which gets rid of the less important weights to make the model smaller and the inference process faster. We will use TensorFlow Lite or ONNX as frameworks for this deployment to create the mobile application. We will use the phone camera and the mobile application as a tool for real-time processing to ensure the application can process input images and return predictions in a timely manner.

AI decision-making in healthcare settings and synthetic data have major ethical implications. We must properly create synthetic data to prevent biases. We applied it in our study to validate the performance of the proposed model in situations similar to real-world settings where there are noises and unclear images. AI decision-making also presents ethical challenges, especially with regard to false positives and negatives. Strict validation, model explainability, and ongoing monitoring are absolutely necessary to help reduce these risks. AI is a tool rather than a replacement, so clinicians must stay essential to the decision-making process. Strong ethical standards will guarantee that artificial intelligence systems give patient safety and fair results first priority.

Conclusions

COVID-19 led to millions of infections and deaths worldwide. Pneumonia causes morbidity and mortality—particularly in children and the elderly. Tuberculosis (TB) remains a global threat in low- and middle-income countries. These are common respiratory infections with social, economic, and public health impacts. Various studies have developed advanced models for early detection of these disease. Our study reports a lightweight, explainable, and real-time CNN model with high performance and a reduced number of parameters. When deploying this model for mobile health applications, it is possible for clinicians to obtain quick and real-time assistance from CT scans or X-ray images to classify each patient's condition. The results can be further integrated with other hospital systems and with electronic health records (EHRs). We could use cloud-based services instead of a GPU to accelerate processing times. Future directions include the creation of standardized benchmark datasets, enhancing model interpretability, as well as utilizing multimodal data (combining chest X-rays, CTs, and patient symptoms).

Footnotes

Acknowledgements

We deeply regret the loss of Dr Ibraheem Assiri, a highly skilled AI expert, whose continuous search for knowledge and unwavering commitment to his students served as a source of inspiration for everyone acquainted with him. He made a significant contribution to the concept of this work. Dr Ibraheem died on June 20, 2024, at the age of 68.

Author contributions statement

Eman is the author of this article and is responsible for all parts.

Consent statement

This study uses publicly available datasets, and no identifiable or private information was collected. Therefore, informed consent was not required.

Data availability

This study utilized publicly available datasets from multiple sources, which were accessed and used in full compliance with their respective terms and conditions. The datasets and their sources are as follows:Covid-19 Dataset C1: Available at https://www.kaggle.com/datasets/plameneduardo/sarscov2-ctscan-dataset, licensed under CC BY-NC-SA 4.0.Covid-19 Dataset C2: Available at https://data.mendeley.com/datasets/xztwjmktrg/2, licensed under CC BY 4.0. Covid-19 Dataset C3: Available at https://www.kaggle.com/pranavraikokte/covid19-image-dataset, licensed under CC BY-SA 4.0. Tuberculosis Dataset: Available at https://www.kaggle.com/tawsifurrahman/tuberculosis-tb-chest-xray-dataset, used in accordance with the requirement to properly cite the authors' article. Pneumonia Dataset: Available at , licensed under CC BY 4.0. All datasets were accessed and utilized in accordance with their respective licensing agreements and applicable ethical guidelines. The researcher undertook due diligence to ensure compliance with the licensing terms and conditions at the time of use. However, the researcher does not assume responsibility for any future claims, disputes, or ambiguities arising from changes or unclear licensing terms.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by the author.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Guarantor

Eman alqaissi is the guarantor for this article and takes full responsibility for the integrity and accuracy of the research.

ORCID iD

Eman Alqaissi

References

Bender

Sirota

Swetschinski

, et al. Global, regional, and national incidence and mortality burden of non-COVID-19 lower respiratory infections and aetiologies, 1990–2021: a systematic analysis from the global burden of disease study 2021. Lancet Infect Dis 2024; 24: 974–1002.

Yang

Zhang

Respiratory diseases. In: Chen

Liang

, et al. (eds) Textbook of pathologic anatomy: for medical students. Singapore: Springer Nature Singapore, 2024, pp. 229–261.

Kumwichar

Chongsuvivatwong

COVID-19 pneumonia and the subsequent risk of getting active pulmonary tuberculosis: a population-based dynamic cohort study using national insurance claims databases. EClinicalMedicine 2023; 56: 1–10.

World Health Organization (WHO). WHO COVID-19 Dashboard.

World Health Organization (WHO). World Health Organization Global TB Report 2024.

Global Initiative for Asthma. World Pneumonia Day 2024.

Aggarwal

Mishra

Fatimah

, et al. COVID-19 image classification using deep learning: Advances, challenges and opportunities. Comput Biol Med 2022; 144: 1–23.

Anderson

Tarder-Stoll

Alpaslan

, et al. Deep learning improves physician accuracy in the comprehensive detection of abnormalities on chest X-rays. Sci Rep 2024; 14: 25151.

Segall

Sankarasubbu

. Survey of recent applications of artificial intelligence for detection and analysis of COVID-19 and other infectious diseases. Int J Artif Intell Mach Learn 2022; 12: 1–30.

10.

Benjamin

, et al. Advances in artificial intelligence for infectious-disease surveillance. N Engl J Med 2023; 388: 1597–1607.

11.

Zhang

Wen

J-W

, et al. Diagnostic error and bias in the department of radiology: a pictorial essay. Insights Imaging 2023; 14: 163.

12.

Hussein

Lafta

Ahmed

, et al. The Detection and Classification from the Multimodal Images using Artificial Intelligence for Lung Diseases. 2024.

13.

Zazzaro

Martone

Romano

, et al. A deep learning ensemble approach for automated COVID-19 detection from chest CT images. J Clin Med 2021; 10: 1–13.

14.

Ahmed

Hossain

Hoque

, et al. Automated COVID-19 Detection from Chest X-Ray Images: A High-Resolution Network (HRNet) Approach. SN Comput Sci 2021; 2: 1–17.

15.

Abdul Gafoor

Sampathila

Madhushankara

, et al. Deep learning model for detection of COVID-19 utilizing the chest X-ray images. Cogent Eng 2022; 9: 1–18.

16.

Yang

Wang

, et al. Covidvit: a novel neural network with self-attention mechanism to detect COVID-19 through X-ray images. Int J Mach Learn Cybern 2023; 14: 973–987.

17.

Gupta

Sheth

Xie

Neural architecture search for pneumonia diagnosis from chest X-rays. Sci Rep 2022; 12: 1–12.

18.

Ramana

Kavitha

Narayana

GVL

, et al. Journal of advanced zoology A comparison of Pre-trained models for pneumonia disease prediction using chest images. J Adv Zool 2023; 44: 263–270.

19.

Abdulahi

ART

Ogundokun

Adenike

, et al. PulmoNet: a novel deep learning based pulmonary diseases detection model. BMC Med Imaging 2024; 24: 1–19.

20.

Vats

Sharma

Singh

, et al. Iterative enhancement fusion-based cascaded model for detection and localization of multiple disease from CXR-images. Expert Syst Appl 2024; 255: 124464.

21.

Kabir

Mridha

Rahman

, et al. Detection of COVID-19, pneumonia, and tuberculosis from radiographs using AI-driven knowledge distillation. Heliyon 2024; 10: 1–16.

22.

Mahbub

MdK

Biswas

Gaur

, et al. Deep features to detect pulmonary abnormalities in chest X-rays due to infectious diseaseX: COVID-19, pneumonia, and tuberculosis. Inf Sci (N Y) 2022; 592: 389–401.

23.

Angelov

Soares

. Towards explainable deep neural networks (xDNN). Neural Netw 2020; 130: 185–194.

24.

Fraiwan

Khasawneh

Khassawneh

, et al. A dataset of COVID-19 x-ray chest images. Data Brief 2023; 47: 109000.

25.

Shastri

Kansal

Kumar

, et al. CheXImageNet: a novel architecture for accurate classification of COVID-19 with chest x-ray digital images using deep convolutional neural networks. Health Technol (Berl) 2022; 12: 193–204.

26.

Kumar

Shastri

Mahajan

, et al. Litecovidnet: a lightweight deep neural network model for detection of COVID-19 using X-ray images. Int J Imaging Syst Technol 2022; 32: 1464–1480.

27.

Kermany

Goldbaum

Cai

, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018; 172: 1122–1131.e9.

28.

Rahman

Khandakar

Kadir

, et al. Reliable tuberculosis detection using chest X-ray with deep learning, segmentation and visualization. IEEE Access 2020; 8: 191586–191601.

29.

Wang

Sourlos

Zheng

, et al. Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets. Heliyon 2023; 9: 1–14.

30.

Chitradevi

Srimathi

. An overview on image processing techniques. Int J Innov Res in Comput Commun Eng 2014; 2: 6466–6472.

31.

Gracia Moisés

Vitoria Pascual

Imas González

, et al. Data augmentation techniques for machine learning applied to optical spectroscopy datasets in agrifood applications: a comprehensive review. Sensors 2023; 23: 8562.

32.

Yoon

Fuentes

, et al. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit 2023; 137: 1–12.

33.

Skansi

. Convolutional Neural Networks 2018. https://api.semanticscholar.org/CorpusID:3719231

34.

Ghosh

Sufian

Sultana

, et al. Fundamental concepts of convolutional neural network. In: Balas

Kumar

Srivastava

(eds) Recent trends and advances in artificial intelligence and internet of things. Cham: Springer International Publishing, 2020, pp. 519–567.

35.

Salehi

Chalechale

Taghizadeh

. Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments. ArXiv abs/2005.13178. 2020. https://api.semanticscholar.org/CorpusID:218900710 .

36.

Dash

Wang

. A review of generative adversarial networks (GANs) and its applications in a wide variety of disciplines: from medical to remote sensing. IEEE Access 2021; 12: 18330–18357.

37.

Huang

SY.

Computation of the distribution of model accuracy statistics in machine learning: Comparison between analytically derived distributions and simulation-based methods. Health Sci Rep 2023; 6: 1–9.

38.

Kumarakulasinghe

Blomberg

Liu

, et al. Evaluating local interpretable model-agnostic explanations on clinical machine learning classification models. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS). 2020, pp. 7–12.

39.

Huang

SY.

Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One 2023; 18: 1–15.

A novel and ultralight convolutional neural network model for real-time detection of infectious lung diseases

Abstract

Objective

Methods

Results

Conclusion

Keywords

Introduction

Methods

Dataset description

COVID-19

Pneumonia

Tuberculosis (TB)

Images preprocessing

Images normalization

Image transformation

Convolutional neural network (CNN) to classify images

Generative adversarial network (GAN)

Results

Discussion

Conclusions

Footnotes

Acknowledgements

Author contributions statement

Consent statement

Data availability

Declaration of conflicting interests

Ethical approval

Funding

Guarantor

ORCID iD

References