Sage Journals: Discover world-class research

Abstract

In numerous real-world contexts, the prevalence of abnormal instances in comparison to normal instances is markedly low. This phenomenon, characterized by an imbalanced data distribution, results in an increased likelihood of misclassifying the minority class as the majority. The publicly accessible collection of coronavirus disease-2019 (COVID-19) chest X-ray images has become significantly imbalanced due to the ramifications of the pandemic. To address this challenge, the proposed methodology integrates Histogram of Oriented Gradients (HOG) with the Synthetic Minority Over-sampling Technique (SMOTE) and Kernel-based Extreme Learning Machine (KELM). This process is executed in three phases: initially, the images are subjected to preprocessing, followed by the extraction of features using the HOG algorithm. These extracted features facilitate the generation of synthetic minority class samples, thereby achieving a more balanced dataset. The SMOTE-augmented dataset is subsequently employed for training the KELM, which demonstrates superior performance relative to existing state-of-the-art models. A comprehensive experimental analysis was conducted on four datasets comprising chest X-ray images of COVID-19, pneumonia, tuberculosis, and Healthy lungs. The classification accuracy obtained are 96.72%, 96.38%, 97.15%, and 98.66% on Dataset-1, Dataset-2, Dataset-3, and Dataset-4, respectively.

Keywords

computer-aided diagnosis extreme learning machine COVID-19 classification machine learning class imbalance

1. Introduction

The World Health Organization declared coronavirus disease-2019 (COVID-19) a pandemic due to its rapid spread and severe symptoms, significantly impacting human life and causing major economic challenges globally (Carter et al., 2020). The diagnostic methods utilized for the identification of COVID-19 can be classified into two primary categories: diagnostic tests and antibody tests. Polymerase chain reaction (PCR) tests and nucleic acid amplification tests (NAATs) are both advanced diagnostic techniques employed for the early detection of COVID-19 infection. In contrast, antibody tests are designed to detect antibodies in the bloodstream that the immune system produces in response to the SARS-CoV-2 virus, yet these tests are not recommended as diagnostic tools for COVID-19. Throughout the pandemic, the demand for COVID-19 testing kits surged; however, supply shortages were prevalent. Both reverse transcription-PCR and NAAT processes are intricate and time-intensive, necessitating specialized equipment and trained personnel. Furthermore, there exists a risk of false-positive results arising from potential contamination during these testing procedures (Maleki & Hojati, 2022). Moreover, COVID-19 and other forms of pneumonia cannot be distinguished by these tests. To diagnose COVID-19, imaging methods such as computed tomography (CT) scans, X-rays, and ultrasound have been employed (Chung et al., 2020).

Recent developments in computer vision and machine learning have significantly impacted the advancement of automated computer-aided detection and diagnosis systems, facilitating the diagnosis of various medical conditions (Ali & Yadav, 2021). The development of such sophisticated diagnostic systems, designed to be efficient and less susceptible to errors, can leverage datasets containing chest scans of COVID-19 patients in addition to data relating to other pulmonary disorders, such as tuberculosis (TB) and pneumonia. In the context of today’s global healthcare challenges, it can be assumed that X-ray images offer a viable solution for rapid, straightforward, and scalable diagnostic procedures in the face of epidemics, such as COVID-19 (Amin et al., 2024). Manual analysis and interpretation of radiological images require considerable time and are associated with an increased risk of misdiagnosis. To effectively reduce misclassification in diagnostic processes, deep learning (DL) models are developed to identify and extract relevant features from data automatically. This advanced capability not only minimizes the dependence on human judgment, but also helps alleviate the negative influences that human factors can have on the accuracy and reliability of diagnostic results. Using these models, organizations can enhance the precision of their assessments and improve overall results (Ali & Yadav, 2023). Consequently, DL models, combined with image processing methods, have been extensively explored in recent years for the development of systems designed to automate the detection of diseases through the analysis of radiological images.

The quality and reliability of machine learning techniques are substantially influenced by class imbalance, a phenomenon that is prevalent in various real-world applications. A salient example of class-imbalanced datasets is found within the biomedical domain, where it poses significant challenges in disease identification. The primary contributor to the imbalance in medical datasets is the disparity between data originating from patients with medical disorders and those without such disorders. Typically, the majority class comprises data from healthy individuals, while data pertaining to rare disorders constitutes the minority class. For example, the COVID-19 dataset (Chowdhury et al., 2020; Rahman et al., 2021) of chest radiographs consists of 3,616 images of COVID-19-infected lungs, whereas 10,192 images are of healthy lungs. Learning from these unbalanced datasets can be challenging, and the application of nontraditional machine learning techniques may sometimes be necessary to achieve acceptable results, especially when addressing clinical illnesses or diseases with low prevalence.

Motivation: The precise diagnosis of thoracic diseases using chest X-ray (CXR) images remains a vital challenge in the realm of medical image analysis. A principal obstacle to the development of dependable machine learning models for this purpose is the disproportionate class distribution—specific conditions, such as pneumonia, TB, or COVID-19, are underrepresented relative to normal cases in publicly accessible datasets. This disparity predisposes classifiers to favor the majority class, resulting in diminished sensitivity and low recall for rare yet clinically critical conditions. Furthermore, medical images frequently display intricate structures, noisy artifacts, and variations in intensity arising from differences in acquisition parameters, patient anatomy, and pathological conditions. Conventional pixel-level features are susceptible to such variations and often lack the robustness required to capture invariant structural attributes essential for reliable classification.

To address these challenges, this work proposes a hybrid approach that combines Histogram of Oriented Gradients (HOG) for extracting structural features with the Synthetic Minority Over-sampling Technique (SMOTE) to balance class distributions within the feature space. HOG provides a concise and noise-resistant representation of anatomical and pathological patterns by encoding local edge orientations, making it effective for capturing abnormalities such as lung infiltrates, nodules, or lesions. SMOTE enhances the model’s ability to generalize across all classes by generating synthetic samples for minority classes based on existing feature vectors, thereby mitigating bias caused by class imbalance. By combining these two techniques, the proposed method aims to enhance the discriminative power and generalization ability of machine learning models for CXR classification, especially in environments with limited or unevenly distributed data.

This article introduces a cost-efficient methodology that integrates statistical feature extraction techniques with advanced computational modeling approaches. Our research offers a significant contribution, which can be concisely summarized as follows:

To address the dual challenge of feature noise and class imbalance in CXR datasets, a novel hybrid classification framework is proposed that integrates structural feature extraction, synthetic data augmentation, and kernel-based nonlinear learning. The key contribution of this work lies in the synergistic use of HOG, SMOTE-based oversampling, and Kernel-based Extreme Learning Machine (KELM) to build a robust, balanced, and efficient classifier for thoracic disease detection.

First, HOG descriptors are employed to extract local structural features from CXR images. These descriptors effectively capture edge patterns corresponding to anatomical boundaries and pathological regions such as nodules, infiltrates, or lesions, while being relatively invariant to illumination and noise. Compared to raw pixel-based features, HOG provides a more compact and noise-resilient representation, particularly suited for the heterogeneous appearance of medical imaging data.

Next, to mitigate the adverse effects of class imbalance, which is prevalent in real-world medical datasets, the SMOTE is applied using the HOG feature vector. This technique generates new synthetic samples for minority classes by interpolating between existing samples and their nearest neighbors, effectively balancing the class distribution. The application of SMOTE in the feature domain, rather than the image domain, ensures that generated samples preserve essential geometric properties while reducing computational cost.

Finally, the balanced HOG feature set is fed into a KELM classifier. An extreme learning machine (ELM) is known for its high-speed learning and good generalization, but conventional ELMs often fail to handle nonlinear decision boundaries adequately. The incorporation of a kernel function enables the model to project the features into a higher-dimensional space, allowing for effective separation of classes with overlapping distributions. The proposed KELM model combines the speed advantage of ELMs with the representational power of kernel methods, making it particularly effective in handling the complexity and diversity inherent in CXR images.

A comparative analysis has been performed between the proposed system and some of the latest advanced DL models available. The experimental results show that the proposed model is more accurate and efficient than most recent advanced DL models in terms of processing time, precision, recall, $F 1$ -score, accuracy, and cost-effectiveness.

2. Related Work

Researchers and scientists have made significant strides in developing diagnostic systems for COVID-19, specifically designed for the analysis and categorization of this viral infection alongside various other lung conditions. These innovative systems leverage sophisticated technologies such as Convolutional Neural Networks (CNNs) and DL methodologies, which are cutting-edge approaches in the field of AI. Moreover, some researchers have harnessed the power of transfer learning to refine and train DL models for more accurate classification results. Despite the advancements made thus far, there remains a concerted effort to improve the efficiency and reliability of these models. This includes tackling complex challenges such as imbalanced data distribution, which can skew results, as well as preparing for future needs that may arise in medical diagnostics.

Ozturk et al. (2020) employed the DarkNet model as a classifier for the you only look once system which does not require any feature extraction technique. Minaee et al. (2020) have conducted a comparison analysis on four pretrained DL networks trained on CXR images (ResNet18, ResNet50, SqueezeNet, and DenseNet-121); among them SqueezeNet performs comparatively better. A Bayesian optimized support vector machine combined with CNNs has been proposed by Lakshmi et al. (2024) for distinguishing COVID-19 from other disorders using CXR images. The performance of machine learning models in this context is often hindered by limited availability of annotated CXR images due to privacy concerns and the complex phenotypes associated with COVID-19. Additionally, models of this type face challenges such as computational overhead, lack of global optima guarantees, parameter sensitivity, and scalability. Kernel-based support vector machines, in particular, may encounter difficulties in very high-dimensional feature spaces without proper dimensionality reduction or feature engineering. Khalif et al. (2024) propose a novel approach that combines ‘Generative Adversarial Networks (GANs) with Deep Convolutional Neural Networks (DCNNs). This method improves the model’s ability to differentiate between COVID-19 patterns and healthy lung images, addressing the shortage of labeled COVID-19 data. By generating artificial images, GANs help alleviate issues caused by imbalanced datasets. A data augmentation techniques using a Conditional-GAN with DL model was proposed by Loey et al. (2025) for COVID-19 detection in chest CT scan images. They experimented five DL networks, and ResNet50 with data augmentation using Conditional Generative Adversarial Networks (CGAN) outperforms other DL models.

Jie et al. (2024) proposed a model that employs a pyramid Graph Neural Network (GNN) to tackle key challenges in the field. The approach involves dividing CXR images into multiple patches, which are subsequently processed through a CNN-based feature extractor. Specifically, the first five layers of ResNet50 are utilized for feature extraction. These patch-derived features function as nodes within the pyramid GNN framework. However, pyramid GNNs encounter limitations primarily associated with maintaining an appropriate balance between information retention and noise mitigation during the pooling process, scalability concerns for large graphs, and the potential loss of information arising from nonadjacent scale interactions.

Shazia et al. (2021) conducted a comparative analysis of contemporary DL models (VGG16, VGG-19, DenseNet121, Inception-ResNet-V2, InceptionV3, ResNet50, and Xception) to address the identification and categorization of coronavirus pneumonia from pneumonia patients. DenseNet121 outperformed the other models, displaying an accuracy rate of 99.48%. Albataineh et al. (2024) adapted a simple machine learning model using segmentation and three feature extraction methods: the ratio of white lesion regions to lung regions (ratio of infection), global statistical texture features (mean, standard deviation, skewness, kurtosis), and texture features from the gray-level co-occurrence matrix (GLCM) and gray-level run length matrix (GLRLM). To effectively categorize COVID-19 CXR images, Elaraby et al. (2024) develop a hybrid DCNN model in which features are retrieved using five distinct feature extraction techniques (speeded up robust features, GLCM, HOG, segmentation-based fractal texture analysis, and local binary patterns [LBPs]). These features are then integrated into a single feature vector.

Zhang et al. (2022) designed a deep ensemble dynamic learning network, which links a pretrained convolutional feature extractor and two-stage bagging dynamic learning network classifier in sequence. The concept proposed by Liu et al. (2024) is based on combining bi-directional long short-term memory modules with parallel deformable multilayer perceptrons. It faces issues with computational complexity, overfitting risks, and scalability. The dataset used to implement the model is small, making it difficult to assess its effectiveness when applied to larger datasets. Fan and Gong (2023) leverage an automated approach for COVID-19 that uses a dual-ended multiple attention learning model utilizing ResNet50 as the backbone network. Naz et al. (2024) developed a federated learning (FL) model using ResNet50 for training local and global models. FL enables training on decentralized datasets without central data integration, enhancing privacy. However, while FL handles larger datasets better than traditional DL, it may compromise accuracy to protect privacy. Since raw medical data remain local to each institution, and only model updates are shared, issues such as non-independent and identically distributed data distributions, limited communication, and the use of privacy-preserving mechanisms (e.g., differential privacy, secure aggregation) can affect convergence and reduce performance. Nonetheless, FL remains a promising framework for privacy-preserving collaboration in medical imaging tasks.

Hossain et al. (2024) adapted an optimized ensemble model for COVID-19 classification where VGG-19 and ResNet50 networks are used for feature extraction. The fused feature vector is optimized using principal component analysis, which is then used for training an ensemble model to categorize COVID-19. Mezina and Burget (2024) implemented a vision transformer-based neural network for COVID-19 classification. Global and local features are extracted using an Inception network and a combination of three Inception modules and a vision transformer network, respectively. The concatenated features are then utilized for classification. Tan et al. (2024) presented a new Self-Supervised Learning with Self-Distillation learning model. Two auxiliary tasks, image reconstruction and self-distillation modeling, comprise the pretraining portion of the entire model architecture. Following the completion of pretraining, the trained encoder weights are transferred to the optimized network, and COVID-19 classification is performed.

A variational autoencoder-based model utilizing the SMOTE–ENN (Edited Nearest Neighbor) approach to mitigate the class imbalance issue was adapted by Chatterjee et al. (2023). SMOTE–ENN is a hybrid approach that simultaneously uses oversampling of the minority examples and undersampling of the majority examples. Imbalanced class distribution is one of the significant challenges faced by classification models. Schaudt et al. (2023) employ an amalgamation of random oversampling and several augmentation strategies to synthetically generate a balanced dataset. During training, specialized augmentation procedures are employed to enhance the image diversity of the minority classes. Calderon-Ramirez et al. (2021) suggest an efficient method for resolving data imbalance using the Self-supervised Deep Learning Mix-Match architecture. The technique applies extra weight to the underrepresented classes in the labeled dataset using loss-based imbalance correction. Mix-Match method is utilized to resolve unlabeled data issue, where pseudo and augmented labels are generated for unlabeled instances. Chamseddine et al. (2022) experimented with some of the popular state-of-the-art CNN models (Dense-Net201, chest X-ray network (CheXNet), MobileNetV2, ResNet152, VGG-19, and Xception) along with Weighted Categorical Loss (WCL) and then the SMOTE for tackling imbalance data distribution problem. Compared to the other models tested, the WCL and CheXNet collectively achieved better performance. Javidi et al. (2021) propose regularized cost-sensitive CapsNet in conjunction with DenseNet model for balanced classification of COVID-19 CT scan images. After preprocessing, the CT image is conveyed into a deep neural network, where DenseNet integrates with CapsNet to retrieve essential features. In the capsule network, the cost-sensitive regularized loss function is taken into account to combat unbalanced data.

Amin et al. (2024) employed an ensemble-based machine learning approach, integrating models such as Light Gradient Boosting, Random Forest, and XGBoost for classification tasks. They extracted features from CXR images using HOG and LBP descriptors, which effectively captured salient image features. However, the use of HOG and LBP, both of which extract features related to spatial and structural patterns, could produce overlapping or redundant information, potentially leading to inefficiencies in classification. Also, combining these feature extraction methods resulted in a high-dimensional feature space, increasing computational costs and the risk of overfitting, especially given the small dataset size, thereby raising questions about its effectiveness on larger datasets. Al-Shourbaji et al. (2022) employed a batch-normalized CNN model inspired by the VGG architecture. The model consists of 18 layers, with 12 dedicated to feature extraction—organized into four repetitions of convolution, batch normalization, and max pooling—and six layers focused on classification, implemented as two repetitions of dense layers, batch normalization, and dropout. Despite its improved performance, the model faces several limitations, including high-computational complexity, lack of feature reuse, potential overfitting, limited architectural innovation, and reduced robustness. Nayak et al. (2022) introduced LW-CORONet, a streamlined CNN model consisting of a series of convolutional, rectified linear unit (ReLU) activation, and pooling layers, followed by two fully connected layers. This architecture effectively captures significant features from CXR images while employing only five trainable layers. Lightweight CNNs are valuable for their efficiency in resource-constrained environments, but they often struggle with accuracy, generalization, feature extraction, and handling complex tasks. These limitations highlight the inherent compromises in designing compact neural networks. Therefore, selecting an appropriate lightweight CNN requires careful consideration of the specific task needs and dataset features. Shankar et al. (2022) utilized Wiener filtering at the preprocessing stage, and for feature extraction, a fusion-based approach such as GLCM, GLRLM, and LBP was utilized. The Salp Swarm Algorithm (SSA) was employed to select the most relevant feature subset. An artificial neural network (ANN) was trained to differentiate between infected and healthy patients. The combined use of SSA and ANN offers an innovative approach that leverages swarm intelligence for parameter optimization. Nonetheless, this method encounters challenges such as inefficiency during training, sensitivity to initial conditions, and difficulties handling complex or high-dimensional data.

Bhattacharyya et al. (2022) proposed a comprehensive three-step methodology for analyzing lung conditions through image processing and machine learning techniques. The process begins with the segmentation of raw X-ray images using a CGAN to precisely delineate lung regions. Next, these segmented images are processed via a novel pipeline that combines key point extraction methods with trained deep neural networks to identify features relevant for classification. In the final stage, various machine learning models are applied to differentiate between COVID-19, pneumonia, and normal lung images. Their approach achieved a maximum testing accuracy of 96.6% with the VGG-19 model integrated with the Binary Robust Invariant Scalable Key-points (BRISK) algorithm. Nonetheless, it is noted that CGANs can produce overly smooth segmentations due to the adversarial loss, potentially impairing the preservation of fine details and boundaries necessary for high-quality segmentation. Additionally, training CGANs for high-resolution images is computationally demanding, which poses challenges for large-scale dataset applications. Alshahrni et al. (2023) proposed a novel system that integrates two-step-AS (Analytic Server) clustering, ensemble bootstrap aggregation, and multiple neural networks, incorporating fractal-based and statistical texture feature extraction. This hybrid approach was designed to enhance discrimination among COVID-19, pneumonia, and normal cases, thereby improving robustness against dataset imbalance. The model reportedly achieved a classification accuracy of 98.06%, surpassing conventional CNN classifiers and illustrating the advantages of combining unsupervised clustering with supervised DL in medical image analysis. However, the selection of variables, the methods of scaling, and the distance metric utilized are all critical factors that can significantly impact the results of a two-step cluster analysis. Determining the optimal number of clusters presents a significant challenge. An inaccurate specification of cluster counts can adversely affect the efficacy of the neural network training process. Computational efficiency, scalability, and reliance on clustering quality reduce its practicality in many real-world scenarios.

3. Proposed Methodology

To address class imbalance in COVID-19 CXR datasets, this study proposes a hybrid framework that integrates the HOG feature descriptor (Dalal & Triggs, 2005), the SMOTE (Chawla et al., 2002) for data balancing, and the KELM (Huang et al., 2011) for classification. The primary objective of this integration is to enhance classification performance by mitigating class imbalance-induced bias, while preserving computational efficiency.

The proposed pipeline begins by preprocessing CXR images, after which HOG is employed to extract distinctive texture and structural features. HOG effectively captures the distribution of gradient orientations within localized regions of the image, emphasizing structural features—such as edges, contours, and shape information—and reducing the influence of pixel-level noise, texture inconsistencies, and illumination artifacts. This transformation yields a more compact and semantically meaningful feature space, in which samples of the same class tend to exhibit greater intraclass similarity and form more cohesive clusters. The application of SMOTE within this HOG-transformed feature space significantly enhances the probability of generating valid and class-consistent synthetic samples. The assumption of local linearity underlying SMOTE, that interpolated samples should lie within the true class distribution, is more reliable in the HOG space due to its denoising and structure-preserving properties. Consequently, even when the minority class exhibits a complex or sparse distribution in the original image space, the corresponding HOG feature space is smoother and more regular, rendering synthetic oversampling more effective and stable. Therefore, the integration of HOG and SMOTE enhances the oversampling process by ensuring that synthetic data are generated within a feature domain that is both resilient to noise and structurally consistent. The balanced feature set obtained after SMOTE is then used to train the KELM classifier. KELM is selected for its fast training, strong generalization, and noniterative learning mechanism. By incorporating a kernel function, KELM can efficiently handle nonlinear relationships in feature space, thereby improving classification performance.

In summary, the proposed methodology adheres to a sequential pipeline: image preprocessing, HOG feature extraction, SMOTE-based feature balancing, and KELM classification. This integrated method allows the model to leverage strong feature representation, balanced training data, and efficient kernel-based learning, thereby improving COVID-19 detection accuracy in CXR images. The comprehensive workflow of the proposed framework is depicted in Figure 1.

Figure 1.

Outline of the Proposed Methodology.

4. Preliminaries

4.1. HOG Feature Extraction

In the realm of computer vision, effective feature extraction is pivotal for various applications, including object detection, image classification, and scene understanding. Among the many techniques developed for this purpose, HOG stands out for its robustness and effectiveness. This method has been instrumental in extracting meaningful features from images, particularly for object detection tasks such as pedestrian and vehicle detection (Dalal & Triggs, 2005). Figure 2 illustrates the steps involved in the extraction of HOG features from a CXR image. This figure shows the contours and edges in the radiographic data, which are essential for further analysis and interpretation.

Figure 2.

Block Diagram Describing HOG Feature Extraction Method. Note. HOG = Histogram of Oriented Gradient.

The steps required to extract HOG features are listed below: (i)

Gradient computation: The first stage involves calculating the gradient magnitude and direction for each pixel in the image. This is typically achieved using gradient filters like the Sobel operator, which highlights the changes in pixel intensity. These gradients emphasize areas where there are strong transitions in intensity, which often correspond to the edges of objects.

(ii)

Cell division: Cells are discrete, nonoverlapping areas that are generated by image decomposition into smaller units. For assembling the gradient orientations, a histogram is built inside each cell. This histogram reflects the distribution of gradient directions within the cell, which can be indicative of local texture and edge patterns.

(iii)

Block normalization: To ensure that the feature descriptor remains consistent across varying lighting conditions and contrast levels, normalization is applied. Cells are grouped into larger regions called blocks. The histograms within each block are normalized to produce a feature vector that is less sensitive to changes in illumination and contrast.

(iv)

Feature vector construction: In the last step, feature vector for the image is created by concatenating the normalized histograms from each block. The extracted HOG features of the image that are captured in this vector, can be applied to a variety of object recognition or machine learning applications.

4.2. Synthetic Minority Over-Sampling Technique (SMOTE)

Unbalanced datasets have an adverse effect on the performance and reliability of machine learning models. The learning model’s approximation of the decision boundary can be skewed in cases when there is a significant disparity in the number of samples in each class. Several data-balancing strategies have been developed over time; one of the most popular and highly acceptable approach is synthetic minority oversampling technique (SMOTE) (Chawla et al., 2002). As the name suggests, it oversamples the minority class by generating synthetic samples.

Steps involved in the implementation of HOG–SMOTE-balancing approach are:

(i)
Selection of a random instance from HOG feature space of minority class.
(ii)
Identifying K-nearest neighbor: SMOTE identifies $K$ -nearest neighbors of the selected instance from the minority feature space. The choice of $K$ (typically between 5 and 10) is a parameter that can be tuned.
(iii)
Generating synthetic instances: For each instance, SMOTE generates synthetic examples by interpolating between the instance and its nearest neighbors. Specifically, it creates synthetic points by choosing random points along the line segment joining the instance and its neighbor (Le et al., 2019).
(iv)
Incorporate synthetic instances: These newly generated synthetic instances are then added to the minority class dataset, effectively increasing its size and helping balance the class distribution.

The synthetic samples are not exact replicas of existing instances but are instead created to fill gaps in the feature space, which helps in creating a more generalized decision boundary. SMOTE has its own advantages over random sampling methods. Unlike simple duplication, SMOTE generates diverse examples that can help models generalize better by providing a broader representation of minority class. Thus reducing the risk of overfitting by introducing variability in synthetic samples. But classic SMOTE might not be ideal for all types of data, especially in cases where the minority class has a complex distribution or is noisy. The proposed variation HOG–SMOTE improves the performance of classification model, by generating new instances using HOG feature vectors. HOG extracts relevant features by calculating gradients of each pixel and then decomposing image into nonoverlapping cells. Histogram is build inside each cell for assembling gradient orientations. The histograms within each block are normalized to produce a feature vector that is less sensitive to changes in illumination and contrast. Synthetic instances generated using HOG feature vectors are less prone to noise, and hence more efficient than traditional SMOTE.
4.3. Kernel-Based Extreme Learning Machine (KELM)

Feedforward networks have been extensively used in many applications, but optimizing hyperparameters is necessary to achieve strong generalization results. Huang et al. (2004) introduced a faster and efficient learning algorithm called “Extreme Learning Machine (ELM)” which is a single-layer feedforward network (SLFN). Compared to other well-known SLFN learning algorithms, ELM is distinguished by its ease of implementation, tendency to achieve the lowest training error, ability to generate smallest weight norm, strong generalization capabilities, and faster processing speed. For any activation function that is infinitely differentiable, such as the sigmoid, radial basis, or exponential function, it is possible to randomly assign input weights and hidden layer bias values (Huang et al., 2006). The Moore–Penrose generalized inverse (Moore, 1920; Penrose, 1955) technique is used to find the output weights after the input weights are selected at random. Let there be $S$ training samples and $\overset{`}{S}$ hidden neurons, such that $\overset{`}{S} \leq S$ . For $S$ input samples $(x_{i}, t_{i})$ , $x_{i} = [x_{i 1}, x_{i 2}, \dots x_{i n}]^{T}$ $\in R^{n}$ , $t_{i} = [t_{i 1}, t_{i 2}, \dots t_{i m}]^{T}$ $\in R^{m}$

y_{j} = \sum_{i = 1}^{\overset{`}{S}} γ_{i} g (w_{i} x_{j} + b_{i}), j = 1, \dots S

(1)

where

w_{i} = [w_{i 1}, w_{i 2}, \dots w_{i n}]^{T}

is the input weight vector,

γ_{i} = [γ_{i 1}, γ_{i 2}, \dots γ_{i m}]^{T}

is the output weight vector, and

b_{i}

is the bias. The standard equation of this approximate model exists as

\sum_{i = 1}^{\overset{`}{S}} γ_{i} g (w_{i} x_{j} + b_{i}) = t_{i}

(2)

The above equation can be written as

H γ = T

(3)

where

H = {[\begin{matrix} g (w_{1} x_{1} + b_{1}) & \dots & g (w_{\overset{`}{S}} x_{1} + b_{\overset{`}{S}}) \\ ⋮ & ⋱ & ⋮ \\ g (w_{1} x_{S} + b_{1}) & \dots & g (w_{\overset{`}{S}} x_{S} + b_{\overset{`}{S}}) \end{matrix}]}_{S \times \overset{`}{S}}

The output weight can be calculated with the help of following equations

\begin{aligned} y (x) & = h (x) γ \end{aligned}

(4)

\begin{aligned} γ & = H^{+} T \end{aligned}

(5)

where

H^{+}

is the Moore Penrose inverse of

h (x)

A mapping from input space to $L$ -dimensional feature space is performed by the hidden layer. It is possible to build several expansions of ELM variations as the hidden function in ELM is unknown and the majority of nonlinear piecewise continuous functions may be employed as hidden output functions (Huang et al., 2011). Kernel-based ELM extends ELM by incorporating kernel methods to handle nonlinearly separable data. Instead of using a linear transformation in the input space, kernel methods translate the data into a higher-dimensional space where class separation might be more feasible. Mercer’s condition on ELM may be used to obtain the kernel matrix for an unknown $h (x)$ . The equation for kernel matrix is

Ω_{E L M} = H H^{T} : Ω_{E L M_{i, j}} = h (x_{i}) h (x_{j}) = k (x_{i}, x_{j})

(6)

The output function of

K

-ELM follows

\begin{aligned} y (x) & = h (x) H^{T} {(\frac{I}{C} + H H^{T})}^{- 1} T \end{aligned}

(7)

\begin{aligned} f (x) & = {[\begin{matrix} K (x, x_{1}) \\ ⋮ \\ K (x, x_{N}) \end{matrix}]}^{T} {(\frac{I}{C} + Ω_{E L M})}^{- 1} T \end{aligned}

(8)

The main advantage of

K

-ELM over traditional ELM is its ability to model complex, nonlinear relationships more effectively through the use of kernel functions. This leads to improved flexibility, generalization, and robustness in many cases, particularly when dealing with complex or noisy data.

5. Experimental Layout

5.1. Overview of Dataset

There are a limited number of publicly accessible CXR images of individuals affected by COVID-19, primarily due to the relatively recent emergence of the disease. Among the few available resources, four datasets are utilized. Dataset-1 is one of the most used COVID-19 Radiography Database developed by Chowdhury et al. (2020) and Rahman et al. (2021). This database compiles publicly accessible datasets from multiple sources, forming one of the most extensive collections of CXR images of coronavirus-infected cases. In this study, the second version of the database was used, comprising 3,616 COVID-19 cases, 1,345 instances of viral pneumonia, 10,192 normal images, and 6,012 images showing lung opacity. To enhance the effectiveness of this experiment, we focused exclusively on three specific classes, ensuring clarity and precision in our findings. Notably, lung opacity cases were deliberately excluded, to maintain the integrity of our results. As a result, the proposed model was specifically trained to differentiate between COVID-19 and the two remaining classes.

To evaluate the robustness of the proposed model, three more datasets were employed. Dataset-2 was sourced from the Kaggle repository and was assembled from a compilation of CXR images obtained from publicly accessible resources (Cohen et al., 2020; Kermany et al., 2018). Dataset-3 was prepared by combining two publicly available databases from the Kaggle repository. From Kermany et al. (2018), we included normal and pneumonia CXR images, and from Chowdhury et al. (2020) and Rahman et al. (2021), COVID-19 images were included to form the third dataset. In Dataset-4, a new class of TB was incorporated. There exists a possibility that individuals infected with COVID-19 may be erroneously diagnosed with other pulmonary diseases, such as pneumonia or TB. Consequently, this dataset comprises three analogous classes of lung infection diseases: COVID-19, pneumonia, and TB. We prepared this dataset by combining images from three Kaggle repositories. The data distribution and repository link is given in Table 1.

Table 1.
Dataset Distribution with Repository Links.

Dataset Classes Repository Links

Dataset-1 COVID-19: 3,616 https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database

Pneumonia: 1,345

Normal: 10,192

Dataset-2 COVID-19: 576 https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia

Pneumonia: 4,273

Normal: 1,583

Dataset-3 COVID-19: 3,616 https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia

Pneumonia: 4,273

Normal: 3,500

Dataset-3 COVID-19: 3,616 https://www.kaggle.com/datasets/raddar/tuberculosis-chest-xrays-shenzhen

Pneumonia: 4,273 https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset

TB: 1,362 https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database

Dataset	Classes	Repository Links
Dataset-1	COVID-19: 3,616	https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database
	Pneumonia: 1,345
	Normal: 10,192
Dataset-2	COVID-19: 576	https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia
	Pneumonia: 4,273
	Normal: 1,583
Dataset-3	COVID-19: 3,616	https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
	Pneumonia: 4,273
	Normal: 3,500
Dataset-3	COVID-19: 3,616	https://www.kaggle.com/datasets/raddar/tuberculosis-chest-xrays-shenzhen
	Pneumonia: 4,273	https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset
	TB: 1,362	https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database

5.2. Data Preprocessing and Distribution

The raw images of the dataset are first resized to 128 $\times$ 128 pixels and then converted to grayscale. The dimensions of the images have been optimized to prevent computational complexity and memory overflow issues, which are frequently encountered when handling large datasets. This is particularly relevant because larger images demand greater amounts of random-access memory for the application of SMOTE-balancing techniques, especially when the images exceed the size of $128 \times 128$ . Before proceeding to the next step of feature extraction and training the model, the dataset needs to be split into training and test sets. The data can either be split just after the preprocessing step and then followed by feature extraction of training and test sets separately, or after the feature extraction, the entire feature vector can be split into training and test feature vector sets. The second option is more preferable and hence adopted in this experiment. The model is trained using various proportions of the dataset (60–40, 70–30, 80–20, and 90–10). The results suggest that the 90–10 scheme is more optimal than other proportions. To understand and visualize how classes are distributed in the dataset, the t-distributed Stochastic Neighbor Embedding (t-SNE) method is used (van der Maaten & Hinton, 2008). This method reduces the complexity in representing a high-dimensional dataset. It shows class segregation or overlap by mapping comparable data points close to one another into shared clusters or groupings. In Figure 3(a), the t-SNE plot represents data distribution with distinct clusters of each class, while overlapping clusters suggest that classes share similar feature distribution. The denser clusters represent the majority class, whereas the smaller/sparse clusters represent the minority classes. This clearly highlights the extent of imbalance or uneven distribution within the dataset. In the next stage of the experiment, HOG–SMOTE is utilized to create synthetic images for the minority class. After applying HOG–SMOTE, a t-SNE plot visualizes the impact of synthetic samples on the class distribution of Dataset-1, as shown in Figure 3(b). In the postbalancing plot, the cluster sizes are comparable, with more data points and reduced overlap between classes.

Figure 3.

t-SNE Utilized for Visualizing Class Distribution in Datasets. To Prepare the Model and Evaluate its Performance, it Provides Information on Data Disparity, Class Overlap, and Clustering. Examining the t-SNE Plots Before (a) and After Applying HOG–SMOTE (b) Reveals Important Information About the Success of Data-Balancing Techniques and Ensures that the Minority Classes Have Been Adequately Enhanced Without Altering the Boundaries Between Classes. Note. t-SNE = t-distributed Stochastic Neighbor Embedding; HOG = Histogram of Oriented Gradient; SMOTE = Synthetic Minority Over-sampling Technique.

5.3. Performance Metrics

The effectiveness of the proposed method is evaluated through six key performance parameters: accuracy, which measures the overall correctly classified instances; sensitivity, indicating the true positive rate; specificity, reflecting the true negative rate; precision, assessing the proportion of true positive results among all positive predictions; $F 1$ -score, providing a harmonic mean of precision and sensitivity; and the area under the receiver operating characteristic curve (AUC), which summarizes the model’s ability to distinguish between classes across various threshold settings. To gain a deeper understanding of the classifier’s performance, a receiver operating characteristic (ROC) curve is also plotted, illustrating the trade-off between sensitivity and specificity at different thresholds (Ali & Yadav, 2021).

The mathematical representation of the performance metrics mentioned above is provided below

\begin{aligned} C l a s s i f i c a t i o n A c c u r a c y & = \frac{T P + T N}{T N + T P + F N + F P} \end{aligned}

(9)

\begin{aligned} S e n s i t i v i t y o r R e c a l l & = \frac{T P}{T P + F N} \end{aligned}

(10)

\begin{aligned} S p e c i f i c i t y & = \frac{T N}{T N + F P} \end{aligned}

(11)

\begin{aligned} P r e c i s i o n & = \frac{T P}{T P + F P} \end{aligned}

(12)

\begin{aligned} F 1 - S c o r e & = \frac{2 \times S e n s i t i v i t y \times P r e c i s i o n}{S e n s i t i v i t y + P r e c i s i o n} \end{aligned}

(13)

\begin{aligned} A U C & = \frac{S e n s i t i v i t y + S p e c i f i c i t y}{2} \end{aligned}

(14)

5.4. System Configuration

The HOG–SMOTE kernel-based extreme learning machine (HS-KELM) is implemented on the MATLAB R2023b platform. The software was installed on personal workstation with the configuration: 13th Gen Intel Core i7-13700F CPU (2.10 GHz), 16 GB RAM, 1 TB SSD and NVIDIA GeForce RTX 3060 12 GB GPU, which is utilized for performing all the experiments of the proposed model.

6. Results and Discussion

Before applying HOG–SMOTE for data balancing, experiments are initially conducted on the unbalanced datasets outlined in Table 1. The dataset comprises images that are a combination of RGB and grayscale, with varying dimensions. To standardize the images, all CXR images are converted to grayscale, resized to $128 \times 128$ pixels. The selection of this size is primarily due to memory constraints encountered when increasing the image size. Therefore, to mitigate computational complexity and processing duration, a resolution of $128 \times 128$ is employed. An ablation study was conducted to assess the impact of incorporating a data-balancing technique on the overall model performance. Experiments were conducted using various data split ratios (60–40, 70–30, 80–20, 90–10), with each split being randomized. The results presented are the averages of 10 trials for each split ratio. Table 2 presents the classification outcomes obtained with and without any balancing techniques. The results demonstrate that training the model without balancing leads to biased learning, where the classifier predominantly favors majority-class samples and exhibits reduced sensitivity toward minority classes. This imbalance causes lower recall, poorer class separability, and unstable predictions. In contrast, by applying SMOTE in the HOG-transformed feature space, the likelihood of generating valid and class-consistent synthetic samples increases significantly. The local linearity assumption of SMOTE, which requires that interpolated samples fall within the true class distribution, holds more reliably in the HOG space due to its denoising and structure-preserving properties. Consequently, even in cases where the minority class has a complex or sparse distribution in the original image space, the corresponding HOG feature space is smoother and more regular, making synthetic oversampling more effective and stable. Consequently, the balanced setup yields higher accuracy, improved recall for minority classes, and more consistent performance across all categories. These findings confirm that the balancing strategy plays a crucial role in enhancing robustness and ensuring fair classification in imbalanced datasets.

Table 2.
Classification Results: M-1: Without Any Balancing, and M-2: Model Based on HOG–SMOTE-Balancing Technique.

Method Dataset Split Ratio (%) Error (%) Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) $F 1$ -Score (%) AUC (%)

M-1 Dataset-1 60–40 9.3036 90.695 90.386 87.693 92.387 88.85 90.04

70–30 8.4187 91.581 91.427 88.626 93.076 89.857 90.85

80–20 8.363 91.637 91.996 88.538 92.965 90.062 90.75

90–10 7.8019 92.198 91.701 89.485 93.767 90.502 91.63

Dataset-2 60–40 6.765 93.235 93.995 89.918 94.469 91.820 92.19

70–30 6.321 93.679 94.222 90.246 94.877 92.104 92.56

80–20 6.242 93.758 94.607 90.515 94.818 92.404 92.66

90–10 5.858 94.142 95.139 92.005 95.112 93.449 93.56

Dataset-3 60–40 16.026 83.97 83.134 83.052 92.211 82.944 87.63

70–30 12.642 87.35 86.67 86.581 93.826 86.506 90.203

80–20 10.359 89.64 89.025 89.01 94.941 88.988 91.97

90–10 10.01 89.99 89.387 89.37 95.111 89.365 92.24

Dataset-4 60–40 9.946 90.054 89.379 80.75 94.502 82.973 87.626

70–30 8.609 91.39 91.41 83.033 95.211 85.441 89.122

80–20 7.676 92.324 92.828 84.774 95.668 87.277 88.801

90–10 5.83 94.162 94.053 88.476 96.735 90.56 92.605

M-2 Dataset-1 60–40 3.857 96.143 96.199 94.699 96.938 95.43 95.82

70–30 3.565 96.435 96.531 95.11 97.123 95.801 96.12

80–20 3.304 96.697 96.895 95.421 97.328 96.138 96.37

90–10 2.812 97.188 97.27 96.07 97.691 96.648 96.88

Dataset-2 60–40 3.668 96.332 96.498 95.218 97.176 95.844 96.19

70–30 3.606 96.394 96.534 95.675 97.323 96.096 96.49

80–20 3.512 96.488 96.273 95.818 97.478 96.041 96.65

90–10 3.328 96.672 96.893 95.887 97.426 96.369 96.66

Dataset-3 60–40 3.55 96.45 96.27 96.22 98.27 96.23 97.245

70–30 3.14 96.86 96.69 96.66 98.47 96.67 97.565

80–20 2.86 97.135 96.978 96.95 98.603 96.962 97.78

90–10 3.12 96.88 96.71 96.68 98.48 96.69 97.58

Dataset-4 60–40 1.635 98.365 97.921 97.276 99.1702 97.589 98.223

70–30 1.72 98.3 97.98 97.068 99.09 97.508 98.079

80–20 1.43 98.57 98.33 97.62 99.25 97.97 98.435

90–10 1.27 98.72 98.44 97.98 99.34 98.2 98.66

Method	Dataset	Split Ratio (%)	Error (%)	Accuracy (%)	Precision (%)	Sensitivity (%)	Specificity (%)	$F 1$ -Score (%)	AUC (%)
M-1	Dataset-1	60–40	9.3036	90.695	90.386	87.693	92.387	88.85	90.04
		70–30	8.4187	91.581	91.427	88.626	93.076	89.857	90.85
		80–20	8.363	91.637	91.996	88.538	92.965	90.062	90.75
		90–10	7.8019	92.198	91.701	89.485	93.767	90.502	91.63
	Dataset-2	60–40	6.765	93.235	93.995	89.918	94.469	91.820	92.19
		70–30	6.321	93.679	94.222	90.246	94.877	92.104	92.56
		80–20	6.242	93.758	94.607	90.515	94.818	92.404	92.66
		90–10	5.858	94.142	95.139	92.005	95.112	93.449	93.56
	Dataset-3	60–40	16.026	83.97	83.134	83.052	92.211	82.944	87.63
		70–30	12.642	87.35	86.67	86.581	93.826	86.506	90.203
		80–20	10.359	89.64	89.025	89.01	94.941	88.988	91.97
		90–10	10.01	89.99	89.387	89.37	95.111	89.365	92.24
	Dataset-4	60–40	9.946	90.054	89.379	80.75	94.502	82.973	87.626
		70–30	8.609	91.39	91.41	83.033	95.211	85.441	89.122
		80–20	7.676	92.324	92.828	84.774	95.668	87.277	88.801
		90–10	5.83	94.162	94.053	88.476	96.735	90.56	92.605
M-2	Dataset-1	60–40	3.857	96.143	96.199	94.699	96.938	95.43	95.82
		70–30	3.565	96.435	96.531	95.11	97.123	95.801	96.12
		80–20	3.304	96.697	96.895	95.421	97.328	96.138	96.37
		90–10	2.812	97.188	97.27	96.07	97.691	96.648	96.88
	Dataset-2	60–40	3.668	96.332	96.498	95.218	97.176	95.844	96.19
		70–30	3.606	96.394	96.534	95.675	97.323	96.096	96.49
		80–20	3.512	96.488	96.273	95.818	97.478	96.041	96.65
		90–10	3.328	96.672	96.893	95.887	97.426	96.369	96.66
	Dataset-3	60–40	3.55	96.45	96.27	96.22	98.27	96.23	97.245
		70–30	3.14	96.86	96.69	96.66	98.47	96.67	97.565
		80–20	2.86	97.135	96.978	96.95	98.603	96.962	97.78
		90–10	3.12	96.88	96.71	96.68	98.48	96.69	97.58
	Dataset-4	60–40	1.635	98.365	97.921	97.276	99.1702	97.589	98.223
		70–30	1.72	98.3	97.98	97.068	99.09	97.508	98.079
		80–20	1.43	98.57	98.33	97.62	99.25	97.97	98.435
		90–10	1.27	98.72	98.44	97.98	99.34	98.2	98.66

Note. AUC = area under the receiver operating characteristic curve.

In the second experiment, a 10-fold cross-validation was conducted on Datasets 1–4 using a HOG–SMOTE-based balancing module. The results from each fold, along with the average outcomes, are presented in Table 3. Additionally, Table 4 specifies the confidence intervals for the accuracy attained in the classification of each Dataset 1–4. This module not only ensures the dataset is balanced but also addresses any disparities and overlaps among classes. Following the balancing and normalization of the dataset, it is employed to train a classification model based on the KELM. Effective separation of class boundaries is achieved through the application of the Radial Basis Function, which is recognized as the most appropriate kernel function for the ELM model. The results of this study indicate that KELMs demonstrate superior effectiveness and suitability for multiclass classification tasks compared to conventional machine learning classifiers, particularly concerning the specified COVID dataset. In this configuration, the KELM-based classifier model, when utilized in conjunction with the HOG–SMOTE-balancing algorithm, demonstrated superior performance, achieving an accuracy of 96.72%, 96.38%, 97.15%, and 98.66% on Dataset-1, Dataset-2, Dataset-3, and Dataset-4, respectively, surpassing all other models. The performance evaluation plots for the three-class classification model, including the ROC curve, are displayed in Figure 4.

Figure 4.

Graphical Representation of Experimental Outcomes: (a) Comparison of Accuracy Metric Provided by KELM for Each Dataset: (Left Bar) Without Data Balancing, (Right Bar) With Balancing and (b) ROC of Proposed Model Based on KELM. Note. KELM = Kernel-based Extreme Learning Machine; ROC = receiver operating characteristic.

Table 3.

Foldwise Multiclass Classification Performance on Various Datasets.

Dataset	No. of Folds	Accuracy (%)	Precision (%)	Sensitivity (%)	Specificity (%)	$F 1$ -Score (%)	AUC (%)
Dataset-1	1	0.96634	0.96219	0.95067	0.973290	0.95629	0.96198
	2	0.973615	0.970444	0.966473	0.979853	0.96842	0.97316
	3	0.96306	0.965071	0.953754	0.970936	0.959311	0.96234
	4	0.966358	0.968046	0.95325	0.97243	0.96044	0.96284
	5	0.96897	0.96664	0.95919	0.97698	0.96287	0.96809
	6	0.96765	0.97108	0.95260	0.97219	0.961479	0.9624
	7	0.97096	0.97535	0.95898	0.97694	0.96696	0.9679
	8	0.96832	0.96794	0.95152	0.97402	0.95951	0.96277
	9	0.96237	0.96087	0.95111	0.97065	0.95589	0.96088
	10	0.96435	0.96638	0.95255	0.97075	0.95924	0.96167
	AVG	0.9672	0.9674	0.95501	0.9738	0.96104	0.96441
Dataset-2	1	0.96112	0.9595	0.9469	0.97257	0.95298	0.95976
	2	0.95186	0.9578	0.93069	0.96085	0.94363	0.9458
	3	0.96273	0.95464	0.95576	0.97606	0.95509	0.9659
	4	0.96112	0.96549	0.96188	0.970835	0.96365	0.96636
	5	0.96423	0.96852	0.95218	0.97062	0.96002	0.9614
	6	0.96112	0.96664	0.96055	0.96998	0.9635	0.96526
	7	0.97045	0.97184	0.96062	0.9768	0.96611	0.9687
	8	0.9736	0.9775	0.961	0.97975	0.969	0.9704
	9	0.9658	0.9616	0.9494	0.97483	0.9554	0.96212
	10	0.9658	0.97315	0.95934	0.97203	0.96603	0.96568
	AVG	0.96378	0.9657	0.95384	0.97243	0.95954	0.96313
Dataset-3	1	0.9684	0.9668	0.9665	0.9845	0.9667	0.9755
	2	0.9728	0.97113	0.97106	0.98679	0.97107	0.97892
	3	0.9693	0.96759	0.96708	0.98503	0.96724	0.97605
	4	0.97805	0.97663	0.97694	0.98937	0.97677	0.98316
	5	0.97103	0.96991	0.96909	0.98582	0.96933	0.97746
	6	0.96576	0.96407	0.96344	0.98335	0.96356	0.9734
	7	0.96839	0.96681	0.96612	0.98459	0.96633	0.97536
	8	0.97629	0.97487	0.97484	0.98846	0.97485	0.98165
	9	0.96664	0.96467	0.96453	0.98373	0.96459	0.97414
	10	0.97805	0.97699	0.97657	0.98923	0.97677	0.9829
	AVG	0.97146	0.96995	0.96962	0.98609	0.96972	0.97785
Dataset-4	1	0.99459	0.99569	0.9909	0.99689	0.99329	0.99394
	2	0.97948	0.9766	0.96587	0.98914	0.97101	0.9775
	3	0.98811	0.99043	0.9761	0.99326	0.98286	0.98468
	4	0.98811	0.98555	0.98069	0.99376	0.98307	0.98723
	5	0.98703	0.98279	0.97824	0.99349	0.98047	0.98587
	6	0.9827	0.97596	0.97164	0.99147	0.97375	0.98155
	7	0.98378	0.97822	0.97103	0.99197	0.97449	0.9815
	8	0.98595	0.98507	0.97774	0.99257	0.98126	0.98516
	9	0.98919	0.98774	0.98007	0.99435	0.98378	0.98721
	10	0.98703	0.98609	0.98004	0.99308	0.98298	0.98656
	AVG	0.9866	0.98441	0.97724	0.9929	0.98069	0.98512

Note. AUC = area under the receiver operating characteristic curve.

Table 4.

Confidence Interval Based on $K$ -Fold Validation.

Dataset	Mean Accuracy	SD	95% CI
Dataset-1	0.9672	0.0035	[0.9647, 0.9697]
Dataset-2	0.9638	0.0059	[0.9596, 0.9680]
Dataset-3	0.9715	0.0046	[0.9682, 0.9748]
Dataset-4	0.9866	0.0041	[0.9837, 0.9895]

CI = confidence interval.

Table 5 presents a comparative analysis of contemporary DL architectures employed for the classification of COVID-19 via CXR scans. The following categories of models are examined: CNNs, transfer learning-based models, and hybrid/ensemble models. As demonstrated in Table 5, our study surpasses the performance of most existing state-of-the-art models. Additionally, the variation in datasets employed in our research enhances the robustness of the proposed model, thereby better illustrating its effectiveness compared to alternative approaches. The outcome obtained for Dataset-1 is highly comparable to the CNN model based on VGG as proposed by Al-Shourbaji et al. (2022). Compared to the CNN model, the proposed model achieves exceptional performance without requiring high-end GPU hardware. Additionally, the processing time is significantly reduced, making it a more cost-effective solution.

Table 5.

Comparative Analysis of the Proposed Model with Recently Proposed Models.

Author	Methodology	Dataset with Class Distribution	Accuracy (%)
Lakshmi et al. (2024)	Bayesian optimization SVM kernel and CNN	COVID-19: 576, pneumonia: 4,273, normal: 1,583	96.2%
Jie et al. (2024)	Pyramid Graph Neural Network (GNN)	D1: Total: 188 (COVID-19 and normal) D2: Total: 5,911 (normal, bacterial, viral pneumonia, and COVID-19) D3: (Chowdhury et al., 2020; Rahman et al., 2021)	D1: 100%, D2: 88.5%, D3: 95%, combined: 76.4%
Zhang et al. (2022)	Deep ensemble dynamic learning network	D1: COVID-19: 1,200, viral pneumonia: 1,345, normal: 1,341, D2: COVID-19: 137, viral pneumonia: 90, normal: 90. Both dataset combined together	97.94%
Liu et al. (2024)	Fusion of parallel deformable multilayer perceptrons (MLPs) and Bi-directional Long Short-Term Memory (Bi-LSTM) modules	COVID-19: 183, pneumonia: 1,294, normal: 1,341	98%
Amin et al. (2024)	Ensemble-based machine learning model, HOG and LBP as feature descriptors	COVID-19: 150, pneumonia: 150, normal: 150	98%
Naz et al. (2024)	Federated averaging and federated learning using ResNet50 for training local and global models	COVID-19: 3616, lung opacity: 6,012, normal: 10,192	95%
Al-Shourbaji et al. (2022)	CNN model based on VGG	COVID-19: 3,616, pneumonia: 1,345, normal: 10,192	96.84%
Alshahrni et al. (2023)	Two-step AS clustering with ensemble bootstrap aggregating training and multiple neural network (NN)	D1: COVID-19: 342, normal: 2,800, D2: COVID-19: 342, viral pneumonia: 1,495, D3: COVID-19: 342, bacterial pneumonia: 2,773	98.062%
Bhattacharyya et al. (2022)	CGAN: Segmentation, VGG-19 model for classification	COVID-19: 342, pneumonia: 347, normal: 341	96.6%
Nayak et al. (2022)	Lightweight CNN model called LW-CORONet	D1: COVID-19: 750, pneumonia: 750, normal: 750; D2: COVID-19: 2,358, pneumonia: 5,575, normal: 8,066	D1: 98.67%, D2: 95.67%
Shankar et al. (2022)	Salp Swarm algorithm (SSA) with ANN	SARS: 220, normal: 27, Streptococcus: 17	95.65%
Proposed approach	HS-KELM model	D1: COVID-19: 3,616, pneumonia: 1,345, normal: 10,192; D2: COVID-19: 576, pneumonia: 4,273, normal: 1,583; D3: COVID-19: 3,616, pneumonia: 4,273, normal: 3,500; D4: COVID-19: 3,616, pneumonia: 4,273, TB: 1,362	D1: 96.72%, D2: 96.38%, D3: 97.15%, D4: 98.66%

Note. SVM = support vector machine; CNN = Convolutional Neural Network; HOG = Histogram of Oriented Gradient; LBP = Local Binary Pattern; ANN = Artificial Neural Network; HS-KELM = HOG–SMOTE Kernel-based extreme learning machine; SMOTE = Synthetic Minority Over-sampling Technique.

7. Conclusion

In conclusion, the proposed medical image classification model, which integrates an effective imbalance handling technique HOG–SMOTE with the efficient KELM, presents a remarkable advancement over current state-of-the-art DL models. The combination of these two approaches enables the model to address the inherent challenges of class imbalance in medical datasets, while also delivering superior classification performance in terms of accuracy, sensitivity, and specificity. What distinguishes this model is its exceptional computational efficiency. Unlike DL algorithms that typically demand substantial computational resources, including high-end GPUs and extensive training times, our approach significantly reduces both the hardware requirements and processing time. As medical image datasets continue to grow in complexity and volume, the need for efficient and scalable solutions becomes increasingly critical. The proposed method addresses this challenge by offering a high-performance classification system that is both cost-effective and scalable, without compromising on diagnostic accuracy. In summary, the proposed imbalanced technique HS-KELM-based medical image classification model offers a powerful, resource-efficient alternative to traditional DL approaches, providing both superior performance and a significant reduction in computational burden, making it a promising candidate for widespread adoption in medical diagnostics. Future work could further explore optimizations and adapt the model to more diverse and complex real-world datasets, reinforcing its potential to set new standards in classification tasks.

Footnotes

ORCID iDs

Nikhat Ali

Virendra P Vishwakarma

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Albataineh

Aldrweesh

Alzubaidi

M. A.

(2024). COVID-19 CT-images diagnosis and severity assessment using machine learning algorithm. Cluster Computing, 27(1), 547–562.

Ali

Yadav

(2021). Computer-aided detection and diagnosis of lung nodules using CT scan images: An analytical review. In Proceedings of second doctoral symposium on computational intelligence: DoSCI 2021 (pp. 545–558). Springer.

Ali

Yadav

(2023). Coronavirus disease identification using multi-subband feature analysis in DWT domain. Procedia Computer Science, 218, 574–584.

Alshahrni

M. M.

Ahmad

M. A.

Abdullah

Omer

Aziz

(2023). An intelligent deep convolutional network based COVID-19 detection from chest X-rays. Alexandria Engineering Journal, 64, 399–417.

Al-Shourbaji

Kachare

P. H.

Abualigah

Abdelhag

M. E.

Elnaim

Anter

A. M.

Gandomi

A. H.

(2022). A deep batch normalized convolution approach for improving COVID-19 detection from chest X-ray images. Pathogens, 12(1), 17.

Amin

S. U.

Taj

Hussain

Seo

(2024). An automated chest X-ray analysis for COVID-19, tuberculosis, and pneumonia employing ensemble learning approach. Biomedical Signal Processing and Control, 87, 105408.

Bhattacharyya

Bhaik

Kumar

Thakur

Sharma

Pachori

R. B.

(2022). A deep learning based approach for automatic detection of COVID-19 cases using chest X-ray images. Biomedical Signal Processing and Control, 71, 103182.

Calderon-Ramirez

Yang

Moemeni

Elizondo

Colreavy-Donnelly

Chavarria-Estrada

L. F.

Molina-Cabello

M. A.

(2021). Correcting data imbalance for semi-supervised COVID-19 detection using X-ray chest images. Applied Soft Computing, 111, 107692.

Carter

L. J.

Garner

L. V.

Smoot

J. W.

Zhou

Saveson

C. J.

Sasso

J. M.

Gregg

A. C.

Soares

D. J.

Beskid

T. R.

(2020). Assay techniques and test development for COVID-19 diagnosis.

10.

Chamseddine

Mansouri

Soui

Abed

(2022). Handling class imbalance in COVID-19 chest X-ray images classification: Using SMOTE and weighted loss. Applied Soft Computing, 129, 109588.

11.

Chatterjee

Maity

Bhattacharjee

Banerjee

Das

A. K.

Ding

(2023). Variational autoencoder based imbalanced COVID-19 detection using chest X-ray images. New Generation Computing, 41(1), 25–60.

12.

Chawla

N. V.

Bowyer

K. W.

Hall

L. O.

Kegelmeyer

W. P.

(2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.

13.

Chowdhury

M. E. H.

Rahman

Khandakar

Mazhar

Kadir

M. A.

Mahbub

Z. B.

Islam

K. R.

Khan

M. S.

Iqbal

Emadi

N. A.

Reaz

M. B. I.

Islam

M. T.

(2020). Can AI help in screening viral and COVID-19 pneumonia? IEEE Access, 8, 132665–132676.

14.

Chung

Bernheim

Mei

Zhang

Huang

Zeng

Cui

Yang

Fayad

Z. A.

(2020). CT imaging features of 2019 novel coronavirus (2019-NCOV). Radiology, 295(1), 202–207.

15.

Cohen

J. P.

Morrison

Dao

Roth

Duong

T. Q.

Ghassemi

(2020). COVID-19 image data collection: Prospective predictions are the future. arXiv 2006.11988. https://github.com/ieee8023/covid-chestxray-dataset

16.

Dalal

Triggs

(2005). Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) (Vol. 1, pp. 886–893). IEEE.

17.

Elaraby

M. E.

Ewees

A. A.

Anter

A. M.

(2024). A robust IoT-based cloud model for COVID-19 prediction using advanced machine learning technique. Biomedical Signal Processing and Control, 87, 105542.

18.

Fan

Gong

(2023). An improved COVID-19 classification model on chest radiography by dual-ended multiple attention learning. IEEE Journal of Biomedical and Health Informatics, 28(1), 145–156.

19.

Hossain

M. M.

Walid

M. A. A.

Galib

S. S.

Azad

M. M.

Rahman

Shafi

Rahman

M. M.

(2024). COVID-19 detection from chest CT images using optimized deep features and ensemble classification. Systems and Soft Computing, 6, 200077.

20.

Huang

G.-B.

Zhou

Ding

Zhang

(2011). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(2), 513–529.

21.

Huang

G.-B.

Zhu

Q.-Y.

Siew

C.-K.

(2004). Extreme learning machine: A new learning scheme of feedforward neural networks. In 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541) (Vol. 2, pp. 985–990). IEEE.

22.

Huang

G.-B.

Zhu

Q.-Y.

Siew

C.-K.

(2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501.

23.

Javidi

Abbaasi

Naybandi Atashi

Jampour

(2021). COVID-19 early detection for imbalanced or low number of data using a regularized cost-sensitive CapsNet. Scientific Reports, 11(1), 18478.

24.

Jie

Jiming

Ying

Yanchun

Haodong

(2024). A pyramid GNN model for CXR-based COVID-19 classification. The Journal of Supercomputing, 80(4), 5490–5508.

25.

Kermany

D. S.

Goldbaum

Cai

Valentim

C. C. S.

Liang

Baxter

S. L.

McKeown

Yang

Yan

Dong

Prasadha

M. K.

Pei

Ting

M. Y. L.

Zhu

Hewett

Dong

Ziyar

Zhang

(2018). Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5), 1122–1131.e9.

26.

Khalif

K. M. N. K.

Chaw Seng

Gegov

Bakar

A. S. A.

Shahrul

N. A.

(2024). Integrated generative adversarial networks and deep convolutional neural networks for image data classification: A case study for COVID-19. Information, 15(1), 58.

27.

Lakshmi

Das

Manohar

(2024). A new COVID-19 classification approach based on Bayesian optimization SVM kernel using chest X-ray datasets. Evolving Systems, 15(4), 1521–1540.

28.

M. T.

Lee

M. Y.

Baik

S. W.

(2019). A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity, 2019(1), 8460934.

29.

Liu

Xing

Lin

Liu

Chow

T. W.

(2024). A new classification method for diagnosing COVID-19 pneumonia via joint parallel deformable MLP modules and Bi-LSTM with multi-source generated data of CXR images. IEEE Transactions on Consumer Electronics, 70(1), 2794–2805.

30.

Loey

Manogaran

Khalifa

N. E. M.

(2025). A deep transfer learning model with classical data augmentation and CGAN to detect COVID-19 from chest CT radiography digital images. Neural Computing and Applications, 37(35), 29099–29111.

31.

Maleki

Hojati

(2022). A precise review on NAATs-based diagnostic assays for COVID-19: A motion in fast POC molecular tests. European Journal of Clinical Investigation, 52(11), e13853.

32.

Mezina

Burget

(2024). Detection of post-COVID-19-related pulmonary diseases in X-ray images using vision transformer-based neural network. Biomedical Signal Processing and Control, 87, 105380.

33.

Minaee

Kafieh

Sonka

Yazdani

Soufi

G. J.

(2020). Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning. Medical Image Analysis, 65, 101794.

34.

Moore

E. H.

(1920). On the reciprocal of the general algebraic matrix. Bulletin of the American Mathematical Society, 26, 394–395.

35.

Nayak

S. R.

Nayak

D. R.

Sinha

Arora

Pachori

R. B.

(2022). An efficient deep learning method for detection of COVID-19 infection using chest X-ray images. Diagnostics, 13(1), 131.

36.

Naz

Phan

Chen

Y. P. P.

(2024). Centralized and federated learning for COVID-19 detection with chest X-ray images: Implementations and analysis. IEEE Transactions on Emerging Topics in Computational Intelligence, 8(4), 2987–3000.

37.

Ozturk

Talo

Yildirim

E. A.

Baloglu

U. B.

Yildirim

Acharya

U. R.

(2020). Automated detection of COVID-19 cases using deep neural networks with X-ray images. Computers in Biology and Medicine, 121, 103792.

38.

Penrose

(1955). A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society, 51, 406–413.

39.

Rahman

Khandakar

Qiblawey

Tahir

Kiranyaz

Abul Kashem

S. B.

Islam

M. T.

Al Maadeed

Zughaier

S. M.

Khan

M. S.

Chowdhury

M. E.

(2021). Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Computers in Biology and Medicine, 132, 104319.

40.

Schaudt

Von Schwerin

Hafner

Riedel

Reichert

Von Schwerin

Beer

Kloth

(2023). Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset. Scientific Reports, 13(1), 18299.

41.

Shankar

Perumal

Tiwari

Shorfuzzaman

Gupta

(2022). Deep learning and evolutionary intelligence with fusion-based feature extraction for detection of COVID-19 from chest X-ray images. Multimedia Systems, 28(4), 1175–1187.

42.

Shazia

Xuan

T. Z.

Chuah

J. H.

Usman

Qian

Lai

K. W.

(2021). A comparative study of multiple neural network for detection of COVID-19 on chest X-ray. EURASIP Journal on Advances in Signal Processing, 2021, 1–16.

43.

Tan

Meng

Liu

(2024). Self-supervised learning with self-distillation on COVID-19 medical image classification. Computer Methods and Programs in Biomedicine, 243, 107876.

44.

van der Maaten

Hinton

(2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

45.

Zhang

Chen

Luo

(2022). A deep ensemble dynamic learning network for corona virus disease 2019 diagnosis. IEEE Transactions on Neural Networks and Learning Systems, 35(3), 3912–3926.

Enhanced COVID-19 Classification Using SMOTE-KELM: A Comprehensive Review and Proposed Framework

Abstract

Keywords

1. Introduction

2. Related Work

3. Proposed Methodology

4.1. HOG Feature Extraction

5.1. Overview of Dataset

6. Results and Discussion

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

References