Abstract
Background
The emergence of monkeypox as a global health concern highlights the need for innovative detection methods that improve upon polymerase chain reaction, which is costly, time-consuming, and poses risks of contagion to healthcare personnel.
Purpose
This study proposed a lightweight deep learning framework to enhance monkeypox lesion detection using skin image data.
Methods
Data augmentation and a novel edge enhancement algorithm are applied, employing contrast-limited adaptive histogram equalization and bilateral filters to refine skin images. The framework is tested across six pretrained deep learning models and one novel hybrid deep model, DenseNet121 + ConvNeXt-Tiny (DN-CXT). Performance is evaluated using accuracy, F1-score, and precision, with optimization through Adam, root mean square propagation, and stochastic gradient descent.
Results
The proposed DN-CXT model achieved the highest performance, with a test accuracy of 97%, F1-score of 97%, and precision of 99%. Applied techniques such as DenseNet121, MobileNetV2, InceptionV3, and ConvNeXt-Tiny also showed exceptional results.
Conclusions
The proposed framework significantly advances medical image detection for monkeypox lesions.
Implications
These findings support the integration of artificial intelligence-driven methodologies into monkeypox detection workflows, potentially improving diagnostic efficiency, reducing risks to medical personnel, and enhancing healthcare response to emerging infectious diseases.
Highlights
A novel, lightweight deep learning (DL) framework is developed for monkeypox lesion detection. The hybrid DenseNet121 + ConvNeXt-Tiny (DN-CXT) model achieved 97% accuracy, 97% F1-score, and 99% precision. Image enhancement using contrast limited adaptive histogram equalization (CLAHE) and bilateral filters improved model generalization. Artificial intelligence (AI)-driven detection methods can strengthen healthcare efficiency and reduce risks to medical staff.
Introduction
As monkeypox emerged in 2022, it left the world under the threat of being sickened by this infectious disease worldwide. The disease is caused by an Orthopoxvirus, 1 a member of the Poxviridae family. This viral disease is similar to other dangerous diseases, such as smallpox and cowpox. The disease was first discovered in monkey species and, in the late 19th century, appeared in humans as well. The spread of disease occurs through surface contact with an infectious person or animal, or with any infected or contaminated substance. 2 A disease since the late 19’s, cases of Monkeypox have increased from around 50 cases in 1990 to over thousands in 2017 in Nigeria, 3 and is now becoming an alarming disease in many countries. Despite advances, the medical field still has no such medical treatment present to cure the disease completely to date. The only treatment is prevention, which includes detection and taking precautions to isolate the infected individual from COVID-19. Recent surveys and studies have shown that there is still no 100% cure for the life-threatening disease, although the Centers for Disease Control has approved Brincidofovir and Tecovirimat, 4 initially developed for smallpox, to reduce symptoms and make the virus less infectious. These vaccines are approved for smallpox treatment and have shown some efficacy, but are not a cure for monkeypox.
The diagnosis of monkeypox involves a costly, time-consuming method called polymerase chain reaction (PCR), which yields results through a multi-day procedure and requires a properly equipped laboratory with the machinery to perform PCR on samples from affected individuals and their skin lesions. Regarding skin data, a study has been conducted over the past few years, and many remarkable advancements and innovations in the Artificial Intelligence (AI) field have emerged. 5 A study 6 has been conducted to develop a robust model with efficient results, given the smaller amount. To enhance this, image augmentation is applied to increase data dimensionality, along with visual methods such as gradient-weighted class activation mapping (Grad-CAM) and local interpretable model-agnostic explanations (LIME). Another proposed method has utilized the same augmentation with principal component analysis 7 to enhance the accuracy of data in the sensitive detection area, using models such as InceptionV3, ResNet50, VGG-16, SqueezeNet, and support vector machines (SVMs). A recent study 8 has performed statistical feature extraction using a row-and-column grid. After applying the Relief algorithm to the obtained features, the features were classified using the k-nearest neighbors algorithm. In this form, a lightweight method that does not employ DL has been presented. The monkeypox skin lesion data is utilized in both its raw and augmented forms. The obtained results are comparable to or better than those of DL-based methods. Our proposed study presents a solution to detect the disease with a single blink and at an extremely low cost compared to PCR.
There are other models and studies including hybrid models, 9 which not only improved classification efficiency but also diversify the research paradigm to improve the methods and make more efficient model by taking theirs as a base by increased challenges in early detection, there is an increasing interest in AI-based learning methods that shows strong position in healthcare for tasks such as cancer detection, skin disease classification, and COVID-19 screening. 10 Prior work has used convolutional neural networks (CNNs), hybrid models, preprocessing, and hyperparameter tuning to improve model efficiency. However, as a gap in these studies is the unavailability of an efficient amount of data related to monkeypox, the present datasets are insufficient due to this augmentation, which is applied to increase dimensionality of the dataset, but as a sensitive source of the problem, which is medical treatment and diagnosis of the life-threatening disease monkeypox. We require additional advanced fine-tuning and preprocessing methods to maximize the data’s adequacy and support key model detection.
Our proposed study has applied a novel, advanced edge enhancement method to an augmented skin image dataset comprising 3,192 images split into monkeypox and non-monkeypox classes, increasing dimensionality and enabling models to detect and train more efficiently. CNNs and transfer learning to achieve significant accuracies in image-based diagnosis. By omitting advanced fine-tuning methods, only augmented data can result in low-complexity CNNs. Pretrained models, such as MobileNet and VGGNet, have been used to detect skin conditions with accuracies ranging from 71% to 90%.7,11 This sensitive field requires finer methods to improve the quality of the augmented dataset. Our proposed study has employed advanced fine-tuning, along with edge enhancement (CLAHE and bilateral), while preserving the originality of the data, which is why two edge enhancement methods are used. And then the data is trained with six individual models, and a hybrid model is trained with attention and softmax.
Study contributions
Our main research focus on monkeypox detection is aligned as follows:
A novel preprocessing method, edge enhancement approach (CLAHE and bilateral) is applied on augmented data to make data more prominent for feeding to models. The newly transformed preprocessed dataset is then used to train six individual and one novel hybrid model. Seven advanced DL techniques are trained using three different optimizers: root mean square propagation (RMSprop), stochastic gradient descent (SGD), and Adam. A novel hybrid deep model, DN-CXT, is trained by fine-tuning the epochs and applying an attention mechanism with Softmax and Adam optimizers to enhance their effectiveness.
The remainder of the research is structured as follows: Section “Literature review” highlights the analysis of the literature review. Section “Proposed methodology” outlines our research design. Section “Results” assesses the outcomes of applied methods. Section “Discussion” incorporates the main findings of this research.
Literature review
The literature on monkeypox detection indicates an increasing number of studies focused on developing reliable and effective methods to identify and optimize the efficiency and utilization of machine-based medical tools, thereby minimizing the cost and timeliness of detecting this harmful disease. Studies have shown that traditional approaches require more fine-tuned, efficient methods and have served as a basis for research innovation in model evaluation and improvement across all fields, especially in medicine. The literature analysis includes state-of-the-art methods and their evaluation metrics, as shown in Table 1.
Summary of the literature analysis.
AI: artificial intelligence; Grad-CAM: gradient-weighted class activation mapping; LIME: local interpretable model-agnostic explanations; SGD: stochastic gradient descent; RMSprop: root mean square propagation; GPU: graphics processing unit; MLSD: monkeypox lesion detection dataset; CNN: convolutional neural network; DL: deep learning; ANOVA: analysis of variance.
A representative of machine learning (ML), deep learning (DL)-based convolutional neural network (CNN) models has been used previously and remains in demand, as shown in, 11 which demonstrates how the models have been trained using advanced DL techniques to identify monkeypox infections. The study has presented a working model of deep neural networks that can analyze medical images and clinical data to improve diagnostic accuracy, offering a promising approach to enhance disease surveillance and response efforts. Decision trees are widely used in many commercial and industrial fields. Previous study 6 highlights the application of SVM across multiple models, including ResNet50, VGG-16, SqueezeNet, and InceptionV3, to improve the diagnosis of monkeypox using AI. Since monkeypox symptoms can resemble those of other diseases, including chickenpox, accurate and precise detection is crucial. The authors trained a CNN, a DL model, on a dataset of photos of skin lesions from monkeypox patients and other similar diseases. Standard performance metrics, including accuracy, recall, and precision, were used to evaluate the model, which demonstrated high effectiveness in correctly identifying monkeypox cases. Overall, the research reveals the potential of deep neural networks to help healthcare professionals manage and control monkeypox outbreaks more efficiently.
Another study 7 has not yet reached the heights of accuracy, but is quite good. Over the past two years, the monkeypox virus has emerged as a significant epidemic threat following COVID-19, with reported cases now exceeding 40 countries outside Africa. When confirmatory PCR testing is unavailable, image analysis methods become critically important for detection. In this study, an open-source Kaggle Monkeypox Image dataset was utilized. To augment the dataset size, a data duplication approach was first implemented on the images. The study compared the performance of EfficientB7, ResNet50, DenseNet121, Xception, and EfficientNetB3, all pretrained DL models, in detecting monkeypox. Their effectiveness was evaluated using metrics including F1 score, accuracy, precision, and a confusion matrix. The accuracy rates of these models are as follows: 90%, 75%, 72%, 73%, and 82%, respectively. These results demonstrate that DL-based image analysis can serve as a useful aid to physicians in rapid screening efforts.
Researchers 12 use multiple optimizers to determine whether fine-tuning with different optimizers yields different results. At the same time, this study has less accurate results with RMSprop, but our study gave good results with the same optimizer, as the dataset is edge-enhanced. In this form, a lightweight method that does not employ DL has been presented. A previous study 13 demonstrates the classification of monkeypox at its initial stages using DL models, achieving 90% accuracy with MobileNetV2. Another study 15 used eight singular models and preprocessing involved multiple steps, yet still achieved 90% accuracy individually. Another study 16 has been successfully detected by DL methods. In their work, classification was performed using the previously trained CNNs MobileNetV2, VGG16, and VGG19 on the monkeypox lesion image dataset, an open-source dataset released in 2022, and comparative evaluation metrics based on accuracy were used. MobileNetV2 achieves 91.38% accuracy, 88.25% F1 score, 86.75% recall, and 90.5% precision. The VGG16 and VGG19 models show 83.62% and 78.45% accuracy, respectively.
Another study 17 has trained the model with 95% training accuracy on the same dataset. The study highlights a key advancement using the same dataset, achieving superior performance metrics, including accuracy, F1-score, precision, recall, and area under the curve (AUC), compared with conventional pretrained models. This study’s primary innovation is MpoxSLDNet’s ability to achieve high detection accuracy while using notably less storage space than other models. MpoxSLDNet provides a viable solution for early identification and classification of monkeypox lesions in healthcare settings with limited resources, addressing the high storage requirements. We used the monkeypox lesion detection dataset (MLSD) in this work, which included 1764 skin photos of non-monkeypox lesions and 1428 skin images of monkeypox lesions. The model’s capacity to generalize to new situations may be impacted by the dataset’s restrictions. In contrast, the validation accuracy of the MpoxSLDNet model was 94.56%, compared to 84.38%, 86.25%, and 67.19% for DenseNet121, VGG16, and ResNet50, respectively.
A previous study 18 achieved a maximum accuracy of 93% on the monkeypox (MLSD) augmented dataset without any preprocessing. Other models showed slightly lower accuracy in this study. Another study 19 distinguished Mpox from chickenpox, measles, and healthy patients using SqueezeNet, residual networks, and transfer learning. The Mpox class achieved an F1-score of 92.55%. Researchers 21 has also used several benchmarks VGG16, ResNet50, DenseNet121, MobileNetV2, EfficientNetB3, InceptionV3, and Xception models to achieve 83.5% overall accuracy. Additionally, a previous study 22 uses the vision transformer technique, along with several distinguished DL-based CNNs trained via transfer learning, and presents a comparative analysis of the results. The study achieved 78.16% F1 score, 81.91% accuracy, 74.12% recall, and 87.16% precision.
Researchers 23 also used 13 distinct pretrained DL models to identify monkeypox virus. For all of them, fine-tuning is first performed using universal custom layers, and the results are analyzed in terms of accuracy, precision, recall, and F1-score. The metrics are precision 85.44%, recall 85.47%, accuracy 87.13%, and F1-score 85.40%. The study, as proposed inAlharbi et al., 24 aims to identify skin lesions as signs of monkeypox during a pandemic by leveraging meta-heuristic optimization to enhance the effectiveness of feature selection and classification. The required features are extracted using DL and transfer learning techniques. The DL framework for feature extraction is GoogLeNet. Additionally, feature selection is performed using a binary implementation of the dipper-throated optimization algorithm, achieving 87% accuracy after implementing the proposed feature extraction method. The MLSD 23 was used as the dataset, in both its raw and augmented forms. The results are not mentioned clearly.
A previous study 25 introduced an ensemble DL framework for automated detection of monkeypox skin lesions using an amalgamation of CNN models, InceptionV3, Xception, and DenseNet169, enhanced through a beta function-based normalization scheme. The approach aggregates probabilistic outputs from multiple fine-tuned CNNs to improve classification accuracy and robustness. Evaluated on a publicly available monkeypox dataset with five-fold cross-validation, the model achieved strong results: 93.39% accuracy, 88.91% precision, 96.78% recall, and 92.35% F1-score, demonstrating the potential of ensemble-based architectures for reliable monkeypox diagnosis.
Research gap
Based on the thorough literature review mentioned above, we have determined the following research gaps:
Previously, classical ML and DL methods were used for monkeypox detection. There is a data gap, including a lack of diversity in the current data on the relevant issue, which needs to be addressed. To overcome this, the data is augmented, which makes it less reliable and more prone to overfitting for models. To make the data more reliable, several researchers may need to fine-tune and preprocess it to build a robust, high-performing detection system on which people and the field of medicine can rely. Single models have proven effective in predictive analytics, but the integration of multiple models into a hybrid framework offers enhanced performance. Recent research indicates a shift toward hybrid models, achieving accuracy rates close to 99%. This compelling evidence supports our decision to pursue a hybrid approach in our study to attain superior predictive outcomes.
Proposed methodology
This section focuses on our proposed approach, illustrated in Figure 1. To carry out this research study, we propose using a benchmark dataset. Preprocessing is applied to the original dataset to eliminate noise and encode the data. The technique applied is edge enhancement on augmented data. This innovative edge enhancement preprocessing approach has not yet been applied to obtain and verify improved results for enhancing the performance of monkeypox detection models. The transformed dataset is then split into two portions for training (70%) and validation (30%), with 15% for validation and 15% for testing. We used unseen data to compare the performance of our polished ML techniques. Furthermore, each method’s efficacy with hyper-parameterization is validated by the optimizer’s performance metric table. The outperformed hybrid ML model is then utilized for monkeypox detection.

The architecture of our research methodology.
Monkeypox lesion data collection
In this research, we have used a popular dataset, the MLSD, 26 to conduct the research. The MLSD provides comprehensive, up-to-date data for both monkeypox and non-monkeypox classes. The dataset contains 1,428 monkeypox skin images and 1,764 other skin images. Figure 2 contains the sample images.

The monkeypox lesion and other skin image analysis.
Image preprocessing and enhancement
To remove noise, preprocessing is applied to the dataset. The edge enhancement process is done in two parts:
This sequential approach ensures that the dataset benefits from improved contrast and clarity through CLAHE, while the bilateral filter further refines the image by suppressing unwanted noise and artifacts. Together, these steps prepare the data with enhanced edges and cleaner features, making it more suitable for reliable model evaluation and feature extraction. Figures 3 and 4 show the clarity of the application’s edge enhancement and the enhanced image.

The edge enhancement on data.

The edge enhancement steps on data.
Applied learning techniques
The techniques applied have become useful tools for analyzing monkeypox lesion identification.6,7 These methods leverage DL, ML, and sophisticated algorithms to analyze vast amounts of data. These AI-based systems can accurately detect monkeypox and non-monkeypox classes, identify classes, and efficiently detect lesions in images using supervised and unsupervised learning techniques.
DenseNet121
DenseNet is a CNN
27
that encourages feature reuse and alleviates vanishing gradient problems by using a dense connection, in which each layer receives inputs from all previous layers. Formally, the output of it’s
EfficientNetB0
EfficientNetB0 is another powerful CNN-based model that uses compound scaling to simultaneously scale depth, width, and resolution, thereby maximizing performance and efficiency. It’s the mobile inverted bottleneck convolution (MBConv), a fundamental building block of EfficientNetB0 that uses extensive separable convolutions to reduce computation. The main operations within an MBConv block can be summarized as: Pointwise convolution (PWConv) expands the input channels. Depthwise convolution (DWConv) performs spatial filtering. Squeeze-and-excitation (SE) adaptively recalibrates channel-wise feature responses. DropConnect is used for regularization.
InceptionV3
A CNN architecture28,29 widely recognized for its potent performance is InceptionV3. It builds on the Inceptions V1 and V2 architectures, addressing their limitations and improving efficiency. The core of InceptionV3 is the inception module. These modules enable the network to learn features at multiple scales and resolutions by combining pooling layers and convolutional filters of different sizes (
In essence, InceptionV3’s design emphasizes parallel processing, multi-scale feature extraction, and efficient use of computational resources to achieve strong performance and speed. It is a significant advancement in CNN design for image recognition.
MobileNetV2
A lightweight CNN architecture called MobileNetV2 has been developed for efficient deployment on embedded and mobile devices. Developed by Google, an improved version of the original MobileNet architecture focuses on reducing computational expense without sacrificing precision.
In contrast to standard residual blocks, MobileNetV2 employs inverted residuals, in which the intermediate layer is enlarged, and the input and output are narrow bottleneck layers. This design enables the capture of more features with fewer parameters. Similar to MobileNetV1, depthwise separable convolutions are used to reduce the number of parameters and computations.
Using depthwise separable convolutions, linear bottlenecks, and inverted residuals, MobileNetV2 is a neural network architecture suited for embedded and mobile applications that delivers strong performance at low computational cost.
ResNet50
Residual network (ResNet) is a deep CNN architecture introduced in 2015 to address the vanishing gradient problem in extremely deep networks. The core idea is the use of a residual learning method that skips connections and allows a direct flow of gradients across the network, enabling the training of substantially deeper models. A residual block performs a transformation formula on input
ResNet101
ResNet101 contains a 101-layer architecture. The residual blocks contain 33 bottleneck blocks, with more layers in the deeper stages. ResNet-101 has approximately 33% more parameters than ResNet-50 due to additional residual blocks, leading to increased capacity but also higher computational cost. Formulas for floating point operations (FLOPs), for a convolutional layer:
ConvNeXt-Tiny
ConvNeXt is a modernized CNN architecture introduced by Facebook AI Research (FAIR) in 2022. The primary goal is to draw inspiration from vision transformers while maintaining the efficiency of CNNs. ConvNeXt-Tiny is the smallest version in the ConvNeXt family (Tiny, Small, Base, Large, XLarge), making it lightweight for practical use. ConvNeXt-Tiny processes an image in four stages (such as ResNet but modernized):
Stem layer: Patchify image using Conv2D. Stage 1: Large-kernel depthwise convolutions + LayerNorm. Stage 2: Downsampling + ConvNeXt blocks. Stage 3: Downsampling + ConvNeXt blocks. Stage 4: Downsampling + ConvNeXt blocks. Global average pooling + fully connected classification layer.
DN-CXT: Novel hybrid model
Our proposed novel hybrid model, DN-CXT, is fine-tuned using hyperparameters. Both models yielded individually good results and are trained with high accuracy. Table 2 contains the detailed parameters of our hybrid model.
Hybrid model parameters details.
The hybrid feature vector is then obtained by concatenating the two feature representations:
Subsequently, dropout regularization and dense transformations are applied:
Finally, the softmax layer produces the class probabilities:
The model is trained using the categorical cross-entropy loss:
The architecture of our hybrid model, including the configuration details regarding the proposed model, is shown in Figure 5. The model follows a dual-branch architecture, with Branch 1 using DenseNet121 and Branch 2 using ConvNeXt-Tiny as a feature extractor. The input image, with dimensions (160, 160, 3), is fed into both branches simultaneously. In Branch 1, DenseNet121 (initialized without the top layer) extracts hierarchical features, followed by a GAP layer that reduces the output to a 1024-dimensional vector. In parallel, Branch 2 applies ConvNeXt-Tiny (also excluding its top layer) to generate deep representations, which are subsequently pooled into a 768-dimensional vector. The outputs from both branches are concatenated to form a combined feature vector of size 1792. This fused representation is then passed through a series of fully connected layers with regularization: a 128-unit layer with ReLU activation follows a dropout layer (rate = 0.5), and another dropout layer (rate = 0.3) follows. Finally, a dense softmax layer including two units for classification. The output layer categorizes the input into two classes: monkeypox or other.

Architectural diagram of model DenseNet121 + ConvNeXt-Tiny (DN-CXT).
This dual-branch architecture leverages the complementary strengths of DenseNet121 and ConvNeXt-Tiny, enabling robust feature extraction and improving classification performance through ensemble-like feature fusion.
Hyperparameter optimization
Hyperparameter optimization seeks to determine the optimal configuration of hyperparameters for models deployed in a monkeypox detection system and during model training, thereby maximizing their performance and minimizing false positives and negatives. By systematically tuning hyperparameters, we can develop more accurate and robust models. The following is a detailed list of optimizers used to train models individually, ensuring timely responses to detect monkeypox. The optimizers of the applied methods are described in Table 3.
Optimizers’ hyperparameters used in the proposed study.
SGD: stochastic gradient descent; RMSprop: root mean square propagation.
Results
This section presents the findings of our study, including the results from experiments and evaluations. This part assesses the accuracy and performance metrics of the produced model in both hybrid and individual model analysis. The findings will highlight the advantages and disadvantages of the ML algorithms employed throughout the experiment.
Experimental environment
The experimental setup for our research on edge enhancement and individual model training is carried out on a local system, as shown in Table 4.
Experimental environment specifications for edge enhancement applications and individual model training.
The second part of our research includes building a hybrid model using the Google Colab notebook. 14 We have employed F1-score, accuracy, precision, and recall as metrics to assess the performance of the ML algorithms. The experimental setup parameters describe the characteristics of the environment used in the experiment. Table 5 presents the experimental environment for hybrid model training.
Experimental environment specification for hybrid model training.
Results with individual model training
A summary of the performance evaluation of the applied deep transfer learning methods, using the edge enhancement method features applied to the augmented dataset, is provided in the following tables for each model and optimizer. The assessment includes assigning scores for F1 metrics, recall, accuracy, and precision. Using the edge-enhanced feature dataset, the model achieved a high accuracy of 0.96. These results show that to improve the effectiveness of these methods for monkeypox detection, sophisticated feature engineering strategies are required. Table 6 presents a summary of all individual models’ scores with various optimizers.
Summary performance of applied DenseNet121, MobileNetV2, InceptionV3, ResNet101, ResNet50, EfficientNetB0, and ConvNeXt-Tiny models.
SGD: stochastic gradient descent; RMSprop: root mean square propagation.
The results show that ConvNeXt-Tiny, on its own, achieves a maximum accuracy of 96%, while the preceding DenseNet121 obtained higher overall accuracy when trained with both Adam and RMSprop optimizers, achieving an impressive F1-score of 95%, which reflects strong efficiency and reliability. Similarly, the evaluation metrics for MobileNetV2 demonstrate highly competitive performance, achieving a 95% F1 score, confirming its effectiveness. The InceptionV3 model also produced remarkable results, with RMSprop yielding 95% accuracy and an F1-score of 95%, indicating its robustness. On the other hand, the ResNet101 model achieved lower accuracy (75%) and an F1-score of 80%, which is still considered satisfactory. Additionally, the evaluation of ResNet50 yielded an F1-score of 78% and an accuracy of 73%, suggesting that the Adam and RMSprop optimizers improve performance. Finally, the EfficientNetB0 model is also evaluated, although its results are less effective than those of the top-performing architectures.
Results with hybrid model
The classification report of the proposed hybrid DN-CXT model is analyzed on the edge-enhanced dataset using both the training and validation sets, with accuracy and F1-scores as evaluation metrics. The Adam optimizer is used to train DN-CXT, and the corresponding performance metrics are shown in Table 7. The results indicate that the model achieved its best performance at epoch 12, at which point the final weights are saved in the checkpoints folder. At this stage, the model demonstrated the highest training and validation accuracy of 97%.
Summary performance table of DenseNet121 + ConvNeXt-Tiny (DN-CXT).
The hybrid model’s training and validation metrics indicate that its performance during training is consistent with its performance during testing, as illustrated in Figure 6.

During training, metrics of the model DenseNet121 + ConvNeXt-Tiny (DN-CXT).
The confusion matrix of the given fine-tuned model on test data is shown in Figure 7. The matrix analysis concludes that with optimizer Adam (

Confusion matrix of fine-tuned model DenseNet121 + ConvNeXt-Tiny (DN-CXT).
Figure 8 analysis presents the AUC–receiver operating characteristic (ROC) curve results, which demonstrate that true positives are detected at higher rates. This indicates that the hybrid model falls under the best category, as indicated by the AUC curve above 0.99.

AUC–ROC curve of model DN-CXT. AUC: area under the curve; ROC: receiver operating characteristic; DN-CXT: DenseNet121 + ConvNeXt-Tiny.
Comparative analysis of optimizers with all models trained
A barchart-based comparative analysis in Figure 9 provides a comprehensive basis for understanding optimizers’ reactions to models and for analyzing which model performs better with which optimizer.

Optimizers comparative analysis over the accuracies of each model-train.
Statistical validation and error analysis
To assess the generalization capability of the proposed hybrid model, a 5-fold cross-validation strategy is employed. Each fold is trained and evaluated independently on distinct subsets of the skin image data. As summarized in Table 8, the model achieved consistently high performance across all folds, with accuracy values ranging from 0.9687 to 0.9750 and F1-scores between 0.9682 and 0.9747. For error analysis, we have used confidence intervals and standard deviation as shown in Table 9. These results demonstrate that the hybrid architecture, combining DenseNet121’s strong feature propagation with ConvNeXt-Tiny’s efficient representation learning, generalizes well across monkeypox lesion detection.
Five-fold based statistical validation.
Error statistics with 95% confidence intervals across validations.
State-of-the-art comparison
The state-of-the-art performance comparison of our proposed study is described in Table 10. We compared it with a recently published article from the year 2023. The analysis demonstrates that our proposed research model outperformed state-of-the-art studies, achieving high accuracy. This analysis conclusively shows that our research study performs exceptionally well at monkeypox detection after edge enhancement. Results show that our proposed model, by performing advanced preprocessing and edge enhancement on our data, can achieve better results.
Comparison of our state-of-the-art study with the most recent related studies.
Ablation study analysis
The comparative analysis of the hybrid model trained without and with the edge-enhanced image method demonstrates the significant impact of preprocessing and hyperparameter optimization on model performance. As shown in Table 11, in the baseline scenario without edge enhancement, the proposed model achieved an overall test accuracy of 0.73, with a Macro F1-score of 0.71, and exhibited lower sensitivity for the monkeypox class (F1-score = 0.63). To improve generalization, when edge enhancement is applied, the model’s performance improved dramatically, achieving an overall accuracy of 0.97 and a Macro F1-score of 0.97, with substantial gains in precision and recall for both classes. These results indicate that combining preprocessing techniques with careful hyperparameter tuning can substantially enhance the predictive power and robustness of DL models for monkeypox detection.
Performance comparison without and with image enhancement.
Discussion
The proposed hybrid model demonstrates superior performance in monkeypox lesion detection, reflecting the effectiveness of the combined DenseNet121 and ConvNeXt-Tiny architectures. Through extensive experimentation, the model achieved a mean accuracy of 97.21% and a mean F1-score of 97.18% across five-fold cross-validation, indicating consistent and reliable performance.
The integration of edge enhancement preprocessing contributed significantly to this improvement by enhancing feature clarity and model generalization. The comparative results between the enhanced and nonenhanced datasets confirm that edge enhancement not only improves visual feature representation but also leads to more balanced and accurate classification.
The proposed method’s novelty lies in the fusion of hybrid feature extraction with preprocessing-driven optimization, which collectively improves detection accuracy while maintaining computational efficiency.
Furthermore, the inclusion of
Study limitations
Despite the strong performance of the proposed framework, certain limitations remain. The dataset size, although enhanced through augmentation and edge preprocessing, remains relatively small compared to large-scale medical imaging benchmarks, potentially limiting the model’s exposure to diverse lesion variations. Additionally, while the hybrid model achieved high accuracy on the current dataset, its generalization to entirely new datasets from different acquisition sources should be further validated.
Future work will focus on extending this study by incorporating domain adaptation, using larger, more diverse datasets, and exploring explainable AI techniques to improve the interpretation of classification decisions. Moreover, further hyperparameter optimization and lightweight architecture tuning could make the model more suitable for real-time clinical deployment in low-resource environments.
Conclusions
Given our study’s design, we chose to conduct a two-way analysis of the dataset, incorporating fine-tuning and multiple optimization analyses. This research proposed an advanced preprocessing technique that yielded improved results. Seven advanced ML models are compared for monkeypox detection individually in the first half of our study. In which DenseNet121 and ConvNeXt-Tiny achieve the maximum accuracy score of 95%. The second part leads to the development of the state-of-the-art DN-CXT, achieving a high test accuracy of 97%. The performance of the individual applied technique is validated using three optimizers, with hyperparameter tuning of a hybrid model employing the Adam optimizer.
Footnotes
Ethical approval
Not applicable.
Author contributions
Conceptualization: HA, SR, MMI, AR, FR, NLF, MS, and SWL. Methodology: HA, SR, MMI, AR, FR, NLF, MS, SWL. Software: HA, FR, AR and NLF. Validation: SR, AR, and MS. Formal analysis: HA, FR, AR, and NLF. Investigation: MMI and SR. Data curation: HA and AR. Writing—original draft: HA, FR, SR, AR, and NLF. Writing—review & editing: MMI, MS and SWL. Visualization: HA and NLF. Supervision: SR, MS, and SWL. Funding acquisition: MS and SWL. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by Sungkyunkwan University and BK21 FOUR (Graduate School Innovation), funded by the Ministry of Education, Korea. This research was also supported by the Ministry of Education and Ministry of Science & ICT, Republic of Korea (grant numbers: NRF [2021-R1-I1A2 (059735)], RS [2024-0040 (5650)], RS [2024-0044 (0881)], RS [2019-II19 (0421)], and RS [2025-2544 (3209)]).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Guarantor
SWL.
