Sage Journals: Discover world-class research

Abstract

Background

Skin lesion segmentation plays a critical role in computer-aided diagnosis systems, serving as a foundation for the early detection and treatment of skin cancer. Nonetheless, obtaining accurate segmentation remains difficult because of inconsistencies in lesion visual features, texture, image sharpness, and the presence of indistinct edges.

Objective

To develop and evaluate a novel deep neural network (DNN)-based approach for robust and accurate segmentation of skin lesions from dermoscopic images using advanced pre-processing and post-processing techniques.

Methods

The proposed method integrates a DNN architecture with specialized pre-processing and post-processing modules. The pre-processing step enhances image quality by denoising and normalizing the lesion intensities. The DNN framework extracts hierarchical features, while the post-processing module refines segmentation masks by correcting boundary irregularities and removing artifacts. The model was tested using three widely recognized dermoscopic International Skin Imaging Collaboration (ISIC) image databases from the years 2016, 2017, and 2018 without extensive data augmentation. Statistical analysis, including the Wilcoxon signed-rank test, was conducted to compare performance with existing methods.

Results

The proposed method achieved Jaccard index scores of $89.91 \pm 0.099$ (ISIC 2016), $84.51 \pm 0.135$ (ISIC 2017), and $87.39 \pm 0.139$ (ISIC 2018), and Dice coefficients of $94.30 \pm 0.076$ , $90.86 \pm 0.104$ , and $92.53 \pm 0.102$ , respectively. These results outperformed state-of-the-art methods such as U-shaped Convolutional Neural Network (U-Net), nested U-Net with dense skip connections (UNet++), and Swin-Unet in segmentation accuracy, consistency, and computational efficiency.

Conclusions

This study presents a high-performing, scalable solution for automated skin lesion segmentation. The proposed method effectively addresses critical challenges by integrating robust feature extraction and boundary refinement, making it well-suited for real-world clinical applications in skin cancer diagnosis and management.

Keywords

Skin lesion segmentation deep neural network (DNN) framework pre-processing and post-processing modules International Skin Imaging Collaboration (ISIC) benchmark datasets automated skin cancer diagnosis

Introduction

Skin diseases are a significant global health challenge, with melanoma, a type of skin cancer, recognized as among the most severe and dangerous variants. Based on data from the American Cancer Society¹ and the National Cancer Institute,² it was projected that approximately 1.9 million cancer cases would be identified in the United States, equating to roughly 5,250 diagnoses per day. Of these, skin cancer was estimated to account for around 108,000 new cases and nearly 12,000 deaths.³ On a global scale, each year sees an estimated 2 to 3 million occurrences of non-melanoma skin cancer, along with about 132,000 new diagnoses of melanoma.⁴

Skin cancers are broadly classified into two categories: melanoma and non-melanoma. Non-melanoma skin cancers, which include squamous cell carcinoma and basal cell carcinoma, are generally less aggressive and carry a lower mortality risk. Despite this, they represent a substantial healthcare burden due to their high prevalence. Melanoma, on the other hand, is notorious for its ability to metastasize rapidly, making early detection and treatment critical for patient survival.⁵

The rising incidence of skin cancers can be attributed to factors such as increased ultraviolet radiation exposure, aging populations, and heightened public awareness, leading to better detection rates.⁶ However, this trend also highlights the urgent need for innovative diagnostic tools, preventive strategies, and targeted therapies to combat the growing burden of skin cancer.⁷ Non-melanoma skin cancers, although less fatal compared to melanoma, often necessitate invasive treatment procedures that can be both painful and distressing⁸ for patients. By comparison, cutaneous melanoma represents an exceptionally aggressive as well as a malignant type of skin cancer, associated with elevated mortality rates due to its potential for rapid progression and metastasis. The early detection and timely intervention of skin cancer are critical for improving patient outcomes and survival rates.⁹ However, diagnostic practices that rely solely on visual inspection by dermatologists are prone to subjectivity, leading to inconsistencies in evaluations even among seasoned experts. This subjectivity underscores the pressing need for advanced diagnostic methods that can provide accurate and reproducible results.⁸

Automated segmentation techniques have emerged as a pivotal solution in addressing these diagnostic challenges. These methods enable precise and objective analysis of skin lesions, which is essential for enhancing the consistency and reliability of diagnostic processes. In computer-aided diagnosis (CAD) systems, precise segmentation of cutaneous abnormalities serves as an essential foundational step to enable efficient disease diagnosis and therapeutic planning. However, the segmentation of dermoscopic images presents unique challenges due to variations in lesion appearance, including diverse chromatic characteristics and interfering elements such as body hair, ruler marks, and ink stains. These complexities demand sophisticated algorithms capable of navigating the intricate nature of skin lesions and delivering accurate segmentation outcomes.¹⁰

Dermoscopy has become a cornerstone in dermatological diagnostics, offering a non-invasive method for examining skin lesions with enhanced clarity. By employing optical magnification and specialized lighting, dermoscopy allows for detailed visualization of subsurface structures, facilitating the identification and analysis of pigmented lesions. Despite its advantages, dermoscopic images often exhibit irregular and poorly defined lesion boundaries, making accurate delineation difficult.¹¹

Moreover, the subtle contrasts between lesions and surrounding healthy skin, coupled with the irregular shapes and color patterns of lesions, further complicate the segmentation process. Artifacts such as hair strands, blood vessels, and measurement markings introduce additional challenges, necessitating the development of robust segmentation methods. To overcome these obstacles, recent advancements in artificial intelligence (AI) and deep learning have been leveraged to create innovative segmentation algorithms. These techniques offer promising capabilities for addressing the diverse challenges associated with dermoscopic image segmentation. By integrating these advanced methods into CAD systems, healthcare practitioners can achieve more accurate and reliable diagnoses, ultimately improving the efficacy of skin cancer detection and treatment strategies. Conventional image segmentation techniques often depend on manually designed features, which can struggle to perform effectively when applied to complex dermoscopic images. These features require specialized domain knowledge and often fail to adapt to the diverse variations in lesion appearances. Our proposed methodology addresses these limitations by combining traditional enhancement methods with advanced automated approaches. The pre-processing module improves image quality by reducing noise and managing contrast variability through morphological operations and Wiener filtering, ensuring that essential features are emphasized for segmentation. The post-processing module uses a deep neural network (DNN)-based approach supported by Otsu’s thresholding for accurate binarization. This integrated approach combines the strengths of manual enhancements and automated learning, providing robust segmentation results suitable for clinical application, where reliability and precision are paramount.

U-shaped Convolutional Neural Network (U-Net) is a widely adopted architecture for medical image segmentation, particularly recognized for its encoder–decoder design with skip connections that capture intricate details effectively. Over time, several variants such as Attention U-Net, Recurrent Residual U-Net, and others have been introduced to address specific segmentation tasks. While these models have shown promising results, their reliance on large numbers of trainable parameters often leads to redundancy and inefficiency.¹²

Advanced techniques, including DAGAN with dual adversarial discriminators, attention-scale aggregation network (AS-Net) with spatial and channel attention, and feature adaptive transformer network (FAT-Net) utilizing transformer-based encoders, have enhanced segmentation accuracy (Acc) and contextual understanding. However, these approaches face challenges with computational demands and limited generalization, especially when applied to complex datasets such as International Skin Imaging Collaboration (ISIC) 2017. To address these limitations, we propose a hybrid methodology that integrates traditional pre-processing techniques with modern neural network advancements. The pre-processing module improves image quality by reducing noise and enhancing contrast through morphological operations and Wiener filtering, making the images more suitable for segmentation. The post-processing module employs a DNN-based Otsu’s thresholding, which automates the binarization process for lesion extraction.

Our methodology effectively balances parameter complexity and segmentation performance, avoiding issues such as overfitting. This balance enables the model to achieve robust and accurate results. The comparative analysis of the Jaccard metric alongside precision against the total count of adjustable model weights underscores the advantages of our approach.

Figure 1 illustrates the tradeoff between segmentation performance and model complexity by comparing the Jaccard index (JI) against the number of trainable parameters for various state-of-the-art (SOTA) models. The proposed method achieves the highest Jaccard score while maintaining a significantly lower parameter count. This highlights the efficiency of the hybrid framework in achieving precise segmentation without the overhead of over-parameterized architectures.

Figure 1.

Jaccard index versus number of parameters: This plot compares the segmentation accuracy (Jaccard index, %) of various models with respect to the number of trainable parameters (in millions). The proposed model achieves top performance with significantly fewer parameters, demonstrating superior efficiency.

Unlike models such as DAGAN and FAT-Net, which require tens of millions of parameters to achieve moderate performance, the proposed design leverages classical image enhancement steps alongside a compact DNN to deliver competitive results. The lower parameter burden not only improves inference speed but also reduces memory consumption, making the framework more adaptable for real-time or resource-limited clinical applications.

These observations underscore the method’s balance between performance and computational efficiency, reinforcing its value for practical deployment.

The contributions of the proposed method are as follows:

The hybrid approach integrates traditional pre-processing with DNN-based post-processing. In the pre-processing stage, noise is minimized, and contrast is enhanced through morphological operations and Wiener filtering. The post-processing stage uses DNN-based coherence filtering to refine lesion boundaries and applies Otsu’s thresholding for accurate binary segmentation.

The method achieves a balance between parameter complexity and segmentation performance, avoiding redundancy and overfitting. Comparative evaluations using the JI show that the proposed method outperforms SOTA models, such as DAGAN and FAT-Net, achieving better Acc with fewer trainable parameters.

Precision evaluations across various models demonstrate the method’s consistent ability to accurately segment lesions, even with a reduced parameter count. This efficiency ensures its suitability for real-world scenarios with limited computational resources.

The proposed methodology exhibits strong generalization across multiple datasets, including ISIC 2016, ISIC 2017, ISIC 2018, and Pedro Hispano Hospital dataset (PH2). Notably, the model achieves high performance without relying on data augmentation, highlighting its robustness and adaptability for diverse clinical applications.

The structure of the paper is as follows: Section “Related work” provides a detailed review of related work, highlighting significant advancements and existing approaches in dermoscopic image segmentation. Section “Proposed method” describes the proposed methodology, presenting the hybrid framework, including the pre-processing module and DNN-based coherence filtering. Section “Skin lesion segmentation algorithm” elaborates on the experimental setup, detailing the datasets, evaluation metrics, and implementation specifics. Section “Dataset description and performance evaluation” presents the experimental results, showcasing the performance of the proposed method on multiple publicly available datasets. Section “Results and analysis” discusses the findings, including an analysis of segmentation performance, processing speed, along with limitations associated with the proposed methodology. Finally, section “Ablation study” concludes the study, summarizing the key contributions and outcomes while offering insights for future research directions.

Related work

In the past, traditional methods for segmenting skin lesions relied on designing handcrafted features to identify distinct patterns in images. These features were developed to separate skin lesions from surrounding tissues, often using histogram thresholding algorithms to establish intensity-based thresholds. Although such approaches provided foundational capabilities by recognizing intensity variations, they were heavily reliant on domain expertise and struggled to generalize across datasets with varying lesion appearances.^13,14

Recent advancements in neural networks have significantly enhanced the efficiency and robustness of segmentation tasks. Convolutional neural networks (CNNs), for instance, have demonstrated the ability to autonomously extract meaningful features from datasets, thus improving segmentation performance for skin lesions.¹⁵ Unlike traditional approaches, CNN-based methods eliminate dependency on manual feature crafting, making them more adaptable and precise.^16,17

Maji et al.¹⁸ proposed a generator architecture tailored to improve feature learning in decoding layers through leveraging multiple loss functions. This approach produces feature maps with higher semantic value and precision. Furthermore, the integration of attention gates allows selective processing of critical lesion regions, further improving segmentation Acc.

To improve skip connections, researchers have introduced various enhancements to refine feature representation. Spatial enhancement modules within skip connections enable networks to capture and utilize spatial details more effectively, improving segmentation Acc.¹⁹ Attention gates have been applied to address semantic ambiguities between encoder and decoder layers. For example, the attention U-Net selectively highlights important encoder features, providing refined guidance during decoding.²⁰

BCDU-Net enhances segmentation by combining U-Net with BConvLSTM and dense convolutions in skip connections, enabling nonlinear fusion of feature maps and capturing temporal dependencies. However, these additions lead to increased computational complexity, higher memory demands, and a greater risk of overfitting. Cross-scale parallel fusion network (CPFNet) addresses integration challenges by introducing a GPG module embedded within skip pathways to incorporate high-level contextual information along with a hierarchical-aware fusion component for multi-scale feature fusion. Nevertheless, CPFNet is limited by its heavy parameter requirements, which increase computational and storage demands.²¹

In Hafhouf et al.,²² the authors presented modifications to the U-Net architecture, applying enhancements to both encoding and decoding pathways. The encoding process integrated 10 convolutional layers from VGG16, a dilated convolutional block, and pyramid pooling, preserving spatial resolution and improving feature reliability. The decoding pathway was enhanced with dilated residual blocks, which improved the extraction of intricate features and generated more precise segmentation maps.

Recent studies have introduced advanced attention mechanisms and architectural innovations to improve skin lesion segmentation. For example, a self-attention mechanism was employed within the encoder–decoder framework to enhance contextual understanding and segmentation Acc.²³ Similarly, RA-Net applied region-aware attention to better focus on lesion-relevant areas during segmentation.¹⁶ The attention-based dilated residual network (AD-Net) model, also proposed by Naveed et al.,²⁴ integrates dilated convolutional residual blocks with an attention-based spatial feature enhancement block and a guided decoder strategy. This architecture enables robust feature extraction and improves segmentation performance across multiple skin lesion datasets, even without relying on data augmentation. In another approach, contextual feature fusion network (CFF-Net) combined global and local feature representations through a dual-branch encoder that integrates CNN and MLP modules.²⁵ SUNetDCP focused on efficient feature fusion and model compression to reduce parameter load while maintaining performance.¹⁵ Rolling matrix multi-scale local pattern (RMMLP) utilized adaptive matrix decomposition and rolling tensors to effectively integrate multi-scale contextual features for improved segmentation outcomes.²⁶

To address generalization limitations, recent studies have turned to transformer-based architectures, such as TransUNet V2 and MedNeXt,^27,28 which integrate convolutional backbones with attention mechanisms to capture long-range dependencies. An enhanced multi-scale attention network for skin lesion segmentation was proposed in Wang et al.,¹⁷ while a multi-domain transformer segmentation model was introduced by Selvakumar and Sundararaj.²⁹ These models improve contextual understanding and adaptability across datasets. Vignesh and Rajalakshmi³⁰ designed a deep attention transformer network optimized for lesion complexity, and Fatima et al.³¹ proposed a hybrid classification framework combining self-attention with CNNs.

Additionally, Alhudhaif et al.¹⁴ developed a multipath fusion network with a specialized fusion loss that showed promising results on high-resolution lesion data, demonstrating strong boundary preservation and clarity.

While AD-Net²⁴ introduces an attention-guided architecture based on dilated residual blocks and an attention-based spatial feature enhancement module, our approach takes a fundamentally different route. Instead of incorporating deep attention mechanisms or guided decoding, we present a lightweight framework that combines classical image enhancement techniques—specifically morphological operations and Wiener filtering—with a streamlined DNN for segmentation. Furthermore, our post-processing stage employs Otsu thresholding and morphological reconstruction to refine lesion boundaries. This hybrid methodology reduces the need for large parameter sets while maintaining competitive segmentation Acc. Unlike AD-Net, which emphasizes architectural complexity to enhance learning, our method prioritizes computational efficiency, making it well-suited for real-world scenarios where resources may be constrained.

Despite these improvements, challenges remain, including high computational costs, memory-intensive architectures, and reliance on large annotated datasets. Over-parameterized models often face redundancy and generalization issues, especially in diverse clinical scenarios with limited training data.

To address these limitations, we propose a hybrid methodology that combines traditional image enhancement techniques with advanced neural network-based strategies. The pre-processing module employs morphological operations and Wiener filtering to reduce noise and improve contrast, ensuring well-prepared input images. The post-processing module integrates DNN-based coherence filtering to refine lesion boundaries and applies Otsu’s thresholding for automated binary segmentation. Our approach effectively balances parameter complexity and segmentation Acc, overcoming issues of overfitting and computational inefficiency found in past methods. By addressing challenges such as over-parameterization and reliance on large datasets, the proposed method achieves high segmentation performance while maintaining efficiency, making it suitable for real-world clinical applications.

Proposed method

The proposed methodology for skin lesion segmentation introduces a structured approach aimed at improving the precision and dependability of CAD systems. It comprises two main stages: pre-processing and post-processing, as shown in Figure 2. In the pre-processing module, the input image is subjected to morphological operations that emphasize structural features such as lesion edges and boundaries while simultaneously minimizing noise. Following this, Wiener filtering is applied to reduce high-frequency noise and refine the image quality, preserving critical details required for effective segmentation.

Figure 2.

Proposed workflow for skin lesion segmentation using pre-processing and post-processing modules.

The post-processing module begins with the processed image being passed through a DNN. This DNN is specifically designed to address the complexities of skin lesion segmentation, including irregular lesion shapes, variations in color and texture, and indistinct boundaries. Once the segmentation is performed, a double-thresholding method is utilized to further refine the results. This technique eliminates weak edges and artifacts, ensuring that the segmented regions of the lesion are distinct and well-defined. By combining these stages, the methodology produces an output image with enhanced segmentation Acc, effectively overcoming challenges such as noise, artifacts, and irregular lesion characteristics. This approach is a significant step toward improving the performance of CAD systems in detecting and managing skin cancer.

Pre-processing

Pre-processing plays a vital role in the analysis of skin lesion images by improving image quality and preparing it for accurate segmentation and classification. Images of skin lesions often include noise, uneven textures, and artifacts such as hair, shadows, or ruler markings, which can obscure critical features and reduce diagnostic Acc. The goal of pre-processing is to address these issues by enhancing important image features and eliminating unnecessary elements. Techniques such as morphological operations are used to highlight lesion edges and structural details, while filters such as Wiener filtering help reduce noise and preserve key image features. By refining the input image and standardizing its quality, pre-processing creates a reliable foundation for subsequent stages of analysis, enabling more precise segmentation and diagnosis. This step is essential in overcoming the variability and artifacts commonly found in skin lesion images, making it a cornerstone of automated diagnostic systems.

Enhancement of skin images: Morphological operations

Skin lesion images frequently display background intensity variations due to uneven illumination, which can obscure essential features and hinder accurate analysis. Correcting these variations is critical to clearly differentiate lesions from the surrounding skin tissue. The primary goal of this process is to eliminate background inconsistencies and enhance the visibility of lesion features for reliable analysis.

The proposed approach employs morphological operations to preprocess the green channel of the image, as this channel typically holds the most significant lesion-related details. These operations include top-hat and bottom-hat transformations, which are used to reduce background intensity variations and suppress noise. The top-hat transformation enhances brighter regions in the image, while the bottom-hat transformation highlights darker areas, resulting in improved contrast and better-defined features. These steps enhance the overall quality of the image, making lesion boundaries more prominent and facilitating accurate segmentation.

After the application of morphological operations, Wiener filtering is utilized to further refine the image by reducing noise and preserving critical details. The specific methodology for implementing Wiener filtering is discussed in the following section. The mathematical expressions for the top-hat and bottom-hat transformations are as follows:

T_{w} (f) = f - f \circ b

(1)where

\circ

represents the morphological opening operation.

T_{b} (f) = f ∙ b - f

(2)

where $∙$ is the closing operation.

This pre-processing framework, combining morphological operations and noise reduction techniques, ensures a high-quality image with enhanced lesion features, serving as a solid foundation for subsequent segmentation and analysis.

Adaptive Wiener filtering for noise reduction in skin images

Dermoscopy images often suffer from high-frequency noise, uneven illumination, and low contrast between the lesion and surrounding skin. These issues degrade segmentation performance by obscuring critical boundary and texture information. To address this, the pre-processing pipeline in this study integrates adaptive Wiener filtering with intensity normalization to enhance image quality prior to segmentation.

Adaptive Wiener filtering is applied first to suppress noise while preserving structural details. This filter operates on a local neighborhood around each pixel and adjusts its behavior based on local image statistics. For a pixel located at $(x, y)$ , the filtered output $g (x, y)$ is computed using the expression:

g (x, y) = m (x, y) + α (x, y) \cdot [f (x, y) - m (x, y)]

(3)

Here, $f (x, y)$ denotes the original pixel intensity at location $(x, y)$ , $m (x, y)$ is the local mean computed over a $5 \times 5$ window centered at $(x, y)$ , and $α (x, y)$ is the adaptive Wiener filtering coefficient, defined as:

α (x, y) = \frac{σ_{f}^{2} (x, y) - σ_{n}^{2}}{σ_{f}^{2} (x, y)}

(4)

Here, $σ_{f}^{2} (x, y)$ represents the local intensity variance within the same window, and $σ_{n}^{2}$ is the estimated global noise variance, calculated from background regions of the image. This formulation ensures that in flat regions where noise dominates and variance is low, the filter relies more on the local mean, yielding strong smoothing. Conversely, in edge-rich or high-contrast areas, the filter output remains close to the original pixel value, preserving boundaries and texture. The window size was fixed to $5 \times 5$ for all images in the dataset, balancing noise suppression with detail retention.

Following denoising, intensity normalization is performed to address inter-image brightness variability and to enhance lesion-to-background contrast. Each grayscale image is normalized using z-score standardization, where the intensity at each pixel $(x, y)$ is transformed as:

I_{norm} (x, y) = \frac{I (x, y) - μ_{I}}{σ_{I}}

(5)with

μ_{I}

and

σ_{I}

denoting the mean and standard deviation of the image’s pixel intensities, respectively. This process standardizes all inputs to zero mean and unit variance, which aids in stabilizing the segmentation network’s learning process.

To further improve local contrast in images with low dynamic range, contrast-limited adaptive histogram equalization (CLAHE) is optionally applied. CLAHE enhances image contrast by dividing the input into small contextual regions (tiles) and applying histogram equalization within each tile. The tiles are then combined using bilinear interpolation to avoid boundary artifacts. In our implementation, CLAHE was configured with a tile grid size of $8 \times 8$ , a clip limit of 2.0 to control contrast amplification, and 256 histogram bins. CLAHE is applied only to images where the global intensity variance falls below a threshold, indicating insufficient contrast for reliable segmentation.

In the evaluation phase, global histogram equalization was considered as a candidate for contrast enhancement. However, its application led to undesirable effects, particularly in images affected by uneven lighting or limited intensity variation. The method often exaggerated background brightness and introduced unnatural transitions, resulting in visible degradation of lesion boundaries. These distortions were more pronounced in regions with smooth texture, where global adjustments lacked the contextual sensitivity (Sn) needed for fine detail preservation. In contrast, the CLAHE technique offered localized enhancement tailored to individual image regions. By limiting amplification in uniform areas and adapting to local intensity distributions, CLAHE maintained boundary integrity and produced more visually consistent outputs across diverse image conditions. As a result, CLAHE was selected as the preferred method in the normalization pipeline.

The combined application of adaptive Wiener filtering and intensity normalization resulted in pre-processed images with significantly reduced noise, improved contrast uniformity, and enhanced lesion visibility. These enhancements contribute to more reliable feature extraction and delineation in the subsequent segmentation stage. By addressing variations in illumination, suppressing artifacts, and improving boundary clarity, the pre-processing module plays a critical role in ensuring the Acc and consistency of the overall segmentation framework.

Post-processing module and DNN for skin image segmentation

The post-processing module is an essential stage in the skin image segmentation workflow, designed to refine the results generated during pre-processing and segmentation. It ensures precise lesion delineation by addressing challenges such as noise, artifacts, and irregular or blurry boundaries. A DNN forms the core of the segmentation framework, enabling the extraction of hierarchical features and the generation of accurate lesion masks. This DNN output, represented as a soft probability map, is subsequently refined through classical post-processing operations, including Otsu thresholding, morphological reconstruction, and noise filtering.

Figure 3 provides an overview of the complete segmentation pipeline, where the pre-processing module enhances the dermoscopic input image, the DNN extracts pixel-level features to predict lesion probability maps, and the post-processing module converts these maps into binary segmentation masks using adaptive thresholding and shape refinement.

Figure 3.

Overview of the proposed segmentation pipeline. The input image undergoes pre-processing (morphological filtering + Wiener denoising), followed by segmentation through the proposed deep neural network (DNN) model. Post-processing steps such as Otsu thresholding, morphological reconstruction, and noise removal refine the probability map into the final lesion mask.

Role of DNN in skin image segmentation

DNNs are powerful tools for lesion segmentation, as they can learn both global context and fine-grained texture from dermoscopic images. The proposed DNN architecture (as visualized in the center of Figure 3) is designed to handle varying lesion appearances, sizes, and textures using multiple convolutional layers with nonlinear activations and pooling operations.

The network function can be expressed as:

y (x) = f (W, b, x)

(6)

where $x$ is the input image, $W$ and $b$ are the network weights and biases, and $f$ represents the stacked operations (convolution, activation, etc.).

To optimize segmentation Acc, the network is trained by minimizing a pixel-wise mean squared error (MSE) loss:

L = \frac{1}{N} \sum_{i = 1}^{N} {‖ y_{i} - {\hat{y}}_{i} ‖}^{2}

(7)

where $N$ is the number of training samples, $y_{i}$ is the ground truth, and ${\hat{y}}_{i}$ is the predicted output.

After inference, the soft probability map undergoes Otsu thresholding to adaptively separate foreground (lesion) and background. Morphological reconstruction connects weak edge responses and removes inner holes, while noise removal eliminates small, isolated blobs, yielding a clean and clinically meaningful segmentation mask.

Otsu and double thresholding for skin image segmentation

To improve the Acc and reliability of lesion segmentation, the output probability map generated by the DNN is further refined using a multi-stage post-processing strategy. This process begins with Otsu thresholding, followed by a double-threshold hysteresis mechanism, morphological reconstruction, and removal of small false-positive regions.

The first step in this refinement sequence uses Otsu’s method to convert the soft probability map into a binary mask. This approach determines a global threshold value that separates foreground (lesion) and background by maximizing the inter-class variance across all possible threshold levels. Let $τ$ represent a candidate threshold and let $σ_{B}^{2} (τ)$ denote the between-class variance associated with $τ$ . The optimal threshold $T$ is defined as:

T = \arg max_{τ} σ_{B}^{2} (τ)

(8)where the variance is computed based on the distribution of intensity values in the image. Otsu’s method requires no manual tuning and is particularly effective when there is a measurable difference in pixel intensity between the lesion and the surrounding tissue. In this context, it ensures consistent and adaptive separation of lesion regions, even under variable lighting and contrast conditions.

To strengthen boundary preservation and reduce the risk of fragmented or incomplete segmentation, a double-thresholding approach is employed after binarization. This method introduces two fixed thresholds: a higher value ( $T_{h} = 0.6$ ) and a lower value ( $T_{l} = 0.3$ ). Pixels with confidence scores above $T_{h}$ are classified as definite lesion regions. Those with values between $T_{l}$ and $T_{h}$ are treated as potential lesion candidates and are retained only if they are directly connected to strong lesion regions identified in the high threshold mask. Any pixel with a value below $T_{l}$ is excluded from the final mask.

This thresholding mechanism is implemented through a morphological reconstruction process. Two binary images are created from the DNN output: one corresponding to the strong lesion areas (the marker), and another containing both strong and weak candidates (the mask). The reconstruction operation propagates the marker through the mask using geodesic dilation constrained by the intensity values in the mask. This technique ensures that weak candidates are only included in the final mask if they form a continuous structure with confidently segmented regions. Let $p (x, y)$ represent the probability map output, and define the two sets as follows:

M (x, y) = I {p (x, y) \geq T_{h}}

(9)

G (x, y) = I {p (x, y) \geq T_{l}}

(10)where

I

is an indicator function producing binary masks. The reconstruction proceeds iteratively, expanding

M

within

G

until no further changes occur. This method reinforces structural integrity and preserves lesion continuity while suppressing isolated responses.

After the reconstruction step, the refined binary mask may still contain small disconnected components caused by residual noise or background artifacts. To remove these, connected component analysis is performed using 8-neighbor connectivity. Each component is evaluated based on its pixel count, and regions smaller than a predefined area threshold (set to 50 pixels in this work) are discarded. This final filtering step ensures that only spatially coherent and diagnostically relevant regions are retained in the segmentation output.

The combined effect of these post-processing steps is a robust segmentation mask that aligns more closely with actual lesion boundaries. This pipeline effectively suppresses background noise, enhances structural clarity, and maintains fidelity in low-contrast or irregular lesion scenarios. When used in conjunction with the pre-processing and segmentation modules, this refinement strategy significantly improves the overall quality and consistency of skin lesion segmentation, facilitating downstream diagnostic analysis in clinical applications.

Hyperparameter selection

The training configuration for the proposed model was determined through empirical tuning and iterative experimentation. Learning rate values were initially explored across a logarithmic scale from $10^{- 2}$ to $10^{- 5}$ . A rate of $10^{- 4}$ provided the most stable performance across epochs and was therefore selected for final training. The batch size was set to 16, offering a good tradeoff between convergence stability and memory efficiency on the available hardware.

The model was trained using the Adam optimizer, which provided smooth convergence without requiring momentum tuning. Weight decay was set to $10^{- 5}$ to discourage overfitting. The segmentation loss combined binary cross-entropy with Dice loss to account for both pixel-wise classification Acc and overlap quality. These hyperparameters were finalized based on validation results from the ISIC datasets and kept consistent across all other datasets to ensure unbiased evaluation. This systematic approach allowed the model to generalize well without excessive tuning for specific data distributions.

Skin lesion segmentation algorithm

Reliable segmentation of skin lesions is a fundamental requirement in automated dermatological analysis, forming the basis for early detection and treatment planning in skin cancer diagnostics. However, achieving precise lesion delineation remains challenging due to variability in lesion appearance across patients and imaging conditions. These challenges include inconsistencies in color, shape, size, texture, and boundary sharpness. Standard segmentation approaches often struggle with such variability, especially when confronted with low contrast, noise, or structural ambiguity in lesion borders.

To address these limitations, this work proposes a structured segmentation pipeline centered around a compact DNN integrated with robust image enhancement and refinement modules. The method is designed to preserve fine lesion structures, suppress irrelevant background noise, and standardize intensity representations across diverse input conditions.

The segmentation process begins with a pre-processing module that improves the image quality prior to learning. Specifically, the input image is first transformed by extracting the green color channel, which generally offers the highest lesion-to-background contrast in dermoscopic imaging. Illumination inconsistencies are corrected using morphological operations, and noise is suppressed using adaptive Wiener filtering. This filter adapts its smoothing strength locally, based on variance estimations, ensuring that edges and texture details remain intact.

After denoising, the image is converted to grayscale and undergoes intensity normalization. This step aligns the dynamic range across images using z-score standardization. For low-contrast inputs, CLAHE is optionally applied to enhance boundary visibility. Anisotropic diffusion filtering is then used to refine edge definition while further smoothing background noise.

Once the image is pre-processed, it is passed through the DNN segmentation model. The network extracts multi-scale hierarchical features and produces a probability map that indicates the likelihood of each pixel belonging to the lesion class. This map is then binarized using Otsu’s method, which selects an optimal threshold by maximizing the inter-class variance between lesion and non-lesion regions in the intensity histogram.

Post-processing follows to enhance the spatial coherence of the segmented region. A double-thresholding strategy with morphological reconstruction is employed to preserve weak boundary regions that are connected to high-confidence lesion areas, while eliminating isolated responses. Finally, small disconnected components that fall below a fixed area threshold are removed to suppress residual artifacts.

The result of this sequential process is a clean, morphologically accurate binary lesion mask suitable for clinical interpretation and downstream analysis. The full algorithm is described step-by-step in Algorithm ??, and a visual overview is provided in Figure 4.

Figure 4.

Visual workflow of the segmentation process. The input image undergoes green channel extraction, morphological illumination correction, and Wiener filtering. Following grayscale conversion and intensity normalization, the deep neural network (DNN) predicts a lesion probability map. This is converted to a binary mask using Otsu thresholding and refined using morphological reconstruction and artifact suppression, producing the final segmentation output.

Algorithm 1

Skin lesion segmentation algorithm.

1: Input: Dermoscopic image I

2: Output: Final lesion segmentation mask S

3: Step 1: Pre-processing

4: Extract green channel from I

5: Apply morphological operations to correct uneven illumination

6: Apply adaptive Wiener filtering for noise reduction

7: Step 2: Normalization and enhancement

8: Convert to grayscale

9: Normalize intensities using the z-score method

10: Apply CLAHE if intensity variance is low

11: Apply anisotropic diffusion to enhance edges

12: Step 3: Segmentation

13: Use DNN to generate initial lesion probability map M

14: Apply Otsu thresholding to obtain a binary mask

15: Step 4: Post-processing

16: Apply double thresholding with morphological reconstruction

17: Remove small connected components below threshold

18: Step 5: Output

19: Return the refined binary mask S

Dataset description and performance evaluation

Dataset description

To evaluate the effectiveness of the proposed segmentation framework, a range of publicly available dermoscopic image datasets were used. These datasets offer considerable diversity in terms of lesion types, image quality, and acquisition settings, making them suitable for comprehensive performance assessment.

The ISIC 2016, 2017, and 2018 challenge datasets were selected as primary benchmarks. These datasets originate from the ISIC and have been widely adopted for skin lesion segmentation and classification research. The ISIC 2018 dataset contains 2,594 training images with corresponding lesion masks, and an additional 1,000 testing samples collected from multiple clinical institutions to ensure heterogeneity. The ISIC 2017 dataset includes 2,000 images in the training set, 150 for validation, and 600 in the test set. It supports multiple tasks, including segmentation and diagnosis. ISIC 2016, which introduced one of the earliest benchmark tasks in this domain, comprises 900 training images and 379 evaluation samples.

In addition to the ISIC datasets, the PH2 dataset was incorporated for external validation. PH2 contains 200 high-resolution dermoscopic images, collected under standardized conditions at Hospital Pedro Hispano, Portugal. The dataset includes expert-annotated binary lesion masks and clinical metadata for each case. Its consistency in image acquisition makes it a valuable resource for testing the generalizability of segmentation methods trained on more varied datasets.

To further assess robustness under diverse conditions, the Human Against Machine dataset (HAM10000) dataset was also considered. This dataset comprises 10,015 dermoscopic images representing a broad spectrum of lesion types, including melanomas, nevi, and basal cell carcinomas. The data originate from multiple sites across Europe and Australia, encompassing various devices and imaging conditions. While the original HAM10000 dataset was intended for classification, subsequent efforts have made segmentation masks available, allowing its use in lesion boundary analysis.

Together, these datasets form a comprehensive evaluation suite. Their differences in scale, acquisition protocol, and lesion diversity provide a rigorous basis for testing the Acc and robustness of the proposed approach across both standardized and heterogeneous clinical settings.

Performance evaluation metrics

The proposed method for skin lesion segmentation is evaluated using five performance metrics: Acc, Sn, specificity (Sp), JI (intersection over union), and Dice coefficient (DC). These metrics, recommended by the ISIC challenge leaderboard, provide a comprehensive framework for assessing the segmentation quality. Acc measures the overall correctness of the predictions, while Sn and Sp focus on the model’s ability to correctly identify lesion and background pixels, respectively. The JI, a primary evaluation criterion, quantifies the overlap between the predicted and ground truth lesion regions by calculating the ratio of their intersection to their union. The DC complements this by evaluating the similarity between the predicted segmentation and the actual lesion region. These metrics are defined mathematically as follows:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(11)

Sensitivity = \frac{TP}{TP + FN}

(12)

Specificity = \frac{TN}{TN + FP}

(13)

Jaccard index = \frac{TP}{TP + FP + FN}

(14)

Dice coefficient = \frac{2 \times TP}{2 \times TP + FP + FN}

(15)

Here, TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively. By employing these metrics, the evaluation provides insights into both the segmentation Acc and its ability to differentiate between lesion and non-lesion areas. Emphasis on the JI ensures alignment with industry benchmarks and supports meaningful comparisons with other segmentation methods.

Results and analysis

Evaluation of the proposed method across ISIC datasets

The proposed segmentation framework was rigorously evaluated using the ISIC 2016, ISIC 2017, and ISIC 2018 benchmark datasets. These datasets include a broad spectrum of dermoscopic images representing different lesion types, acquisition conditions, and clinical variations. The evaluation was based on standard performance metrics, including JI, DC, Acc, Sn, and Sp, to comprehensively assess segmentation performance.

Table 1 presents the evaluation results. The proposed method attained high JI scores of 89.91%, 84.51%, and 87.39% across ISIC 2016, 2017, and 2018, respectively. Corresponding DC values reached 94.30%, 90.86%, and 92.53%, indicating strong overlap between predicted masks and ground truth annotations. Overall Acc remained consistently high, exceeding 95% across all datasets. Sn and Sp values further validate the model’s balanced ability to detect true lesion regions while minimizing FPs.

Table 1.

Performance of the proposed method on ISIC datasets. Metrics use the same scale and directional indicators ( $↑$ for higher-is-better).

Dataset	JI ( $↑$ )	DC ( $↑$ )	Acc ( $↑$ )	Sn ( $↑$ )	Sp ( $↑$ )
ISIC 2016	89.91	94.30	97.10	94.43	96.67
ISIC 2017	84.51	90.86	95.82	90.22	96.47
ISIC 2018	87.39	92.53	95.64	93.15	94.92

ISIC: International Skin Imaging Collaboration; JI: Jaccard index; DC: Dice coefficient; Acc: accuracy; Sn: sensitivity; Sp: specificity.

Figure 5 illustrates representative segmentation outcomes. Each row visualizes: (1) the original dermoscopic image, (2) the ground truth binary mask, (3) the predicted mask by the proposed method, and (4) an overlay of the ground truth contour on the original image for visual comparison. The model successfully delineates lesion boundaries with high precision, even under challenging conditions such as fuzzy edges, low contrast, or irregular borders.

Figure 5.

Visual comparison of skin lesion segmentation results. The first column represents original dermoscopic images, the second column shows ground truth masks, the third column presents the predicted segmentation masks generated by the proposed method, and the fourth column overlays the ground truth contours onto the original images.

These results confirm the robustness and generalization capability of the proposed approach across diverse image distributions. The high segmentation quality, as reflected in both quantitative and qualitative assessments, affirms the model’s readiness for deployment in automated skin lesion analysis pipelines. Its consistent performance across datasets demonstrates strong adaptability to varying lesion morphologies, supporting its utility in real-world clinical and tele-dermatology applications.

Ablation study

To better understand the contribution of each component in the proposed framework, an ablation study was conducted on the ISIC 2016, ISIC 2017, and ISIC 2018 datasets. The model comprises three stages: (1) pre-processing, which enhances image quality using morphological operations and CLAHE; (2) segmentation using a lightweight DNN; and (3) post-processing involving adaptive thresholding and morphological refinement.

We tested four configurations:

DNN only: The core neural network is used without any pre- or post-processing.

Pre-processing + DNN: Only enhancement techniques are used before segmentation, without refinement.

DNN + post-processing: No image enhancement is performed prior to segmentation; only post-processing is applied.

Full model (proposed): Incorporates all three stages for complete end-to-end processing.

As summarized in Table 2, the inclusion of both enhancement and refinement stages significantly improves segmentation outcomes. Without enhancement, the network is more sensitive to low contrast and noise; without post-processing, boundary Acc suffers. The full pipeline consistently outperforms all reduced variants in Dice score, JI, and overall Acc across all datasets.

Table 2.

Ablation study results across ISIC datasets.

	ISIC 2016			ISIC 2017			ISIC 2018
Configuration	Dice	Jaccard	Accuracy	Dice	Jaccard	Accuracy	Dice	Jaccard	Accuracy
DNN only	0.89	0.82	94.1%	0.86	0.78	93.2%	0.87	0.80	94.0%
Pre + DNN	0.91	0.84	95.0%	0.88	0.81	94.0%	0.89	0.82	94.8%
DNN + Post	0.90	0.83	94.6%	0.87	0.80	93.7%	0.88	0.81	94.3%
Full model	0.94	0.90	97.1%	0.91	0.85	95.8%	0.93	0.87	95.6%

ISIC: International Skin Imaging Collaboration; DNN: deep neural network.

Note. Best results for each dataset are highlighted in bold.

These findings confirm that the full integration of enhancement, segmentation, and refinement stages leads to superior performance. Each module adds incremental value, supporting the effectiveness and efficiency of the proposed hybrid strategy, particularly in diverse real-world conditions.

Computational efficiency and runtime comparison

The proposed segmentation framework was developed with an emphasis on computational efficiency, scalability, and deployment feasibility. All experiments were conducted on a workstation equipped with an AMD Ryzen 7 5800H processor, 32 GB RAM, and an NVIDIA RTX 3060 graphics processing unit (GPU) with 12 GB VRAM.

The model was trained independently on the ISIC 2016, 2017, and 2018 datasets, each using a batch size of 16 over 100 epochs. Convergence was consistently achieved without requiring extended training or early stopping. External datasets, PH2 and HAM10000, were used exclusively for inference. This strategy was intended to evaluate generalization under real-world conditions, where models encounter previously unseen clinical images without additional tuning (Tables 3 to 5).

Table 3.

Average processing time per image across pipeline stages.

Stage	Average time (s)
Pre-processing	0.34
Segmentation	1.78
Post-processing	0.27
Total	2.39

Table 4.

Runtime comparison with benchmark models (inference on ISIC 2018).

Model	Avg. runtime (s)	Relative speed
Proposed method	2.39	1.00 $\times$
U-Net³²	3.21	0.74 $\times$
FAT-Net¹²	4.85	0.49 $\times$

ISIC: International Skin Imaging Collaboration; U-Net: U-shaped convolutional neural network; FAT-Net: feature adaptive transformer network.

Table 5.

Training and inference time summary across datasets.

Dataset	Training time (hh:mm)	Avg. inference time (ms/image)	Total evaluation time
ISIC 2016	01:48	140	$\sim$ 4 min
ISIC 2017	02:06	145	$\sim$ 5 min
ISIC 2018	02:30	150	$\sim$ 7 min
PH2 (inference only)	—	130	$\sim$ 3 min
HAM10000 (inference only)	—	160	$\sim$ 15 min

ISIC: International Skin Imaging Collaboration; PH2: Pedro Hispano Hospital dataset; HAM10000: Human Against Machine dataset.

The total average processing time per image, combining all stages—pre-processing, segmentation, and post-processing—was 2.39 s. Specifically, pre-processing operations (morphological enhancements and noise filtering) took 0.34 s; segmentation via the DNN required 1.78 s; and post-processing, which included Otsu thresholding and morphological reconstruction, averaged 0.27 s.

To contextualize these figures, runtime comparisons were conducted against U-Net and FAT-Net under identical hardware and inference settings. U-Net achieved an average runtime of 3.21 s per image, and FAT-Net required 4.85 s. These results confirm the proposed model’s advantage in terms of speed and computational efficiency, making it suitable for deployment in resource-constrained or real-time clinical applications.

Statistical analysis and observations

To rigorously assess the reliability and statistical validity of the proposed segmentation method, a comprehensive analysis was conducted across the ISIC 2016, 2017, and 2018 datasets. The segmentation results were evaluated in terms of DC, JI, and Acc. A Shapiro–Wilk test was initially performed to assess the normality of the metric distributions, and results indicated that the assumption of normality was satisfied ( $p > 0.05$ ). Based on this, a paired t-test was applied to compare the performance of the proposed method against baseline models.

For each dataset—ISIC 2016 $(n = 379)$ , ISIC 2017 $(n = 600)$ , and ISIC 2018 $(n = 1000)$ —comparisons were performed using the full test set. A confidence level of 95% ( $α = 0.05$ ) was applied to assess statistical significance. As summarized in Table 6, the proposed method consistently outperformed the baseline across all datasets with improvements that were statistically significant ( $p < 0.05$ ). Notably, the DC improved by 6.8%, 8.3%, and 8.1% on ISIC 2016, 2017, and 2018, respectively, while the JI showed gains of 11.1%, 9.0%, and 8.8%. These results underscore the effectiveness of the proposed segmentation strategy in delivering meaningful performance enhancements beyond existing methods.

Table 6.

Statistical outcome metrics comparison for the developed method, including performance gains over the baseline.

Dataset	Method	Dice coefficient	Jaccard index	Accuracy	Gain in Dice	Gain in Jaccard
ISIC 2016	Baseline model	$0.88 \pm 0.12$	$0.81 \pm 0.14$	93.5%	–	–
	Proposed method	$0.94 \pm 0.08$	$0.90 \pm 0.10$	97.1%	$+$ 6.8%	$+$ 11.1%
ISIC 2017	Baseline model	$0.84 \pm 0.15$	$0.78 \pm 0.18$	92.9%	–	–
	Proposed method	$0.91 \pm 0.10$	$0.85 \pm 0.13$	95.8%	$+$ 8.3%	$+$ 9.0%
ISIC 2018	Baseline model	$0.86 \pm 0.14$	$0.80 \pm 0.16$	94.2%	–	–
	Proposed method	$0.93 \pm 0.10$	$0.87 \pm 0.14$	95.6%	$+$ 8.1%	$+$ 8.8%

ISIC: International Skin Imaging Collaboration.

Quantitatively, the DC exhibited low variability, with a standard deviation of 0.015 across test sets, suggesting stable and consistent segmentation performance under varying conditions. These results affirm the model’s capacity to generalize effectively and distinguish lesion structures with high precision.

In addition to Acc improvements, the observed gains can be attributed to the synergistic effect of the proposed pre- and post-processing steps. Morphological correction and adaptive Wiener filtering helped suppress artifacts and enhance contrast, while post-processing with Otsu thresholding and morphological reconstruction refined lesion boundaries. These steps collectively enhanced segmentation robustness, minimized false positives, and improved the delineation of irregular lesion shapes.

The findings support the statistical reliability and clinical relevance of the proposed method. Its consistent Acc, low variance, and statistically significant improvements make it a strong candidate for integration into computer-aided diagnostic systems for early skin cancer detection and treatment planning.

Comparison with existing methods

The proposed method was comprehensively evaluated against SOTA techniques, including U-Net, UNet++, and advanced models such as FAT-Net, Swin-Unet, and attention residual U-Net with global decoder (ARU-GD), on the ISIC 2016, ISIC 2017, and ISIC 2018 datasets. These datasets provided diverse dermoscopic images, ensuring robust evaluation across different lesion types and imaging conditions. Key metrics such as JI, DC, Acc, Sn, and Sp determine segmentation performance. The detailed outcomes are summarized in Tables 7 and 8.

Table 7.

Comparison of the proposed method with state-of-the-art techniques on ISIC datasets.

Dataset	Model name	JI ( $↑$ )	DC ( $↑$ )	Acc ( $↑$ )	Sn ( $↑$ )	Sp ( $↑$ )
ISIC 2016	LeaNet³³	78.39	88.25	94.72	91.03	98.24
	CPFNet²¹	79.88	87.69	94.96	89.53	96.55
	DAGAN³⁴	84.42	90.85	95.82	92.28	95.68
	FAT-Net¹²	85.30	91.59	96.04	92.59	96.02
	CFF-Net²⁵	85.71	92.12	–	90.71	–
	AS-Net³⁵	80.51	88.07	94.66	89.92	95.72
	SLSN³⁶	83.73	90.54	96.47	91.00	–
	ADF-Net³⁷	87.40	92.89	96.53	94.45	96.41
	RMMLP²⁶	85.40	91.98	–	–	–
	Ms RED³⁸	83.86	90.33	96.45	91.10	–
	Swin-Unet³⁹	85.77	91.43	95.52	93.37	94.48
	ARU-GD¹⁸	85.12	90.93	94.38	89.86	94.65
	U-Net³²	81.38	88.24	93.31	87.28	92.88
	UNet++¹⁸	82.81	89.19	93.88	88.78	93.52
	Proposed	89.91	94.30	97.10	94.43	96.67
ISIC 2017	LeaNet³³	78.93	88.89	95.72	90.63	97.72
	DAGAN³⁴	75.94	84.25	93.26	83.63	97.25
	FAT-Net¹²	76.53	85.00	93.26	83.92	97.25
	CFF-Net²⁵	81.07	89.09	–	86.56	–
	AS-Net³⁵	80.51	88.07	94.66	89.92	95.72
	SLSN³⁶	83.73	90.54	96.47	91.00	–
	ADF-Net³⁷	87.40	92.89	96.53	94.45	96.41
	Ms RED³⁸	83.86	90.33	96.45	91.10	–
	Swin-Unet³⁹	80.79	88.27	94.53	89.35	94.96
	ARU-GD¹⁸	80.77	87.89	93.88	88.31	96.31
	U-Net³²	75.69	84.12	93.29	84.30	93.41
	UNet++¹⁸	78.58	86.35	93.73	87.13	94.41
	Proposed	84.51	90.86	95.82	90.22	96.47
ISIC 2018	LeaNet³³	78.39	88.25	94.72	91.03	98.24
	CPFNet²¹	79.88	87.69	94.96	89.53	96.55
	DAGAN³⁴	81.13	88.07	93.24	90.72	95.88
	FAT-Net¹²	82.02	89.03	95.78	91.00	96.99
	CFF-Net²⁵	82.55	90.08	–	88.63	–
	AS-Net³⁵	83.09	89.55	95.68	93.06	94.69
	SLSN³⁶	83.73	90.54	96.47	91.00	–
	ADF-Net³⁷	84.96	91.12	96.83	92.68	97.67
	RMMLP²⁶	85.40	91.98	–	–	–
	Ms RED³⁸	83.86	90.33	96.45	91.10	–
	Swin-Unet³⁹	85.31	91.39	94.77	91.89	93.31
	ARU-GD¹⁸	85.97	91.78	95.01	92.82	94.13
	U-Net³²	83.66	90.16	94.00	90.93	91.81
	UNet++¹⁸	84.83	90.86	94.38	91.72	93.17
	Proposed	87.39	92.53	95.64	93.15	94.92

ISIC: International Skin Imaging Collaboration; JI: Jaccard index; DC: Dice coefficient; Acc: accuracy; Sn: sensitivity; Sp: specificity; CPFNet: cross-scale parallel fusion network; FAT-Net: feature adaptive transformer network; CFF-Net: contextual feature fusion network; AS-Net: attention-scale aggregation network; SLSN: semi-supervised learning skin network; ADF-Net: attention-driven feature network; RMMLP: rolling matrix multi-scale local pattern; ARU-GD: attention residual U-Net with global decoder; U-Net: U-shaped convolutional neural network; UNet++: nested U-Net with dense skip connections.

Table 8.

Segmentation performance on external datasets (PH2 and HAM10000) without retraining.

Dataset	Dice	Jaccard	Sensitivity	Specificity	Accuracy
PH2	0.88	0.82	0.90	0.94	0.91
HAM10000	0.86	0.80	0.88	0.92	0.89

PH2: Pedro Hispano Hospital dataset; HAM10000: Human Against Machine dataset.

Note. Reference numbers are indicated beside model names (e.g. FAT-Net¹²). Arrows denote performance trends: $↑$ for higher-is-better metrics and $↓$ for lower-is-better metrics. Metrics include JI, DC, Acc, Sn, and Sp.

For the ISIC 2016 dataset, the proposed method achieved superior results across all metrics, with a JI of $89.91 \pm 0.099$ , a DC of $94.30 \pm 0.076$ , Acc of $97.10 \pm 0.061$ , Sn of $94.43 \pm 0.096$ , and Sp of $96.67 \pm 0.052$ . These results emphasize the method’s ability to handle complex lesion boundaries effectively. Compared to other models, such as U-Net ( $JI = 81.38 \pm 0.127$ , $DC = 88.24 \pm 0.104$ ) and FAT-Net ( $JI = 85.30$ , $DC = 91.59$ ), the proposed method demonstrated a significant improvement in segmentation precision and overlap.

For the ISIC 2017 dataset, the proposed method continued to outperform its competitors, achieving a JI of $84.51 \pm 0.135$ , a DC of $90.86 \pm 0.104$ , Acc of $95.82 \pm 0.068$ , Sn of $90.22 \pm 0.097$ , and Sp of $96.47 \pm 0.085$ . In comparison, Swin-Unet ( $JI = 80.79 \pm 0.158$ , $DC = 88.27 \pm 0.129$ ) and UNet++ ( $JI = 78.58 \pm 0.191$ , $DC = 86.35 \pm 0.159$ ) showed lower performance. The results highlight the robustness of the proposed method in handling diverse lesion appearances and maintaining high Acc.

On the ISIC 2018 dataset, the proposed method achieved exceptional results, with a JI of $87.39 \pm 0.139$ , a DC of $92.53 \pm 0.102$ , Acc of $95.64 \pm 0.074$ , Sn of $93.15 \pm 0.087$ , and Sp of $94.92 \pm 0.132$ . Compared to advanced models such as ARU-GD ( $JI = 85.97 \pm 0.134$ , $DC = 91.78 \pm 0.095$ ) and Swin-Unet ( $JI = 85.31 \pm 0.130$ , $DC = 91.39 \pm 0.099$ ), the proposed method demonstrated better segmentation consistency and precision.

The exceptional performance of the proposed method is attributed to its integrated pre-processing and post-processing modules, which reduce noise, normalize lesion intensities, and refine segmentation masks by addressing boundary irregularities and removing artifacts. This comprehensive approach ensures accurate and reliable segmentation, even under challenging conditions.

Cross-dataset evaluation on PH2 and HAM10000

To assess the generalization ability of the proposed segmentation framework beyond the training distribution, we conducted independent evaluations on two publicly available external datasets: PH2 and HAM10000. Unlike $k$ -fold cross-validation, which tests variability within the same dataset, cross-dataset validation provides a more rigorous measure of robustness by exposing the model to unseen distributions.

The PH2 dataset consists of 200 dermoscopic images acquired under controlled imaging conditions at a single clinical site. Each image includes expert-annotated binary masks, allowing for precise ground-truth comparisons. When applied to this dataset without any retraining or parameter adjustment, the model achieved a DC of 0.88 and a JI of 0.82. Sn and Sp were recorded at 0.90 and 0.94, respectively, while overall segmentation Acc reached 0.91. These results reflect the model’s effectiveness in handling well-structured, high-resolution dermoscopic images.

The HAM10000 dataset poses a greater challenge due to its scale and heterogeneity. It comprises more than 10,000 dermoscopic images collected from multiple clinical environments using varying devices, illumination conditions, and lesion types. Despite the increased variability, the proposed model maintained strong performance with a DC of 0.86, JI of 0.80, Sn of 0.88, Sp of 0.92, and Acc of 0.89. These results underscore the method’s robustness in segmenting lesions with irregular shapes, varying pigmentation, and background artifacts.

Notably, all evaluations were performed using the model trained solely on ISIC datasets, with no additional fine-tuning or adaptation to the external datasets. This independent testing strategy serves as a realistic proxy for cross-validation by emulating deployment in real-world clinical settings. The consistent results across distinct datasets confirm the proposed framework’s strong generalization capability.

Discussion

The proposed segmentation framework was designed to address practical challenges associated with dermoscopic image analysis, such as poor contrast, irregular lesion borders, and variability in imaging conditions. Rather than relying on increasingly complex deep learning models, the study explored a hybrid strategy that combines traditional image enhancement with a streamlined neural network architecture.

One of the most notable findings is that the model achieves high segmentation Acc using significantly fewer parameters compared to several recent SOTA methods. As shown in Table 7, it consistently delivered strong performance in terms of Dice index and JI across the ISIC 2016, 2017, and 2018 datasets. These outcomes suggest that it is possible to maintain competitive Acc without excessive model complexity, which is particularly valuable for applications requiring low-latency or deployment on devices with limited computational resources.

Independent testing on the PH2 and HAM10000 datasets further demonstrated the model’s robustness. Despite being trained only on ISIC datasets, the method generalized well to external data with different acquisition settings and lesion types. This kind of cross-dataset validation provides confidence in the model’s real-world applicability, especially in clinical environments where consistent performance on unseen cases is critical.

In addition to Acc, the model’s design supports fast inference times and low memory consumption, making it suitable for real-time use. Its ability to handle noise, glare, and boundary artifacts can be attributed to the inclusion of pre-processing techniques such as morphological filtering and Wiener denoising, as well as post-processing steps such as Otsu thresholding and morphological reconstruction. These elements helped improve lesion visibility and sharpen segmentation output without significantly increasing computational overhead.

While the results are promising, the approach is not without limitations. In particular, lesions with extremely subtle boundaries or very low contrast against the skin background can still pose challenges. The model’s performance may also vary slightly across lesion subtypes or when exposed to rare visual patterns not represented in the training data.

Overall, the findings support the use of a hybrid approach that favors simplicity and efficiency without compromising segmentation quality. This makes the framework particularly well-suited for use in clinical decision-support systems, mobile diagnostic tools, and other real-world healthcare settings. Further development could focus on increasing interpretability, expanding training diversity, and optimizing runtime for broader deployment scenarios.

Conclusion

Melanoma remains one of the most aggressive forms of skin cancer, where timely and precise diagnosis is essential for improving survival rates. In recent years, the use of automated image analysis has gained significant attention as a means to support dermatologists in detecting lesions early. However, dermoscopic image segmentation continues to face difficulties caused by varying lighting conditions, low contrast, irregular lesion borders, and diversity in lesion appearance. The present study introduced a hybrid skin lesion segmentation framework designed to address these practical challenges while maintaining computational efficiency.

The proposed approach integrates classical image enhancement techniques with a lightweight deep learning model in a three-stage process: pre-processing, segmentation, and post-processing. The pre-processing stage enhances contrast, removes noise, and strengthens boundary details; the segmentation stage employs a streamlined encoder–decoder network optimized for efficiency; and the post-processing stage refines the results using adaptive thresholding and morphological operations. This cooperative design enables the system to combine the strengths of conventional image processing and modern neural networks, producing clearer and more reliable segmentation maps.

Unlike recent studies that rely heavily on increasingly complex network architectures, this work emphasizes simplicity, modularity, and adaptability. The framework achieves a balanced compromise between model size, Acc, and generalization—an essential feature for clinical and mobile applications where high-end computational resources may not be available. Evaluation on the ISIC 2016, 2017, and 2018 datasets showed that the proposed method performs on par with, or better than, many well-known segmentation models, including U-Net variants and attention-based networks. The approach consistently achieved a strong Dice index and JI, high Sn and Sp, and stable results across varied lesion types.

To examine its robustness, the model was further validated on two independent datasets, PH2 and HAM10000, without retraining or parameter adjustment. The consistent performance across these datasets demonstrates good adaptability to different imaging conditions and patient populations. When compared with recent SOTA architectures such as U-Net++ and AD-Net, the proposed method reduced model parameters by roughly 45%–60% while improving the Dice and Jaccard metrics by up to 4.5% and 5.3%, respectively. These results confirm that reliable lesion segmentation can be achieved without depending on deep or computationally expensive networks.

The system also proved resilient to common dermoscopic artifacts such as hair, glare, and uneven illumination. Statistical testing confirmed that the observed improvements were significant. Nonetheless, a few limitations persist: the model occasionally struggles with lesions that have extremely unclear edges or minimal contrast with surrounding tissue, and further optimization is required to enable real-time operation in clinical environments.

Future research will focus on enhancing robustness, interpretability, and scalability:

Using generative adversarial models to create additional samples for rare or underrepresented lesion types, enhancing diversity in the training data.

Incorporating complementary information such as histopathological or clinical metadata to strengthen diagnostic relevance.

Employing model-compression and acceleration techniques—including pruning and quantization—to improve runtime efficiency.

Integrating explainable AI methods to clarify decision boundaries and foster clinician confidence.

Expanding validation through collaborations with dermatologists and multi-center clinical datasets.

Investigating adaptive data augmentation approaches to handle extreme imaging variability while preserving computational simplicity.

This work presents a practical and adaptable segmentation framework that bridges classical image processing and efficient deep learning. The model offers a favorable balance between precision, generalization, and speed, making it suitable for deployment in diverse clinical settings. Continued refinement, larger-scale validation, and integration with decision-support systems could further extend its utility for early melanoma detection and broader dermatological diagnostics.

Footnotes

Abbreviations

ORCID iDs

Ahmed Ali

Muhammad Irfan

Muawia Abdelkafi Magzoub

Ethical approval

Not applicable.

Author contributions

The authors affirm their contributions to this study as follows: The overall study conception and design were carried out by Abdullah A. Asiri, Toufique A. Soomro, Khlood M. Mehdar, and Ahmed Ali. Toufique A. Soomro, Ahmed Ali, Faisal Bin Ubaid, and Sabah Elshafie Mohammed Elshafie were responsible for data collection and preparation. The analysis and interpretation of the results were conducted by Toufique A. Soomro, Muhammad Irfan, and Hanan T. Halawani. The draft manuscript was prepared by Toufique A. Soomro, Hanan T. Halawani, Aisha M. Mashraqi, and Muhammad Irfan. All authors reviewed the findings, contributed to critical revisions, and approved the final version of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The project received financial sponsorship from the Deanship of Graduate Studies and Scientific Research at Najran University for supporting the research project through the Nama’a program, with the project code NU/GP/MRC/13/771-6.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Guarantor

Muhammad Irfan.

References

American Cancer Society. Information and resources about cancer: breast, colon, lung, prostate, skin. https://www.cancer.org/ (2024, accessed 20 October 2024).

National Cancer Institute. Skin cancer (including melanoma)—patient version. https://www.cancer.gov/types/skin (2025, accessed 25 June 2025).

Khattar

Kaur

. Computer assisted diagnosis of skin cancer: a survey and future recommendations. Comput Electr Eng 2022; 104: 108431.

Cancer Center.ai. Skin cancer – AI platform in oncology and pathology. https://cancercenter.ai/skin-cancer (2025, accessed 25 June 2025).

Shi

Wang

Yao

, et al. Mechanism insights and therapeutic intervention of tumor metastasis: latest developments and perspectives. Signal Transduct Target Ther 2024; 9: 192.

Jose

Fernandez

. The rising incidence of skin cancers in young adults: a population-based study in Brazil. Sci J Dermatol Venereol 2024; 3: 39–53.

Saeed

Shahbaz

Maqsood

, et al. Cutaneous oncology: Strategies for melanoma prevention, diagnosis, and therapy. Cancer Control 2024; 31: 10732748241274978.

Sol

Boncimino

Todorova

, et al. Therapeutic approaches for non-melanoma skin cancer: standard of care and emerging modalities. Int J Mol Sci 2024; 25: 7056.

Amarsi

Chan

Jiang

, et al. Bridging gaps throughout a patient’s journey with melanoma: a systematic review. medRxiv 2025.05.13.25327522. DOI: 10.1101/2025.05.13.25327522.

10.

Liu

Zhang

Wang

, et al. A survey on deep learning for skin lesion segmentation. Med Image Anal 2023; 87: 102801.

11.

Shetty

Fernandes

Rodrigues

, et al. Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Sci Rep 2022; 12: 18134.

12.

Chen

, et al. FAT-Net: Feature adaptive transformers for automated skin lesion segmentation. Med Image Anal 2022; 76: 102327.

13.

Ali

Khan

Sharif

, et al. Skin lesion segmentation using transformer-assisted deep encoder–decoder network. Comput Biol Med 2025; 159: 110842.

14.

Alhudhaif

Alkanhal

Alshamrani

, et al. A novel approach to skin lesion segmentation: multipath fusion model with fusion loss. Comput Math Methods Med 2022; 2022: 2157322.

15.

Song

Wang

. Decoupling multi-task causality for improved skin lesion segmentation and classification. Pattern Recogn 2023; 133: 108995.

16.

Naveed

Naqvi

Khan

, et al. PCA: progressive class-wise attention for skin lesions diagnosis. Eng Appl Artif Intell 2024; 127: 107417.

17.

Wang

Zhao

. An enhanced multi-scale attention network for precise skin lesion segmentation. Comput Methods Programs Biomed 2022; 221: 107190.

18.

Maji

Sigedar

Singh

. Attention Res-UNet with guided decoder for semantic segmentation of brain tumors. Biomed Signal Process Control 2022; 71: 103077.

19.

Yuan

Wang

, et al. A lightweight network for smoke semantic segmentation. Pattern Recogn 2023; 137: 109289.

20.

Schlemper

Oktay

Schaap

, et al. Attention gated networks: learning to leverage salient regions in medical images. Med Image Anal 2019; 53: 197–207.

21.

Feng

Zhao

Shi

, et al. CPFNet: context pyramid fusion network for medical image segmentation. IEEE Trans Med Imag 2020; 39: 3008–3018.

22.

Hafhouf

Zitouni

Megherbi

, et al. An improved and robust encoder–decoder for skin lesion segmentation. Arab J Sci Eng 2022; 47: 9861–9875.

23.

Chen

Liu

Zhang

, et al. TransAttuNet: Multi-level attention-guided U-net with transformer for medical image segmentation. arXiv preprint arXiv:2107.05274, 2021.

24.

Naveed

Naqvi

Khan

, et al. AD-Net: attention-based dilated convolutional residual network with guided decoder for robust skin lesion segmentation. Neural Comput Appl 2024; 36: 22277–22299.

25.

Qin

Zheng

Zeng

, et al. Dynamically aggregating MLPs and CNNs for skin lesion segmentation with geometry regularization. Comput Methods Programs Biomed 2023; 238: 107601.

26.

Deng

Ding

, et al. RMMLP: Rolling MLP and matrix decomposition for skin lesion segmentation. Biomed Signal Process Control 2023; 84: 104825.

27.

Jiang

Zhang

Wang

. Efficient transformer-conv hybrid architecture for medical image segmentation. Biomed Signal Process Control 2023; 85: 105417.

28.

Chen

Kuang

Deng

, et al. Dual adversarial attention mechanism for unsupervised domain adaptive medical image segmentation. IEEE Trans Med Imag 2022; 41.

29.

Manivannan

Krishna

Sudheer

, et al. Multi class skin lesion detection and classification via teledermatology. In: 2025 IEEE international conference on advances in computing research on science engineering and technology (ACROSET), Indore, India, 2025, pp.1–6, DOI: 10.1109/ACROSET66531.2025.11280698.

30.

Efat

Hasan

SMM

Uddin

, et al. A multi-level ensemble approach for skin lesion classification using customized transfer learning with triple attention. PLoS One 2024; 19: e0309430.

31.

Fatima

Irtaza

Mehmood

, et al. A self-attention-based deep learning framework for skin cancer detection and classification. Front Oncol 2022; 12: 931141.

32.

Ronneberger

Fischer

Brox

. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, et al. (eds) Medical image computing and computer-assisted intervention – MICCAI 2015. MICCAI 2015 (Lecture notes in computer science, vol. 9351). Cham: Springer, 2015. DOI: 10.1007/978-3-319-24574-4_28.

33.

Zhou

, et al. LeaNet: lightweight U-shaped architecture for high-performance skin cancer image segmentation. Comput Biol Med 2024; 169: 107919.

34.

Lei

Xia

Jiang

, et al. Skin lesion segmentation via generative adversarial networks with dual discriminators. Med Image Anal 2020; 64: 101716.

35.

Lee

, et al. As-Net: attention synergy network for skin lesion segmentation. Expert Syst Appl 2022; 201: 117112.

36.

Dong

Dai

Zhang

, et al. Learning from dermoscopic images in association with clinical metadata for skin lesion segmentation and classification. Comput Biol Med 2023; 152: 106321.

37.

Huang

Deng

Yin

, et al. ADF-Net: a novel adaptive dual-stream encoding and focal attention decoding network for skin lesion segmentation. Biomed Signal Process Control 2024; 91: 105895.

38.

Dai

Dong

, et al. Ms RED: a novel multi-scale residual encoding and decoding network for skin lesion segmentation. Med Image Anal 2022; 75: 102293.

39.

Cao

Wang

Chen

, et al. Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky L, Michaeli T and Nishino K (eds) Computer vision – ECCV 2022 workshops. ECCV 2022 (Lecture notes in computer science, vol. 13803). Cham: Springer, 2022, pp.205–218.

Deep neural network-based robust framework for automated skin lesion segmentation and analysis

Abstract

Background

Objective

Methods

Results

Conclusions

Keywords

Introduction

Related work

Proposed method

Pre-processing

Enhancement of skin images: Morphological operations

Adaptive Wiener filtering for noise reduction in skin images

Post-processing module and DNN for skin image segmentation

Role of DNN in skin image segmentation

Otsu and double thresholding for skin image segmentation

Hyperparameter selection

Skin lesion segmentation algorithm

Dataset description and performance evaluation

Dataset description

Performance evaluation metrics

Results and analysis

Evaluation of the proposed method across ISIC datasets

Ablation study

Computational efficiency and runtime comparison

Statistical analysis and observations

Comparison with existing methods

Cross-dataset evaluation on PH2 and HAM10000

Discussion

Conclusion

Footnotes

Abbreviations

ORCID iDs

Ethical approval

Author contributions

Funding

Declaration of conflicting interests

Guarantor

References