Sage Journals: Discover world-class research

Abstract

Objective

Breast cancer detection is critical for timely and effective treatment, and automatic detection systems can significantly reduce human error and improve diagnosis speed. This study aims to develop an accurate and robust framework for classifying breast cancer into benign and malignant categories using a novel machine learning architecture.

Methods

We propose a dense-ResNet attention integration (DRAI) architecture that combines DenseNet and ResNet models with three attention mechanisms to enhance feature extraction from the BreakHis dataset. The attention mechanisms focus on regionally important features, improving classification accuracy. A triple-level ensemble (TLE) method combines the performance of multiple models, further enhancing prediction accuracy.

Results

The proposed DRAI architecture with TLE achieves an accuracy of 99.58% in classifying breast cancer into benign and malignant categories, surpassing existing methodologies. This high accuracy demonstrates the effectiveness of the fusion architecture and its ability to reduce manual errors in breast cancer diagnosis.

Conclusion

The DRAI architecture with TLE provides a robust, automated framework for breast cancer classification. Its exceptional accuracy lays a solid foundation for future advancements in automated diagnostics and offers a reliable method for aiding early breast cancer detection.

Keywords

Dense-ResNet attention integration (DRAI)transfer learning attention mechanisms triple-Level attention (TLE)breast cancer classification BreakHis

Introduction

Cancer is a disease in which some of the body’s cells grow uncontrollably and spread to other parts of the body.¹ Breast cancer, in its various forms, originates in the breast tissue when abnormal cells multiply uncontrollably, resulting in the formation of a mass or tumor, most commonly found within the milk ducts or lobules.² Tumors manifest in two distinct forms: non-cancerous, known as benign, or cancerous, referred to as malignant. Siegel et al.³ performed a study where they claim that the mortality of breast cancer is very high when compared to other types of cancer. In 2020, breast cancer diagnoses affected 2.3 million women worldwide, resulting in 685,000 fatalities. Based on information from the World Health Organization, by the close of that year, 7.8 million women previously diagnosed with breast cancer within the past 5 years were alive, solidifying its status as the most widespread cancer globally.⁴ Breast cancer incidence is noted to be higher in developed nations than in other parts of the world.⁵

One crucial factor contributing to the global disparity in mortality rates was the delay in diagnosis.⁶ It was found that timely detection of breast cancer significantly reduced its health impact.^7,8 However, various detection methods—mammography, histopathological examinations, clinical/self-assessments, ultrasound, magnetic resonance imaging (MRI), molecular imaging, liquid biopsies, and 3D mammography—experienced intrinsic limitations. Challenges in dense breast tissue elevated the risk of false results in mammography.⁹ Inherent errors were identified in histopathology, and clinical and self-assessments were subject to subjective judgment. Manual disease diagnosis necessitated a high clinician expertise.¹⁰ The effectiveness of ultrasound varied, MRI incurred high costs, and biopsies lacked absolute precision, collectively impeding comprehensive breast cancer diagnosis.

Physicians grappled with interpreting subtle indicators like dense breast tissue and experienced fatigue from exhaustive assessments, impacting timely and accurate detection. Pathologists meticulously analyzed tissue samples, causing delays in cancer detection. Diagnosis and interpretation errors stem from inadequate training, memory lapses, biases, and failures in secondary reviews.¹¹ The morphological standards employed to categorize histopathological images carried some subjectivity, resulting in an average diagnostic agreement among pathologists of around 75%.¹² The conventional treatment pathway, involving confirmatory tests and specialist consultations, prolonged the duration before commencing therapy.

Automatic detection systems, particularly computer-aided diagnosis (CAD) emerged as a cornerstone in addressing the limitations inherent in traditional diagnostic approaches for various diseases, including breast cancer. CAD leverages computer algorithms to improve the accuracy and efficiency of medical diagnoses.^13,14 The manual analysis of complex-natured histopathological images was found to be a time-consuming and tedious process prone to errors. To address the limitations of human interpretation, CAD systems were deemed essential and crucial in the breast cancer classification problem to facilitate the diagnosis process and improve survival chances.¹⁵ CAD algorithms utilize advanced technologies, such as artificial intelligence (AI) and machine learning (ML),^16–19 to enhance the accuracy and efficiency of detection.^20,21 Supervised classification was often utilized for the quantitative analysis of biomedical imaging.²² The potential of AI to supplant traditional expert systems was acknowledged, enabling the swift attainment of preliminary diagnoses within remarkably brief periods. By swiftly analyzing vast amounts of data from imaging scans and pathology samples, CAD systems identify subtle abnormalities that might be missed or misinterpreted by human assessment. This rapid and meticulous analysis significantly reduced the time required for diagnosis, enabling earlier detection and intervention.²³

Consequently, CAD not only enhanced accuracy but also expedited the diagnostic process, ultimately leading to improved patient outcomes.²⁴ Research demonstrated that the performance of radiologists could be enhanced by supplying them with the outcomes generated by a CAD system. In the proposed approach, a custom convolutional neural network (CCNN) model was utilized, integrating the robust architectures of DenseNet (DN) and ResNet (RN), to automate the classification of breast cancer images into distinct categories, particularly benign and malignant. This integration with advanced convolutional neural network (CNN) models significantly elevated the precision and efficiency of detection processes. By rapidly processing extensive datasets obtained from imaging scans and pathology samples, these CAD systems demonstrated a superior ability to discern subtle abnormalities that might evade human assessment. This heightened ability for rapid and meticulous analysis substantially reduced the time required for accurate diagnosis, facilitating early detection and intervention strategies. This acceleration in the diagnostic workflow ultimately translated to improved patient outcomes.

Motivated by the remarkable strides in computer vision and image processing facilitated by deep learning (DL), numerous researchers were prompted to employ this approach in histopathological image classification. The realm of ML and DL methodologies was explored through multiple studies, revealing substantial opportunities for advancement. Significant interest and potential for enhancing diagnostic applications in healthcare were demonstrated by advances in ML and DL methodologies.²⁵ In prior attempts to automate breast cancer detection, challenges were encountered in accurately discerning nuanced irregularities, handling diverse imaging datasets inefficiently, and relying on basic classification models ill-equipped to handle complex features, leading to a diminished accuracy. Furthermore, vulnerabilities were observed in certain techniques for managing variations in imaging quality and breast tissue compositions, emphasizing the demand for more refined models capable of intricate feature extraction and streamlined classification processes. These limitations paved the way for novel methods, notably the integration of sophisticated architectures such as DN and RN, aimed at addressing these shortcomings and propelling advancements in breast cancer detection.

In previous studies on breast cancer classification, researchers encountered a range of challenges, including imbalanced datasets that impacted the ability of models to generalize and make accurate predictions for minority classes. A notable gap existed in the careful tailoring of CNN architectures specifically for breast cancer images, potentially limiting the performance of models. Furthermore, there were limitations in the exploration of advanced techniques to achieve optimal accuracy in breast cancer classification.²⁶ Effective data pre-processing, such as selecting the most relevant features, was also often overlooked, impacting the overall model performance.²⁷ Sophisticated transfer learning (TL) methods were underutilized, and there was a lack of emphasis on integrating multiple data modalities, leading to a lack of comprehensive understanding. Moreover, transparency in the decision-making processes of models was lacking, and ethical considerations were not adequately addressed in previous research efforts within this domain. These gaps highlight the need for further investigation and improvement in breast cancer classification methodologies.

Ensembling techniques were found to be underutilized within this domain. However, multilevel ensembling emerged as a pivotal strategy for addressing challenges and enhancing the dependability and precision of classification models. Analogous to insights gathered from diverse perspectives, predictions were amalgamated from disparate sources through multilevel ensembling, thereby encapsulating a broader spectrum of patterns and features present in the data.²⁸ Consequently, this amalgamation fostered the development of more robust models with superior overall performance. Moreover, comprehension of the confidence and reliability associated with final classification decisions was facilitated by integrating predictions from multiple models through multilevel ensembling. Such transparency was deemed paramount in medical image analysis, wherein decisions had a tangible impact on patient care and treatment planning. Despite these features, ensembling techniques have been underutilized. A better generalized profound could be built by employing them.

To address the above-mentioned limitations, our focus has been directed toward gaining a comprehensive understanding of the challenges inherent in breast cancer classification. Recognizing the subtleties associated with imbalanced datasets, the customization of CNN architectures, and the exploration of advanced techniques, targeted research questions were meticulously formulated. These questions served as a strategic pathway for overcoming the limitations observed in prior research. The first and foremost step is to address the major research inquiries listed below, which must be answered to establish a robust architecture and enhance the acceptability of the study.

Research Question 1 (RQ1): How can an equitable distribution of classes in the dataset be ensured for optimal training, considering variations in sample numbers among classes?—Variances in sample counts across classes might lead to majority class dominance, potentially impeding accurate predictions for minority classes. Therefore, striving for a balanced class distribution is crucial.²⁹

Research Question 2 (RQ2): What methods can be employed to prioritize essential features, particularly significant areas or regions within the images?—Certain image regions might contain irrelevant or redundant information affecting feature extraction, while others might significantly contribute to class identification, warranting focus on the crucial regions.³⁰

Research Question 3 (RQ3): Does the ensemble approach employing multiple models offer distinct advantages over a singular model in breast cancer classification?—Considering the diverse characteristics of breast cancer, exploring the effectiveness of ensembling multiple models versus relying on a singular model could potentially enhance classification accuracy and robustness in detecting various manifestations of the disease.³¹

Research Question 4 (RQ4): What are the advantages of employing a multilevel ensemble approach over a single-level ensemble in improving the reliability and robustness of breast cancer classification models?—When evaluating the effectiveness and applicability of classification models, comparing the utilization of multilevel ensembling to single-level ensembling in terms of enhancing overall performance and efficacy is essential.³²

To overcome the inherent limitations of previous research and effectively address the research mentioned above questions, a multifaceted approach was meticulously developed, consisting of the following strategic steps:

Augmenting imbalanced dataset:

To address the issue of an imbalanced dataset, the training set was augmented using techniques such as rotating, shifting, zooming, and flipping.

A fill mode of “nearest” was used to ensure that the augmented images remained realistic and useful for training.

These augmentations helped to increase the diversity and size of the dataset, which in turn improved the model’s ability to generalize and perform better on minority classes.

Customization and fine-tuning of pre-trained models:

DN and RN were integrated into the model architecture, leveraging their pre-trained weights through TL.

TL allowed us to significantly reduce the amount of data and time required to train the models while improving their performance.

Fine-tuning and necessary customizations were applied to optimize these pre-trained models for the specific task, ensuring better accuracy and efficiency.

Integration of attention mechanism to more distinguishable feature representation:

Attention mechanisms such as channel attention (CA), soft attention (SA), and squeeze, and excitation attention (SEA) were implemented to identify important features from different regions of an image.

These attention mechanisms enhanced the model’s ability to focus on relevant features, improving feature extraction and overall model performance.

By incorporating these techniques, it was ensured that the model could better distinguish between significant and less significant features, leading to improved classification results.

Integration of ensemble approach for enhanced applicability:

To enhance overall performance, an ensemble approach was applied by combining the outputs of different models.

This ensemble method increased the reliability and robustness of the predictions, making the model more applicable to various scenarios.

By leveraging the strengths of multiple models, better generalization and higher accuracy on unseen data were achieved.

The remaining part of this article is organized as follows: the “Dataset description” section outlines the datasets employed. The “Method” section covers the research methodology. The experimental results are discussed in the “Results” section. Comparisons with recent studies and methodologies are presented in the “Literature review” section. The “Threats to validity” section addresses the study’s limitations. Lastly, the “Conclusion and future work” section presents the conclusion.

Literature review

The landscape of breast cancer detection witnessed significant progress in healthcare, particularly through the integration of tailored CNN models.³³ Recent advancements highlighted the effectiveness of CCNN architectures in breast cancer image classification, surpassing generic models by capturing specific diagnostic features. A pioneering hybrid CNN-RNN approach by Szegedy et al.³⁴ and Yosinski et al.,³⁵ utilizing pre-trained Inception-V3 through transfer learning, marked a promising stride in enhancing breast cancer detection accuracy. The integration of recurrent neural networks (RNNs), specifically long short-term memory (LSTM) networks,³⁶ retained contextual information, presenting a compelling pathway for refining breast cancer diagnosis. Additionally, DL methods³⁷ demonstrated remarkable results in various computer vision tasks.

In parallel, TL has emerged as a pivotal strategy in medical image analysis, utilizing pre-trained models from extensive annotated databases such as ImageNet. The primary benefits of TL include the reduction of training duration, enhancement of neural network performance, and the requirement for only a small amount of data.³⁸ The effectiveness of fuzzy TL, which leveraged neural networks to manage uncertainty, was demonstrated in applying model parameters across tasks, illustrating the potential for improving predictive accuracy in dynamic environments.³⁹ Byra et al.⁴⁰ employed the VGG19 pre-trained TL model in their work. Yan et al.⁴¹ extended TL to breast cancer classification, demonstrating that fine-tuning pre-trained models can achieve high accuracy comparable to expert performance, highlighting TL’s versatility across various classification tasks. Deniz et al.⁴² utilized the AlexNet⁴³ pre-trained model structure for feature extraction from histopathological images. Moreover, addressing the challenge posed by the large size of pathological images through segmentation into patches, combined with the use of RNNs for retaining contextual information, represents a substantial advancement in breast cancer detection.

The accuracy of cancer diagnosis, hindered by statistical, distributional, and human errors arising from the manual analysis of extensive tumor tissue slides by pathologists, has been a longstanding challenge. Momentum has been gained in the pursuit of automated and reproducible methodologies. A robust tool in medical image-based examinations, CAD, has been recognized for its cost-effective solutions and the minimization of unnecessary expenses.⁴⁴ However, the diagnostic process has been impeded by the laborious pre-processing, segmentation, and manual feature extraction required by traditional image processing and ML techniques.

The revolution of cancer diagnosis by automating intricate feature extraction from medical images by DL was marked by the advent of CNNs.⁴⁵ A significant milestone in large-scale image and video recognition was achieved using CNNs, showcasing remarkable success.⁴⁵ According to Spanhol et al.’s study,⁴⁶ the AlexNet model was applied to categorize breast cancer pathology images, distinguishing between benign and malignant classifications. Fuzzy logic was integrated into CNNs to enhance the model’s ability to handle ambiguities in genomic data, offering high interpretability and reasonable accuracy.⁴⁷ The ability of neuro-fuzzy systems to combine rule-based knowledge and neural network training was utilized to offer a mechanism for improving classification performance.⁴⁸ The potential of diverse DL network architectures, including GoogLeNet,⁴⁹ AlexNet, VGG16, and ConvNet, in identifying breast cancer traits has been underscored by recent successes, ranging from nucleus detection in colon cancer to mitosis detection in breast cancer images.

Prominence in breast cancer detection has been gained by CCNN models due to their efficacy in analyzing medical images. The utilization of a CNN image recognition and classification model in the ImageNet competition is likely to have occurred first.⁵⁰ A custom shallow CNN model for breast cancer classification was developed by the authors in the study by Masud et al.⁵¹ In the study by Sohail et al.,⁵² a multi-phase deep CNN-based mitosis detection framework for H&E stained breast cancer images was suggested. The study by Wang et al.⁵³ introduced a multiview CNN utilizing the Inception-V3 pre-trained model for the classification of breast lesions into malignant and benign categories using breast ultrasound images.

Efficiency in CAD was enhanced by the research paper on histopathological breast cancer image classification using multiple instance learning (MIL).⁵⁴ MIL was compared with single instance classification, and its advantages were highlighted, influencing histopathological image analysis and potentially advancing breast cancer diagnosis systems. The MIL approach was employed in the study by Sarath et al.,⁵⁵ wherein patch-level features were aggregated through a weighted average to facilitate a thorough image-level representation for predicting benign or malignant masses in mammograms. Furthermore, the ResHist model proposed by Gour et al.⁵⁶ offers a crucial solution for automated breast cancer diagnosis from histopathological images, addressing the growing need for accurate and swift diagnostic tools amid rising breast cancer cases worldwide. Its potential to revolutionize clinical practice is highlighted by its high accuracy, stability, and effectiveness in tumor classification.

To address the limitations identified in breast cancer classification, a comprehensive ML approach was employed. Data augmentation techniques were implemented to mitigate the challenges posed by the imbalanced dataset, ensuring a more balanced representation of both benign and malignant cases. Fine-tuning of pre-trained models was conducted to enhance performance, tailoring them specifically to the breast cancer dataset. Additionally, customized attention integration was incorporated, allowing the model to focus on critical cancer regions, significantly improving diagnostic accuracy while reducing human errors associated with manual slide analysis. For efficient feature extraction and the learning of complex patterns from large pathological images, DN and RN architectures were utilized. Their capacity to preserve feature flow and model deep relationships proved essential in improving classification outcomes. Finally, model ensembling was performed, combining multiple model predictions to overcome individual limitations and boost overall performance.

Dataset description

Despite the rapid advancements, histopathological image analysis was identified as a predominant method for diagnosing breast cancer.⁵⁷ The efficacy of CNNs depends on the availability of a comprehensive and diverse dataset, that is, accurately annotated.⁵⁸ The publicly available “BreakHis” dataset,⁵⁹ was utilized in the study, comprising 7909 breast tumor tissue images from 82 patients and serving as a pivotal resource for breast cancer histopathological image classification research. A comprehensive array of images at varying magnification levels (40 $\times$ , 100 $\times$ , 200 $\times$ , and 400 $\times$ ) was allowed for the exploration of breast tumor tissue at diverse scales and resolutions. The complete dataset can be retrieved from “Laborat’orio Visão Robótica e Imagem.”⁶⁰

Structured into benign and malignant categories, the dataset consisted of 2480 benign and 5429 malignant images. Each standardized image was sized at $700 \times 460$ pixels and was presented in the RGB color space with three channels, each channel having an 8-bit pixel depth. Compatibility with various image processing and ML tools was ensured by the PNG format.

The “BreakHis” dataset was meticulously curated in collaboration with the P&D Laboratory – Pathological Anatomy and Cytopathology, located in Parana, Brazil. The dataset’s credibility and relevance in the field of breast cancer research were underscored by this collaborative effort. The dataset had been designed with the utmost care to facilitate future benchmarking, research, and evaluation endeavors. Researchers consider the “BreakHis” dataset to be an invaluable resource for advancing the understanding of breast cancer histopathological images, providing a robust foundation for the development and evaluation of diagnostic and classification algorithms. Figures 1 and 2 showcase the sample images of the benign and malignant classes, respectively, sourced from the “BreakHis” dataset.

Figure 1.

Image samples of benign class.

Figure 2.

Image samples of malignant class.

The dataset was augmented, and the resulting augmented dataset was made publicly available. This dataset can be found by Mohammad Sakif Alam.⁶¹ It is organized into three directories: train, test, and validation, each containing two classes: benign and malignant. Table 1 presents the distribution of images across these directories.

Table 1.

Augmented dataset image numbers by directory and class.

Directory	Class	Number of images
Train	Benign	7984
	Malignant	8000
Test	Benign	372
	Malignant	815
Validation	Benign	372
	Malignant	814

Method

The research methodology employed in this study encompassed a systematic process aimed at enhancing classification performance. Initially, raw data was collected and preprocessed to create a refined dataset. This processed dataset was then partitioned into training, test, and validation sets, with augmentation techniques applied to the training set to improve model generalization.

Our proposed approach, DRAI architecture, integrates attention mechanisms into DN and RN models. Specifically, channel, soft, squeeze, and excitation attention mechanisms were incorporated to augment feature extraction and classification performance. Various pre-trained models including DenseNet-121, DenseNet-169, DenseNet-201, ResNet-50, ResNet-101, and ResNet-152 were utilized in this architecture.

Histopathological images, such as those in the “BreakHis” dataset, require models capable of capturing fine-grained features across varying resolutions. DN’s dense connections facilitate efficient feature reuse and gradient flow, effectively integrating low- and high-level features while reducing the risk of overfitting. ResNet’s skip connections resolve the vanishing gradient problem, enabling deeper architectures to capture hierarchical features essential for distinguishing between cancerous and non-cancerous tissues.

Other models, such as VGG and inception, were not chosen due to their limitations and suboptimal performance on the BreakHis dataset. VGG’s high parameter count increases the risk of overfitting on small datasets, while inception’s complex architecture struggles to generalize effectively for texture-rich datasets like “BreakHis.” In contrast, DenseNet and ResNet demonstrated superior performance, and their hybrid architecture combines DN’s feature reuse with RN’s depth and robustness, offering an optimal balance of feature extraction and computational efficiency, making it particularly well-suited for histopathological image analysis using augmentation techniques.

Ensembling techniques were systematically employed across three layers to amalgamate predictions from various models, employing majority voting, softmax averaging, and weight averaging. Initially, independent ensembles of DN and RN models were constructed in the first layer. Subsequently, outputs from the initial layer were amalgamated and integrated with predictions from both DN and RN models in the subsequent layer. Finally, in the third layer, results from the preceding layer were consolidated utilizing majority voting, softmax averaging, and weight averaging techniques, culminating in the triple level ensemble (TLE) strategy, which produced the ultimate output classification.

To assess the effectiveness of our proposed methodology, comprehensive testing and validation processes were conducted. The combined TLE output classification served as the outcome of the model, representing a culmination of the integrated attention mechanisms and ensemble techniques employed in this study. This innovative approach was not only aimed at maximizing classification accuracy but also held promise for advancing the frontiers of medical image analysis, paving the way for more precise and reliable diagnostic tools in clinical practice. The sequential workflow is depicted in Figure 3.

Figure 3.

Sequential workflow of the research methodology.

Data preprocessing

Preprocessing played a vital role in both data and feature selection processes.^62,63 In this study, the BreakHis dataset serves as the primary source of data, subjected to meticulous preprocessing procedures to optimize its suitability for training a DL model. The series of preprocessing steps are delineated as follows:

Dataset partitioning: Initially, the original dataset undergoes partitioning into three distinct subsets: training, testing, and validation sets. This partitioning adheres to a predetermined ratio, allocating 70% of the dataset for training while earmarking 15% each for testing and validation. Such allocation ensures a balanced distribution of data for model training, performance evaluation, and generalization assessment. The split-folders library is adeptly employed to facilitate this partitioning process while preserving the inherent class distribution within the dataset.

Data augmentation: To imbue the training dataset with greater diversity and resilience, data augmentation methodologies are judiciously applied to the training images. The following data augmentation techniques were applied:

Rotation (up to 180°): Rotates images to introduce different orientations.

Width/height shift (0.1): Shifts images slightly to simulate position variation.

Zoom (0.1): Zooms in or out to create scale variations.

Horizontal/vertical flip: Flips images to mirror them in different directions.

Fill mode (nearest): Fills in gaps caused by transformations using the nearest pixels.

These augmentation techniques help balance the dataset by generating additional samples for both benign and malignant classes, addressing the class imbalance issue. By increasing the diversity of the training data, the model is better equipped to generalize to new, unseen images, reducing overfitting. This makes it more effective in recognizing subtle patterns across different tumor forms, thereby improving performance in breast cancer classification.

Image standardization: Before commencing the training process, all images within the dataset are uniformly resized to adhere to a standardized dimension of $224 \times 224$ pixels. This standardization of image size fosters uniformity in input dimensions across the dataset, thereby facilitating seamless integration with DL models predicated on fixed input dimensions.

Data normalization: Integral to the preprocessing pipeline is the normalization of pixel values within the images. This normalization process ensures that pixel values are constrained within a specific range, thereby engendering numerical stability throughout the training process. In the context of this study, normalization is orchestrated through the preprocessing_function parameter inherent within the ImageDataGenerator class, leveraging the prescribed preprocessing function intrinsic to the DN architecture.

Data batching: After preprocessing, the dataset is methodically organized into discrete batches, with each batch comprising a predetermined number of images, set at 16 images per batch. This batching methodology serves as a foundational component of the training regimen, affording operational efficiency and optimal memory utilization.

Through the meticulous execution of these preprocessing steps, the dataset is meticulously primed for subsequent training of the DL model. The resultant preprocessed data embodies heightened diversity, standardization, and compatibility, thereby laying a robust foundation for the forthcoming stages of model training and evaluation.

Proposed DRAI architecture

The proposed architecture integrated two renowned pre-trained models, DN and RN, each offering three distinct variants: DN-121, DN-169, DN-201 for DN, and RN-50, RN-101, and RN-152 for RN. However, as these models were primarily designed for ImageNet datasets, challenges arose in adapting them to diverse datasets due to inherent design biases. They were augmented with additional custom convolutional layers to mitigate these limitations and tailor the models to our dataset.

The architecture was comprised of two primary blocks of convolutional layers designed to extract and refine features from breast cancer images. Initially, the input tensor, shaped as (none, height, width, and channels), underwent a series of four convolutional layers. Each layer was utilized with 128 filters having varying kernel sizes of ( $7 \times 7$ ), ( $5 \times 5$ ), ( $3 \times 3$ ), and ( $1 \times 1$ ) to capture a range of spatial details. Batch normalization was applied after each convolution to stabilize and accelerate training by normalizing activations. Max-pooling layers were used to reduce the spatial dimensions, summarizing essential features and decreasing computational complexity. An attention module was incorporated to enhance the model’s focus on the most relevant parts of the image, further refining feature extraction.

The second block was designed to mirror the first, with four additional convolutional layers using 256 filters of the same kernel sizes, allowing for the extraction of more complex and high-level features. Similar to the first block, batch normalization, max-pooling, and attention modules were applied sequentially, ensuring that the model was able to learn from the most significant features while maintaining stability and efficiency.

The incorporation of attention modules was motivated by the necessity to enhance the model’s capacity to focus on relevant features within the data. These modules facilitated the reduction of redundant features, thus improving the efficiency of feature extraction. By selectively attending to informative features, the attention mechanisms aided in streamlining the learning process and enhancing the model’s generalization capabilities across diverse datasets.

Three distinct mechanisms such as channel attention (CA), soft attention (SA), and squeeze-and-excitation attention (SEA) were integrated into the attention module to achieve these objectives. CA allowed the model to adaptively recalibrate channel-wise feature responses, emphasizing informative channels while suppressing irrelevant ones. More details about CA can be found in the “Channel attention” part. SA mechanisms enabled the model to dynamically attend to different regions of the input, effectively capturing spatial dependencies and enhancing feature representation. Further information about SA is provided in the “Soft attention” part. Meanwhile, the SEA mechanism facilitated the model in learning to selectively emphasize important features by recalibrating channel-wise feature responses. Additional insights into the SEA mechanism can be found in the “Squeeze-and-excitation attention” part.

Activation functions,⁶⁴ crucial for handling nonlinear problems, were employed across all convolutional layers. In the proposed architecture, rectified linear unit (ReLU) activation functions were utilized to mitigate the vanishing gradient problem and enhance feature extraction capabilities.

The outputs from the second block were flattened and passed through three dense layers housing 1024, 512, and two neurons, respectively. The initial two fully connected layers employed ReLU activation to capture complex features, while the final layer utilized sigmoid activation for class probability representation. Figure 4 depicts a detailed architectural overview of the proposed CCNN model. The interaction between these layers ensures a comprehensive analysis of the input data, facilitating improved classification accuracy in breast cancer detection.

Figure 4.

Proposed dense-ResNet attention integration (DRAI) architecture.

Feature extraction process

We implemented a TL approach using the DN and RN models, which were pre-trained on ImageNet. The top fully connected layers were excluded (include_top $=$ False), and global average pooling (pooling $=$ ‘‘avg’’) was applied. The extracted features were reshaped into a (16, 16, 8) tensor before passing through a sequence of convolutional layers with filters of sizes $7 \times 7$ , $5 \times 5$ , $3 \times 3$ , and $1 \times 1$ . Each convolution was followed by ReLU activation and batch normalization to improve training stability. Max-pooling layers were included to reduce spatial dimensions and emphasize key features. The resulting feature maps were then flattened and passed through fully connected layers with 1024 and 512 neurons, respectively, utilizing ReLU activations for capturing high-level representations. Finally, a dense output layer with sigmoid activation was employed to generate class probabilities.

Figure 5 presents the feature map activations across different layers of the TL model. Each row corresponds to activations from a unique layer in the network:

Input layer (input_1): Displays the preprocessed input image, representing raw pixel data.

Zero padding (zero_padding2d): Shows feature maps after padding, preparing the input for convolution operations.

Convolution (conv2d): Highlights the activation maps after applying 64 convolutional filters, capturing patterns like edges and textures.

Batch normalization (batch_normalization): Normalized activations to improve training stability and ensure convergence.

ReLU activation (activation): Outputs after introducing nonlinearity through the ReLU function.

Max pooling (max_pooling2d): Downsampled feature maps, reducing spatial resolution while retaining important features.

Figure 5.

Feature extraction after activation of each layer (one image as example).

Each subplot visualizes up to five filters per layer, rendered with the “viridis” colormap for better clarity. The figure demonstrates how the model progressively processes and transforms input data through successive layers, extracting hierarchical features critical for classification. The dense layer activations provide class probabilities via the sigmoid function. While the example focuses on a single sample and a subset of layers, this methodology allowed the generation of thousands of feature maps, significantly improving algorithm performance.

Attention mechanisms

In our study, different combinations of three attention blocks were leveraged to underscore the importance of selectively emphasizing pertinent features in the input. Each attention mechanism is crucial as it extracts information from various regions of the input, as illustrated by the gradient-weighted class activation map (Grad-CAM) images.

Channel attention: CA refers to a neural network component that learns to allocate weights to individual channels within a feature map. This process prioritizes crucial channels by assigning them higher weights based on their importance.⁶⁵ The mechanism is defined by the following equation:

w_{c} = σ (W_{2} δ (W_{1} x))

(1)

where

x

is the input feature map (

C \times H \times W

W_{1}

and

W_{2}

are weight matrices,

δ

is the ReLU activation function,

σ

is the sigmoid activation function, and

w_{c}

represents the calculated attention weights.⁶⁶

y_{c} = w_{c} ⊙ x

(2)

where

⊙

represents the element-wise multiplication.

Figure 6 illustrates the region visualization for CA. The first image shows the original input from the dataset, while the second image shows the Grad-CAM visualization, highlighting the regions that the CA mechanism focuses on for feature extraction.

Figure 6.

Channel attention: (a) original image from the dataset, (b) Grad-CAM image highlighting the regions focused on by the CA mechanism. Grad-CAM: gradient-weighted class activation map; CA: channel attention.

Soft attention: SA attentively concentrates on specific input elements by assigning weights.⁶⁷ This mechanism can be mathematically represented as follows:

α_{i} = \frac{\exp (e_{i})}{\sum_{j = 1}^{T} \exp (e_{j})}

(3)

where

α_{i}

is the attention weight,

T

is the length of the input sequence, and

e_{i}

is a scalar score.⁶⁸

Figure 7 illustrates the region visualization for SA. The first image shows the original input from the dataset, while the second image shows the Grad-CAM visualization, highlighting the regions that the SA mechanism focuses on for feature extraction.

Figure 7.

Soft attention: (a) original image from the dataset, (b) Grad-CAM image highlighting the regions focused on by the SA mechanism. Grad-CAM: gradient-weighted class activation map; SA: soft attention.

Squeeze-and-excitation attention: The SEA module comprises a two-step process: a squeeze operation and an excitation operation.⁶⁹ The operations are defined as follows:

z = GlobalAveragePooling (x)

(4)

s = ReLU (W_{2} σ (W_{1} z))

(5)

y = s ⊙ x,

(6)

where

W_{1} \in R^{C / r \times C}

and

W_{2} \in R^{C / r \times C}

are the weight matrices,

r

is the reduction ratio, and

⊙

represents the element-wise multiplication.⁷⁰

Hu et al.⁷⁰ proposed a “Squeeze-and-Excitation (SE)” network focusing on channel-wise attention. The SE module, depicted by the solid line box in Figure 8, comprises three components: squeezing, excitation, and scaling. Squeezing involves global average pooling (GAP) to produce a $1 \times 1 \times c$ vector from the $H \times W$ convolution feature map. Excitation entails a bottleneck structure with two fully connected layers to capture channel correlations. This compression-reconstruction process reduces model parameters and complexity between channels. Finally, the sigmoid function normalizes the weights to a range of 0 to 1, and these weights are applied to the features of each channel through scaling.

Figure 8.

Squeeze and excitation block.

Figure 9 illustrates the region visualization for SEA. The first image shows the original input from the dataset, while the second image shows the Grad-CAM visualization, highlighting the regions that the SEA mechanism focuses on the feature extraction.

Figure 9.

Squeeze-and-excitation attention: (a) original image from the dataset, and (b) Grad-CAM image highlighting the regions focused on by the SEA mechanism. Grad-CAM: Grad-CAM: gradient-weighted class activation map; SEA: squeeze-and-excitation attention.

Triple-level ensemble

The integration of pre-trained DN and RN models using weighted averaging (WAvg), softmax averaging (SAvg), and majority voting (MV) techniques across three hierarchical levels was conducted in the research. This process resulted in the development of the final dense-ResNet (DR) attention integration model.

In the first layer, each variant of DN and RN was paired with various attention modules and merged into a unified variant of DN or RN. For instance, DN-121 CCNN, channel attention CNN (CACNN), soft attention CNN (SACNN), and squeeze and excitation attention neural network (SEACNN) versions were combined into a single DN-121 version using majority voting, softmax averaging, and weighted averaging techniques. This procedure was replicated for every variant of DN and RN.

In the second layer, the outputs from the first layer of DN were aggregated, and the same procedure was applied to RN. For example, the outputs from DN-121 (MV, SAvg, and WAvg), DN-169 (MV, SAvg, and WAvg), and DN-201 (MV, SAvg, and WAvg) obtained from the first layer were merged into DN (WAvg), and similarly for all RN versions into RN (WAvg).

In the third layer, the outputs from the second layer were combined. DN and RN were merged to produce the final ensemble result, predicting the ultimate output from the model. This ensemble technique was referred to as triple-level ensemble (TLE), which generated the final output of the model. The TLE is denoted in Figure 10.

Figure 10.

Triple-level ensemble (TLE) approach.

Justification of our proposed architecture

The core aim of this research was to develop an ensemble-based architecture tailored for precise breast cancer diagnosis and classification. The strategic use of kernels with diverse sizes within the model was considered an effective approach, enabling the capture of crucial details present in sample images, even in instances where certain areas were small or less distinct. This methodology ensured the preservation of vital information, facilitating the model’s ability to identify a wide spectrum of features. Smaller kernels, such as ( $1 \times 1$ ) and ( $3 \times 3$ ), were found to exhibit proficiency in capturing intricate details, encompassing edges, vertices, and texture within the images. Not only did they adeptly capture fine details, but they also aided in reducing computational overheads and weight sharing, thereby minimizing the number of back-propagation weights. Conversely, larger filter sizes like ( $5 \times 5$ ) and ( $7 \times 7$ ) were observed to possess a broader receptive field, enabling the identification of broader and higher-level characteristics in breast tumor images.

The integration of batch normalization after each convolutional layer served to counteract internal covariate shifts, thereby ensuring proper centering and normalization of input data. This normalization facilitated faster and more robust training by standardizing the inputs and enhancing the network’s adaptability. The inclusion of learnable parameters for scaling and shifting within batch normalization contributed to the network’s flexibility and assisted in mitigating overfitting. Max-pooling was justified in this context as it aided in dimensionality reduction by capturing the most salient features, allowing the model to focus on essential elements while discarding less critical information. This selective downsampling enhanced computational efficiency and prevented overfitting by promoting translation invariance.

Adam optimization, known for its adaptive learning rates, iteratively adjusted the network’s weights during training to minimize the disparity between predicted and actual values. By amalgamating the benefits of AdaGrad and RMSprop, Adam optimally balanced the advantages of different optimization algorithms, enhancing the model’s convergence speed and performance. Furthermore, the incorporation of SA, CA, and SEA mechanisms into the breast tumor CCNN model significantly amplified overall performance. These mechanisms bolstered feature extraction, aided in cancer cell localization, and facilitated contextual understanding, thereby augmenting accuracy and robustness in classification tasks.

Performance evaluation measures

To evaluate the models’ effectiveness, various metrics have been utilized, encompassing accuracy, precision, recall (sensitivity), F1-score, and specificity. The mathematical expressions defining these measures are provided below for clarity and reference.

A c c u r a c y (A c c) = \frac{T P + T N}{T P + T N + F P + F N}

(7)

P r e c i s i o n (P r e) = \frac{T P}{T P + F P}

(8)

R e c a l l (R e) = \frac{T P}{T P + F N}

(9)

F 1 - S c o r e (F 1) = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(10)

S p e c i f i c i t y (S p e) = \frac{T N}{T N + F P}

(11)

Experimental setup

The architectural model was fully executed on a Kaggle notebook utilizing an NVIDIA TESLA P100 GPU operating at a frequency of 1327 MHz. After obtaining distinct images with an input size of (224, 224, 3), the dataset was partitioned into three distinct subsets: training set (70%), validation set (15%), and test set (15%). Throughout the training phase, a batch size of 16 was employed, in conjunction with the Adam optimizer set at a learning rate of 0.001 and an epsilon value of 0.1.

Table 2 provides a comprehensive overview of the model complexities, including the number of parameters for each model, the total number of epochs, and the total time taken for training. Each model utilized the categorical cross-entropy loss function and was designed to be trained for a duration of 50 epochs. However, due to the implementation of an early stopping mechanism, some models were trained for fewer than 50 epochs. This mechanism was introduced to halt training when validation accuracy did not exhibit improvement over a specified number of consecutive epochs.

Table 2.

Model complexities and time.

Model	Attention variant	Number of parameters	Total number of epochs	Total time taken
DN-121	CCNN	16,177,538	50	2.3 hours
	CACNN	16,220,130	42	2.55 hours
	SACNN	22,032,802	37	2 hours
	SEACNN	23,064,580	39	1.83 hours
DN-169	CCNN	18,700,418	50	2.65 hours
	CACNN	18,743,010	50	3 hours
	SACNN	21,409,954	50	3.13 hours
	SEACNN	25,587,460	32	1.7 hours
DN-201	CCNN	24,333,954	46	3.14 hours
	CACNN	24,376,546	44	3.26 hours
	SACNN	27,043,490	41	2.95 hours
	SEACNN	31,220,996	38	2.47 hours
RN-50	CCNN	32,783,362	43	1.73 hours
	CACNN	32,825,954	42	2 hours
	SACNN	38,638,626	32	1.43 hours
	SEACNN	39,670,404	50	2.3 hours
RN-101	CCNN	51,801,602	48	2.66 hours
	CACNN	51,844,194	43	2.9 hours
	SACNN	57,656,866	34	2 hours
	SEACNN	58,688,644	46	2.73 hours
RN-152	CCNN	67,468,290	37	2.78 hours
	CACNN	67,510,882	50	4.3 hours
	SACNN	74,355,332	40	3.21 hours
	SEACNN	74,355,332	42	3.52 hours

DN: DenseNet; RN: ResNet; CCNN: custom convolutional neural network; CACNN: channel attention convolutional neural network; SACNN: soft attention convolutional neural network; SEACNN: squeeze and excitation attention convolutional neural network.

The time taken for training varied across models and is reported in the table. This time reflects the total duration required for each model to complete its training across the specified epochs, illustrating the relationship between model complexity and computational time. The detailed analysis provided in the table allows for a better understanding of how varying model architectures impact both the number of parameters and the time required for training, thereby contributing to the optimization of computational resources in DL applications.

The individual models’ time complexity, DN and RN, varies before the blending process. Table 3 shows the time complexities before and after blending. The DN models (DN121, DN169, and DN201) have time complexities ranging from 160 to 226 ms/step, while the RN models (RN50, RN101, and RN152) range from 148 to 273 ms/ step. These values reflect the computational demands of each model when evaluated individually.

Table 3.

Comparison of time complexity for models before and after blending.

Phase	Model name	Time complexity (ms/step)
Before blending	DN121	160
	DN169	193
	DN201	226
	RN50	148
	RN101	202
	RN152	273
After blending	Ensembled model	285

DN: DenseNet; RN: ResNet.

After blending, using an ensembling technique that combines the outputs of these models, the time complexity increases to 285 ms/step for the ensembled model. This increase is expected, as blending typically adds extra computational overhead due to the aggregation of predictions from multiple models. However, the improvement in accuracy or robustness gained from the ensemble may justify this additional time cost.

Results

This section presents a comprehensive analysis of the classification performance of the proposed model, substantiated by both numerical evidence and graphical representations. We provide detailed results and insights into the performance of different models and their ensemble combinations. The model comprises two main parts: the first part focuses on training the primary model using various pre-trained DN and RN architectures, enhanced through fine-tuning, customization, and the application of attention mechanisms to improve performance. The second part employs an ensembling approach to combine and generalize the results from the first part, thereby increasing robustness and overall improving accuracy, efficiency, and performance in the classification task. The proposed model utilizes a sophisticated three-level ensemble procedure to enhance classification accuracy. In the initial ensemble stage, three techniques—WAvg, SAvg, and MV—were employed. The outputs from these initial ensemble techniques were further refined through subsequent ensemble stages, where the highest values obtained were amalgamated using WAvg to achieve the final results.

Based on the classification performance metrics presented in Table 4, it is evident that the integration of CA and SA blocks with DenseNet-121 (DN121) does not yield significant performance improvements. Specifically, the DN121-CACNN variant, incorporating CA, shows a lower accuracy (97.39%) compared to the CCNN variant without any attention blocks, which achieves an accuracy of 98.99%. Similarly, the DN121-SACNN variant, which includes SA, also underperforms relative to the DN121-CCNN with an accuracy of 98.65%. In contrast, the DN121-SEACNN variant, which incorporates SEA, demonstrates superior performance with the highest accuracy of 99.07%, surpassing all other variants. This suggests that while the CA and SA blocks do not significantly enhance performance when combined with DN121, the SEA block effectively boosts the model’s accuracy and overall performance. These insights highlight the importance of selecting appropriate attention mechanisms to maximize the efficacy of DL models in classification tasks.

Table 4.

Classification performance of DN121 variants.

Model	Acc	Pre	Re	F1	Spe
DN121-CCNN	98.99	98.99	98.99	98.99	99.10
DN121-CACNN	97.39	97.42	97.39	97.36	94.86
DN121-SACNN	98.65	98.65	98.65	98.65	98.21
DN121-SEACNN	99.07	99.08	99.07	99.07	99.28

Acc: accuracy; Pre: precision; Re: recall; F1: F1-score; Spe: specificity; DN: DenseNet; CCNN: custom convolutional neural network; CACNN: channel attention convolutional neural network; SACNN: soft attention convolutional neural network; SEACNN: squeeze and excitation attention convolutional neural network.

Following the classification performance metrics outlined in Table 5, the DenseNet-169 (DN169) variants show that the integration of CA significantly enhances performance, with DN169-CACNN achieving the highest accuracy of 99.16%, along with superior precision, recall, F1-score, and specificity. The DN169-SACNN variant, incorporating SA, also performs well with an accuracy of 98.99%. In contrast, the DN169-SEACNN variant, which includes SEA, has a lower accuracy of 98.65%, indicating less effectiveness than the other attention variants. The DN169-CCNN variant without any attention mechanisms achieves an accuracy of 98.48%, demonstrating strong performance but slightly lower than the CA, SA, and SEA variants. These results suggest that CA is the most effective mechanism for improving DN169 model performance, followed by SA, while SEA offer less benefit.

Table 5.

Classification performance of DN169 variants.

Model	Acc	Pre	Re	F1	Spe
DN169-CCNN	98.48	98.49	98.48	98.48	97.12
DN169-CACNN	99.16	99.16	99.16	99.16	99.18
DN169-SACNN	98.99	98.99	98.99	98.99	98.52
DN169-SEACNN	98.65	98.65	98.65	98.65	98.21

According to the classification performance metrics provided in Table 6, the DenseNet-201 (DN201) variants demonstrate varying levels of effectiveness across different attention mechanisms. The DN201-SACNN variant, which incorporates SA, achieves the highest accuracy of 98.99%, indicating its strong performance. The DN201-CCNN variant without any attention mechanisms also performs well, with an accuracy of 98.82%. The DN201-CACNN variant, which includes CA, shows a slightly lower accuracy of 98.48%. The DN201-SEACNN variant, incorporating SEA, has the lowest accuracy of 98.06%, suggesting it is the least effective for this model. These results indicate that SA is the most beneficial for the DN201 architecture, followed closely by the model without any attention mechanisms, while CA and SEA offer less improvement.

Table 6.

Classification performance of DN201 variants.

Model	Acc	Pre	Re	F1	Spe
DN201-CCNN	98.82	98.82	98.82	98.82	98.73
DN201-CACNN	98.48	98.51	98.48	98.49	98.72
DN201-SACNN	98.99	98.99	98.99	98.99	98.52
DN201-SEACNN	98.06	98.06	98.06	98.06	97.07

Based on the classification performance metrics outlined in Table 7, the ResNet-50 (RN50) variants demonstrate varying effects of the attention mechanisms. Notably, the RN50-SEACNN variant, incorporating SEA, achieves the highest accuracy of 98.48%, indicative of its effectiveness. Additionally, the RN50-CCNN variant, devoid of attention mechanisms, also performs commendably with an accuracy of 98.40%. In contrast, the RN50-CACNN variant, integrating CA, exhibits a slightly lower accuracy of 98.15%. Furthermore, the RN50-SACNN variant, embedding SA, records the lowest accuracy of 97.56%, suggesting its limited efficacy for this model. These findings underscore the superior efficacy of SEA in optimizing the RN50 architecture, while also highlighting the competitive performance of the model without any attention mechanisms. Conversely, CA and SA contribute less to performance improvement in this context.

Table 7.

Classification performance of RN50 variants.

Model	Acc	Pre	Re	F1	Spe
RN50-CCNN	98.40	98.40	98.40	98.40	97.95
RN50-CACNN	98.15	98.17	98.15	98.15	98.13
RN50-SACNN	97.56	97.61	97.56	97.57	97.72
RN50-SEACNN	98.48	98.49	98.48	98.49	98.29

Acc: accuracy; Pre: precision; Re: recall; F1: F1-score; Spe: specificity; RN: ResNet; CCNN: custom convolutional neural network; CACNN: channel attention convolutional neural network; SACNN: soft attention convolutional neural network; SEACNN: squeeze and excitation attention convolutional neural network.

By examining the results in Table 8, it is evident that the RN101-SEACNN variant, which incorporates SEA, achieves the highest accuracy of 98.90%, indicating its effectiveness. Similarly, the RN101-CACNN variant, integrating CA, also demonstrates strong performance with an accuracy of 98.82%. In contrast, the RN101-SACNN variant, utilizing SA, exhibits a slightly lower accuracy of 98.40%. Notably, the RN101-CCNN variant, devoid of any attention mechanisms, records the lowest accuracy of 96.88%. These findings suggest that both SEA and CA significantly enhance the performance of the RN101 model, underscoring the importance of attention mechanisms in improving classification accuracy. Conversely, SA appears to be less effective, while the absence of attention mechanisms leads to diminished performance.

Table 8.

Classification performance of RN101 variants.

Model	Acc	Pre	Re	F1	Spe
RN101-CCNN	96.88	96.88	96.88	96.88	95.65
RN101-CACNN	98.82	98.82	98.82	98.82	98.58
RN101-SACNN	98.40	98.40	98.40	98.40	97.95
RN101-SEACNN	98.90	98.91	98.90	98.91	98.62

The classification performance metrics provided in Table 9 reveal varying effectiveness among the ResNet-152 (RN152) variants with different attention mechanisms. Notably, the RN152-CACNN variant, integrating CA, achieves the highest accuracy of 98.99%, indicating superior performance. Conversely, the RN152-SEACNN and RN152-SACNN variants, embedding SEA and SA respectively, demonstrate strong yet slightly lower accuracies of 98.32% and 98.23%, compared to the CA variant. Remarkably, the RN152-CCNN variant, devoid of attention mechanisms, records the lowest accuracy of 97.64%. These findings underscore the significance of CA in optimizing the RN152 architecture, while also highlighting the contributions of SEA and SA, albeit to a lesser extent. Moreover, the absence of attention mechanisms correlates with diminished performance, emphasizing the importance of incorporating attention mechanisms for enhanced classification accuracy.

Table 9.

Classification performance of RN152 variants.

Model	Acc	Pre	Re	F1	Spe
RN152-CCNN	97.64	97.65	97.64	97.64	97.02
RN152-CACNN	98.99	99.99	98.99	98.99	98.52
RN152-SACNN	98.23	98.26	98.23	98.24	98.32
RN152-SEACNN	98.32	98.33	98.32	98.32	98.35

The tables above illustrate the results obtained for the 24 variants of the CNN models used in this study before any ensemble techniques were applied. These tables provide a comprehensive overview of key evaluation metrics, including accuracy, precision, recall, F1-score, and specificity, for each model variant. From the performance evaluation, it is evident that different attention mechanisms yield varying degrees of effectiveness across different models. For some models, CA demonstrated superior performance, while for others, SA or SEA proved more effective. Additionally, there are instances where models without any attention mechanisms performed better. Given this variability, it is clear that relying on a single model or one specific attention mechanism is not optimal. Instead, employing an ensemble approach is justified to combine outputs from various models, thereby creating a more generalized and robust model for breast cancer classification. The ensemble approach is executed in three levels using techniques such as MV, WAvg, and SAvg.

The performance metrics in Table 10 illustrate that for all DN and RN models, the first level of ensembling using MV, SAvg, and WAvg generally results in the same or improved accuracy compared to individual model performance. Notably, the WAvg technique consistently outperforms MV and SAvg across almost all models, delivering better precision, recall, F1-score, and specificity. This superior performance of WAvg is the reason it was exclusively used in the subsequent second and third ensemble layers. The highest accuracy was achieved by the DN121 variant, with an accuracy of 99.58%, while the RN50 variant had the lowest accuracy at 98.99%.

Table 10.

Performance measure indices (first ensemble).

Model	Technique	Acc	Pre	Re	F1	Spe
DN-121	MV	99.58	99.58	99.58	99.58	99.37
	SAvg	99.58	99.58	99.58	99.58	99.37
	WAvg	99.58	99.58	99.58	99.58	99.52
DN-169	MV	99.16	99.16	99.16	99.16	98.59
	SAvg	99.16	99.16	99.16	99.16	98.59
	WAvg	99.33	99.33	99.33	99.33	99.25
DN-201	MV	99.33	99.33	99.33	99.33	99.25
	SAvg	99.33	99.33	99.33	99.33	99.25
	WAvg	99.49	99.49	99.49	99.49	99.33
RN-50	MV	98.99	99.00	98.99	98.99	99.10
	SAvg	98.99	99.00	98.99	98.99	99.10
	WAvg	98.99	99.00	98.99	98.99	99.10
RN-101	MV	98.99	98.99	98.99	98.99	98.81
	SAvg	98.99	98.99	98.99	98.99	98.81
	WAvg	99.24	99.24	99.24	99.24	99.22
RN-152	MV	98.74	98.74	98.74	98.74	98.69
	SAvg	98.74	98.74	98.74	98.74	98.69
	WAvg	99.16	99.16	99.16	99.16	98.88

Acc: accuracy; Pre: precision; Re: recall; F1: F1-score; Spe: specificity; DN: DenseNet; RN: ResNet; MV: majority voting; SAvg: softmax averaging; WAvg: weighted averaging.

The performance metrics presented in Table 11 indicate the results of the second level of ensembling for DN and RN models. The DN ensemble achieves a high accuracy of 99.58%, with equally high precision, recall, F1-score, and specificity, all at 99.58% and 99.37%, respectively. The RN ensemble shows slightly lower but still impressive performance metrics, with accuracy, precision, recall, and F1-score of 99.16%, and specificity of 99.18%. These results confirm the effectiveness of the second ensemble layer in maintaining or improving model performance, with DN variants continuing to outperform RN variants in this stage of the ensemble process.

Table 11.

Performance measure indices (second ensemble).

Model	Acc	Pre	Re	F1	Spe
DenseNet	99.58	99.58	99.58	99.58	99.37
ResNet	99.16	99.16	99.16	99.16	99.18

Acc: accuracy; Pre: precision; Re: recall; F1: F1-score; Spe: specificity.

Confusion matrix elements are essential components in evaluating the performance of classification models. True negative (TN) represents the instances where the model correctly identifies negatives, indicating the absence of a condition or event. False positive (FP) occurs when the model incorrectly identifies negatives as positives, suggesting a false alarm. Conversely, false negative (FN) denotes the instances where the model incorrectly identifies positives as negatives, indicating a missed detection. True positive (TP) reflects cases where the model correctly identifies positives, accurately detecting the presence of a condition or event. Together, these elements provide a comprehensive understanding of the model’s predictive capabilities, enabling insights into its accuracy, precision, recall, and overall performance.

Figures 11 to 13 depict the confusion matrices containing TN, FP, FN, and TP values at different ensemble layers.

Figure 11.

TN, FP, FN, and TP values after layer 2 DenseNet ensemble. TN: true negative; FP: false positive; FN: false negative; TP: true positive.

Figure 12.

TN, FP, FN, and TP values after layer 2 ResNet ensemble. TN: true negative; FP: false positive; FN: false negative; TP: true positive.

Figure 13.

TN, FP, FN, and TP values after layer 3 final ensemble. TN: true negative; FP: false positive; FN: false negative; TP: true positive.

Table 12 of the dense-ResNet final ensemble showcases outstanding performance metrics across benign and malignant classes, with an impressive overall accuracy, precision, recall, and F1-score of 99.58%. This uniform excellence highlights its robustness in accurately classifying instances from both classes. These results underscore the dense-ResNet ensemble’s significant contribution to improving classification accuracy.

Table 12.

Performance measure indices for dense-RresNet (final ensemble).

Evaluation	Name of the classes		Overall
measures	Benign	Malignant	results
Accuracy	99.19	99.75	99.58
Precision	99.46	99.63	99.58
Recall	99.19	99.75	99.58
F1-score	99.33	99.69	99.58
Specificity	99.75	99.19	99.37

The performance assessment of our model across various layers is depicted through nine visualizations: Figures 14 to 22. The initial six visualizations showcase the ROC curves for the ensemble of DN and RN architectures at the first layer of ensembling, integrating the CCNN, CACNN, SACNN, and SEACNN variants. Specifically, Figure 20 portrays the ROC curves for the ensemble of DN at the subsequent layer, while Figure 21 presents the ROC curves for the ensemble of RN at the same layer. Lastly, Figure 22 illustrates the ROC curves for the ultimate ensemble, amalgamating both RN and RN architectures at the third layer. These visual representations provide a comprehensive insight into the model’s classification efficacy at different evaluation stages, facilitating a thorough analysis of its performance in breast cancer detection. Furthermore, the ROC curves vividly demonstrate the model’s capability to differentiate between various classes, offering valuable insights into its discriminatory prowess.

Figure 14.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled DenseNet-121.

Figure 15.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled DenseNet-169.

Figure 16.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled DenseNet-201.

Figure 17.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled ResNet-50.

Figure 18.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled ResNet-101.

Figure 19.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled ResNet-152.

Figure 20.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled DenseNet.

Figure 21.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled ResNet.

Figure 22.

Area under the receiver operating characteristic curve (ROC-AUC) of ensembled DenseNet and ResNet.

The tables presented earlier vividly demonstrated significant improvements following each successive ensemble iteration. Substantial enhancements were observed not only after the initial ensemble but also following subsequent second and third-level ensemble procedures. After the final (third) ensemble, this model achieved impressive performance metrics with accuracy, precision, recall, and F1-score reaching 99.58%. Additionally, the specificity score stood at 99.37%.

Statistical analysis for performance validation

To ensure the robustness of our proposed model, we conducted a random train-test splitting procedure, wherein the dataset was randomly partitioned into training, validation, and test sets three times. Each split generated a unique configuration for the training and testing datasets, thereby allowing us to evaluate the model’s performance comprehensively. After applying the model to each of the three datasets, we recorded the accuracy for both the benign and malignant classes. The results demonstrated a remarkable accuracy of 100% for two of the datasets. However, the second dataset exhibited an accuracy of 98.5%. This systematic approach underscores the reliability of our model in classifying breast cancer images. Table 13 presents the results of the statistical analysis conducted to evaluate the model’s performance across the different datasets.

Table 13.

Accuracy results for randomly split datasets.

Dataset	Accuracy (benign)	Accuracy (malignant)	Overall accuracy
Dataset 1	100%	100%	100%
Dataset 2	100%	98.5%	99.25%
Dataset 3	100%	100%	100%

Answers to the research question

Answer to Research Question 1: Achieving a balanced class distribution in breast cancer classification necessitated data augmentation, particularly for the minority class samples. This process was crucial as it enabled the deep CNN model to discern distinctive features effectively during training, thereby enhancing its ability to generalize patterns when presented with new, unseen test data. Failure to augment minority class samples might lead the model to exhibit bias toward the majority class during testing scenarios. It is advisable to apply data augmentation exclusively to the training dataset while preserving the originality of the validation and test sets.

Answer to Research Question 2: The attention mechanisms employed in this study, including CA, CA, and SEA, played a pivotal role in selectively highlighting significant regions within the input data. These mechanisms offered several advantages, such as enabling precise feature extraction, adapting to varying degrees of feature importance, filtering out noise, and enhancing model interpretability. As a result, these attention mechanisms significantly contributed to augmenting the overall robustness and generalization capabilities of the proposed architecture.

Answer to Research Question 3: The study indicates that adopting a multi-level ensemble approach yields superior performance compared to employing a single algorithm CNN model. For instance, while the DN169-CACNN achieved a commendable single-algorithm accuracy of 99.16%, the proposed ensemble architecture, even without an attention mechanism, achieved an even higher accuracy of 99.58%.

Answer to Research Question 4: Multi-level ensembling surpasses single-level approaches through its iterative refinement of features. By integrating ensembling across various layers, the model selectively preserves indispensable features, ensuring that subsequent layers receive the most pertinent information for accurate classification. In contrast, confining ensembling to a single layer risks overlooking critical features, potentially compromising the model’s classification efficacy. Consequently, multi-level ensembling optimizes both feature selection and propagation, thereby bolstering classification performance. This refined approach fosters a more focused feature set for subsequent layers, thereby enhancing classification accuracy and mitigating overfitting through the introduction of abstraction layers and the utilization of diverse learners. Consequently, it amplifies the model’s generalization and robustness, culminating in improved overall performance and efficacy.

Ensemble techniques are noted for their advantages over single algorithms. By amalgamating predictions from multiple models, these techniques effectively mitigate the impact of outliers and help alleviate overfitting tendencies. Additionally, ensemble architectures capitalize on the strengths of diverse models, fostering improved decision-making and enhanced generalization capabilities, contributing significantly to the higher accuracy observed.

Discussion

The innovative dense-ResNet model, an amalgamation of DN and RN architectures, exhibited superior performance in breast cancer classification compared to individual pre-trained models. Table 14 presents for a comprehensive comparison, highlighting the efficacy of the dense-RN attention integration architecture alongside existing models documented in the literature. This evaluation focused on overall results, providing a broader perspective on the model’s superiority and effectiveness within the breast cancer classification domain.

Table 14.

Comparison of proposed dense-ResNet model with existing models.

Article	Acc	Pre	Re	Spe	F1
Deniz et al.⁴²	91.37	–	–	–	–
Spanhol et al.⁴⁶	86.30	–	–	–	–
Sudharshan et al.⁵⁴	92.10	–	–	–	–
Gour et al.⁵⁶	92.52	–	–	–	–
Bayramoglu et al.⁷¹	82.13	–	–	–	–
Motlagh et al.⁷²	93	–	–	–	–
Majumdar et al.⁷³	99.16	–	–	–	–
Sahu et al.⁷⁴	92.60	–	–	–	–
Simonyan et al.⁷⁵	91.37	–	–	–	–
Obayya et al.⁷⁶	96.77	–	–	–	–
Ours (DRAI)	99.58	99.58	99.58	99.37	99.58

Acc: accuracy; Pre: precision; Re: recall; F1: F1-score; Spe: specificity; DRAI: dense-ResNet attention integration.

Table 14 highlights the remarkable performance of the dense-RN attention integration architecture, particularly when contrasted with existing models in the literature. The comprehensive array of metrics reported in this study—accuracy, precision, recall, specificity, and F1-score—demonstrate a significant improvement over those observed in previous research. For instance, while the highest accuracy reported in prior studies reached 99.16%,⁷³ our model achieved an impressive accuracy of 99.58%. Furthermore, dense-RN not only outperformed all other models in terms of accuracy but also excelled in precision, recall, specificity, and F1-score, with values of 99.58%, 99.58%, 99.37%, and 99.58%, respectively. This indicates that the dense-RN attention integration architecture excels across multiple performance metrics, underscoring its potential as a superior solution for breast cancer classification. This performance level is noteworthy and indicates that no other methods in prior research have achieved such promising outcomes across all evaluated metrics.

The performance of the dense-RN model can be attributed to several key factors. First, the integration of attention mechanisms enhances the model’s ability to focus on critical features within the images, thereby improving the overall classification accuracy. By utilizing CA, SA, SE blocks, and other advanced techniques, our architecture can dynamically adjust its focus based on the contextual relevance of features, which is particularly beneficial in medical imaging scenarios where subtle distinctions can significantly impact diagnostic outcomes.

Moreover, the ensemble approach employed in our methodology further contributes to the model’s effectiveness. By systematically amalgamating predictions from various model layers through techniques such as MV, SA, and WV, we were able to enhance the robustness of the predictions. This layered ensembling strategy not only reduces the potential for overfitting but also fosters a more comprehensive decision-making process, as it leverages the strengths of different architectures.

The findings from our study advocate for the potential of the dense-RN attention integration architecture as a promising solution for breast cancer classification. The impressive results demonstrate not only its accuracy but also its ability to serve as a reliable diagnostic tool, paving the way for advancements in medical image analysis. The insights gleaned from this research underscore the importance of integrating modern ML techniques with clinical practice, ultimately aiming to enhance the precision and reliability of breast cancer diagnostics.

Threats to validity

While the BreakHis dataset was leveraged in this study, and the dense-RN attention integration architecture was employed to proficiently classify breast cancer instances into benign and malignant categories, several potential limitations and threats to the study’s validity warrant consideration.

Limited dataset dependency: This study’s reliance solely on the BreakHis dataset poses a threat to the model’s generalizability beyond this specific dataset. The model’s performance on alternative independent datasets is yet to be explored, potentially limiting its applicability to broader clinical contexts.

Class diversity: The classification system in use within this study focuses exclusively on benign and malignant classes. The presence of various other breast cancer subtypes necessitates consideration for a more inclusive and comprehensive analysis. The absence of representation from these additional subtypes within the dataset may affect the model’s ability to generalize across diverse breast cancer types.

To mitigate these threats, future extensions of this research should focus on augmenting the dataset with additional breast cancer subtypes and integrating diverse datasets like curated subsets from DDSM or other repositories. Moreover, exploring metadata attributes related to breast cancer lesions or patients could enhance the model’s accuracy and facilitate a more comprehensive understanding of breast cancer classification.

Conclusion and future work

In conclusion, this study presents a groundbreaking approach to the automated classification of breast cancer using the innovative DRAI architecture. Through rigorous evaluation with TLE, our model achieved an exceptional accuracy rate of 99.58%, surpassing existing methodologies and demonstrating the effectiveness of our fusion architecture in distinguishing between benign and malignant cases. By addressing the challenges inherent in traditional diagnostic methods and leveraging advanced technologies such as AI and ML, our automated classification framework offers a robust and highly accurate methodology for breast cancer diagnosis.

This study’s limitations include reliance on the BreakHis dataset, which may hinder the model’s generalizability to other datasets. Additionally, the focus on only benign and malignant classes limits the analysis, as various breast cancer subtypes are not represented, potentially affecting the model’s broader applicability.

We have shown that augmenting imbalanced datasets, customizing and fine-tuning pre-trained models, integrating attention mechanisms, and employing ensemble approaches are effective strategies for improving the accuracy and reliability of breast cancer classification models. By ensuring a balanced class distribution, optimizing feature extraction, and enhancing model generalization, our approach not only mitigates manual errors but also accelerates the diagnostic process, ultimately leading to improved patient outcomes. Future research should prioritize augmenting datasets with diverse breast cancer subtypes and integrating metadata attributes for comprehensive analysis. Cross-dataset validation on independent datasets, such as DDSM subsets, is crucial to assess model generalization. Exploring advanced architectures and techniques tailored for histopathological image analysis, including attention mechanisms and ensemble strategies, holds promise for improving model accuracy. Additionally, investigating semi-supervised learning and TL can enhance model performance in resource-constrained settings. These endeavors will advance automated breast cancer diagnosis, yielding more reliable models with improved clinical utility and broader applicability.

Footnotes

Acknowledgements

We thank the Computer Science and Engineering Department of Rajshahi University of Engineering and Technology, Bangladesh for supporting the study process.

Contributorship

Mohammad Sakif Alam: conceptualization, methodology, software, validation, data analysis, visualization, and manuscript drafting. Anwar Hossain Efat: conceptualization, methodology, data curation, investigation, validation and manuscript drafting. SM Mahedy Hasan: conceptualization, resources, investigation, manuscript review, and supervision. Md Palash Uddin: investigation, manuscript review, manuscript finalization, and supervision.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

This research is not subject to ethical approval since the research did not have participants (humans or animals).

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Gurantor

MPU.

Consent statement

In accordance with the ICMJE guidelines, we confirm that patient consent was not applicable to this study. This study did not involve human subjects, and all data were obtained from publicly available sources.

Availability of data and materials

The dataset utilized in this study is available at ^59,60

ORCID iD

Md Palash Uddin

References

What Is Cancer? [Internet]. Cancer.gov. Available at: https://www.cancer.gov/about-cancer/understanding/what-is-cancer (2021, accessed 22 June 2022).

Breastcancer.org. What Is Breast Cancer? [Internet]. Breastcancer.org. Available at: https://www.breastcancer.org/about-breast-cancer? (2021, accessed 22 June 2022).

Siegel

Miller

Jemal

. Cancer statistics. CA: A Cancer J Clin 2018; 68: 7–30.

World. Breast cancer [Internet]. Who.int. World Health Organization: WHO. Available at: https://www.who.int/news-room/fact-sheets/detail/breast-cancer (2024, accessed 22 June 2022).

Sarkar

Mali

. Firefly-SVM predictive model for breast cancer subgroup classification with clinicopathological parameters. Digital Health 2023; 9: 20552076231207203.

Gao

Wang

Huang

. Designing a deep learning-driven resource-efficient diagnostic system for metastatic breast cancer: reducing long delays of clinical diagnosis and improving patient survival in developing countries. Cancer Inf 2023; 22: 11769351231214446.

Hameed

Zahia

Garcia-Zapirain

, et al. Breast cancer histopathology image classification using an ensemble of deep learning models. Sensors 2020; 20: 4373.

Cai

Razmjooy

, et al. Breast cancer diagnosis by convolutional neural network and advanced thermal exchange optimization algorithm. Comput Math Methods Med 2021; 2021: 5595180.

Huynh

Jarolimek

Daye

. The false-negative mammogram. Radiographics 1998; 18: 1137–1154.

10.

Arun Kumar

Sasikala

. Review on deep learning-based CAD systems for breast cancer diagnosis. Technol Cancer Res Treat 2023; 22: 15330338231177977.

11.

Raab

Swain

Smith

, et al. Quality and patient safety in the diagnosis of breast cancer. Clin Biochem 2013; 46: 1180–1186.

12.

Elmore

Longton

Carney

, et al. Diagnostic concordance among pathologists interpreting breast biopsy specimens. Jama 2015; 313: 1122–1132.

13.

Singh

Shrivastava

. An enhanced and efficient approach for feature selection for chronic human disease prediction: a breast cancer study. Heliyon 2024; 10: 1–21.

14.

Zhang

Gao

, et al. A deep learning outline aimed at prompt skin cancer detection utilizing gated recurrent unit networks and improved orca predation algorithm. Biom Signal Process Control 2024; 90: 105858.

15.

Davoudi

Thulasiraman

. Evolving convolutional neural network parameters through the genetic algorithm for the breast cancer classification problem. Simulation 2021; 97: 511–527.

16.

Chen

Aljrees

Umer

, et al. Cervical cancer detection using K nearest neighbor imputer and stacked ensemble learning model. Digital Health 2023; 9: 20552076231203802.

17.

Altmannshofer

Flaucher

Beierlein

, et al. A content-based review of mobile health applications for breast cancer prevention and education: characteristics, quality and functionality analysis. Digital Health 2024; 10: 20552076241234627.

18.

Suzuki

. Development of classifier of engagement in occupation with machine learning (CEOML) for quantifying context. Sage Open 2023; 13: 21582440231176998.

19.

Şener

Terregrossa

. A transcendental LASSO function for combining machine learning and statistical model forecasts. Sage Open 2024; 14: 21582440241262695.

20.

Thawkar

Katta

Parashar

, et al. Breast cancer: a hybrid method for feature selection and classification in digital mammography. Int J Imag Syst Technol 2023; 33: 1696–1712.

21.

Han

Zhao

Yin

, et al. Timely detection of skin cancer: an AI-based approach on the basis of the integration of echo state network and adapted seasons optimization algorithm. Biom Signal Process Control 2024; 94: 106324.

22.

Razmjooy

Sheykhahmad

Ghadimi

. A hybrid neural network—world cup optimization algorithm for melanoma detection. Open Med 2018; 13: 9–16.

23.

Sheykhahmad

Ghadimi

, et al. Computer-aided diagnosis of skin cancer based on soft computing techniques. Open Med 2020; 15: 860–871.

24.

Efat

Hasant

Jannat

, et al. Inquisition of the support vector machine classifier in association with hyper-parameter tuning: a disease prognostication model. In: 2022 4th international conference on electrical, computer & telecommunication engineering (ICECTE), 2022, pp.131–134. IEEE.

25.

Krupa

Dhanalakshmi

Lai

, et al. An IoMT enabled deep learning framework for automatic detection of fetal QRS: a solution to remote prenatal care. J King Saud Univ-Comput Inf Sci 2022; 34: 7200–7211.

26.

Singh

Khanna

Singh

. An enhanced soft-computing based strategy for efficient feature selection for timely breast cancer prediction: Wisconsin diagnostic breast cancer dataset case. Multimed Tools Appl 2024; 83: 1–66.

27.

Singh

Khanna

Singh

. Efficient feature selection for breast cancer classification using soft computing approach: a novel clinical decision support system. Multimed Tools Appl 2024; 83: 43223–43276.

28.

Efat

Hasan

Uddin

, et al. A multi-level ensemble approach for skin lesion classification using customized transfer learning with triple attention. PLoS ONE 2024; 19: e0309430.

29.

Jannat

Hasan

Srizon

, et al. Efficient detection of crop leaf diseases: a lightweight convolutional neural network approach for enhanced agricultural productivity. In: 2023 International conference on information and communication technology for sustainable development (ICICT4SD), 21 September 2023, pp.99–103. IEEE.

30.

Sikder

Efat

Hasan

, et al. A triple-level ensemble-based brain tumor classification using dense-ResNet in association with three attention mechanisms. In: 2023 26th International conference on computer and information technology (ICCIT), 13 Dec 2023, pp.1–6. IEEE.

31.

Haque

Efat

Hasan

, et al. Revolutionizing pest detection for sustainable agriculture: a transfer learning fusion network with attention-triplet and multi-layer ensemble. In: 2023 26th international conference on computer and information technology (ICCIT), 13 December 2023, pp.1–6. IEEE.

32.

Hossain Efat

Faysal Ferdous

Islam Nayem

, et al. From data to diagnosis: a journey with machine learning, hyperparameter tuning, and ensemble learning for disease prognostication. In: International conference on trends in electronics and health informatics, 20 December 2023, pp.407–420. Singapore: Springer Nature Singapore.

33.

Medsker

Jain

. editors. Recurrent neural networks: design and applications. Boca Raton, FL: CRC Press, 1999.

34.

Szegedy

Vanhoucke

Ioffe

, et al. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016 pp.2818–2826.

35.

Yosinski

Clune

Bengio

, et al. How transferable are features in deep neural networks? Adv Neural Inf Process Syst 2014; 27: 1–9.

36.

Hochreiter

. Long Short-term Memory. Cambridge, MA: Neural Computation MIT-Press, 1997.

37.

LeCun

Bengio

Hinton

. Deep learning. Nature 2015; 521: 436–444.

38.

Neyshabur

Sedghi

Zhang

. What is being transferred in transfer learning? Adv Neural Inf Process Syst 2020; 33: 512–523.

39.

Wang

Tan

, et al. Dynamic allocation strategy of VM resources with fuzzy transfer learning method. Peer-to-Peer Netw Appl 2020; 13: 2201–2213.

40.

Byra

Galperin

Ojeda-Fournier

, et al. Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med Phys 2019; 46: 746–755.

41.

Yan

Ren

Wang

, et al. Breast cancer histopathological image classification using a hybrid deep neural network. Methods 2020; 173: 52–60.

42.

Deniz

Şengür

Kadiroğlu

, et al. Transfer learning based histopathologic image classification for breast cancer detection. Health Inf Sci Syst 2018; 6: 1–7.

43.

Krizhevsky

Sutskever

Hinton

. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2012; 25.

44.

Giger

Suzuki

. Computer-aided diagnosis. In: Biomedical information technology, 1 January 2008, pp.359–XXII. Academic Press.

45.

Jiang

Chen

Zhang

, et al. Breast cancer histopathological image classification using convolutional neural networks with small SE-ResNet module. PLoS ONE 2019; 14: 1–21.

46.

Spanhol

Oliveira

Cavalin

, et al. Deep features for breast cancer histopathological image classification. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), 5 October 2017, pp.1868–1873. IEEE.

47.

Zhang

Lai

, et al. A novel centralized federated deep fuzzy neural network with multi-objectives neural architecture search for epistatic detection. IEEE Trans Fuzzy Syst 2024: 1–13.

48.

Razmjooy

Ramezani

Ghadimi

. Imperialist competitive algorithm-based optimization of neuro-fuzzy system parameters for automatic red-eye removal. Int J Fuzzy Syst 2017; 19: 1144–1156.

49.

Szegedy

Liu

Jia

, et al. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.1–9.

50.

Russakovsky

Deng

, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision 2015; 115: 211–252.

51.

Masud

Eldin Rashed

Hossain

. Convolutional neural network-based models for diagnosis of breast cancer. Neural Comput Appl 2022; 34: 11383–11394.

52.

Sohail

Khan

Wahab

, et al. A multi-phase deep CNN based mitosis detection framework for breast cancer histopathological images. Sci Rep 2021; 11: 6215.

53.

Wang

Choi

, et al. Breast cancer classification in automated breast ultrasound using multiview convolutional neural network with transfer learning. Ultrasound Med Biol 2020; 46: 1119–1132.

54.

Sudharshan

Petitjean

Spanhol

, et al. Multiple instance learning for histopathological breast cancer image classification. Expert Syst Appl 2019; 117: 103–111.

55.

Sarath

Chakravarty

Ghosh

, et al. A two-stage multiple instance learning framework for the detection of breast cancer in mammograms. In: 2020 42nd annual international conference of the IEEE engineering in medicine & biology society (EMBC), 20 July 2020, pp.1128–1131. IEEE.

56.

Gour

Jain

Sunil Kumar

. Residual learning based CNN for breast cancer histopathological image classification. Int J Imag Syst Technol 2020; 30: 621–635.

57.

Veta

Pluim

Van Diest

, et al. Breast cancer histopathology image analysis: a review. IEEE Trans Biomed Eng 2014; 61: 1400–1411.

58.

Liu

Ghadimi

. Hybrid convolutional neural network and flexible dwarf mongoose optimization algorithm for strong kidney stone diagnosis. Biomed Signal Process Control 2024; 91: 106024.

59.

Spanhol

Oliveira

Petitjean

, et al. A dataset for breast cancer histopathological image classification. IEEE Trans Biomed Eng 2015; 63: 1455–1462.

60.

Breast Cancer Histopathological Database (BreakHis) – Laboratório Visão Robótica e Imagem [Internet]. Ufpr.br. Available at: https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ (2016, accessed 22 June 2024).

61.

Mohammad Sakif Alam. BREAKHis Augmented Dataset. Kaggle. Available at: https://www.kaggle.com/datasets/mohammadsakifalam/breakhis-aug/data.

62.

Datta

Hasan

Faruk

, et al. Improved diabetes prediction with reduced feature sets: evaluating feature selection techniques in machine learning. In: 2023 International conference on information and communication technology for sustainable development (ICICT4SD), 21 September 2023, pp.104–108. IEEE.

63.

Datta

Hasan

Mitu

, et al. Hyperparameter-tuned machine learning models for complex medical datasets classification. In: 2023 International conference on electrical, computer and communication engineering (ECCE), 23 February 2023, pp.1–6. IEEE.

64.

Sharma

Athaiya

. Activation functions in neural networks. Towards Data Sci 2017; 6: 310–316.

65.

Shafin

Efat

Hasan

, et al. Skin lesion classification through sequential triple attention denseNet: diverse utilization of the combination of attention modules. In: 2023 26th International conference on computer and information technology (ICCIT), 13 December 2023, pp.1–6. IEEE.

66.

Woo

Park

Lee

, et al. Cbam: convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), 2018, pp.3–19.

67.

Montashir Fahim

Efat

Mahedy Hasan

, et al. Tri focus net: a CNN-based model with integrated attention modules for pest and insect detection in agriculture. In: International conference on trends in electronics and health informatics, 20 December 2023, pp.225–240. Singapore: Springer Nature Singapore.

68.

Bahdanau

. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 2014.

69.

Joy

Efat

Hasan

, et al. Attention trinity net and DenseNet fusion: revolutionizing American Sign Language recognition for inclusive communication. In: 2023 26th international conference on computer and information technology (ICCIT), 13 December 2023, pp.1–6. IEEE.

70.

Shen

Sun

. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp.7132–7141.

71.

Bayramoglu

Kannala

Heikkilä

. Deep learning for magnification independent breast cancer histopathology image classification. In: 2016 23rd international conference on pattern recognition (ICPR), 4 December 2016, pp.2440–2445. IEEE.

72.

Motlagh

Jannesari

Aboulkheyr

, et al. Breast cancer histopathological image classification: a deep learning approach. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), Madrid, Spain, 03–06 December 2018, pp. 2405–2412. IEEE.

73.

Majumdar

Pramanik

Sarkar

. Gamma function based ensemble of CNN models for breast cancer detection in histopathology images. Expert Syst Appl 2023; 213: 119022.

74.

Sahu

Tripathi

Gupta

, et al. A CNN-SVM based computer aided diagnosis of breast cancer using histogram K-means segmentation technique. Multimedia Tools Appl 2023; 82: 14055–14075.

75.

Simonyan

Badejo

Weijin

. Histopathological breast cancer classification using CNN. Mater Today 2024; 105: 268–275.

76.

Obayya

Maashi

Nemri

, et al. Hyperparameter optimizer with deep learning-based decision-support systems for histopathological breast cancer diagnosis. Cancers 2023; 15: 885.

Refining breast cancer classification: Customized attention integration approaches with dense and residual networks for enhanced detection

Abstract

Objective

Methods

Results

Conclusion

Keywords

Introduction

Literature review

Dataset description

Method

Data preprocessing

Proposed DRAI architecture

Feature extraction process

Attention mechanisms

Triple-level ensemble

Justification of our proposed architecture

Performance evaluation measures

Experimental setup

Results

Statistical analysis for performance validation

Answers to the research question

Discussion

Threats to validity

Conclusion and future work

Footnotes

Acknowledgements

Contributorship

Declaration of conflicting interests

Ethical approval

Funding

Gurantor

Consent statement

Availability of data and materials

ORCID iD

References