A Hybrid 2D Gaussian Filter and Deep Learning Approach with Visualization of Class Activation for Automatic Lung and Colon Cancer Diagnosis

Abstract

Cancer is a significant public health issue due to its high prevalence and lethality, particularly lung and colon cancers, which account for over a quarter of all cancer cases. This study aims to enhance the detection rate of lung and colon cancer by designing an automated diagnosis system. The system focuses on early detection through image pre-processing with a 2D Gaussian filter, while maintaining simplicity to minimize computational requirements and runtime. The study employs three Convolutional Neural Network (CNN) models-MobileNet, VGG16, and ResNet50-to diagnose five types of cancer: Colon Adenocarcinoma, Benign Colonic Tissue, Lung Adenocarcinoma, Benign Lung Tissue, and Lung Squamous Cell Carcinoma. A large dataset comprising 25 000 histopathological images is utilized. Additionally, the research addresses the need for safety levels in the model by using Class Activation Mapping (CAM) for explanatory purposes. Experimental results indicate that the proposed system achieves a high diagnostic accuracy of 99.38% for lung and colon cancers. This high performance underscores the effectiveness of the automated system in detecting these types of cancer. The findings from this study support the potential for early diagnosis of lung and colon cancers, which can facilitate timely therapeutic interventions and improve patient outcomes.

Keywords

lung and colon cancer Gaussian (Blur) filter ResNet50 deep learning

Introduction

Cancer is one of the most researched diseases that endanger human health and has the highest mortality rate among all diseases.¹ Tumors are generally of two types.² Among these, benign tumors are not cancerous, they can be removed with an easy surgical procedure, and they can rarely be dangerous because they do not recur frequently. Malignant tumors, on the other hand, are dangerous because they grow irregularly and uncontrollably and cause cancer.³ Cancer is a significant reason of death worldwide with approximately 10 million deaths in 2020. The most common cancer types in terms of causing the deaths in 2020 were lung [1.80 million deaths], colon and rectum [935 000 deaths], liver [830 000 deaths], stomach [769 000 deaths] and breast [685 000 deaths].⁴ When these statistical data are considered, the first two cancer types that cause the most deaths in 2020 are colon and lung cancers. It has been proven by clinical studies that cancer can be treated when detected early. Therefore, early diagnosis of lung and colon cancers, as in other cancers, is extremely important for clinicians and researchers interested in this field. The reason why both cancers are diagnosed together is that the probability of metastasis between the two organs is higher, rather than the simultaneous emergence of both cancers.

In recent years, the scientists have developed many approaches for the early detection of cancer. One of these approaches is the use of medical imaging methods, which play a very effective and important role in the cancer detection.⁵ However, early detection of cancer becomes a correspondingly difficult task, as the manual interpretation of medical imaging data takes some time. In addition, misinterpretation of medical images by experts prolongs the period of early diagnosis of cancer disease and accordingly decreases the accuracy rate.⁶ In order to overcome this problem, many machine learning-based algorithms have been employed in the literature and different cancer types have been detected early with satisfactory accuracy.

Considering the studies in the literature, two main approaches based on machine learning,⁷ which constitute the sub-branch of artificial intelligence, have been utilized for the diagnosis of lung and colon cancers. The first of these covers Conventional Machine Learning (CML) algorithms. These algorithms are usually employed together with predefined features, which are obtained by feature extraction methods.³ Some of the studies in the literature carried out for the diagnosis of colon and lung cancer by utilizing CML-based algorithms can be summarized as follows: Xu et al have proposed a model to classify colon cancer images via classical feature extraction methods and multi-class SVM classifier. Their results showed that the average precision of 73.7% has been observed for classification of colon cancer images.⁸ Shi et al have recommended a novel approach in order to classify lung needle biopsy images by employing multimodal sparse representation-based classification technique. Finally, they have achieved important advancement with average accuracy of 88.1% for classifying different lung cancerous types.⁹ Kuruvilla and Gunavathi have proposed an approach for early detection of lung cancer by utilizing feed forward and feed forward back propagation neural networks via statistical features. The accuracy result of their proposed approach has been found as 93.3%.¹⁰ Hussain et al have developed a tool in order to detect lung cancer automatically by utilizing multimodal features and various ML techniques (NB, DT and SVM). At the result, they have reached a high accuracy rate for lung cancer detection.¹¹ Selvanambi et al have suggested a new work in order to diagnose and predict the lung cancer thanks to the Recurrent Neural Network (RNN) with Levenberg–Marquardt and the glowworm swarm optimization technique. Their experimental result has achieved classification accuracy of 98%.¹² Naeem et al have successfully detected colon cancer using genomic signal processing approach considering the cancer to be a genetic disease. In the proposed study, KNN and SVM have been used to classify the statistical features which had been obtained from DNA sequences.¹³

Another efficient approach for diagnosis of lung and colon cancers covers DL based algorithms. In this approach, DL algorithms do not need any predefined features in contrast to CML algorithms. This situation makes the performance of DL approach more superior than CML approach since CML algorithms are inflexible, unstable, and time-consuming when they are manually designed in lung and colon cancer detection. Thus, many CML algorithms have been substituted by DL algorithms in recent years.³ Some of the recent works in the literature on the diagnosis of lung and colon cancer using DL based algorithms can be summarized as follows: Masud et al have proposed a Convolutional Neural Network (CNN) based classification method to classify different types of lung and colon cancer tissues. They have obtained an overall classification accuracy of 96.33%.¹⁴ While the method used in this study has given good results for three disease groups, it achieved low classification success for the other two. In addition, Discrete Fourier transform (DFT) and Discrete Wavelet transform (DWT) used for feature extraction are algorithms that require high computational requirements and long working time. Togacar has recommended an approach of DarkNet-19 model with Manta Ray Foreign and Equilibrium optimizations for the classification of lung and colon cancer types. The results have showed a high classification accuracy of around 99%.¹⁵ Similar to the previous study, an optimization algorithm that increases the hardware requirements has been used in this study. Ali et al have suggested a new multi-input capsule network by employing two convolutional layer blocks to classify lung and colon tumors into five categories. As a result, an overall accuracy of over 99% has been obtained with the proposed approach.¹⁶ Although the success rate has been increased by using two convolutional layer blocks and giving two inputs to these blocks, the complexity of the whole system stands out as a striking detail. Wang et al have used a novel two-step path for the diagnosis of colorectal cancer using CNN and transfer learning approach. They have achieved an area under the curve (AUC) value of 0.988 using a fairly large dataset.¹⁷ Garg et al have used 8 well-known pre-trained CNN models to predict lung and colon cancer using histopathological images.¹⁸ Mehmood et al have successfully detected the malignancy in lung and colon histopathology images making use of a highly accurate and computationally efficient CNN model.¹⁹ Although the system used is hardware efficient, transfer learning has limited flexibility, overfitting and limited generalization disadvantages. Mridha et al have used four CNN models to detect lung and colon cancers.²⁰ Now that each CNN model solves a separate classification problem, indeed a binary classification has been performed rather than a multi-classification. Teramoto et al have diagnosed three lung cancer types using a novel CNN model and obtained an overall accuracy of 71%.²¹ Experiments on colon cancer diagnosis were not conducted. Kumar et al have made use of DenseNet-121 model and handcrafted features to classify lung and colon cancer disease. The authors have obtained an overall accuracy of 98.6%.²² The researchers who are interested in the field of lung and colon cancer diagnosis using DL techniques with histopathological images can also refer to²³ and³ which are rich and up-to-date review papers.

Lung and colon cancers are among the most prevalent and lethal forms of cancer globally, underscoring the vital importance of prompt and precise diagnosis for enhancing survival outcomes. The current diagnostic methods, including traditional imaging techniques and manual assessment, are often time-consuming, susceptible to human error, and frequently necessitate the involvement of highly trained professionals. Notwithstanding the advances made in automated diagnostic systems, there remains a considerable gap in the development of efficacious and dependable models that are capable of integrating both feature extraction and interpretability, particularly in the context of intricate medical datasets. This study aims to address this gap by proposing a hybrid approach combining a two-dimensional Gaussian filter with deep learning, along with class activation visualization to enhance the model's interpretability. The objective of this approach is to provide an automated, accurate, and explainable solution for diagnosing lung and colon cancer, which is of great significance in improving clinical outcomes and facilitating early intervention. In addition, the motivation of this study is to diagnose lung and colon cancers from the histopathological images using DL models that have been trained through a fairly large dataset. Three deep learning models (MobileNet, VGG16 and ResNet50), which have been used and yielded very successful results many times in the literature in the diagnosis of medical diseases from medical images so far, have been used for the purpose of lung and colon cancer detection. In order for the DL models to yield more successful results, all histopathological images have been passed through a 2D Gaussian (Blur) filter and the images have been made more suitable for further processing. The regions of the image learned by the CNN network during the learning process have been visualized with the Class Activation Map (CAM) technique. This study is believed to be an important aid to doctors and radiologists in diagnosing lung and colon cancer.

When the experimental results obtained are taken into account together with similar studies and results in the literature, the main contribution of this study can be summarized as follows:

✓ The proposed system in this study makes it possible to speed up the detection of lung and colon cancer by enabling doctors or radiologists to examine a large number of patients in a shorter time and a lower cost.

✓ The proposed CAD system is able to diagnose lung and colon cancer with a high accurate rate such as 99.38%.

✓ Using 2D Gaussian Filter as a pre-processing stage prior to training highly increases the overall accuracy rate.

✓ Even in classifying images of the same organ, the ResNet50 model achieves high success.

✓ The proposed method can be used for mobile phone-based medical disease diagnosis as it uses the MobileNet model and is compatible with mobile vision.

The rest of this paper can be organized as follows: Section II introduces the materials, methods and dataset that has been used. Section III presents and discusses the results of the experiments conducted and finally Section IV concludes the paper.

Materials & Methods

Dataset Description & Data Augmentation

One of the most important factors for success in DL studies is the dataset used. In the proposed study, Lung and Colon Cancer Histopathological Images dataset (LC25000)²⁴ has been used. This is a publicly and freely available dataset that can be downloaded from a public database.²⁵ The dataset contains cancerous and benign lung and colon tissue images. It totally includes 1250 histopathological images and is divided into 5 groups as 250 benign lung images, 250 lung adenocarcinoma, 250 lung squamous cell carcinomas, 250 benign colon tissue 250 colon adenocarcinomas. The LC25000 dataset images were collected at James A. Haley Veterans’ Hospital situated in Tampa, Florida. All the images were acquired from pathology glass slides using a Leica Microscope MC190 HD Camera connected to an Olympus microscope and fulfill the requirement for the Health Insurance Portability and Accountability Act (HIPAA). Although the original size of the images was 1024 × 768, they have been re-sized to 768 × 768 to be more compatible inputs to DL models. Images were carefully checked and annotated by an expert pathologist to make sure the quality of images and annotations. The number of images in the dataset is not sufficient for a valid and successful machine learning study. Therefore, to increase the generalization ability of DL models the original dataset has been augmented using vertical and horizontal flips (0.5 probability), and left and right rotations up to 25 degrees. After data augmentation there are 25 000 images in total and 5000 in each group. Figure 1 shows sample images from the dataset and Table 1 shows number of images used in this study before and after data augmentation.

Figure 1.

Sample images in the dataset.

Table 1.

Data Augmentation.

Image Type	Diagnosis Type	Number of Images before Data Augmentation	Number of Images after Data Augmentation
Colon Images	Colon Adenocarcinoma	250	5000
Colon Images	Benign Colonic Tissue	250	5000
Lung Images	Lung Adenocarcinoma	250	5000
	Benign Lung Tissue	250	5000
	Lung Squamous Cell Carcinoma	250	5000

Proposed Method

Manual evaluation of medical images takes a lot of time, requires specialists and is prone to error, which shows the importance of diagnosing lung and colon cancer by DL based CAD systems. This study proposes a CAD system to diagnose lung and colon cancer from histopathological images using CNN method. A quick review of literature on medical disease diagnosis from medical images shows that the most efficient DL method used for this purpose is CNN method.^26,27 It is possible to find many studies that successfully diagnose diseases from medical images using CNN models.^28,29 Since the CNN method has proven itself in this field, the CNN method has been used for the detection of lung and colon cancer in this study. Figure 2 is the general layout of the proposed method. The proposed method uses three popular CNN models which are MobileNet, VGG16 and ResNet50. These models have been first trained on a large dataset such as 25 000 histopathological images. 2D Gaussian (Blur) filter has been used as a pre-processing stage to denoise the input images. As soon as an input image is fed to the proposed algorithm, it is gone under feature extraction and classification processes and a decision is made whether it is Colon Adenocarcinoma, Benign Colonic Tissue, Lung Adenocarcinoma, Benign Lung Tissue and Lung Squamous Cell Carcinoma. The performance of the proposed method has been evaluated using performance evaluation metrics derived from confusion matrix. Finally, in the last stage, the prediction decision made by the CNN has been interpreted by CAM visualization technique.

Figure 2.

Graphical representation of the proposed algorithm.

2D Gaussian (Blur) Filter

The Gaussian blur filter³⁰ was selected for its capacity to efficiently diminish noise and detail in images while maintaining essential features, rendering it an optimal pre-processing instrument for medical image analysis. In the context of lung and colon cancer diagnosis, medical images frequently contain artefacts or irrelevant details that can impede the performance of deep learning models. The application of a Gaussian blur filter serves to mitigate the impact of minor inconsistencies, thereby enhancing the overall clarity of the image, particularly in regions of interest such as tumour boundaries. Furthermore, the application of a Gaussian blurring function introduces a controlled level of smoothing, which serves to emphasise the larger structures of interest without completely eliminating the edges that are critical for accurate feature extraction. This makes the filter particularly suitable when combined with deep learning models, as it enhances the model's ability to focus on relevant features, thereby potentially improving classification accuracy in complex medical datasets.

The two-dimensional Gaussian function that is used in filtering can be computed as follows:

G (x, y) = \frac{1}{2 π σ^{2}} e^{- \frac{x^{2} + y^{2}}{2 σ}}

(1)

where x and y are location indices and σ is standard deviation of the function. This distribution produces a convolution matrix that will be applied to input image. Gaussian filter size used in this study is 11 × 11. High-frequency unwanted noises, such as sharp edges and dots, appear in the images while down-sampling an image to make them suitable for DL models. Therefore, Gaussian Blur Filter is a suitable pre-processing step prior to learning process. The original image and a sample 2D Gaussian filtered image are shown in Figure 3.

Figure 3.

(a) an image of the original example (b) an example image filtered with 2D Gaussian filter.

Deep Learning Approaches

Advances in DL methods have made DL based medical disease diagnosis more objective and faster than the diagnosis made manually by a doctor or radiologist. Considering the remote areas where doctor and hospital facilities are very scarce, it can be understood how important DL based medical disease diagnosis methods are. As mentioned in detail in the literature review, the most commonly used DL method for diagnosis of medical disease is CNN models. Despite the large number of different CNN models developed so far, the substantial function and structure of all CNN models basically consist of two main structures that we distinguish as feature extraction and classification. CNN models basically perform feature extraction and classification using five types of consecutive convolutional neural network layers. These are the input layer, which receives the input images and makes them suitable for deep learning (resizing and denoising), the convolution layer and pooling layer, which performs feature extraction, and finally, the fully connected layer and classification layer, which performs the classification process.

MobileNet

MobileNet³¹ is a popular light-weight CNN model with low latency that requires less memory and computational effort. Although MobileNet was first developed and used mostly for DL based mobile applications, its fast speed and minimal utilization of computer power have made it popular for many other applications as well. MobileNet suggests that the depth and spatial dimension (height and width) of a filter used in convolutional layers can be separated, therefore the main characteristic of MobileNet architecture is structure of a depthwise convolution followed by a pointwise convolution. MobileNet model architecture has 28 convolutional neural network layers in total, considering depthwise and pointwise convolutions as separate layers. Considering the possibility and potential of making cancer diagnosis possible with mobile devices, the MobileNet CNN model has been used in this study. The structural details of the MobileNet CNN architecture are given in Table 2.

Table 2.

Structural Details of MobileNet CNN Model.

* 224 × 224 × 3 input layer

* 1x convolutional layer with 32 chanel and 1x convolutional layer with 64 chanel.

* 1x depthwise convolutional and 2x batchnormalization and 2x ReLu layers with 32 chanel and 1x batchnormalization and 1x Relu layers with 64 chanel.

* 1x zeropadding layer and 1x depthwise convolutional layer with 64 chanel and 1x convolutional layer with 128 chanel

* 1x batchnormalization and 1x Relu layers with 64 chanel and 1x batchnormalization and 1x Relu layers with 128 chanel.

* 1x depthwise convolutional layer, 2x batchnormalization layers, 2x ReLu and 1x convolutional layers with 128 chanel.

* 1x zeropadding layer and 1x depthwise convolutional layer with 128 chanel and 1x convolutional layer with 256 chanel.

* 1x batchnormalization and 1x Relu layers with 128 chanel and 1x batchnormalization and 1x Relu layers with 256 chanel.

* 1x depthwise convolutional, 2x batchnormalization layers, 2x ReLu and 1x convolutional layers with 256 chanel.

* 1x zeropadding layer and 1x depthwise convolutional layer with 256 chanel and 1x convolutional layer with 512 chanel.

* 1x batchnormalization and 1x Relu layers with 256 chanel and 1x batchnormalization and 1x Relu layers with 512 chanel.

* 5x depthwise convolutional, 10x batchnormalization layers, 10x ReLu and 5x convolutional layers with 512 chanel.

* 1x zeropadding layer and 1x depthwise convolutional layer with 512 chanel and 1x convolutional layer with 1024 chanel.

* 1x batchnormalization and 1x Relu layers with 512 chanel and 1x batchnormalization and 1x Relu layers with 1024 chanel.

* 1x depthwise convolutional, 1x batchnormalization layers, 1x ReLu and 1x convolutional layers with 1024 chanel.

* flatten layer, dense layer 1 and dense layer 2

VGG16

Developed by the Visual Geometry Group (VGG), VGG16³² suggests enhancing network performance by increasing the network depth. The VGG16 CNN model consists of roughly 21 convolutional neural network layers, 13 of which are convolutional, 5 are max pooling, and 3 are fully connected. VGG16 is a deep CNN model that consists of blocks formed by cascading convolutional and max pooling layers. The number of filters in the first block is 64, and this number doubles in each subsequent block, eventually reaching 512. As the number of filters increases in parallel with the model depth, the number of parameters in the subsequent layers increases significantly. A small kernel such as 3 × 3 with stride size of 1 is used in all convolutional layers of VGG16 CNN model. Therefore, VGG16 is able to extract sparse features remarkably, making it an efficient model to diagnose lung and colon cancers. The structural details of the VGG16 CNN architecture are shown in Table 3.

Table 3.

Structural Details of VGG16 CNN Model.

* 224 × 224 × 3 input layer

* 2x convolutional layers and 1x maxpooling layer with 64 chanel.

* 2x convolutional layers and 1x maxpooling layer with 128 chanel.

* 3x convolutional layers and 1x maxpooling layer with 256 chanel.

* 3x convolutional layers and 1x maxpooling layer with 512 chanel.

* flatten layer, dense layer 1 and dense layer 2

Resnet50

Resnet50³³ is a 50-layer network trained on the ImageNet dataset and uses convolutional layers with size 1 × 1, 3 × 3, etc instead of 2 convolutional layers with size 3 × 3. An important point here is that more AI layers do not always generate more performance because very deep networks are prone to performance degradation. ResNet model aims to solve the degradation problem of CNN networks using Residual blocks. The degradation problem arises when deep networks begin to converge. As the network depth increases, its efficiency (accuracy) saturates (as expected) but then tends to decline rapidly. ResNet adds shortcuts between layers by connecting the shallow layers and deep layers directly to solve this problem. This simple idea avoids degradation as the network deepens. The structural details of the VGG16 CNN architecture are demonsrated in Table 4.

Table 4.

Structural Details of MobileNet CNN Model.

* 224 × 224 × 3 input layer

* 1x zeropadding layer with 3 chanel

* 3x convolutional layers, 3x batchnormalization layers and 3x Relu layers with 64 chanel

* 2x convolutional layers, 2x batchnormalization layers, 1x Relu layer and 1x add layer with 256 chanel

* 2x convolutional layers, 2x batchnormalization layers and 2x Relu layers with 64 chanel

* 1x convolutional layer, 1x batchnormalization layer, 1x Relu layer and 1x add layer with 256 chanel

* 2x convolutional layers, 2x batchnormalization layers and 2x Relu layers with 64 chanel

* 1x convolutional layer, 1x batchnormalization layer, 1x Relu layer and 1x add layer with 256 chanel

* 2x convolutional layers, 2x batchnormalization layers and 2x Relu layers with 128 chanel

* 2x convolutional layers, 2x batchnormalization layers, 1x Relu layer and 1x add layer with 512 chanel

* 2x convolutional layers, 2x batchnormalization layers and 2x Relu layers with 128 chanel

* 1x convolutional layer, 1x batchnormalization layer, 1x Relu layer and 1x add layer with 512 chanel

* 2x convolutional layers, 2x batchnormalization layers and 2x Relu layers with 128 chanel

* 1x convolutional layer, 1x batchnormalization layer, 1x Relu layer and 1x add layer with 512 chanel

* 2x convolutional layers, 2x batchnormalization layers and 2x Relu layers with 128 chanel

* 1x convolutional layer, 1x batchnormalization layer, 1x Relu layer and 1x add layer with 512 chanel