Sage Journals: Discover world-class research

Abstract

Objective: Endoscopic submucosal dissection (ESD) is the preferred technique for treating early gastric cancers including poorly differentiated adenocarcinoma without ulcerative findings. The histopathological classification of poorly differentiated adenocarcinoma including signet ring cell carcinoma is of pivotal importance for determining further optimum cancer treatment(s) and clinical outcomes. Because conventional diagnosis by pathologists using microscopes is time-consuming and limited in terms of human resources, it is very important to develop computer-aided techniques that can rapidly and accurately inspect large number of histopathological specimen whole-slide images (WSIs). Computational pathology applications which can assist pathologists in detecting and classifying gastric poorly differentiated adenocarcinoma from ESD WSIs would be of great benefit for routine histopathological diagnostic workflow. Methods: In this study, we trained the deep learning model to classify poorly differentiated adenocarcinoma in ESD WSIs by transfer and weakly supervised learning approaches. Results: We evaluated the model on ESD, endoscopic biopsy, and surgical specimen WSI test sets, achieving and ROC-AUC up to 0.975 in gastric ESD test sets for poorly differentiated adenocarcinoma. Conclusion: The deep learning model developed in this study demonstrates the high promising potential of deployment in a routine practical gastric ESD histopathological diagnostic workflow as a computer-aided diagnosis system.

Keywords

deep learning endoscopic submucosal dissection poorly differentiated adenocarcinoma whole slide image weakly supervised learning transfer learning

Introduction

According to the Global Cancer Statistics 2020, stomach cancer was responsible for 1 089 103 new cases (5.6% of all sites) and an estimated 768 793 deaths (7.7% of all sites) in 2020, ranking fifth for incidence and fourth for mortality globally.¹ The so-called diffuse type adenocarcinoma (ADC) and signet ring cell carcinoma (SRCC) of stomach are poorly differentiated cancers (ADCs) which are believed to show poor prognosis and aggressive behavior.² Histopathologically, a diffuse growth of ADC cells is observed, associated with extensive fibrosis and inflammation and often the entire gastric wall is involved. Although a foveolar (intramucosal) type of SRCC occurs, in many cases of this entity the gastric mucosa is less affected than the deeper layers.^3–5 Therefore, poorly differentiated ADCs can often be mistaken for a variety of nonneoplastic lesions including gastritis, xanthoma/foamy histiocytes, or reactive endothelial cells in granulation tissues. Curative rates of endoscopic treatment for poorly differentiated type early gastric cancer are lower than those for the differentiated type ADC.⁶ Endoscopic submucosal dissection (ESD) was developed in the late 1990s and has been widely used for early gastric cancer worldwide. ESD allows en bloc resection and precise histopathological inspection, while being a less invasive treatment than surgical resection.⁷ Endoscopic resection is considered for tumors that have a very low possibility of lymph node metastasis and are suitable for en bloc resection.⁸ According to the Japanese Gastric Cancer Treatment Guidelines 2018 (fifth edition),⁹ the absolute indication of ESD is a differentiated type ADC with (UL1) or without (UL0) ulcerative findings and the expanded indication is a poorly differentiated (undifferentiated) ADC without ulcerative findings (UL0). Even in differentiated type ADC-dominant ESD specimens, it is important to inspect poorly differentiated ADC for the decision making of future therapeutic strategy.¹⁰ Therefore, surgical pathologists are always on the lookout for signs of poorly differentiated ADC when evaluating gastric ESD.

In the field of computational pathology as a computer-aided detection (CADe) or computer-aided diagnosis (CADx), deep learning models have been widely applied in the histopathological cancer classification on whole-slide images (WSIs), cancer cell detection and segmentation, and the stratification of patient clinical outcomes.^11–24 Previous studies have looked into applying deep learning models for ADC classification in stomach,^24–26 and for gastric poorly differentiated ADC classification on WSIs.^25,26 However, the existing poorly differentiated ADC models did not classify poorly differentiated ADC well on gastric ESD WSIs.

In this study, we trained a deep learning model for the classification of gastric poorly differentiated ADC on ESD WSIs. We evaluated the trained model on ESD, endoscopic biopsy, and surgical specimen WSI test sets, achieving an ROC-AUC up to 0.975 in gastric ESD test sets, 0.960 in endoscopic biopsy test sets, and 0.929 in surgical specimen test sets. These findings suggest that deep learning models might be very useful as routine histopathological diagnostic aids for inspecting gastric ESD to detect poorly differentiated ADC precisely.

Materials and Methods

Clinical Cases and Pathological Records

This is a retrospective study. A total of 5103 H&E (hematoxylin & eosin) stained gastric histopathological specimens (2506 ESD, 1866 endoscopic biopsy, and 731 surgical specimen) of human poorly differentiated ADC, differentiated ADC, and nonneoplastic lesions were collected from the surgical pathology files of 6 hospitals: Sapporo-Kosei General Hospital (Hokkaido, Japan), Kamachi Group Hospitals (Wajiro, Shinyukuhashi, Shinkuki, and Shintakeo Hospitals) (Fukuoka, Japan), and International University of Health and Welfare, Mita Hospital (Tokyo, Japan) after histopathological review of those specimens by surgical pathologists. The cases were selected randomly, so as to reflect a real clinical setting as much as possible. The nonneoplastic lesions consisted of 1018 WSIs that were almost normal mucosa or regenerative mucosa and 2024 WSIs that were inflammatory lesions. Each WSI diagnosis was observed by at least 2 pathologists, with the final checking and verification performed by a senior pathologist. All WSIs were scanned at a magnification of 20 $\times$ using the same Leica Aperio AT2 Digital Whole Slide Scanner (Leica Biosystems, Tokyo, Japan) and were saved as a SVS file format with JPEG2000 compression.

Dataset

Hospitals which provided histopathological specimen slides in the present study were anonymized (Hospital-A-F). Table 1 breaks down the distribution of training and validation sets of gastric ESD WSIs from Hospital-A. Validation sets were selected randomly from the training sets (Table 1). The test sets consisted of ESD, biopsy, and surgical specimen WSIs (Table 2). The patients’ pathological records were used to extract the WSIs’ pathological diagnoses and to assign WSI labels. Training set WSIs were not annotated, and the training algorithm only used the WSI diagnosis labels, meaning that the only information available for the training was whether the WSI contained gastric poorly differentiated ADC or was nonpoorly differentiated ADC (differentiated ADC and nonneoplastic lesion), but no information about the location of the cancerous tissue lesions. We have confirmed that surgical pathologists were able to diagnose test sets in Table 2 from visual inspection of the H&E stained slide WSIs alone.

Table 1.

Distribution of Gastric Endoscopic Submucosal Dissection (ESD) Whole-Slide Images (WSIs) in the Training and Validation Sets Obtained From the Hospital-A.

Hospital-A	Training Sets	Validation Sets	total
Poorly differentiated ADC	140	10	150
Differentiated ADC	290	10	300
Nonneoplastic lesion	690	10	700
total	1120	30	1150

Table 2.

Distribution of Gastric Endoscopic Submucosal Dissection (ESD), Endoscopic Biopsy, and Surgical Specimen Whole-Slide Images in the Test Sets Obtained From 6 Hospitals (A-F).

	ESD-test Set-1 (719 WSIs)
Supplier	Poorly Differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-A	133	243	343
	ESD-test set-2 (637 WSIs)
Supplier	Poorly differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-B	7	69	54
Hospital-C	3	53	67
Hospital-D	11	89	78
Hospital-E	19	105	82
	Biopsy-test set-1 (355 WSIs)
Supplier	Poorly differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-F	25	96	234
	Biopsy-test set-2 (516 WSIs)
Supplier	Poorly differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-B	54	55	407
	Biopsy-test set-3 (495 WSIs)
Supplier	Poorly differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-C	10	12	473
	Biopsy-SRCC-test set (500 WSIs)
Supplier	Poorly differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-F	54	55	391
	Surgical-test set (731 WSIs)
Supplier	Poorly differentiated ADC	Differentiated ADC	Nonneoplastic lesion
Hospital-B	251	32	102
Hospital-C	211	24	111

Abbreviations: ADC, adenocarcinoma; WSIs, whole-slide images.

Deep Learning Models

To train our models, we used weakly supervised learning and transfer learning. The former makes it possible to train a model using only slide-level labels without the need to do laborious cell-level annotations. The latter allows speeding up training by starting from a model with pre-trained weights on another task without having to start training from scratch.

We trained the models via transfer learning using the partial fine-tuning approach.²⁷ This is an efficient fine-tuning approach that consists of using the weights of an existing pre-trained model and only fine-tuning the affine parameters of the batch normalization layers and the final classification layer. For the model architecture, we used EfficientNetB1²⁸ starting with pre-trained weights of the previous Biopsy-poorly ADC model, which in turn had been initialized from ImageNet weights when it was fine-tuned. Figure 1 shows an overview of the training method. The training methodology that we used in the present study was the same as reported in our previous studies.^29–31 For completeness, we repeat the methodology here.

Figure 1.

Schematic diagrams of training methods. (A) The simple summary of training method using transfer learning and weakly supervised learning for this study. During training (B), we iteratively alternated between inference and training. During the inference step, the model weights were frozen and the model was used to select tiles with the highest probability after applying it on the entire tissue regions of each hole-slide image (WSI). The top k tiles with the highest probabilities were then selected from each WSI and placed into a queue. During training, the selected tiles from multiple WSIs formed a training batch and were used to train the model.

We performed slide tiling by extracting square tiles from tissue regions of the WSIs. We started by detecting the tissue regions in order to eliminate most of the white background. We did this by performing a thresholding on a grayscale version of the WSIs using Otsu’s method.³² During prediction, we performed the tiling of the tissue regions in a sliding window fashion, using a fixed-size stride ( $224 \times 224$ pixels). During training, we initially performed a random balanced sampling of tiles extracted from the tissue regions, where we tried to maintain an equal balance of each label in the training batch. To do so, we placed the WSIs in a shuffled queue such that we looped over the labels in succession (ie, we alternated between picking a WSI with a positive label and a negative label). Once a WSI was selected, we randomly sampled $\frac{batch size}{num labels}$ tiles from each WSI to form a balanced batch.

To maintain the balance on the WSI, we oversampled from the WSIs to ensure the model trained on tiles from all of the WSIs in each epoch. We then switched to hard mining of tiles. To perform the hard mining, we alternated between training and inference. During inference, the CNN was applied in a sliding window fashion on all of the tissue regions in the WSI, and we then selected the $k$ tiles with the highest probability for being positive. This step effectively selects the tiles that are most likely to be false positives when the WSI is negative. The selected tiles were placed in a training subset, and once that subset contained $N$ tiles, the training was run. We used $k = 8$ , $N = 224$ , and a batch size of $32$ .

To obtain a single prediction for the WSIs from the tile predictions, we took the maximum probability from all of the tiles. This simple approach proved effective. Using a recurrent neural network would have been another approach for aggregating the probabilities.

We used the Adam optimizer,³³ with the binary cross-entropy as the loss function, with the following parameters: $b e t a_{1} = 0.9$ , $b e t a_{2} = 0.999$ , a batch size of 32, and a learning rate of $0.001$ when fine-tuning. We used early stopping by tracking the performance of the model on a validation set, and training was stopped automatically when there was no further improvement on the validation loss for 10 epochs. We chose the model with the lowest validation loss as the final model.

During training we performed data augmentation on the fly, during tile sampling. The augmentation consisted in randomly modifying the brightness, contrast, hue, and saturation of the tiles, as well as adding JPEG artifacts. This allows the model to be less sensitive to color and image quality and to focus more on content.

Software and Statistical Analysis

The deep learning models were implemented and trained using TensorFlow version 2.5.³⁴ AUCs were calculated in python using the scikit-learn package³⁵ and plotted using matplotlib.³⁶ The 95% CIs of the AUCs were estimated using the bootstrap method³⁷ with 1000 iterations.

Availability of Data and Material

The datasets generated during and/or analyzed during the current study are not publicly available due to specific institutional requirements governing the privacy protection but are available from the corresponding author on reasonable request. The datasets that support the findings of this study are available from Sapporo-Kosei General Hospital (Hokkaido, Japan), Kamachi Group Hospitals (Fukuoka, Japan), and International University of Health and Welfare, Mita Hospital (Tokyo, Japan), but restrictions apply to the availability of these data, which were used under a data use agreement which was made according to the Ethical Guidelines for Medical and Health Research Involving Human Subjects as set by the Japanese Ministry of Health, Labour and Welfare (Tokyo, Japan), and so are not publicly available. However, the data are available from the authors upon reasonable request for private viewing and with permission from the corresponding medical institutions within the terms of the data use agreement and if compliant with the ethical and legal requirements as stipulated by the Japanese Ministry of Health, Labour and Welfare.

Results

High AUC Performance of Gastric ESD, Biopsy, and Surgical Specimen WSI Evaluation of Gastric Poorly Ddifferentiated ADC HistopathologyImages

The aim of this retrospective study was to train a deep learning model for the classification of gastric poorly differentiated ADC in ESD WSIs. We have achieved high ROC-AUC performances in the ESD test sets (0.955 and 0.975) (Figure 2A and Table 3).

Figure 2.

ROC curves with AUCs from trained gastric endoscopic submucosal dissection (ESD) poorly differentiated ADC deep learning model (ESD-poorly ADC model) and existing biopsy model (Biopsy-poorly ADC model) on the seven test sets: (A) the newly trained gastric ESD poorly differentiated ADC classification model (ESD-poorly ADC model) with tile size 224 px and magnification at 20x; (B) the existing biopsy gastric poorly differentiated ADC classification model (Biopsy-poorly ADC model) with tile size 224 px and magnification at 20x. Abbreviations: ROC, Receiver Operator Characteristic; AUCs, the area under the curve; ADC, adenocarcinoma.

Table 3.

The Comparison of ROC-AUC and Log Loss Results for Poorly Differentiated Adenocarcinoma (ADC) Classification on Various Test Sets between Trained Gastric Endoscopic Submucosal Dissection (ESD) Poorly Differentiated ADC deep Learning Model (ESD-Poorly ADC Model) and Existing Biopsy Model (Biopsy-Poorly ADC Model).

	ESD-Poorly ADC Model
	ROC-AUC	log-loss
ESD-test set-1	0.975 [0.962-0.986]	0.235 [0.194-0.271]
ESD-test set-2	0.955 [0.915-0.981]	0.658 [0.592-0.736]
Biopsy-test set-1	0.953 [0.909-0.981]	0.352 [0.260-0.458]
Biopsy-test set-2	0.937 [0.892-0.969]	0.205 [0.155-0.264]
Biopsy-test set-3	0.960 [0.922-0.989]	0.156 [0.118-0.212]
Biopsy-SRCC-test set	0.941 [0.896-0.975]	0.185 [0.137-0.241]
Surgical-test set	0.929 [0.909-0.949]	0.572 [0.468-0.661]
	Biopsy-poorly ADC model
	ROC-AUC	log-loss
ESD-test set-1	0.899 [0.869-0.927]	0.857 [0.816-0.897]
ESD-test set-2	0.638 [0.541-0.722]	1.104 [1.049-1.155]
Biopsy-test set-1	0.959 [0.931-0.982]	0.587 [0.532-0.649]
Biopsy-test set-2	0.976 [0.951-0.993]	0.199 [0.181-0.220]
Biopsy-test set-3	0.975 [0.946-0.994]	0.418 [0.394-0.450]
Biopsy-SRCC-test set	0.980 [0.953-0.995]	0.190 [0.172-0.208]
Surgical-test set	0.885 [0.857-0.910]	0.518 [0.468-0.575]

Abbreviations: ROC, Receiver Operator Characteristic; AUCs, the area under the curve; SRCC, signet ring cell carcinoma.

The models were applied in a sliding window fashion with an input tile size and stride of 224 $\times$ 224 pixels (Figure 1). The transfer learning model (ESD-poorly ADC model) from the existing Biopsy-poorly ADC model²⁵ has higher ROC-AUCs, accuracy, sensitivity, and specificity and lower log losses compared to the Biopsy-poorly ADC model in ESD and surgical specimen test sets but slightly lower ROC-AUCs compared to the Biopsy-poorly ADC model in biopsy test sets (Figure 2, Tables 3 and 4).

Table 4.

The comparison of Scores of Accuracy, Sensitivity, and Specificity on the Various Test Sets Between Trained Gastric Endoscopic Submucosal Dissection (ESD) Poorly Differentiated ADC Deep Learning Model (ESD-Poorly ADC Model) and Existing Biopsy Model (Biopsy-Poorly ADC Model).

	ESD-poorly ADC Model
	Accuracy	Sensitivity	Specificity
ESD-test set-1	0.929 [0.912-0.949]	0.910 [0.860-0.957]	0.933 [0.915-0.954]
ESD-test set-2	0.940 [0.918-0.956]	0.925 [0.829-1.000]	0.941 [0.920-0.958]
Biopsy-test set-1	0.904 [0.870-0.935]	0.920 [0.795-1.000]	0.903 [0.869-0.936]
Biopsy-test set-2	0.882 [0.853-0.905]	0.889 [0.786-0.962]	0.881 [0.850-0.907]
Biopsy-test set-3	0.935 [0.911-0.954]	0.900 [0.667-1.000]	0.936 [0.911-0.955]
Biopsy-SRCC-test set	0.884 [0.856-0.912]	0.889 [0.786-0.964]	0.883 [0.853-0.912]
Surgical-test set	0.862 [0.837-0.889]	0.875 [0.846-0.903]	0.840 [0.795-0.884]
	Biopsy-poorly ADC model
	Accuracy	Sensitivity	Specificity
ESD-test set-1	0.837 [0.811-0.865]	0.790 [0.715-0.856]	0.848 [0.820-0.877]
ESD-test set-2	0.776 [0.744-0.805]	0.425 [0.256-0.567]	0.799 [0.766-0.830]
Biopsy-test set-1	0.856 [0.820-0.890]	0.960 [0.875-1.000]	0.849 [0.809-0.885]
Biopsy-test set-2	0.921 [0.897-0.944]	0.963 [0.904-1.000]	0.916 [0.889-0.940]
Biopsy-test set-3	0.869 [0.838-0.897]	0.900 [0.667-1.000]	0.868 [0.838-0.897]
Biopsy-SRCC-test set	0.936 [0.912-0.954]	0.963 [0.895-1.000]	0.933 [0.908-0.954]
Surgical-test set	0.835 [0.804-0.859]	0.855 [0.820-0.883]	0.799 [0.747-0.845]

Abbreviations: SRCC, signet ring cell carcinoma; ADC, adenocarcinoma.

Table 5.

The Number of True Positives and False Positives in Each Test Set for the Two Different Models.

	ESD-poorly ADC Model		Biopsy-poorly ADC Model
	True Positive WSIs	False Positive WSIs	True Positive WSIs	False Positive WSIs
ESD-test set-1	121	39	105	89
ESD-test set-2	37	35	17	120
Biopsy-test set-1	23	32	24	50
Biopsy-test set-2	48	55	52	39
Biopsy-test set-3	9	31	9	64
Biopsy-SRCC-test set	48	52	52	30
Surgical-test set	404	43	395	54

Abbreviations: ESD, endoscopic eubmucosal dissection; SRCC, signet ring cell carcinoma; ADC, adenocarcinoma; WSIs, whole-slide images.

True Positive Gastric Poorly Differentiated ADC Prediction on ESD WSIs

Our model (ESD-poorly ADC model) satisfactorily predicted gastric poorly differentiated ADC in ESD WSIs (Figure 3). According to the histopathological reports and additional pathologists’ reviewing, gastric poorly differentiated ADC cells were infiltrating in the neck area of gastric glands (Figure 3A and C). The heatmap image (Figure 3B) shows the true positive predictions of poorly differentiated ADC cells (Figure 3D) without false positive predictions in nonneoplastic areas. Histopathologically, there were poorly differentiated ADC cells which exhibited intramucosal invasive manners among differentiated type ADC with tubular structures (Figure 3E and G). The heatmap image (Figure 3F) shows true positive predictions of poorly differentiated ADC cells (Figure 3H).

Figure 3.

Two representative histopathological images of poorly differentiated adenocarcinoma (ADC) true positive prediction outputs on whole-slide images (WSIs) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). In the gastric poorly differentiated ADC of ESD specimen (A), poorly differentiated ADC cells were infiltrating in the neck area of gastric gland (C). The heatmap image (B) shows true positive predictions of gastric poorly differentiated ADC cells (D) which correspond respectively to H&E histopathology (C). According to the histopathological report, (E) has differentiated and poorly differentiated ADC (G). The heatmap image (F) shows true positive prediction of gastric poorly differentiated ADC cells (H) which correspond respectively to H&E histopathology (G). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

True Negative Gastric Poorly Differentiated ADC Prediction on ESD WSIs

Our model (ESD-poorly ADC model) shows true negative predictions of gastric poorly differentiated ADC in ESD WSIs (Figure 4A and B). Histopathologically, there was no evidence of the presence of poorly differentiated ADC cells in all tissue fragments (#1-#3) which were nonneoplastic lesions with gastritis with ulcer formation (Figure 4B) and were not predicted as gastric poorly differentiated ADC (Figure 4C).

Figure 4.

Representative true negative gastric poorly differentiated adenocarcinoma (ADC) prediction outputs on a whole slide image (WSI) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). Histopathologically, in (A), all tissue fragments (#1-#3) were nonneoplastic lesions with ulcerative gastritis (B). The heatmap image (C) shows true negative prediction of gastric poorly differentiated ADC. The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

False Positive Gastric Poorly Differentiated ADC Prediction on ESD WSIs

According to the histopathological report and additional pathologists’ reviewing, there were no gastric poorly differentiated ADC in these ESD fragments (#1-#3) which were nonneoplastic specimens (Figure 5A). Our model (ESD-poorly ADC model) showed the false positive predictions of poorly differentiated ADC (Figure 5B). The false positively predicted areas (Figure 5C and D) showed lymphatic tissue cells (eg, lymphocyte, tingible body macrophage, and follicular dendritic cells) in the artificially collapsed lymphoid follicle, which could be the primary cause of false positives due to its morphological similarity in poorly differentiated ADC cells.

Figure 5.

A representative example of gastric poorly differentiated adenocarcinoma (ADC) false positive prediction outputs on a whole slide image (WSI) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). Histopathologically, all tissue fragments (#1-#3) in (A) were nonneoplastic lesions. In the tissue fragment #2, The heatmap image (B) exhibits false positive predictions of gastric poorly differentiated ADC (D) on the lymphatic tissue cells (C) in lymphoid follicle which was artificially collapsed during the specimen processing procedures. The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

False Negative Gastric Poorly Fifferentiated ADC Prediction on ESD WSIs

According to the histopathological report and additional pathologists’ reviewing, there were the foveolar-type signet ring cell carcinoma (SRCC) cells³ in the superficial layer of an ESD fragment (#1) (Figure 6A and C) where pathologists marked with red-dots. Our model (ESD-poorly ADC model) did not predict poorly differentiated ADC cells (Figure 6B and D). For comparison, we demonstrated predictions by our model (ESD-poorly ADC model) on 13 endoscopic biopsy WSIs with the presence of SRCC cells, which have been false negatively predicted as SRCC by existing SRCC model.²⁶ Interestingly, there were 4 out of 13 WSIs with the presence of the foveolar-type SRCC cells, which were also false negatively predicted as SRCC by our model (ESD-poorly ADC model). On the other hand, 9 out of 13 WSIs with the presence of SRCC cells were true positively predicted as poorly differentiated ADC by our model (ESD-poorly ADC model). Therefore, there is a limitation of our model (ESD-poorly ADC model) to predict the foveolar-type SRCC cells precisely.

Figure 6.

A representative example of gastric poorly differentiated adenocarcinoma (ADC) false negative prediction output on a whole slide image (WSI) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). Histopathologically, this case (A) has the foveolar-type signet ring cell carcinoma cell infiltration in the small area of superficial layer in the fragment #1 (C). The other 2 tissue fragments (#2 and #3) were nonneoplastic lesions (A). The heatmap image (B) exhibited no positive poorly differentiated ADC prediction (D). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

True Positive and True Negative Gastric Poorly Differentiated ADC Prediction on Surgical Specimen WSIs

In addition, we have applied the model (ESD-poorly ADC model) on the surgical specimen WSIs. Figure 7 shows an example of surgical gastric poorly differentiated ADC case with 5 serial section WSIs (#1-#5) (Figure 7A, C, E, G and I). We see the model (ESD-poorly ADC model) was capable of true positive (#1, #2, #4 and #5) (Figure 7A-D and G-J) and true negative (#3) (Figure 7E and F) poorly differentiated ADC detection on such section WSIs. Histopathologically, gastric poorly differentiated ADC cells invading areas (Figure 7K, M, O and Q) were visualized by heatmap images (Figure 7L, N, P and R).

Figure 7.

A representative surgically resected gastric poorly differentiated adenocarcinoma (ADC) case serial specimens. According to the histopathological report, serial specimen #1 (A), #2 (C), #4 (G), and #5 (I) have poorly differentiated ADC and #3 (E) is nonneoplastic lesion. The heatmap images show true positive predictions of gastric poorly differentiated ADC cells (B, D, H, J, L, N, P, and R) which correspond respectively to H&E histopathology (A, C, G, I, K, M, O, and Q) and true negative prediction (F). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.

The Application of Deep Learning Models to Classify Gastric Poorly Differentiated ADC on Various Type of Specimen

Based on the findings in this study and previous study,²⁵ we have summarized the possible application of our deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) for classification of gastric poorly differentiated ADC in various types of specimen WSIs (Figure 8). We can apply the model (ESD-poorly ADC model) for all types of specimen (ESD, biopsy, and surgical specimens), however, for biopsy specimen, the biopsy model (Biopsy-poorly ADC model) achieved slightly better performance than the ESD model (ESD-poorly ADC model).

Figure 8.

The schematic diagram of possible application of deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) for classification of gastric poorly differentiated adenocarcinoma (ADC) in different type of specimens. ESD-poorly ADC model can be applied to classify gastric poorly differentiated ADC in biopsy, endoscopic submucosal dissection (ESD), and surgical specimen. Biopsy-poorly ADC model can be applied only in biopsy specimen. To classify gastric poorly differentiated ADC in biopsy specimen, both ESD-poorly ADC model and Biopsy-poorly ADC model can be used, however, Biopsy-poorly ADC model would be able to achieve higher ROC-AUC performance than ESD-poorly ADC model.

Discussion

In this study, we trained a deep learning model for the classification of gastric poorly differentiated ADC in gastric ESD WSIs. Indications for gastric ESD were determined by the presence or absence of a risk of nodal metastasis and according to the gastric cancer treatment guidelines.^9,10 As an expanded indication, the gastric poorly differentiated ADC without ulcerative findings (UL0) in which the depth of invasion is clinically diagnosed as T1a (cT1a) and the diameter is $\leq$ 2 cm can be endoscopically resected.^9,38 However, the mixed type tumors (poorly differentiated adenocarcinoma and SRCC) should not be considered for endoscopic resection due to the higher risk of nodal metastases.³⁹ According to the guidelines for ESD in early gastric cancer treatment by Japan Gastroenterological Endoscopy Society (JGES), differentiated and poorly differentiated mixed type ADCs measuring $\leq$ 2 cm in diameter with UL0 and cT1a are absolute indications for ESD.⁴⁰ Importantly, incidental gastric poorly differentiated ADCs are diagnosed at the time of ESD for differentiated type ADC, even though histopathological type was determined by endoscopic biopsy prior to ESD.¹⁰ The histopathological evaluation of gastric ESD specimens whether there are poorly differentiated ADC cells or not is important for future therapeutic strategy because of lower rate of adverse events and high rate of en bloc resection.¹⁰ Prior to training the deep learning model for ESD WSIs, we evaluated the ROC-AUC on gastric ESD test sets using existing poorly differentiated ADC model (Biopsy-poorly ADC model).²⁵ The existing model (Biopsy-poorly ADC model) achieved ROC-AUCs in the range of 0.638-0.899 on the 2 independent ESD test sets (Table 3). The existing model (Biopsy-poorly ADC model) has been trained using purely endoscopic biopsy specimen WSIs.²⁵ Endoscopic biopsy often yields samples that include muscularis mucosae, except in regions (eg, gastric body) where the mucosal folds are thick. On the other hand, the mucosa surrounding the lesion is circumferentially incised and the submucosal layer is dissected from the proper muscle layer by ESD procedures.⁹ ESD specimens usually consist of mucosa, muscularis mucosae, and submucosa with layered tissue architectures.⁴¹ Therefore, there are histopathological differences between endoscopic biopsy and ESD in terms of tissue and cellular components, which might be a primary cause of lower ROC-AUC values in ESD test sets as compared to biopsy test sets by the existing model (Biopsy-poorly ADC model) (Table 3). The deep learning model (ESD-poorly ADC model) was trained by the transfer learning approach from our existing model (Biopsy-poorly ADC model).²⁵ We used the partial fine-tuning approach²⁷ to train the model faster, as there are less weights involved to tune. We used only 1120 ESD WSIs (poorly differentiated ADC: 140 WSIs, differentiated ADC: 290 WSIs, nonneoplastic lesion: 690 WSIs) (Table 1) without manual drawing annotations to indicate cancerous tissue areas by pathologists.^24,42,43 After specifically training on ESD WSIs, the model (ESD-poorly ADC model) significantly improved prediction performance on ESD test sets (Table 2) compared to the existing model (Biopsy-poorly ADC model) (Tables 3 and 4).

If the diagnosis of a case in a biopsy specimen is well-differentiated adenocarcinoma, but the diagnosis from an ESD specimen from the same patient showed a mixture of poorly differentiated types, then the diagnosis is poorly differentiated adenocarcinoma. Therefore, a model that detects the presence of poorly differentiated adenocarcinoma in ESD specimens would be highly useful.^44–46 Importantly, the model (ESD-poorly ADC model) achieved high ROC-AUC (0.929) in surgical specimen test sets (Tables 2 and 3) and predicted poorly differentiated ADC cell infiltrating area precisely in the serial surgical sections (Figure 7), which would be very useful to inspect presence or absence of poorly differentiated ADC in massive number of surgical serial sections by pathologists in routine clinical workflow. Moreover, the model (ESD-poorly ADC model) also achieved high ROC-AUC values in endoscopic biopsy test sets as compared to the existing biopsy model (Biopsy-poorly ADC model) (Tables 2 and 3). Thus, for endoscopic biopsy specimens, both deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) can classify poorly differentiated ADC precisely, however, the biopsy model (Biopsy-poorly ADC model) achieved slightly better ROC-AUC values (Table 3), so that it would be better to apply the biopsy model (Biopsy-poorly ADC model) for endoscopic biopsy specimens in the routine workflow (Figure 8).

One of the limitations of this study is that the deep learning models (both ESD-poorly ADC model and Biopsy-poorly ADC model) false negatively predicted poorly differentiated ADC cells (SRCC cells) in foveolar-type SRCC biopsy and ESD WSIs.³ In early stage, SRCC cells proliferate predominantly in the proliferative zone (near the mucous neck cells),⁴⁷ which were consistently false negatively predicted as poorly differentiated ADC. To predict foveolar-type SRCC precisely, we need to collect a number of foveolar-type SRCC biopsy and ESD cases for additional training or active learning.⁴⁸ Another limitation of this study is that it primarily included specimens from a limited number of hospitals and suppliers in Japan, and, therefore, the model could potentially be biased to such specimens. Further validation, conducted as randomized trials, on a wide variety of specimens from multiple different origins would be essential to ensure the robustness of the model.

Conclusion

The deep learning model established in the present study offers promising results that indicate it could be beneficial as a screening aid for pathologists prior to observing gastric ESD histopathology on glass slides or WSIs. The combination of the deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) can cover to predict gastric poorly differentiated ADC precisely in ESD, endoscopic biopsy, and surgical specimen WSIs. At the same time, the model could be used as a double-check tool to reduce the risk of missed poorly differentiated ADC cells. The most important advantage of using a fully automated computational tool as a computer-aided diagnosis is that it can systematically handle large amounts of WSIs without potential bias due to the fatigue commonly experienced by pathologists.

Footnotes

Acknowledgements

This study is based on results obtained from a project, JPNP14012, subsidized by the New Energy and Industrial Technology Development Organization (NEDO). We are grateful for the support provided by Professor Takayuki Shiomi at Department of Pathology, Faculty of Medicine, International University of Health and Welfare; Dr Ryosuke Matsuoka at Diagnostic Pathology Center, International University of Health and Welfare, Mita Hospital; and Dr Shigeo Nakano at Kamachi Group Hospitals (Fukuoka, Japan). We thank pathologists who have been engaged in reviewing cases for this study.

Authors’ Contribution

MT and FK designed the studies; MT and FK performed experiments and analyzed the data; MT and FK performed computational studies; MT and FK wrote the manuscript; MT supervised the project. All authors reviewed and approved the final manuscript.

Ethical Approval

The experimental protocol was approved by the institutional review board of the Sapporo-Kosei General Hospital (No. 576), International University of Health and Welfare (No. 19-Im-007), and Kamachi Group Hospitals (No. 173). All research activities complied with all relevant ethical regulations and were performed in accordance with relevant guidelines and regulations in the all hospitals mentioned above.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MT and FK are employees of Medmain Inc.

Informed Consent

Informed consent to use histopathological samples and pathological diagnostic reports for research purposes had previously been obtained from all patients prior to the surgical procedures at all hospitals, and the opportunity for refusal to participate in research had been guaranteed by an opt-out manner.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:This study is based on results obtained from a project, JPNP14012, subsidized by the New Energy and Industrial Technology Development Organization (NEDO).

ORCID iD

Masayuki Tsuneki

References

Sung

Ferlay

Siegel

Laversanne

Soerjomataram

Jemal

Bray

. Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021; 71(3): 209-249.

Kang

Kim

Moon

Lee

Kim

Sung

Lee

Jeong

. Signet ring cell carcinoma of early gastric cancer, is endoscopic treatment really risky?. Medicine. 2017; 96(33): e7532.

Sugihara

Hattori

Fukuda

Fujita

. Cell proliferation and differentiation in intramucosal and advanced signet ring cell carcinomas of the human stomach. Virchows Archiv A. 1987; 411(2): 117-127.

Huang

Zou

. Clinicopathology of early gastric carcinoma: an update for pathologists and gastroenterologists. Gastrointest Tumors. 2016; 3(3-4): 115-124.

Wang

Yang

Wang

Zhang

Han

Cao

. Superficial flat-type early-stage gastric signet ring cell carcinoma in the atrophic background mucosa: two case reports, 2022.

Inuyama

Horiuchi

Yamamoto

Yoshimizu

Ishiyama

Yoshio

Hirasawa

Tsuchida

Igarashi

Fujisaki

. Usefulness of magnifying endoscopy with narrow-band imaging for diagnosing mixed poorly differentiated gastric cancers. Digestion. 2021; 102(6): 938-945.

Kuroki

Oka

Tanaka

Yorita

Hata

Kotachi

Boda

Arihiro

Shimamoto

Chayama

. Preceding endoscopic submucosal dissection in submucosal invasive gastric cancer patients does not impact clinical outcomes. Sci Rep. 2021; 11(1): 1-9.

Gotoda

Yanagisawa

Sasako

Ono

Nakanishi

Shimoda

Kato

. Incidence of lymph node metastasis from early gastric cancer: estimation with a large number of cases at two large centers. Gastric Cancer. 2000; 3(4): 219-225.

jgca@ koto kpu-m ac jp JGCA. Japanese gastric cancer treatment guidelines 2018. Gastric Cancer. 2020; 24(1): 1-21.

10.

Fujimoto

Goto

Nishizawa

Ochiai

Horii

Maehata

Akimoto

Kinoshita

Sagara

Sasaki

et al. Gastric esd may be useful as accurate staging and decision of future therapeutic strategy. Endosc Int Open. 2017; 5(02): E90-E95.

11.

Zhang

Berry

Altman

Ré

Rubin

Snyder

. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016; 7: 12474-12400.

12.

Hou

Samaras

Kurc

Gao

Davis

Saltz

. Patch-based convolutional neural network for whole slide tissue image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2424–2433, 2016.

13.

Madabhushi

Lee

. Image analysis and machine learning in digital pathology: challenges and opportunities. Med Image Anal. 2016; 33: 170-175.

14.

Litjens

Sánchez

Timofeeva

Hermsen

Nagtegaal

Kovacs

Hulsbergen-Van De Kaa

Bult

Van Ginneken

Van Der Laak

. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep. 2016; 6: 26286.

15.

Kraus

Frey

. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics. 2016; 32(12): i52-i59.

16.

Korbar

Olofson

Miraflor

Nicka

Suriawinata

Torresani

Suriawinata

Hassanpour

. Deep learning for classification of colorectal polyps on whole-slide images. J Pathol Inform. 2017; 8: 30

17.

Luo

Zang

Yang

Huang

Liang

Rodriguez-Canales

Wistuba

Gazdar

Xie

Xiao

. Comprehensive computational pathological image analysis predicts lung cancer prognosis. J Thorac Oncol. 2017; 12(3): 501-509.

18.

Coudray

Ocampo

Sakellaropoulos

Narula

Snuderl

Fenyö

Moreira

Razavian

Tsirigos

. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med. 2018; 24(10): 1559-1567.

19.

Wei

Tafe

Linnik

Vaickus

Tomita

Hassanpour

. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci Rep. 2019; 9(1): 1-8.

20.

Gertych

Swiderska-Chadaj

Ing

Markiewicz

Cierniak

Salemi

Guzman

Walts

Knudsen

. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Sci Rep. 2019; 9(1): 1483.

21.

Bejnordi

Veta

Van Diest

Van Ginneken

Karssemeijer

Litjens

Van Der Laak

Hermsen

Manson

Balkenhol

et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama. 2017; 318(22): 2199-2210.

22.

Saltz

Gupta

Hou

Kurc

Singh

Nguyen

Samaras

Shroyer

Zhao

Batiste

et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 2018; 23(1): 181-193.

23.

Campanella

Hanna

Geneslaw

Miraflor

Silva

VWK

Busam

Brogi

Reuter

Klimstra

Fuchs

. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019; 25(8): 1301-1309.

24.

Iizuka

Kanavati

Kato

Rambeau

Arihiro

Tsuneki

. Deep learning models for histopathological classification of gastric and colonic epithelial tumours. Sci Rep. 2020; 10(1): 1-11.

25.

Kanavati

Tsuneki

. A deep learning model for gastric diffuse-type adenocarcinoma classification in whole slide images. arXiv preprint arXiv:2104.12478, 2021.

26.

Kanavati

Ichihara

Rambeau

Iizuka

Arihiro

Tsuneki

. Deep learning models for gastric signet ring cell carcinoma classification in whole slide images. Technol Cancer Res Treat. 2021; 20: 15330338211027901.

27.

Kanavati

Tsuneki

. Partial transfusion: on the expressive influence of trainable batch norm parameters for transfer learning. In: Medical Imaging with Deep Learning. PMLR, pp. 338–353, 2021.

28.

Tan

. Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp. 6105–6114, 2019.

29.

Kanavati

Tsuneki

. Breast invasive ductal carcinoma classification on whole slide images with weakly-supervised and transfer learning. bioRxiv, 2021.

30.

Tsuneki

Kanavati

. Deep learning models for poorly differentiated colorectal adenocarcinoma classification in whole slide images using transfer learning. Diagnostics. 2021; 11(11): 2074.

31.

Tsuneki

Abe

Kanavati

. A deep learning model for prostate adenocarcinoma classification in needle biopsy whole-slide images using transfer learning. Diagnostics. 2022; 12(3): 768.

32.

Otsu

. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979; 9(1): 62-66.

33.

Kingma

. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

34.

Abadi

Agarwal

Barham

Brevdo

Chen

Citro

Corrado

Davis

Dean

Devin

Ghemawat

Goodfellow

Harp

Irving

Isard

Jia

Jozefowicz

Kaiser

Kudlur

Levenberg

Mané

Monga

Moore

Murray

Olah

Schuster

Shlens

Steiner

Sutskever

Talwar

Tucker

Vanhoucke

Vasudevan

Viégas

Vinyals

Warden

Wattenberg

Wicke

Zheng

. TensorFlow: large-scale machine learning on heterogeneous systems. 2015. https://www.tensorflow.org/. Software available from tensorflow.org.

35.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

Vanderplas

Passos

Cournapeau

Brucher

Perrot

Duchesnay

. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011; 12: 2825-2830.

36.

Hunter

. Matplotlib: a 2d graphics environment. Comput Sci Eng. 2007; 9(3): 90-95. DOI: 10.1109/MCSE.2007.55.

37.

Efron

Tibshirani

. An introduction to the bootstrap. Boca Raton, Florida, United States: CRC press, 1994.

38.

Takizawa

Takashima

Kimura

Mizusawa

Hasuike

Ono

Terashima

Muto

Boku

Sasako

et al. A phase ii clinical trial of endoscopic submucosal dissection for early gastric cancer of undifferentiated type: Japan clinical oncology group study jcog1009/1010. Jpn J Clin Oncol. 2013; 43(1): 87-91.

39.

Lee

Park

Gong

Yook

Kim

. Applicability of endoscopic submucosal dissection for undifferentiated early gastric cancer: mixed histology of poorly differentiated adenocarcinoma and signet ring cell carcinoma is a worse predictive factor of nodal metastasis. Surg Oncol. 2017; 26(1): 8-12.

40.

Ono

Yao

Fujishiro

Oda

Uedo

Nimura

Yahagi

Iishi

Oka

Ajioka

et al. Guidelines for endoscopic submucosal dissection and endoscopic mucosal resection for early gastric cancer. Dig Endosc. 2021; 33(1): 4-20.

41.

Nagata

Shimizu

. Pathological evaluation of gastrointestinal endoscopic submucosal dissection materials based on Japanese guidelines. World J Gastrointest Endosc. 2012; 4(11): 489.

42.

Naito

Tsuneki

Fukushima

Koga

Higashi

Notohara

Aishima

Ohike

Tajiri

Yamaguchi

et al. A deep learning model to detect pancreatic ductal adenocarcinoma on endoscopic ultrasound-guided fine-needle biopsy. Sci Rep. 2021; 11(1): 1-8.

43.

Kanavati

Ichihara

Tsuneki

. A deep learning model for breast ductal carcinoma in situ classification in whole slide images. Virchows Arch. 2022; 480: 1-14.

44.

Zheng

Takahashi

Murai

Cui

Nomoto

Miwa

Tsuneyama

Takano

. Pathobiological characteristics of intestinal and diffuse-type gastric carcinoma in Japan: an immunostaining study on the tissue microarray. J Clin Pathol. 2007; 60(3): 273-277.

45.

Yamashita

Sakuramoto

Katada

Futawatari

Moriya

Hirai

Kikuchi

Watanabe

. Diffuse type advanced gastric cancer showing dismal prognosis is characterized by deeper invasion and emerging peritoneal cancer cell: the latest comparative study to intestinal advanced gastric cancer. Hepatogastroenterology. 2009; 56(89): 276-281.

46.

Fujishiro

Yoshida

Matsuda

Narita

Yamashita

Seto

. Updated evidence on endoscopic resection of early gastric cancer from Japan. Gastric Cancer. 2017; 20(1): 39-44.

47.

Abe

Ushiku

. Pathological diversity of gastric cancer from the viewpoint of background condition. Digestion. 2021; 103: 1-9.

48.

Jin

Wang

Wen

. Reducing the annotation cost of whole slide histology images using active learning. In: 2021 3rd International Conference on Image Processing and Machine Vision (IPMV). pp. 47–52, 2021.

Weakly Supervised Learning for Poorly Differentiated Adenocarcinoma Classification in GastricEndoscopic Submucosal Dissection Whole Slide Images

Abstract

Keywords

Introduction

Materials and Methods

Clinical Cases and Pathological Records

Dataset

Deep Learning Models

Software and Statistical Analysis

Availability of Data and Material

Results

High AUC Performance of Gastric ESD, Biopsy, and Surgical Specimen WSI Evaluation of Gastric Poorly Ddifferentiated ADC HistopathologyImages

True Positive Gastric Poorly Differentiated ADC Prediction on ESD WSIs

True Negative Gastric Poorly Differentiated ADC Prediction on ESD WSIs

False Positive Gastric Poorly Differentiated ADC Prediction on ESD WSIs

False Negative Gastric Poorly Fifferentiated ADC Prediction on ESD WSIs

True Positive and True Negative Gastric Poorly Differentiated ADC Prediction on Surgical Specimen WSIs

The Application of Deep Learning Models to Classify Gastric Poorly Differentiated ADC on Various Type of Specimen

Discussion

Conclusion

Footnotes

Acknowledgements

Authors’ Contribution

Ethical Approval

Declaration of Conflicting Interests

Informed Consent

Funding

ORCID iD

References