Abstract
Keywords
Introduction
According to the Global Cancer Statistics 2020, stomach cancer was responsible for 1 089 103 new cases (5.6% of all sites) and an estimated 768 793 deaths (7.7% of all sites) in 2020, ranking fifth for incidence and fourth for mortality globally. 1 The so-called diffuse type adenocarcinoma (ADC) and signet ring cell carcinoma (SRCC) of stomach are poorly differentiated cancers (ADCs) which are believed to show poor prognosis and aggressive behavior. 2 Histopathologically, a diffuse growth of ADC cells is observed, associated with extensive fibrosis and inflammation and often the entire gastric wall is involved. Although a foveolar (intramucosal) type of SRCC occurs, in many cases of this entity the gastric mucosa is less affected than the deeper layers.3–5 Therefore, poorly differentiated ADCs can often be mistaken for a variety of nonneoplastic lesions including gastritis, xanthoma/foamy histiocytes, or reactive endothelial cells in granulation tissues. Curative rates of endoscopic treatment for poorly differentiated type early gastric cancer are lower than those for the differentiated type ADC. 6 Endoscopic submucosal dissection (ESD) was developed in the late 1990s and has been widely used for early gastric cancer worldwide. ESD allows en bloc resection and precise histopathological inspection, while being a less invasive treatment than surgical resection. 7 Endoscopic resection is considered for tumors that have a very low possibility of lymph node metastasis and are suitable for en bloc resection. 8 According to the Japanese Gastric Cancer Treatment Guidelines 2018 (fifth edition), 9 the absolute indication of ESD is a differentiated type ADC with (UL1) or without (UL0) ulcerative findings and the expanded indication is a poorly differentiated (undifferentiated) ADC without ulcerative findings (UL0). Even in differentiated type ADC-dominant ESD specimens, it is important to inspect poorly differentiated ADC for the decision making of future therapeutic strategy. 10 Therefore, surgical pathologists are always on the lookout for signs of poorly differentiated ADC when evaluating gastric ESD.
In the field of computational pathology as a computer-aided detection (CADe) or computer-aided diagnosis (CADx), deep learning models have been widely applied in the histopathological cancer classification on whole-slide images (WSIs), cancer cell detection and segmentation, and the stratification of patient clinical outcomes.11–24 Previous studies have looked into applying deep learning models for ADC classification in stomach,24–26 and for gastric poorly differentiated ADC classification on WSIs.25,26 However, the existing poorly differentiated ADC models did not classify poorly differentiated ADC well on gastric ESD WSIs.
In this study, we trained a deep learning model for the classification of gastric poorly differentiated ADC on ESD WSIs. We evaluated the trained model on ESD, endoscopic biopsy, and surgical specimen WSI test sets, achieving an ROC-AUC up to 0.975 in gastric ESD test sets, 0.960 in endoscopic biopsy test sets, and 0.929 in surgical specimen test sets. These findings suggest that deep learning models might be very useful as routine histopathological diagnostic aids for inspecting gastric ESD to detect poorly differentiated ADC precisely.
Materials and Methods
Clinical Cases and Pathological Records
This is a retrospective study. A total of 5103 H&E (hematoxylin & eosin) stained gastric histopathological specimens (2506 ESD, 1866 endoscopic biopsy, and 731 surgical specimen) of human poorly differentiated ADC, differentiated ADC, and nonneoplastic lesions were collected from the surgical pathology files of 6 hospitals: Sapporo-Kosei General Hospital (Hokkaido, Japan), Kamachi Group Hospitals (Wajiro, Shinyukuhashi, Shinkuki, and Shintakeo Hospitals) (Fukuoka, Japan), and International University of Health and Welfare, Mita Hospital (Tokyo, Japan) after histopathological review of those specimens by surgical pathologists. The cases were selected randomly, so as to reflect a real clinical setting as much as possible. The nonneoplastic lesions consisted of 1018 WSIs that were almost normal mucosa or regenerative mucosa and 2024 WSIs that were inflammatory lesions. Each WSI diagnosis was observed by at least 2 pathologists, with the final checking and verification performed by a senior pathologist. All WSIs were scanned at a magnification of 20
Dataset
Hospitals which provided histopathological specimen slides in the present study were anonymized (Hospital-A-F). Table 1 breaks down the distribution of training and validation sets of gastric ESD WSIs from Hospital-A. Validation sets were selected randomly from the training sets (Table 1). The test sets consisted of ESD, biopsy, and surgical specimen WSIs (Table 2). The patients’ pathological records were used to extract the WSIs’ pathological diagnoses and to assign WSI labels. Training set WSIs were not annotated, and the training algorithm only used the WSI diagnosis labels, meaning that the only information available for the training was whether the WSI contained gastric poorly differentiated ADC or was nonpoorly differentiated ADC (differentiated ADC and nonneoplastic lesion), but no information about the location of the cancerous tissue lesions. We have confirmed that surgical pathologists were able to diagnose test sets in Table 2 from visual inspection of the H&E stained slide WSIs alone.
Distribution of Gastric Endoscopic Submucosal Dissection (ESD) Whole-Slide Images (WSIs) in the Training and Validation Sets Obtained From the Hospital-A.
Distribution of Gastric Endoscopic Submucosal Dissection (ESD), Endoscopic Biopsy, and Surgical Specimen Whole-Slide Images in the Test Sets Obtained From 6 Hospitals (A-F).
Abbreviations: ADC, adenocarcinoma; WSIs, whole-slide images.
Deep Learning Models
To train our models, we used weakly supervised learning and transfer learning. The former makes it possible to train a model using only slide-level labels without the need to do laborious cell-level annotations. The latter allows speeding up training by starting from a model with pre-trained weights on another task without having to start training from scratch.
We trained the models via transfer learning using the partial fine-tuning approach. 27 This is an efficient fine-tuning approach that consists of using the weights of an existing pre-trained model and only fine-tuning the affine parameters of the batch normalization layers and the final classification layer. For the model architecture, we used EfficientNetB1 28 starting with pre-trained weights of the previous Biopsy-poorly ADC model, which in turn had been initialized from ImageNet weights when it was fine-tuned. Figure 1 shows an overview of the training method. The training methodology that we used in the present study was the same as reported in our previous studies.29–31 For completeness, we repeat the methodology here.

Schematic diagrams of training methods. (A) The simple summary of training method using transfer learning and weakly supervised learning for this study. During training (B), we iteratively alternated between inference and training. During the inference step, the model weights were frozen and the model was used to select tiles with the highest probability after applying it on the entire tissue regions of each hole-slide image (WSI). The top k tiles with the highest probabilities were then selected from each WSI and placed into a queue. During training, the selected tiles from multiple WSIs formed a training batch and were used to train the model.
We performed slide tiling by extracting square tiles from tissue regions of the WSIs. We started by detecting the tissue regions in order to eliminate most of the white background. We did this by performing a thresholding on a grayscale version of the WSIs using Otsu’s method.
32
During prediction, we performed the tiling of the tissue regions in a sliding window fashion, using a fixed-size stride (
To maintain the balance on the WSI, we oversampled from the WSIs to ensure the model trained on tiles from all of the WSIs in each epoch. We then switched to hard mining of tiles. To perform the hard mining, we alternated between training and inference. During inference, the CNN was applied in a sliding window fashion on all of the tissue regions in the WSI, and we then selected the
To obtain a single prediction for the WSIs from the tile predictions, we took the maximum probability from all of the tiles. This simple approach proved effective. Using a recurrent neural network would have been another approach for aggregating the probabilities.
We used the Adam optimizer,
33
with the binary cross-entropy as the loss function, with the following parameters:
During training we performed data augmentation on the fly, during tile sampling. The augmentation consisted in randomly modifying the brightness, contrast, hue, and saturation of the tiles, as well as adding JPEG artifacts. This allows the model to be less sensitive to color and image quality and to focus more on content.
Software and Statistical Analysis
The deep learning models were implemented and trained using TensorFlow version 2.5. 34 AUCs were calculated in python using the scikit-learn package 35 and plotted using matplotlib. 36 The 95% CIs of the AUCs were estimated using the bootstrap method 37 with 1000 iterations.
Availability of Data and Material
The datasets generated during and/or analyzed during the current study are not publicly available due to specific institutional requirements governing the privacy protection but are available from the corresponding author on reasonable request. The datasets that support the findings of this study are available from Sapporo-Kosei General Hospital (Hokkaido, Japan), Kamachi Group Hospitals (Fukuoka, Japan), and International University of Health and Welfare, Mita Hospital (Tokyo, Japan), but restrictions apply to the availability of these data, which were used under a data use agreement which was made according to the Ethical Guidelines for Medical and Health Research Involving Human Subjects as set by the Japanese Ministry of Health, Labour and Welfare (Tokyo, Japan), and so are not publicly available. However, the data are available from the authors upon reasonable request for private viewing and with permission from the corresponding medical institutions within the terms of the data use agreement and if compliant with the ethical and legal requirements as stipulated by the Japanese Ministry of Health, Labour and Welfare.
Results
High AUC Performance of Gastric ESD, Biopsy, and Surgical Specimen WSI Evaluation of Gastric Poorly Ddifferentiated ADC HistopathologyImages
The aim of this retrospective study was to train a deep learning model for the classification of gastric poorly differentiated ADC in ESD WSIs. We have achieved high ROC-AUC performances in the ESD test sets (0.955 and 0.975) (Figure 2A and Table 3).

ROC curves with AUCs from trained gastric endoscopic submucosal dissection (ESD) poorly differentiated ADC deep learning model (ESD-poorly ADC model) and existing biopsy model (Biopsy-poorly ADC model) on the seven test sets: (A) the newly trained gastric ESD poorly differentiated ADC classification model (ESD-poorly ADC model) with tile size 224 px and magnification at 20x; (B) the existing biopsy gastric poorly differentiated ADC classification model (Biopsy-poorly ADC model) with tile size 224 px and magnification at 20x. Abbreviations: ROC, Receiver Operator Characteristic; AUCs, the area under the curve; ADC, adenocarcinoma.
The Comparison of ROC-AUC and Log Loss Results for Poorly Differentiated Adenocarcinoma (ADC) Classification on Various Test Sets between Trained Gastric Endoscopic Submucosal Dissection (ESD) Poorly Differentiated ADC deep Learning Model (ESD-Poorly ADC Model) and Existing Biopsy Model (Biopsy-Poorly ADC Model).
Abbreviations: ROC, Receiver Operator Characteristic; AUCs, the area under the curve; SRCC, signet ring cell carcinoma.
The models were applied in a sliding window fashion with an input tile size and stride of 224
The comparison of Scores of Accuracy, Sensitivity, and Specificity on the Various Test Sets Between Trained Gastric Endoscopic Submucosal Dissection (ESD) Poorly Differentiated ADC Deep Learning Model (ESD-Poorly ADC Model) and Existing Biopsy Model (Biopsy-Poorly ADC Model).
Abbreviations: SRCC, signet ring cell carcinoma; ADC, adenocarcinoma.
The Number of True Positives and False Positives in Each Test Set for the Two Different Models.
Abbreviations: ESD, endoscopic eubmucosal dissection; SRCC, signet ring cell carcinoma; ADC, adenocarcinoma; WSIs, whole-slide images.
True Positive Gastric Poorly Differentiated ADC Prediction on ESD WSIs
Our model (ESD-poorly ADC model) satisfactorily predicted gastric poorly differentiated ADC in ESD WSIs (Figure 3). According to the histopathological reports and additional pathologists’ reviewing, gastric poorly differentiated ADC cells were infiltrating in the neck area of gastric glands (Figure 3A and C). The heatmap image (Figure 3B) shows the true positive predictions of poorly differentiated ADC cells (Figure 3D) without false positive predictions in nonneoplastic areas. Histopathologically, there were poorly differentiated ADC cells which exhibited intramucosal invasive manners among differentiated type ADC with tubular structures (Figure 3E and G). The heatmap image (Figure 3F) shows true positive predictions of poorly differentiated ADC cells (Figure 3H).

Two representative histopathological images of poorly differentiated adenocarcinoma (ADC) true positive prediction outputs on whole-slide images (WSIs) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). In the gastric poorly differentiated ADC of ESD specimen (A), poorly differentiated ADC cells were infiltrating in the neck area of gastric gland (C). The heatmap image (B) shows true positive predictions of gastric poorly differentiated ADC cells (D) which correspond respectively to H&E histopathology (C). According to the histopathological report, (E) has differentiated and poorly differentiated ADC (G). The heatmap image (F) shows true positive prediction of gastric poorly differentiated ADC cells (H) which correspond respectively to H&E histopathology (G). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.
True Negative Gastric Poorly Differentiated ADC Prediction on ESD WSIs
Our model (ESD-poorly ADC model) shows true negative predictions of gastric poorly differentiated ADC in ESD WSIs (Figure 4A and B). Histopathologically, there was no evidence of the presence of poorly differentiated ADC cells in all tissue fragments (#1-#3) which were nonneoplastic lesions with gastritis with ulcer formation (Figure 4B) and were not predicted as gastric poorly differentiated ADC (Figure 4C).

Representative true negative gastric poorly differentiated adenocarcinoma (ADC) prediction outputs on a whole slide image (WSI) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). Histopathologically, in (A), all tissue fragments (#1-#3) were nonneoplastic lesions with ulcerative gastritis (B). The heatmap image (C) shows true negative prediction of gastric poorly differentiated ADC. The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.
False Positive Gastric Poorly Differentiated ADC Prediction on ESD WSIs
According to the histopathological report and additional pathologists’ reviewing, there were no gastric poorly differentiated ADC in these ESD fragments (#1-#3) which were nonneoplastic specimens (Figure 5A). Our model (ESD-poorly ADC model) showed the false positive predictions of poorly differentiated ADC (Figure 5B). The false positively predicted areas (Figure 5C and D) showed lymphatic tissue cells (eg, lymphocyte, tingible body macrophage, and follicular dendritic cells) in the artificially collapsed lymphoid follicle, which could be the primary cause of false positives due to its morphological similarity in poorly differentiated ADC cells.

A representative example of gastric poorly differentiated adenocarcinoma (ADC) false positive prediction outputs on a whole slide image (WSI) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). Histopathologically, all tissue fragments (#1-#3) in (A) were nonneoplastic lesions. In the tissue fragment #2, The heatmap image (B) exhibits false positive predictions of gastric poorly differentiated ADC (D) on the lymphatic tissue cells (C) in lymphoid follicle which was artificially collapsed during the specimen processing procedures. The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.
False Negative Gastric Poorly Fifferentiated ADC Prediction on ESD WSIs
According to the histopathological report and additional pathologists’ reviewing, there were the foveolar-type signet ring cell carcinoma (SRCC) cells 3 in the superficial layer of an ESD fragment (#1) (Figure 6A and C) where pathologists marked with red-dots. Our model (ESD-poorly ADC model) did not predict poorly differentiated ADC cells (Figure 6B and D). For comparison, we demonstrated predictions by our model (ESD-poorly ADC model) on 13 endoscopic biopsy WSIs with the presence of SRCC cells, which have been false negatively predicted as SRCC by existing SRCC model. 26 Interestingly, there were 4 out of 13 WSIs with the presence of the foveolar-type SRCC cells, which were also false negatively predicted as SRCC by our model (ESD-poorly ADC model). On the other hand, 9 out of 13 WSIs with the presence of SRCC cells were true positively predicted as poorly differentiated ADC by our model (ESD-poorly ADC model). Therefore, there is a limitation of our model (ESD-poorly ADC model) to predict the foveolar-type SRCC cells precisely.

A representative example of gastric poorly differentiated adenocarcinoma (ADC) false negative prediction output on a whole slide image (WSI) from gastric endoscopic submucosal dissection (ESD) test sets using the model (ESD-poorly ADC model). Histopathologically, this case (A) has the foveolar-type signet ring cell carcinoma cell infiltration in the small area of superficial layer in the fragment #1 (C). The other 2 tissue fragments (#2 and #3) were nonneoplastic lesions (A). The heatmap image (B) exhibited no positive poorly differentiated ADC prediction (D). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.
True Positive and True Negative Gastric Poorly Differentiated ADC Prediction on Surgical Specimen WSIs
In addition, we have applied the model (ESD-poorly ADC model) on the surgical specimen WSIs. Figure 7 shows an example of surgical gastric poorly differentiated ADC case with 5 serial section WSIs (#1-#5) (Figure 7A, C, E, G and I). We see the model (ESD-poorly ADC model) was capable of true positive (#1, #2, #4 and #5) (Figure 7A-D and G-J) and true negative (#3) (Figure 7E and F) poorly differentiated ADC detection on such section WSIs. Histopathologically, gastric poorly differentiated ADC cells invading areas (Figure 7K, M, O and Q) were visualized by heatmap images (Figure 7L, N, P and R).

A representative surgically resected gastric poorly differentiated adenocarcinoma (ADC) case serial specimens. According to the histopathological report, serial specimen #1 (A), #2 (C), #4 (G), and #5 (I) have poorly differentiated ADC and #3 (E) is nonneoplastic lesion. The heatmap images show true positive predictions of gastric poorly differentiated ADC cells (B, D, H, J, L, N, P, and R) which correspond respectively to H&E histopathology (A, C, G, I, K, M, O, and Q) and true negative prediction (F). The heatmap uses the jet color map where blue indicates low probability and red indicates high probability.
The Application of Deep Learning Models to Classify Gastric Poorly Differentiated ADC on Various Type of Specimen
Based on the findings in this study and previous study, 25 we have summarized the possible application of our deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) for classification of gastric poorly differentiated ADC in various types of specimen WSIs (Figure 8). We can apply the model (ESD-poorly ADC model) for all types of specimen (ESD, biopsy, and surgical specimens), however, for biopsy specimen, the biopsy model (Biopsy-poorly ADC model) achieved slightly better performance than the ESD model (ESD-poorly ADC model).

The schematic diagram of possible application of deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) for classification of gastric poorly differentiated adenocarcinoma (ADC) in different type of specimens. ESD-poorly ADC model can be applied to classify gastric poorly differentiated ADC in biopsy, endoscopic submucosal dissection (ESD), and surgical specimen. Biopsy-poorly ADC model can be applied only in biopsy specimen. To classify gastric poorly differentiated ADC in biopsy specimen, both ESD-poorly ADC model and Biopsy-poorly ADC model can be used, however, Biopsy-poorly ADC model would be able to achieve higher ROC-AUC performance than ESD-poorly ADC model.
Discussion
In this study, we trained a deep learning model for the classification of gastric poorly differentiated ADC in gastric ESD WSIs. Indications for gastric ESD were determined by the presence or absence of a risk of nodal metastasis and according to the gastric cancer treatment guidelines.9,10 As an expanded indication, the gastric poorly differentiated ADC without ulcerative findings (UL0) in which the depth of invasion is clinically diagnosed as T1a (cT1a) and the diameter is
If the diagnosis of a case in a biopsy specimen is well-differentiated adenocarcinoma, but the diagnosis from an ESD specimen from the same patient showed a mixture of poorly differentiated types, then the diagnosis is poorly differentiated adenocarcinoma. Therefore, a model that detects the presence of poorly differentiated adenocarcinoma in ESD specimens would be highly useful.44–46 Importantly, the model (ESD-poorly ADC model) achieved high ROC-AUC (0.929) in surgical specimen test sets (Tables 2 and 3) and predicted poorly differentiated ADC cell infiltrating area precisely in the serial surgical sections (Figure 7), which would be very useful to inspect presence or absence of poorly differentiated ADC in massive number of surgical serial sections by pathologists in routine clinical workflow. Moreover, the model (ESD-poorly ADC model) also achieved high ROC-AUC values in endoscopic biopsy test sets as compared to the existing biopsy model (Biopsy-poorly ADC model) (Tables 2 and 3). Thus, for endoscopic biopsy specimens, both deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) can classify poorly differentiated ADC precisely, however, the biopsy model (Biopsy-poorly ADC model) achieved slightly better ROC-AUC values (Table 3), so that it would be better to apply the biopsy model (Biopsy-poorly ADC model) for endoscopic biopsy specimens in the routine workflow (Figure 8).
One of the limitations of this study is that the deep learning models (both ESD-poorly ADC model and Biopsy-poorly ADC model) false negatively predicted poorly differentiated ADC cells (SRCC cells) in foveolar-type SRCC biopsy and ESD WSIs. 3 In early stage, SRCC cells proliferate predominantly in the proliferative zone (near the mucous neck cells), 47 which were consistently false negatively predicted as poorly differentiated ADC. To predict foveolar-type SRCC precisely, we need to collect a number of foveolar-type SRCC biopsy and ESD cases for additional training or active learning. 48 Another limitation of this study is that it primarily included specimens from a limited number of hospitals and suppliers in Japan, and, therefore, the model could potentially be biased to such specimens. Further validation, conducted as randomized trials, on a wide variety of specimens from multiple different origins would be essential to ensure the robustness of the model.
Conclusion
The deep learning model established in the present study offers promising results that indicate it could be beneficial as a screening aid for pathologists prior to observing gastric ESD histopathology on glass slides or WSIs. The combination of the deep learning models (ESD-poorly ADC model and Biopsy-poorly ADC model) can cover to predict gastric poorly differentiated ADC precisely in ESD, endoscopic biopsy, and surgical specimen WSIs. At the same time, the model could be used as a double-check tool to reduce the risk of missed poorly differentiated ADC cells. The most important advantage of using a fully automated computational tool as a computer-aided diagnosis is that it can systematically handle large amounts of WSIs without potential bias due to the fatigue commonly experienced by pathologists.
Footnotes
Acknowledgements
This study is based on results obtained from a project, JPNP14012, subsidized by the New Energy and Industrial Technology Development Organization (NEDO). We are grateful for the support provided by Professor Takayuki Shiomi at Department of Pathology, Faculty of Medicine, International University of Health and Welfare; Dr Ryosuke Matsuoka at Diagnostic Pathology Center, International University of Health and Welfare, Mita Hospital; and Dr Shigeo Nakano at Kamachi Group Hospitals (Fukuoka, Japan). We thank pathologists who have been engaged in reviewing cases for this study.
Authors’ Contribution
MT and FK designed the studies; MT and FK performed experiments and analyzed the data; MT and FK performed computational studies; MT and FK wrote the manuscript; MT supervised the project. All authors reviewed and approved the final manuscript.
Ethical Approval
The experimental protocol was approved by the institutional review board of the Sapporo-Kosei General Hospital (No. 576), International University of Health and Welfare (No. 19-Im-007), and Kamachi Group Hospitals (No. 173). All research activities complied with all relevant ethical regulations and were performed in accordance with relevant guidelines and regulations in the all hospitals mentioned above.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MT and FK are employees of Medmain Inc.
Informed Consent
Informed consent to use histopathological samples and pathological diagnostic reports for research purposes had previously been obtained from all patients prior to the surgical procedures at all hospitals, and the opportunity for refusal to participate in research had been guaranteed by an opt-out manner.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article:This study is based on results obtained from a project, JPNP14012, subsidized by the New Energy and Industrial Technology Development Organization (NEDO).
