Sage Journals: Discover world-class research

Abstract

Objective

Colorectal neuroendocrine tumors and polyps share similar endoscopic features, often resulting in misdiagnosis. As neuroendocrine tumors are rare, obtaining a sufficient number of images for deep learning models is challenging.

Methods

This study introduces a few-shot learning model for ternary classification of neuroendocrine tumors, serrated lesions and polyps, and traditional adenomas using endoscopic images. Three groups of images (56 serrated lesions and polyps, 86 adenomas, and 53 neuroendocrine tumors) were collected retrospectively and divided into Support Sets and Query Sets. The proposed few-shot learning model involved transfer learning using ResNet50 V2 pretrained on ImageNet and esophageal endoscopic images, followed by metric learning based on Euclidean distances and K-nearest neighbor classification.

Results

Evaluated across three rounds, the few-shot learning model outperformed conventional deep learning models and both junior and senior endoscopists in several metrics, achieving an average macro–area under the curve of 0.731, macro–F1-score of 0.674, Matthews correlation coefficient of 0.526, and Cohen’s kappa of 0.523. When identifying neuroendocrine tumors specifically, the model achieved the highest accuracy (0.823), sensitivity (0.653), precision (0.673), and F1-score (0.659).

Conclusions

The few-shot learning approach effectively addresses data scarcity issues and improves diagnostic accuracy, offering a promising tool for computer-aided diagnosis of rare gastrointestinal diseases.

Keywords

Neuroendocrine tumors adenoma serrated lesions and polyps few-shot learning artificial intelligence colonoscopy

Introduction

Neuroendocrine tumors (NETs) are well-differentiated neuroendocrine neoplasms originating from neuroendocrine cells and are rarely found in the colorectum.¹ They are generally diagnosed and staged using endoscopic ultrasonography (EUS), imaging techniques such as computed tomography (CT), or biomarkers such as Ki-67 index.² White-light imaging (WLI) endoscopy alone cannot determine the nature or origin of lesions; therefore, it cannot be used to diagnose NETs. However, endoscopy serves as the basis for identifying submucosal tumors (SMTs), which allows for an initial assessment. The decision to perform EUS and subsequent imaging examinations often depends on the endoscopist’s preliminary judgment during endoscopy. If NETs are suspected, EUS is typically conducted to confirm the diagnosis.

Colorectal polyps, which protrude into the colorectal lumen, are commonly encountered in clinical practice. According to the 2019 World Health Organization (WHO) classification of digestive system tumors,³ colorectal polyps can be categorized into serrated lesions and polyps (SLPs) and traditional adenomas. SLPs are characterized by a serrated (sawtooth or stellate) architecture of the crypt epithelium and can be further classified histologically as hyperplastic polyps (HP), sessile serrated lesions (SSL), traditional serrated adenomas (TSA), and unclassified serrated adenomas. Traditional adenomas are benign, premalignant neoplasms composed of dysplastic epithelium and are classified into tubular adenomas, villous adenomas, and tubulovillous adenomas.

Colorectal NETs and polyps exhibit distinct features under endoscopy, enabling endoscopists to conduct a preliminary differential diagnosis. However, NETs <10 mm with atypical endoscopic findings, clear boundaries, and polyp-like protrusions are difficult to differentiate.^4,5 Moreover, endoscopy is a labor-intensive procedure, which may lead to misdiagnosis of colorectal NETs as adenomas or atypical HP.⁶ A multicenter study reported that fewer than 20% of French endoscopists suspected NETs during endoscopic procedures, resulting in misdiagnosis.⁷ Therefore, it is necessary to develop an objective clinical diagnostic technique that can distinguish NETs from polyps under endoscopy to guide subsequent diagnosis and treatment strategies.

Deep learning (DL), which consists of multilayer neural networks, can transform low-level, simple features into high-level, complex features, making it suitable for image processing.^8,9 Owing to its excellent performance in image segmentation, detection, and classification, DL has been widely applied in the field of digestive system imaging.^10–12 However, conventional supervised DL typically requires large, balanced, and well-annotated image datasets for training. For rare diseases such as colorectal NETs,¹³ collecting a sufficient number of endoscopic images is costly and time-consuming, and models trained on small datasets are prone to overfitting. Moreover, subtle interclass and intraclass overlap between NETs and polyps under white-light endoscopy (WLE) limits the benefits of standard augmentation alone.

To address these issues, few-shot learning (FSL) has been proposed. As the name implies, FSL represents a method that identifies and classifies novel, previously unseen samples by learning rapidly and accurately from a small number of base-labeled examples.¹⁴ It has shown considerable progress in overcoming small-sample and domain-generalization challenges in the medical field. This approach can be applied effectively with few per-class examples and can extend to new classes without full retraining. Therefore, we hypothesized that FSL would be suitable for classifying colorectal NETs and polyps under conditions of limited data availability.

This study aimed to develop an FSL-based model that integrates transfer learning with metric learning for the classification of colorectal SLPs, adenomas, and NETs based on endoscopic images, thereby enabling accurate identification of colorectal SMTs and polyps.

Materials and methods

Study design

The overall design process of this study is summarized as follows: 1. Endoscopic images of SLPs, adenomas, and NETs were collected from multiple centers and preprocessed. 2. A three-way, three-shot dataset was constructed, comprising a Support Set (SS) that included three images each of SLPs, adenomas, and NETs and a Query Set (QS) that included 53 SLPs, 83 adenomas, and 50 NETs. 3. A residual network (ResNet50 V2), initially trained on ImageNet, was selected as the DL model for extracting the eigenvectors. 4. The model was subsequently pretrained using esophageal endoscopic images to capture more detailed endoscopic features, and its performance was visualized through Gradient-weighted Class Activation Mapping (Grad-CAM). 5. The dual-pretrained ResNet50 model was then used to extract eigenvectors from the SS and QS. 6. The Euclidean distance was calculated between each QS image and the nine SS images based on the eigenvectors. 7. K-nearest neighbor (KNN) algorithm was used for classification. 8. The FSL process was conducted in three rounds, and its performance was compared with that of the conventional ResNet50 model as well as junior and senior endoscopists.

Steps 3–5 involved transfer learning, whereas steps 6 and 7 involved metric learning. The FSL model consisted of SS and QS based on the combination of transfer learning and metric learning. The study flowchart is presented in Figure 1. Institutional Review Board approval (approval number: 2022098) was obtained for the retrospective collection and analysis of data. Additional Institutional Review Board approval was not required because the data were de-identified. The study was conducted in accordance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM).

Figure 1.

The flowchart of the study. The three-way three-shot FSL model consists of SS and QS based on transfer learning and metric learning. FSL: few-shot learning; SS: Support Set; QS: Query Set.

Datasets and sets

Datasets

Endoscopic images of SLPs, adenomas, and NETs were obtained from the First Affiliated Hospital of Soochow University, Changshu No.1 People’s Hospital, and Yangchenghu People’s Hospital. All images were captured using Olympus endoscopy systems under white-light and nonmagnifying imaging and confirmed by senior endoscopists and pathologists. Each image corresponded to a single patient. To prevent image-level data leakage, we ensured that the dataset was divided at the patient level. All images were processed as red–green–blue (RGB) three-channel inputs. Each image file was decoded from the JPEG format, resized to 331 × 331 pixels, converted to 32-bit floating-point tensors, and normalized to [0,1] intensity via dividing by 255. No image augmentation was applied. The inclusion criteria are provided in Supplementary Method 1.

In addition, esophageal endoscopic images (normal (n = 429), early cancer (n = 489), and advanced cancer (n = 594)) were obtained under WLI at The First Affiliated Hospital of Soochow University for dual pretraining. These images were augmented and randomly divided into training and validation sets in a ratio of 8:2. The dataset of esophageal endoscopic images used for dual pretraining is shown in Figure 2(a). The definitions of esophageal early cancer and advanced cancer are presented in Supplementary Method 2.

Figure 2.

Datasets of endoscopic images. (a) Esophageal endoscopic images were randomly allocated into training set and validation set for subsequently pretraining the DL model and (b) the SS consisted of three base classes of SLPs, adenomas, and NETs, with each class containing three images. The QS consisted of 186 images to be classified. DL: deep learning; SLPs: serrated lesions and polyps; NETs: neuroendocrine tumors; SS: Support Set; QS: Query Set.

SS and QS

FSL is a form of supervised machine learning (ML) composed of an SS used for training and a QS used for testing. The SS contains a K-way N-shot setting, where K represents the number of classes and N represents the number of images in each class. The QS contains images from the remaining samples.

In this study, the SS consisted of three classes, with each class containing three images (SLPs, n = 3; adenomas, n = 3; and NETs, n = 3), forming a three-way, three-shot setting. The remaining images were assigned to the QS (SLPs, n = 53; adenomas, n = 83; and NETs, n = 50). The SS and QS settings used in the FSL model are shown in Figure 2(b). Additional shot settings were explored, including 3-way 5-shot, 3-way 7-shot, and 3-way 10-shot settings.

Transfer learning

DL models trained on large datasets such as ImageNet (https://image-net.org/) are generally highly transferable to image classification and recognition tasks. In this study, because the images in the source and target datasets differed in type, transfer learning was performed to achieve a domain shift and enhance the feature extraction capability for endoscopic images. The transfer learning process, as illustrated in Figure 3, involved the original ResNet50 V2 model, dual pretraining, and feature extraction.

Figure 3.

The structure of transfer learning. (a) Structure of the dual-pretrained ResNet50 V2 model: The model employed residual blocks as the fundamental module with an output of a 1 × 64 eigenvector. In each residual block, the BN-ReLu-Conv structure is used. (b) Feature extraction: For nine images in SS (S₁_–9), their corresponding eigenvectors FS_1–9, each containing 64 features, were extracted through the dual-pretrained ResNet50 V2. For any image Q_n (n = 1–186) in QS, its corresponding eigenvector FQ_n was extracted. SS: Support Set; QS: Query Set; BN: batch normalization; ReLU: rectified linear unit; Conv: convolutional layer.

ResNet50 V2, a DL model, was used to extract eigenvectors from endoscopic images.¹⁵ It was first pretrained on ImageNet and subsequently pretrained on a ternary classification task using esophageal endoscopic images to better identify endoscopic features. The last fully connected layer (FCL) and classification layer (1000 categories) of the single pretrained ResNet50 V2 model were truncated and replaced with a new FCL and classifier layer (three categories of real-world endoscopic images). For each image, a 1 × 64 eigenvector was obtained using the dual-pretrained ResNet50 V2 model. In the SS, nine images were labeled S₁–S₉, and their corresponding eigenvectors were labeled FS₁–FS₉. In the QS, 183 images were labeled Q₁–Q₁₈₃, with corresponding eigenvectors labeled FQ₁–FQ₁₈₃. The performance of the dual-pretrained model was further validated using Grad-CAM to visualize the extracted features.¹⁶

Metric learning

In image classification, selecting appropriate metrics to evaluate the distance or similarity between images is both essential and challenging. This has led to the development of metric learning.¹⁷ The objective of metric learning is to learn a similarity or distance metric that maximizes interclass variation and minimizes intraclass variation, allowing images to be classified based on their distance values or similarities. A classical prototypical network for FSL based on metric learning has been proposed,¹⁸ wherein the mean eigenvector of each SS image is calculated as its corresponding prototype representation. The similarity between each QS image and prototype is then used for classification using a nearest neighbor classifier.

In this study, the similarity between QS and SS images was calculated using Euclidean distance. For each QS image, nine Euclidean distance values (d₁, d₂…d₈, d₉) were computed based on the eigenvectors between the QS image and nine SS images, which were then used for further analysis. A classifier based on the KNN algorithm^19,20 was subsequently applied for classification according to similarity. For each QS image, its category was determined by the k-nearest SS images. Figure 4 illustrates the overall process of metric learning. A detailed description of the methodology and model parameters is provided in Supplementary Method 3.

Figure 4.

The process of metric learning. (a) Euclidean distance: For nine images S_1–9 in SS and one image Q in QS, the corresponding eigenvectors FS_1–9 and FQ were obtained. The Euclidean distance d₁ = $\sqrt{\sum_{i = 1}^{64} {({(F_{S_{1}})}_{i} - {(F_{Q})}_{i})}^{2}}$ was then calculated. In metric space, FS₁ and FQ were projected as points, and the Euclidean distance d₁ was defined as the distance between them, as were d_2–9. (b) KNN algorithm: The nine distance-based features of each image in QS were input into a KNN classifier, and predictions were made according to the K value. In this figure, K was calculated as 4, and the four nearest SS images were S₂, S₇, S₈, and S₉, indicating that image Q was predicted as NETs. SS: Support Set; QS: Query Set; KNN: K-nearest neighbor; NETs: neuroendocrine tumors.

Statistical analysis

Python software (version 3.9) and TensorFlow (version 2.8.0) were used to fit the FSL model. Statistical analysis was performed using R (version 4.2.2).

A confusion matrix was used to visualize the classification performance of DL models and endoscopists. Accuracy, recall, specificity, precision, and F1-score were used to evaluate the classification performance for each category (SLPs, adenomas, and NETs). The Matthews correlation coefficient (MCC), macro–area under the curve (macro-AUC), macro–F1-score, and Cohen’s kappa²¹ were used to comprehensively evaluate and compare the performance between the FSL model and other models. Details of each indicator are provided in Supplementary Method 4.

Two endoscopists (one junior endoscopist with <5 years of experience and one senior endoscopist with ≥15 years of experience) were blinded to the diagnoses and independently classified the QS images into SLPs, adenomas, and NETs. McNemar’s test was used to compare the FSL model with both the conventional ResNet50 V2 model and endoscopists.

Results

Baseline characteristics of the collected NETs and polyps

A total of 195 eligible images (SLPs, n = 56; adenomas, n = 86; and NETs, n = 53) were included in this study. Among the SLPs, HP (n = 36), SSL (n = 18), and TSA (n = 2) accounted for 64.3%, 32.1%, and 3.6%, respectively. Among the adenomas, tubular (n = 35), villous (n = 21), and tubulovillous (n = 30) adenomas accounted for 40.7%, 24.4%, and 34.9%, respectively. Among the NETs, 71.7% (n = 38) were well-differentiated, 24.5% (n = 13) were moderately differentiated, 3.8% (n = 2) were poorly differentiated. Detailed characteristics of the included images are provided in Supplementary Table 1.

Performance of the dual-pretrained ResNet50 V2 model

As shown in Figure 5, Grad-CAM heatmaps highlighted the abnormal regions of the endoscopic images based on the dual-pretrained ResNet50 V2 model, demonstrating its high reliability.

Figure 5.

Grad-CAM of dual-pretrained ResNet50 V2. Heatmaps of six examples generated using the Grad-CAM algorithm. The left panels show the original endoscopic images, the middle panels show heatmaps predicted by the dual-pretrained ResNet50 V2 model, and the right panels show combined original and predicted heatmap images. Grad-CAM: Gradient-weighted Class Activation Mapping.

Performance of the three-way three-shot FSL model

Three rounds were conducted to evaluate the performance of the proposed three-way three-shot FSL model. Figure 6(a) to (c) shows the confusion matrices of the FSL model in each round, which correctly identified an average of 29 SLPs, 67 adenomas, and 33 NETs.

Figure 6.

Confusion matrices of the DL models and endoscopists. (a) FSL model, round #1. (b) FSL model, round #2. (c) FSL model, round #3. (d) Traditional ResNet50 V2 model. (e) Senior endoscopist (experience ≥15 years) and (f) junior endoscopist (experience <5 years). FSL: few-shot learning; DL: deep learning.

When distinguishing among SLPs, adenomas, and NETs, the FSL model achieved average accuracies and F1-scores of 0.783 and 0.593, 0.785 and 0.772, and 0.823 and 0.659, respectively, as shown in Table 1.

Table 1.

Performance of the models and endoscopists in distinguishing between SLPs, adenomas, and NETs.

		Group	Accuracy	Recall	Specificity	Precision	F1-score
Dual-pretrained FSL	Round #1	SLPs	0.823	0.585	0.917	0.738	0.653
		Adenoma	0.828	0.819	0.835	0.800	0.810
		NETs	0.855	0.82	0.868	0.695	0.752
	Round #2	SLPs	0.763	0.491	0.872	0.605	0.542
		Adenoma	0.774	0.831	0.728	0.711	0.767
		NETs	0.817	0.620	0.890	0.674	0.646
	Round #3	SLPs	0.763	0.585	0.835	0.585	0.585
		Adenoma	0.753	0.783	0.728	0.699	0.739
		NETs	0.796	0.520	0.897	0.650	0.578
	Mean	SLPs	0.783	0.554	0.875	0.643	0.593
		Adenoma	0.785	0.811	0.763	0.736	0.772
		NETs	0.823	0.653	0.885	0.673	0.659
Traditional DL model	ResNet50 V2	SLPs	0.697	0	0.978	0	-
		Adenoma	0.538	0.872	0.275	0.487	0.625
		NETs	0.677	0.264	0.831	0.368	0.308
Endoscopists	Senior	SLPs	0.774	0.491	0.887	0.634	0.553
		Adenoma	0.629	0.735	0.544	0.565	0.639
		NETs	0.790	0.480	0.904	0.649	0.552
	Junior	SLPs	0.672	0.396	0.782	0.420	0.408
		Adenoma	0.538	0.590	0.495	0.485	0.533
		NETs	0.645	0.380	0.882	0.543	0.447

FSL: few-shot learning; SLPs: serrated lesions and polyps; NETs: neuroendocrine tumors; DL: deep learning.

The overall process of the three-way three-shot FSL model was conducted in three rounds, where it was compared with traditional ResNet50 V2, senior endoscopist, and junior endoscopist, respectively. The performance of the FSL model, traditional model, and endoscopists in identifying each class was presented.

Figure 7 illustrates the comprehensive classification performance. The FSL model achieved an average macro-AUC of 0.731, macro–F1-score of 0.674, MCC of 0.526, and Cohen’s kappa of 0.523, as shown in Table 2. Performance under different shot settings is presented in Supplementary Table 2 and Supplementary Figure 1, which showed no consistent improvement compared with the three-way three-shot setting. In addition, correctly and incorrectly classified cases are presented in Supplementary Figure 2, along with the corresponding Grad-CAM heatmaps and annotations of potentially misclassified cases.

Figure 7.

Visual performance of the ternary classification. The FSL model achieved superior classification performance compared with the traditional ResNet50 V2, senior endoscopist, and junior endoscopist. FSL: few-shot learning.

Table 2.

Performance of the models and endoscopists during the ternary classification task.

		Macro-AUC	Macro–F1-score	MCC	Cohen’s kappa
Dual-pretrained FSL	Round #1	0.769	0.738	0.620	0.620
	Round #2	0.708	0.651	0.496	0.490
	Round #3	0.715	0.634	0.463	0.460
	Mean	0.731	0.674	0.526	0.523
Traditional DL model	ResNet50 V2	0.549	0.311	0.108	0.085
Endoscopists	Senior	0.631	0.581	0.363	0.350
	Junior	0.526	0.462	0.175	0.173

AUC: area under the curve; MCC: Matthews correlation coefficient; FSL: few-shot learning; DL: deep learning.

The three-way three-shot FSL model was conducted in three rounds, where it was compared with traditional ResNet50 V2, senior endoscopist, and junior endoscopist, respectively. The comprehensive performance of the FSL model, traditional model, and endoscopists during the ternary classification task was evaluated using four indicators: macro-AUC, macro–F1-score, MCC, and Cohen’s kappa.

Comparison of the FSL model with the conventional ResNet50 V2 model and endoscopists

Figure 6(d) to (f) displays the confusion matrices of the conventional model and endoscopists. Table 1 presents their performance in distinguishing SLPs, adenomas, and NETs, while Figure 7 and Table 2 depict the comprehensive classification performance.

As shown in Table 1, when identifying SLPs, the best performance was achieved by the FSL model (accuracy: 0.783, recall: 0.554, precision: 0.643, and F1-score: 0.593), followed by senior endoscopist (accuracy: 0.774, recall: 0.491, precision: 0.634, and F1-score: 0.553). The FSL model also achieved the highest accuracy (0.785), specificity (0.763), precision (0.736), and F1-score (0.772) for adenoma identification. When identifying NETs, the FSL model achieved the highest accuracy (0.823), recall (0.653), precision (0.673), and F1-score (0.659).

As shown in Figure 7, the FSL model demonstrated better overall performance in terms of macro-AUC, macro–F1-score, MCC, and Cohen’s kappa than the conventional DL model and endoscopists. For the ternary classification task, the FSL model, conventional DL model, senior endoscopist, and junior endoscopist achieved macro-AUC values of 0.731, 0.549, 0.631, and 0.526; macro–F1-scores of 0.674, 0.311, 0.581, and 0.462; MCCs of 0.526, 0.108, 0.363, and 0.175; and Cohen’s kappa values of 0.523, 0.085, 0.350, and 0.173, respectively. McNemar’s test was used to compare the FSL model with both conventional DL model and endoscopists (Supplementary Table 3). Significant differences in classification performance were observed between the FSL model and conventional DL model (χ² = 7.21, p < 0.001) and between the FSL model and junior endoscopist (χ² = 4.21, p = 0.005). However, no significant difference was observed between the FSL model and senior endoscopist (χ² = 1.95, p = 0.051).

Discussion

To the best of our knowledge, this is the first study to use a three-way three-shot FSL strategy based on transfer learning and metric learning to distinguish between colorectal NETs and polyps. The ResNet50 V2 extractor was applied to obtain eigenvectors of endoscopic images, which were then converted into Euclidean distance vectors. A KNN classifier was used to categorize SLPs, adenomas, and NETs. Moreover, a dual pretraining method was innovatively adopted in this study to enhance the model’s ability to recognize and classify endoscopic images. Overall, the proposed method outperformed both the conventional DL model and endoscopists, demonstrating its feasibility for computer-aided diagnosis of NETs.

Differentiating colorectal NETs from polyps during endoscopy has long been a challenge in clinical practice, with important implications for subsequent diagnosis and treatment. The proposed FSL model provides a novel approach to assist endoscopists in the endoscopic diagnosis of colorectal NETs and polyps.

Endoscopic resection and biopsy are the preferred methods for diagnosing and treating polyps. Cold snare polypectomy (CSP) is an effective and safe resection option for small SLPs (<10 mm). For SLPs >10 mm, preoperative biopsy is necessary to determine the pathological type and assess malignancy before performing endoscopic mucosal resection (EMR). Endoscopic submucosal dissection (ESD) is recommended for larger SLPs (>20 mm), including those associated with dysplasia or invasive carcinoma.^22–24 For traditional adenomas—the most important precancerous lesions of colorectal cancer—endoscopic resection and biopsy are recommended regardless of size.

In contrast to the treatment strategy for polyps, the management of NETs depends on accurate preoperative diagnosis and malignancy assessment. The invasive nature of preoperative submucosal biopsy may cause mucosal damage or adhesions, which increases the risk of perforation, bleeding, or tumor spread. Hence, it is not recommended for definitive diagnosis. Therefore, preoperative evaluation is needed for submucosal lesions.²⁵ Endoscopic resection is contraindicated if NETs invade the muscularis propria. Magnetic resonance imaging (MRI) is necessary for NETs ≥2 cm, particularly when muscularis propria invasion or metastasis is suspected.²⁶ There are no clear guidelines or randomized controlled trials that have formalized the therapeutic interventions for colorectal NETs. EMR is effective for low-risk NETs <0.5 cm. For NETs measuring 0.5–2 cm, ESD offers the advantage of wider and deeper resection margins for adequate removal. For NETs ≥2 cm or those with intermediate or high risk, surgical resection is generally performed. Formal partial colectomy is usually performed for colonic NETs, whereas right hemicolectomy is indicated for NETs near the ileocecal valve. Local transanal resection is recommended for lower rectal NETs (<5 cm from the anal verge), while transanal endoscopic microsurgery is used for middle and upper rectal lesions.^27–29 Pathological evaluation is recommended for both surgical and endoscopic resections to assess vascular invasion, muscularis propria invasion, and margin status.^26,30,31 Because NETs misdiagnosed as polyps may delay EUS evaluation and increase the risk of complications or metastasis, recall is a key performance metric. The proposed FSL model achieved higher recall than the conventional DL model and both senior and junior endoscopists, with only a small reduction in specificity, highlighting its potential clinical applicability.

Prior reviews have indicated that the application of DL to gastroenteropancreatic NETs remains limited.^32,33 For instance, Udristoiu et al.³⁴ used a convolutional neural network and long short-term memory (CNN-LSTM) model to distinguish pancreatic NETs from ductal adenocarcinoma and chronic pseudotumoral pancreatitis, achieving an AUC >90%. For colorectal NETs, most studies have relied on structured clinical or biomarker data. Zhao et al.³⁵ developed a Surveillance, Epidemiology, and End Results (SEER)–based nomogram to predict 1-, 3-, and 5-year overall survival in patients with NETs. To the best of our knowledge, no prior study has applied DL to classify or detect colonoscopic images of colorectal NETs, likely due to the lack of large, curated image datasets for this rare disease. Furthermore, most studies applying DL to NETs have focused on CT or MRI using small, pancreas-predominant cohorts, leaving endoscopic imaging of colorectal NETs largely underexplored.³³ Our contribution is therefore application-oriented: we proposed an FSL framework based on transfer learning and metric learning to differentiate colorectal NETs and polyps on endoscopic images under conditions of real-world data scarcity, rather than introducing a new algorithm.

Currently, FSL strategies are mainly categorized into three types: transfer learning,³⁶ metric learning,¹⁸ and meta-learning.³⁷ Transfer learning utilizes models previously trained on a sourced domain and adapts them to a target domain to improve performance. Metric learning evaluates similarities between samples by calculating distances or optimizing a metric space, allowing each unlabeled image to be classified according to intrinsic differences or similarities. Meta-learning, which consists of meta-training and meta-testing phases, is a task-level optimization approach that transfers knowledge from base-learning tasks to target tasks through a two-loop optimization process.

Based on these strategies, FSL is well-suited for model training with limited available data, particularly for the identification, segmentation, and classification of rare diseases across medical datasets. Another study³⁸ proposed an FSL technique to classify the severity of coronavirus disease 2019 (COVID-19), achieving an accuracy of 0.954. Lin et al.³⁹ developed a two-way three-shot framework to differentiate Crohn’s disease from intestinal tuberculosis, which outperformed both junior and senior endoscopists. Yin et al.⁴⁰ applied a metric learning–based FSL model to classify endoscopic images of gastric signet ring cell carcinoma, gastric adenocarcinoma, and gastric ulcers, achieving a classification accuracy of 0.794. In the field of polyp classification, FSL has also gained increasing attention. Krenzer et al.⁴¹ proposed a deep metric learning–based FSL pipeline for polyp Narrow-band Imaging International Colorectal Endoscopic classification (NICE) I versus II classification using narrow-band imaging (NBI), reporting an accuracy of 81% after triplet-loss pretraining and fine-tuning, which demonstrated the feasibility of the FSL paradigm for polyp classification in data-scarce environments. For polyp segmentation, Wang et al.⁴² addressed few-shot domain adaptation; on a cross-center external test set (CVC-300), it achieved a Dice coefficient of 93.6% with 20-shot supervision, outperforming traditional segmentation models. Although differences in label space, modality, and protocol preclude a direct numerical comparison, these studies collectively demonstrate the potential and feasibility of FSL for endoscopic image analysis under real-world data scarcity, laying the foundation for our research.

The proposed FSL classifier can serve as a second-reader aid during colonoscopy by flagging frames that exhibit NET-like mucosal patterns and providing explanatory heatmaps to assist endoscopists in their review. Beyond lesion identification, these outputs can support clinical decision-making by triaging cases for targeted EUS assessment and by preselecting lesions suitable for EMR or other polypectomy procedures, subject to clinician judgment.

This study has several advantages. First, it is the first study to employ an FSL model to distinguish between colorectal NETs, SLPs, and adenomas and utilize unstructured data from colorectal NETs for DL model development. Second, we conducted a rigorous multicenter epidemiological analysis based on endoscopic images and pathological diagnoses. Third, we used a dual-pretrained method to enhance generalization and verify cross-domain transferability from ImageNet to endoscopic images. Finally, we used metric learning to reduce training costs and minimize the risk of overfitting.

However, this study has several limitations. First, colorectal NETs are rare in routine endoscopy, which limited our case accrual and statistical power. In addition, pathological confirmation relied on endoscopically obtained specimens (biopsy/polypectomy), leading to the exclusion of cases diagnosed only after surgical resection. Second, confidence intervals for class-wise and composite metrics were not reported due to design and data constraints. Third, because the final classifier is a nondifferentiable KNN operating on ResNet50V2 embeddings, the Grad-CAM visualizations were computed on the backbone only, reflecting feature saliency rather than the exact decision basis of the KNN head. Finally, this was a retrospective, in silico analysis of still endoscopic images; thus, the findings are hypothesis-generating and require prospective, multicenter, video-based validation for clinical application. In the future, we aim to enhance endoscopy image quality, perform analyses of additional subtypes, incorporate binomial and bootstrap confidence intervals for improved uncertainty quantification, and apply the FSL framework for real-time, video-based recognition during endoscopy. To strengthen generalizability, we plan to conduct a multicenter, multivendor prospective study with external validation.

Conclusions

In this study, an FSL model integrating transfer learning and metric learning was developed to classify colorectal NETs, SLPs, and adenomas on white-light colonoscopy under real-world data scarcity. The proposed model outperformed a conventional DL model and achieved performance comparable to that of endoscopists. It may serve as a second-reader aid during colonoscopy by highlighting NET-like mucosal patterns and supporting downstream clinical decision-making. Overall, these findings demonstrate the feasibility of applying FSL to computer-aided diagnosis of rare or low-incidence diseases; however, clinical effectiveness remains to be established. Future work will focus on assessing the model’s generalizability and effectiveness for real-time, video-based recognition in clinical practice.

Supplemental Material

sj-pdf-1-imr-10.1177_03000605251395564 - Supplemental material for Few-shot learning for the classification of colorectal neuroendocrine tumors and polyps on endoscopic images

Supplemental material, sj-pdf-1-imr-10.1177_03000605251395564 for Few-shot learning for the classification of colorectal neuroendocrine tumors and polyps on endoscopic images by Shiqi Zhu, Hailong Ge, Yu Wang, Jiaxi Lin, Rufa Zhang, Minyue Yin, Jinzhou Zhu and Chen Chao in Journal of International Medical Research

Supplemental Material

sj-pdf-2-imr-10.1177_03000605251395564 - Supplemental material for Few-shot learning for the classification of colorectal neuroendocrine tumors and polyps on endoscopic images

Supplemental material, sj-pdf-2-imr-10.1177_03000605251395564 for Few-shot learning for the classification of colorectal neuroendocrine tumors and polyps on endoscopic images by Shiqi Zhu, Hailong Ge, Yu Wang, Jiaxi Lin, Rufa Zhang, Minyue Yin, Jinzhou Zhu and Chen Chao in Journal of International Medical Research

Footnotes

Acknowledgments

The authors are grateful to Elsevier Language Editing Services for their support.

Author contributions

Shiqi Zhu: Writing–original draft, Formal analysis, Data curation. Jiaxi Lin: Writing–original draft, Formal analysis. Rufa Zhang: Data curation. Minyue Yin: Formal analysis. Hailong Ge: Data curation. Yu Wang: Writing–review and editing. Jinzhou Zhu and Chen Chao: Funding acquisition, Conceptualization.

Data availability statement

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request. The training code is publicly available at .

Declaration of conflicting interests

The authors declare that they have no competing interests.

EQUATOR network guidelines

This study was performed in accordance with the Checklist for Artificial Intelligence in Medical Imaging (CLAIM).

Funding

This study was supported by Science and Technology Projects of Jintan District Health Bureau (JTYXH-2025-1-02), Medical Education Collaborative Innovation Fund of Jiangsu University (No. JDYY2023042), Frontier Technologies of Science and Technology Projects of Changzhou Municipal Health Commission (No. QY202309), Suzhou Clinical Center of Digestive Diseases (Szlcyxzx202101), and Youth Program of Suzhou Health Committee (KJXW2019001).

Supplemental material

Supplemental material for this article is available online.

ORCID iDs

Yu Wang

Jinzhou Zhu

References

Cives

Strosberg

JR.

Gastroenteropancreatic neuroendocrine tumors. CA Cancer J Clin 2018; 68: 471–487.

Ahmed

Gastrointestinal neuroendocrine tumors in 2020. World J Gastrointest Oncol 2020; 12: 791–807.

Nagtegaal

Odze

Klimstra

; WHO Classification of Tumours Editorial Boardet al. The 2019 WHO classification of tumours of the digestive system. Histopathology 2020; 76: 182–188.

Jung

Yun

Chang

, et al. Risk factors associated with rectal neuroendocrine tumors: a cross-sectional study. Cancer Epidemiol Biomarkers Prev 2014; 23: 1406–1413.

Scherübl

Rectal carcinoids are on the rise: early detection by screening endoscopy. Endoscopy 2009; 41: 162–165.

Srirajaskanthan

Clement

Brown

, et al. Optimising outcomes and surveillance strategies of rectal neuroendocrine neoplasms. Cancers (Basel) 2023; 15: 2766.

Fine

Roquin

Terrebonne

, et al. Endoscopic management of 345 small rectal neuroendocrine tumours: a national study from the French group of endocrine tumours (GTE). United European Gastroenterol J 2019; 7: 1102–1112.

Ndikumana

Tran

Kim

, et al. Deep learning based caching for self-driving cars in multi-access edge computing. IEEE Transactions on Intelligent Transportation Systems 2021; 22: 2862–2877.

Liu

Yang

, et al. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Transactions on Neural Networks and Learning Systems 2022; 33: 6999–7019.

10.

Garcia-Garcia

Orts

Oprea

, et al. A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 2018; 70: 41–65.

11.

Gong

Bang

Lee

, et al. Deep learning-based clinical decision support system for gastric neoplasms in real-time endoscopy: development and validation study. Endoscopy 2023; 55: 701–708.

12.

Sharma

Kumar

, and Garg

Deep learning-based prediction model for diagnosing gastrointestinal diseases using endoscopy images. Int J Med Inform 2023; 177: 105142.

13.

Weng

Ran

Peng

, et al. Comparison of characteristics between true rectal neuroendocrine tumors and rectal hyperplastic polyps among patients with endoscope-diagnosed rectal neuroendocrine tumors. J Gastrointest Oncol 2022; 13: 1121–1131.

14.

Chao

Zhang

Few-shot imbalanced classification based on data augmentation. Multimed Syst 2023; 29: 2843–2851.

15.

Zhang

Ren

, et al. Identity mappings in deep residual networks. In: Computer Vision- ECCV 2016 (eds Leibe B, Matas J, Sebe N, et al.), Amsterdam, The Netherlands, 8–16 October 2016, paper no. 38, vol 9908, pp. 630–645. Cham: Springer.

16.

Selvaraju

Cogswell

Das

, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision 2020; 128: 336–359.

17.

Wang

Sun

Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Disc 2015; 29: 534–564.

18.

Snell

Swersky

Zemel

Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems 2017.

19.

Uddin

Khan

Hossain

, et al. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak 2019; 19: 281.

20.

Wang

Chen

, et al. Automatic premature ventricular contraction detection using deep metric learning and KNN. Biosensors (Basel) 2021; 11: 69.

21.

Sokolova

Lapalme

A systematic analysis of performance measures for classification tasks. Inf Process Manag 2009; 45: 427–437.

22.

Raad

Tripathi

Cooper

, et al. Role of the cold biopsy technique in diminutive and small colonic polyp removal: a systematic review and meta-analysis. Gastrointest Endosc 2016; 83: 508–515.

23.

Murakami

Kurosawa

Fukushima

, et al. Sessile serrated lesions: clinicopathological characteristics, endoscopic diagnosis, and management. Dig Endosc 2022; 34: 1096–1109.

24.

Gupta

East

JE.

Optimal endoscopic treatment and surveillance of serrated polyps. Gut Liver 2020; 14: 423–429.

25.

Ishii

Horiki

Itoh

, et al. Endoscopic submucosal dissection and preoperative assessment with endoscopic ultrasonography for the treatment of rectal carcinoid tumors. Surg Endosc 2010; 24: 1413–1419.

26.

Ramage

De Herder

Delle Fave

; Vienna Consensus Conference participantset al. ENETS consensus guidelines update for colorectal neuroendocrine neoplasms. Neuroendocrinology 2016; 103: 139–143.

27.

Pavel

Öberg

Falconi

; ESMO Guidelines Committee. Electronic address: clinicalguidelines@esmo.orget al. Gastroenteropancreatic neuroendocrine neoplasms: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 2020; 31: 844–860.

28.

Kinoshita

Kanehira

Omura

, et al. Transanal endoscopic microsurgery in the treatment of rectal carcinoid tumor. Surg Endosc 2007; 21: 970–974.

29.

Ishikawa

Arita

Shimoda

, et al. Usefulness of transanal endoscopic surgery for carcinoid tumor in the upper and middle rectum. Surg Endosc 2005; 19: 1151–1154.

30.

Ito

Masui

Komoto

, et al. JNETS clinical practice guidelines for gastroenteropancreatic neuroendocrine neoplasms: diagnosis, treatment, and follow-up: a synopsis. J Gastroenterol 2021; 56: 1033–1044.

31.

Basuroy

Haji

Ramage

, et al. Review article: the investigation and management of rectal neuroendocrine tumours. Aliment Pharmacol Ther 2016; 44: 332–345.

32.

Pantelis

Panagopoulou

Lapatsanis

DP.

Artificial intelligence and machine learning in the diagnosis and management of gastroenteropancreatic neuroendocrine neoplasms-a scoping review. Diagnostics (Basel) 2022; 12: 874.

33.

Lopez-Ramirez

Yasrab

Tixier

, et al. The role of AI in the evaluation of neuroendocrine tumors: current state of the art. Semin Nucl Med 2025; 55: 345–357.

34.

Udriștoiu

Cazacu

Gruionu

, et al. Real-time computer-aided diagnosis of focal pancreatic masses from endoscopic ultrasound imaging based on a hybrid convolutional and long short-term memory neural network model. PLoS One 2021; 16: e0251701.

35.

Zhao

Huang

Wang

, et al. Epidemiological trends and novel prognostic evaluation approaches of patients with stage II-IV colorectal neuroendocrine neoplasms: a population-based study with external validation. Front Endocrinol (Lausanne) 2023; 14: 1061187.

36.

Köhler

Eisenbach

Gross

HM.

Few-shot object detection: a comprehensive survey. IEEE Trans Neural Netw Learn Syst 2024; 35: 11958–11978.

37.

Finn

Abbeel

Levine

Model-agnostic meta-learning for fast adaptation of deep networks. PMLR 2017; 70: 1126–1135.

38.

Prognosticating various acute covid lung disorders from COVID-19 patient using chest CT Images. Eng Appl Artif Intell 2023; 119: 105820.

39.

Lin

Zhu

Yin

, et al. Few-shot learning for the classification of intestinal tuberculosis and Crohn’s disease on endoscopic images: a novel learn-to-learn framework. Heliyon 2024; 10: e26559.

40.

Yin

Zhang

Lin

, et al. Identification of gastric signet ring cell carcinoma based on endoscopic images using few-shot learning. Dig Liver Dis 2023; 55: 1725–1734.

41.

Krenzer

Heil

Fitting

, et al. Automated classification of polyps using deep learning architectures and few-shot learning. BMC Med Imaging 2023; 23: 59.

42.

Wang

Zheng

PFMNet: prototype-based feature mapping network for few-shot domain adaptation in medical image segmentation. Comput Med Imaging Graph 2024; 116: 102406.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.20 MB

0.64 MB