Abstract
Keywords
Introduction
The incidence of thyroid nodules is increasing annually, and the pathological types of such nodules are complex. Common malignant thyroid nodules include papillary thyroid carcinoma (PTC), medullary thyroid carcinoma (MTC), follicular thyroid carcinoma (FTC), and benign nodules including nodular goiters and adenomas. For benign thyroid nodules, only timely follow-up is required, while malignant nodules require timely surgery. 1 Among these, PTC is the most common malignant thyroid tumor, so accurately distinguishing PTC from other types of thyroid nodules is of great significance. 2 However, PTC has varying degrees of pathological similarities with MTC, FTC, adenomatous goiters, and adenomas and is easily misdiagnosed, 3 which in turn affects treatment. Therefore, improving the differential diagnosis of PTC from the other types of thyroid nodules is a clinically important research topic.
Differential diagnosis using pathological images of thyroid nodules has many challenges: (1) pathologists have different levels of professional knowledge; (2) different diagnostic results can be obtained even from the same pathological image; (3) skilled pathologists need long-term training, which conflicts with the rapidly increasing workload; and (4) overwork can result in fatigue, making pathologists more prone to misdiagnoses. The rapid development of artificial intelligence (AI) technology has overcome these problems to a certain extent. In particular, deep neural network (DNN) models have recently been shown to be efficient for pathological diagnoses4,5 and can improve misdiagnoses caused by the lack of knowledge and fatigue of pathologists. Some studies have confirmed that the DNN models represented by Resnet50 can effectively identify the pathology of different thyroid nodules and play an increasingly prominent role in healthcare.
In recent years, the use of pathological or imaging data to train Resnet50 for diagnosis has become common, but the use of text information to train random forest (RF) models is rare. Therefore, this study used imaging data and text information to separately train residual network (Resnet)50 and RF models, and then integrated the two models to obtain a DS ensemble model. The three models were compared to explore the differential diagnostic efficiency of ResNet50, RF, and DS ensemble models for PTC and other pathological types of thyroid nodules.
Materials and methods
Patients and data
The data included in this study were obtained from patients who underwent surgical treatment or aspiration biopsy due to thyroid nodules at Shanghai Tenth People's Hospital between July 2014 and August 2021. All patients signed written informed consent. This study is a retrospective study and received exemption from the institution’s review board. The reporting of this study conforms with the STARD 2015 guidelines. 6 The inclusion criteria were as follows: (1) patients who underwent initial surgical treatment or aspiration biopsy due to thyroid nodules; and (2) patients with clear pathology of thyroid nodules. The exclusion criteria were as follows: (1) patients without postoperative thyroid nodule pathology or unclear pathology; (2) patients with 131I treatment; (3) patients with antitumor therapy. The pathological images and auxiliary examination results of the patients were collected to construct the pathological image and the text datasets. We have de-identified all patient details.
All pathological sections were stained with hematoxylin and eosin (HE) and observed under a DM4000B LED microscope with intelligent automation (Leica, Wetzlar, Germany). Two senior pathologists selected the regions of interest (ROIs) and performed pathological diagnoses. The critical regions of the images where the pathology could be identified were considered the ROIs. Direct manual acquisition under a microscope was used, i.e., a Leica DFC495 microscope camera was used to directly capture pathological images. Pathological images that were controversial according to the pathologists were excluded, and all remaining pathological images were classified. The resolution ratio of images was 3264 × 2448 pixels, and the pixel distance of images was 264 nm/pixel (Figure 1).

Pathological images of thyroid nodules: (a) Papillary thyroid carcinoma (PTC); (b) medullary thyroid carcinoma (MTC); (c) nodular goiter; (d) adenoma; (e) follicular thyroid carcinoma (FTC).
PTC patients were divided into the PTC group, and patients with MTC, FTC, nodular goiters, and adenomas were grouped together in the “other” type of nodule group. The auxiliary examination results of patients included: (1) laboratory test results, such as thyroglobulin (Tg) content, thyroglobulin antibody (TgAb) content, and thyroid peroxidase antibody (TPOAb) content; and (2) ultrasound examination results, such as length of the left lymph node (mm), length of the right lymph node (mm), size of the thyroid nodule (mm), and the thyroid imaging reporting and data system (TI-RADS) classification. The reference index of laboratory tests in the hospital where the patient was treated were as follows: Tg, 3.5 to 77 ng/mL; TgAb, <100 IU/mL; and TPOAb, <40 IU/mL. Indicators were represented by 0 when within the normal range and by 1 when outside the normal range.
Data enhancement
To improve the diagnostic efficacy of the models, we performed data enhancement on the pathological image dataset. In the dataset, random flipping (horizontal flips with 50% probability), random rotation (−10° to 10°), random scaling (100%–110%), and random brightness enhancement (0%–20%) were performed to increase the amount of training data. For each image, only one of the four transformations was randomly applied.
Network architecture
ResNet50 model
ResNet50 models have achieved breakthroughs in image classification. As the number of convolutional layers increases, the learning depth increases, and the effect of the model also increases. The ResNet network is modified from the visual geometry group (VGG)19 network and is constructed by adding residual blocks through the short-circuit mechanism. The main function of the residual block is to establish a short-circuit loop between the input terminal and the output terminal; therefore, when training the network, it is necessary to learn only the residuals in the previous step, rather than those in the entire process, which not only saves time from the input terminal to the output terminal but also reduces the learning difficulty of the neural network.
RF model
RF models are highly flexible machine learning algorithms that were first proposed by Leo Breiman in 2001. This method uses bootstrap sampling with replacement to repeatedly and randomly select

Random forest (RF) structure.
The specific steps for an RF model are as follows: (1) bootstrap sampling with replacement is used to randomly select
where
DS ensemble model
Supposing that
where
Model training and testing
The RF, ResNet50, and DS ensemble models were all trained using 5-fold cross validation. For each fold of training, 60% of the data was used for training, 20% for validation, and 20% for testing. During the training process, the model with the best results in the validation set was taken and tested on the test set. The ResNet50 model was trained using the pathological imaging dataset and used to diagnose pathological images in the test set. The RF model was trained using the text dataset and used to analyze the auxiliary examination results in the test set. The results of the two models were then integrated to obtain the DS ensemble model, which was then used to diagnose PTC and other types of nodules.
Statistical analysis
SPSS 20.0 software (IBM Corp., Armonk, NY, USA) was used for all statistical analyses. Diagnoses of PTC and other types of nodules by the ResNet50, RF, and DS ensemble models were statistically analyzed. The receiver operating characteristic (ROC) curve was plotted, and the area under the ROC curve (AUC) was calculated. The diagnostic performances of different DNN models were analyzed using ROC values, with diagnostic performance represented by the AUC.
Results
This study enrolled 559 patients, including 381 with PTC, 38 with MTC, 41 with FTC, 40 with nodular goiters, and 59 with adenomas. A total of 610 pathological images were collected, including 426 of PTC, 40 of MTC, 41 of FTC, 44 of nodular goiters, and 59 of adenomas.
The ResNet50 model correctly diagnosed 546 images and misdiagnosed 64, i.e., 35 cases of PTC were misdiagnosed as other types of nodules, and 16 cases of FTC, one case of MTC, one case of nodular goiter, and 11 cases of adenoma were misdiagnosed as PTC. The diagnostic accuracy was 89.51%, the sensitivity was 84.24%, the misdiagnosis rate was 10.49%, and the specificity was 91.78%. The diagnostic results are detailed in Table 1. ROC curve analysis showed an AUC of 0.955 (Figure 3).
Diagnosis results of the ResNet50 model.

Receiver operating characteristic (ROC) curves of different deep neural network (DNN) models for the diagnosis of thyroid nodules.
Using auxiliary examination results, the RF model correctly diagnosed 529 cases and misdiagnosed 81, i.e., 16 cases of PTC were misdiagnosed as other types of nodules, and 18 case of FTC, 26 cases of MTC, one case of nodular goiter, and 26 cases of adenoma were misdiagnosed as PTC. The diagnostic accuracy was 86.72%, the sensitivity was 64.67%, the misdiagnosis rate was 13.28%, and the specificity was 96.24%. These diagnostic results are detailed in Table 2. ROC curve analysis showed an AUC of 0.904 (Figure 3).
Diagnosis results of the RF model.
The DS ensemble model correctly diagnosed 572 patients and misdiagnosed 38 cases, i.e., 12 cases of PTC were misdiagnosed as other types of nodules, and 13 case of FTC, four cases of MTC, and nine cases of adenoma were misdiagnosed as PTC. The diagnostic accuracy was 93.77%, the sensitivity was 85.87%, the misdiagnosis rate was 6.23%, and the specificity was 97.18%. The diagnostic results are detailed in Table 3. ROC curve analysis showed an AUC of 0.979 (Figure 3).
Diagnosis results of the DS ensemble model.
Comparison of the diagnosis results of DNN models.
Discussion
With the progress and development of science and technology, AI is becoming increasingly skilled, especially in the medical field, where great achievements have been made. In a retrospective study, Song et al. 8 used a DNN model to predict benign and malignant thyroid nodules using ultrasound images. Using pathological results as the gold standard, the sensitivity of the DNN model was 95.2% and the specificity was 61.8%. Wang et al. 9 studied 11,715 pathological images of 806 patients with thyroid nodules, and the accuracies of identifying normal tissues, anaplastic thyroid carcinoma (ATC), FTC, MTC, PTC, nodular goiter, and adenoma after convolutional neural network (CNN) learning were 88.33%, 98.57%, 98.89%, 100%, 97.77%, 100%, and 92.44%, respectively. Recent studies have suggested that using imaging or pathological data to train DNN models can effectively identify the pathology of thyroid nodules, but there are few studies using text information to train RF models. This study used both imaging data and text information to train models, and then formed an ensemble model from the results of multiple models to explore the differential diagnostic efficacy of the ResNet50, RF, and DS ensemble models for PTC. However, the model used in this study is a relatively classic model, and the ensemble model can be achieved by calculating the mean value. More new models and other ensemble models can be used in future research, and it is expected that the application of new models may improve diagnostic efficiency.
The performance of DNN models largely depends on the quantity and quality of the dataset. A complete pathological section includes tumor tissue, normal thyroid tissue, follicular cells, blood vessels, and muscles. Moreover, different preparation methods and imaging equipment may result in different tissue features in images. 10 After the pathologist excluded pathological images with unclear diagnoses, the pathological images used by the DNN models in this study could be clearly diagnosed and had typical pathological manifestations. The DNN model was excellent at diagnosing these images, but the limitation of this method is that it may affect the model's diagnosis of pathological images that are atypical. In future studies, the sample size should be increased to include pathological images that are atypical. Owing to the limited number of patients, data enhancement was used to provide more data and expand the dataset
The ResNet50 model was trained using pathological images and misdiagnosed 64 cases. An analysis of pathological images and a literature review showed the following results: 1) a nodular goiter may be confused with PTC because of the nodular changes and sometimes papillary structures that can be observed microscopically; 11 2) adenomas often exhibit follicular enlargement and fusion, forming a cystic structure, while some PTC cases may also form a cystic structure, resulting in misdiagnosis between the two;12,13 3) both FTC and PTC are differentiated thyroid carcinomas (DTCs) from follicular epithelial cells and have similar pathological manifestations; additionally, there is a special type of PTC called follicular papillary thyroid carcinoma (FPTC) that exhibits manifestations similar to those of FTC, and FTC also has papillary structures, so DNN models easily misdiagnose the two; 14 and 4) some tumor cells in MTC may be arranged in papillary or follicular shapes, causing misdiagnoses of PTC by DNN models. 15
The RF model was trained by the text dataset and misdiagnosed 81 auxiliary examination results. An analysis of the misdiagnosis results showed that the TgAb and Tg contents were the cause. Tg is a glycoprotein that is mainly secreted by the thyroid follicular epithelium, and its expression in the healthy human body is low.16,17 However, patients with thyroid cancer develop inflammatory responses due to thyroid tissue damage, which may induce the activation of thyroid epithelial cells, 18 thereby causing them to release more Tg. 19 Therefore, Tg is highly expressed in patients with thyroid cancer. 20 TgAb is a Tg antibody, and an increase in Tg level can cause an increase in TgAb levels. 21 In this study, most PTC patients had TgAb > 10 IU/mL and Tg > 3.5 ng/mL, while the 16 PTC patients who were misdiagnosed as having other types of nodules had Tg < 3.5 ng/mL; therefore, the analysis of data from PTC patients with low Tg levels is prone to misdiagnoses. The 65 patients with other types of nodules who were misdiagnosed as PTC had TgAb > 10 IU/mL and Tg > 3.5 ng/mL. Because the sample size of the group with other types of nodules was smaller than that of the PTC group and the data of the PTC group accounted for a larger proportion, the DNN models may misdiagnose patients with TgAb > 10 IU/mL and Tg > 3.5 ng/mL with other types of nodules as PTC patients.
The DS ensemble model incorporated both the fusion of features and ensemble learning techniques. 22 Feature extraction on the basis of correlation analysis ensured the rationality of model data, 23 and ensemble learning on the basis of different training sets improved the accuracy of the model; 24 therefore, the DS ensemble model had high sensitivity, specificity, accuracy, and AUC. This study demonstrated the effect of using imaging data and text information to train the Resnet50, RF, and DS models and compared the Resnet50 and RF models with ensemble learning. In future studies, the sample size should be increased to improve the diagnostic efficiency of the model.
Footnotes
Ethics statement
The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All patients were required to sign informed consent forms before the related procedures.
Acknowledgements
The authors wish to thank the anonymous referees and editors of this special issue for their constructive comments.
Author contributions
(I) Conception and design: Chengwen Deng, Dongyan Han, Ming Feng, Dan Li; (II) administrative support: Zhongwei Lv, Dan Li; (III) provision of study materials or patients: Chengwen Deng, Dongyan Han, Ming Feng; (IV) collection and assembly of data: Chengwen Deng, Dongyan Han, Ming Feng; (V) data analysis and interpretation: Chengwen Deng, Dongyan Han, Ming Feng; (VI) manuscript writing: All authors; (VII) final approval of the manuscript: all authors.
Declaration of conflicting interest
The authors have no conflicts of interest to declare.
Funding
None.
