Abstract
Keywords
Introduction
Esophageal cancer is one of the most common malignant tumors, ranking seventh in global incidence and sixth in mortality in 2020, with over 544 000 deaths reported. 1 China has a particularly high incidence of esophageal cancer, accounting for more than half of all cases worldwide, 2 with 90% of these patients suffering from esophageal squamous cell carcinoma (ESCC). 3 Owing to the non-specific early symptoms of esophageal cancer, 80%-90% of patients are diagnosed at an advanced stage and require comprehensive treatment plans. 4 Based on the CROSS and NEOCRTEC5010 studies, neoadjuvant chemoradiotherapy (nCRT) combined with radical resection has become the standard treatment for patients with locally advanced resectable esophageal cancer.5-7 The CROSS study is the longest-running multicenter randomized controlled trial of nCRT for esophageal cancer. Its results showed that, for patients with locally advanced esophageal cancer, the median overall survival of the patient group who received nCRT combined with surgery was 48.6 months. This was significantly higher than the 24 months in the group who received surgery alone, with a significant benefit in the ESCC population.5,6 The NEOCRTEC5010 study, a prospective randomized controlled clinical trial conducted at eight large esophageal cancer centers in China, further confirmed the findings of the CROSS study. The results showed that nCRT combined with surgery showed significant advantages in median overall survival and disease-free survival compared with surgery alone, making nCRT the preferred treatment for patients with locally advanced ESCC. 7
Some patients with ESCC who undergo surgery after nCRT still experience local recurrence and distant metastasis, at rates of up to 33.7%–48%.8,9 The pathological complete response (pCR) rate of nCRT is only 43.2%–49%,8,9 and the heterogeneity of treatment responses among patients reflects the complexity of esophageal cancer. This indicates that evidence-based treatment plans may not be well suited for all patients with ESCC, posing a considerable challenge for clinicians. Therefore, identifying patients with ESCC who are sensitive to nCRT would enable clinicians to provide timely and more specific interventions for patients at high risk of tumor control failure and avoid unnecessary treatment for low-risk patients.
Radiomics is an emerging quantitative image analysis method that uses digitalized, quantitative, and high-throughput analysis of imaging modalities—such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography-CT (PET/CT). Researchers can delineate regions of interest (ROIs) and convert them into imaging features such as intensity, texture, shape, and wavelet transformation using machine learning technology. These feature data are then subjected to dimensionality reduction and normalization to select meaningful radiomics features. Finally, radiomic labels are established through linear or nonlinear machine learning methods, enabling comprehensive quantitative descriptions of the tumors. Radiomics, a technique that converts high-dimensional medical imaging data into quantitative features, has been applied to tumor molecular classification, differential diagnosis, efficacy detection, and prognostic evaluation, providing important assistance in precision oncology clinical decision-making.10-13 Radiomics can serve as a potential tumor biomarker, providing a more comprehensive description of potential tumor phenotypes by extracting imaging features. The main imaging methods for evaluating esophageal cancer include CT, MRI, and PET/CT. CT is currently the most commonly used method, playing an important role in the diagnosis, staging, treatment guidance, and follow-up of esophageal cancer. MRI is of great value in the diagnosis of mediastinal lymph node metastasis of esophageal cancer. However, owing to interference from heart movement and gastric motility artifacts, as well as the long acquisition time that requires high patient compliance, MRI is not currently recommended for routine imaging of esophageal cancer. PET/CT is also not routinely recommended, because of its high cost, radiation exposure, and inability to detect subtle metastases. Therefore, this study aims to conduct further research based on CT radiomics. To date, radiomics analysis based on CT, MRI, and PET/CT has shown good predictive value in terms of the efficacy, treatment response, prognosis, and lymph node metastasis of esophageal cancer.14-18 However, few studies have used CT radiomics combined with clinical parameters to predict sensitivity to nCRT in patients with ESCC.
The nCRT approach serves as a crucial therapeutic modality for ESCC. Predicting its efficacy is therefore vital for constructing personalized treatment plans. However, there is still a lack of an accurate and reliable predictive method to assess the sensitivity of patients to nCRT in clinical practice. Studies based on CT radiomics combined with clinical features can improve prediction accuracy. By extracting quantitative features and combining them with machine learning algorithms, CT radiomics can construct more precise predictive models. These features can reflect the heterogeneity within tumors and thus predict their responses to nCRT. Compared with relying solely on clinical features, combining CT radiomics features can significantly improve the accuracy and reliability of prediction. By predicting a patient’s sensitivity to nCRT, physicians can develop more personalized treatment plans for patients, and for those who are predicted to be sensitive, the current treatment regimen can be continued or intensified. Conversely, for those predicted to be insensitive to nCRT, treatment strategies can be adjusted to reduce unnecessary examinations and treatment costs while mitigating the risk of treatment delays and adverse effects.
Therefore, this retrospective study intended to establish a comprehensive model by extracting and analyzing the preoperative CT radiological features of patients with ESCC who received nCRT, combined with clinical parameters, to identify the optimal predictive model for patients with ESCC undergoing nCRT. The goal was to predict sensitivity to nCRT in cases of ESCC before treatment, with the aim of providing risk stratification and decision-making recommendations for clinical treatment plans, as well as offering valuable information for constructing personalized treatment plans.
Methods
Research Participants
Inclusion and Exclusion Criteria
This retrospective study selected 102 consecutive patients with pathologically confirmed ESCC from Shandong Cancer Hospital between 2016 and 2021. All of the patients received standard nCRT before surgery. The study was approved by the Ethics Committee of Shandong Cancer Hospital and Institute, affiliated with Shandong First Medical University, which waived the requirement for informed consent due to the retrospective nature if the study. The workflow of the study is shown in Figure 1. Workflow of the study.
The main inclusion criteria were as follows: ESCC confirmed by pathological tissue examination through endoscopy; patients evaluated by both the Radiation Oncology and Surgical Oncology Departments who were identified as high risk for surgery and were thus recommended to receive nCRT first; patients assessed to be feasible for surgical treatment after nCRT; patients with no prior radiation, chemotherapy, or other anti-tumor treatments before CT examination; those with availability of clear CT images before treatment; and those for whom complete clinical information was accessible through the medical record system.
The main exclusion criteria were as follows: patients who received only radiotherapy and chemotherapy and did not undergo surgical treatment later on; those with incomplete clinical case information (eg, missing baseline clinical data); and those for whom the CT imaging quality did not meet our required standards.
Treatment Protocol
The medical records of all included patients, who underwent radical surgery after nCRT, were reviewed. The radiotherapy techniques used were intensity-modulated radiotherapy and three-dimensional (3D) conformal radiotherapy. The radiation fractionation scheme was 1.8-2 Gy per fraction, administered once daily, 5 days per week. The chemotherapy regimen consisted of paclitaxel or 5-fluorouracil combined with platinum-based drugs. Each patient received 4-6 cycles of chemotherapy, with each cycle lasting 21 days. The dosage and total number of chemotherapy cycles were adjusted based on each patient’s condition, while adhering to the recommendations of the Chinese Society of Clinical Oncology and the National Comprehensive Cancer Network guidelines. Surgical procedures included either three-incision radical esophagectomy (neck, chest, and abdomen) or two-incision radical esophagectomy (chest and abdomen).
Grouping
Based on their pathological results after esophageal cancer resection surgery, the patients were divided into two groups: the chemoradiotherapy-sensitive group, which included patients who achieved pCR; and the chemoradiotherapy-resistant group, which included patients who did not achieve pCR. pCR was defined as the absence of both residual invasive disease and positive lymph nodes in all layers of the esophagus in the surgical specimen after treatment (ypT0N0).
Clinical Model Construction
The patient data collected before treatment included the following parameters: sex, age, smoking history, alcohol history, height, body surface area, weight, Karnofsky performance status score, tumor location, clinical T stage, clinical N stage, clinical tumor, node, and metastasis (TNM) staging, radiation dose, neoadjuvant chemotherapy regimen, operative approach, white blood cell count, lymphocyte count, neutrophil count, platelet-to-lymphocyte ratio, platelet-to-monocyte ratio, lymphocyte-to-monocyte ratio, systemic immune-inflammation index, peripheral blood mononuclear cells, eosinophils, basophils, red blood cell count, hemoglobin level, platelet count, carcinoembryonic antigen (CEA) level, neutrophil-to-lymphocyte ratio (NLR), fibrinogen (FIB) level, cytokeratin 19 fragment level, prothrombin time, thrombin time, activated partial thromboplastin time, D-dimer level, serum aspartate aminotransferase to alanine aminotransferase ratio (S/L), total protein, albumin, globulin, albumin-to-globulin ratio, prealbumin, alkaline phosphatase, cystatin C, and lactate dehydrogenase level. Clinical T-stage and N-stage were assessed according to the eighth edition of the TNM staging system for esophageal and esophagogastric junction cancers by the Union for International Cancer Control and the American Joint Committee on Cancer. Univariate analysis was performed to assess the relationship between clinical features and sensitivity to nCRT in patients with ESCC. Clinical variables with P values <0.05 were used to develop a clinical model for predicting sensitivity to nCRT.
Radiomics Analysis
Image Segmentation
CT images taken within 1 month before nCRT were collected and uploaded to the RadCloud radiomics platform (https://radcloud.cn/). Two doctors with over 5 years of experience in chest imaging diagnosis independently used the 3D-Slicer software to delineate the ROIs on the CT images, outlining the primary esophageal tumor while avoiding vascular bundles, fat, calcifications, the esophageal lumen, and surrounding organs. For validation, 20 randomly selected patients were reassessed by two experienced radiologists for repeated segmentation. The inter-observer consistency coefficient was calculated based on the results of the two radiomics feature extractions, and an inter-observer consistency coefficient of >0.75 was considered to indicate good robustness and reproducibility. We performed a consistency check and found good agreement between the ROI delineations of the two radiologists.
Feature Extraction and Selection
Radiomics features were extracted from the ROIs using the RadCloud platform (https://radcloud.cn/). In total, 1688 radiomics features were extracted from the CT images of each patient. These features were classified into three categories. The first group (first-order statistics) consisted of 126 descriptors that quantitatively characterized the voxel intensity distribution within the CT images using common basic metrics. The second group (shape- and size-based features) included 14 3D features reflecting the shape and size of the region. The third group (textural features) comprised 525 features that quantified regional heterogeneity differences based on gray-level run length and gray-level co-occurrence matrix calculations.
After radiomics feature extraction, several feature selection methods were used for normalization and dimensionality reduction to ensure consistency in the magnitudes of different features and reduce redundant features. Variance thresholding (at a threshold of 0.8) was used to remove features with variance values <0.8. The SelectKBest method, a univariate feature selection approach, used P values to analyze the relationship between features and the classification outcome, and all features with P values <0.05 were selected. For the LASSO model, L1 regularization was used as the cost function, with a cross-validation error set to 5 and a maximum number of iterations of 1000.
Construction of Machine Learning Models
The 102 patients with ESCC were randomly divided into training and validation sets in a 7:3 ratio. Based on the radiomics features selected using LASSO regression, a radiomics score (Rad-Score) was calculated by linearly weighting the corresponding coefficient values. The formula used for the radiomics model was:
In this study, eight classifiers—including K-nearest neighbors (KNN), support vector machine (SVM), extreme gradient boost (XGBoost), linear discriminant analysis (LDA), logistic regression (LR), Bernoulli Naive Bayes (BernoulliNB), multi-layer perceptron classifier (MLPClassifier), and stochastic gradient descent (SGD)—were used to construct the radiomics models. To evaluate the predictive performance of the models, area under the curve (AUC) values were calculated for both the training and validation datasets. Four indicators were used to evaluate the performance of the classifiers in this study: P (precision = true positives/[true positives + false positives]), R (recall = true positives/[true positives + false negatives]), F1-score (F1-score = P × R × 2/[P + R]), and support (total number in the test set).
Construction of the Combined Model
The clinical variables with P values <0.05 selected through univariate analysis and Rad-Score were included in our model. A combined model was established based on binary logistic regression, and a nomogram was developed to present the predicted probabilities of the outcomes visually. The performance of the combined model was evaluated using receiver operating characteristic (ROC) and calibration curves.
Statistical Analysis
The radiomics statistical analysis in this study was conducted using the RadCloud platform. Statistical analysis of the clinical factors was performed using SPSS 27.0 (IBM Corp.). Continuous variables were tested for normality. Those that conformed to a normal distribution were compared using independent samples Student’s ttest, whereas those that did not were compared using the Mann-Whitney U test. Categorical variables were compared between groups using the chi-squared test. Statistical significance was set at p < 0.05. Nomograms, ROC curves, and calibration curves were generated using R version 3.6.0 for further analysis. This research writing follows the STROBE guidelines.
Results
General Information
Comparison of Patients’ Characteristics Between Training Set and Validation Set.
Abbreviations: KPS score, Karnofsky performance status score; WBC, white blood cell; LYM, lymphocyte; NE, neutrophil; PLR, platelet-lymphocyte ratio; PMR, platelet-to-monocyte ratio; LMR, lymphocyte-to-monocyte ratio; SII, systemic immune-inflammation index; PBMC, peripheral blood mononuclear cell; EOS, eosinophils; BASO, basophils; RBC, red blood cell; HGB, hemoglobin; PLT, platelet. CEA, carcinoembryonic antigen; NLR, neutrophil-to-lymphocyte ratio; FIB, fibrinogen; CYFRA21-1, cytokeratin-19 fragment; PT, prothrombin time; TT, thrombin time; APTT, activated partial thromboplastin time; S/L, serum aspartate aminotransferase to alanine aminotransferase ratio; TP, total protein; ALB, albumin; GLB, globulin; A/G, albumin/globulin ratio; PA, prealbumin; Cys C, cystatin C; LDH, lactate dehydrogenase.
Extraction of Radiomics Features and Establishment of Rad-Score
Based on the CT images, we first filtered 1688 radiomics features using the variance method and then applied the optimal selection method K for further filtering. In the end, nine optimal features were selected using the LASSO algorithm (Figure 2), and a radiomics label (Rad-Score) was constructed (Table 2). LASSO algorithm on feature selected. (A) LASSO path; (B) MSE path; (C) Coefficients in the LASSO model. Using the LASSO model, nine features that correspond to the optimal alpha value were selected. Descriptions of Selected Radiomics Features and Their Associated Feature Groups and Filters.
Evaluation Metrics for 8 Classifiers on Training Set and Validation Set.
Abbreviations: AUC, area under the curve; SVM, support vector machine; KNN, K-nearest neighbors; XGBoost, extreme gradient boost; LDA, linear discriminant analysis; LR, logistic regression; BernoulliNB, Bernoulli Naive Bayes; MLPClassifier, multi-layer perceptron classifier; SGD, stochastic gradient descent.
Clinical Characteristics
Univariate Analyses of Variables Linked to pCR in Patients With ESCC.
Values are presented as means. Abbreviations: KNN, K-nearest neighbors; SVM, support vector machine; XGBoost, extreme gradient boost; LDA, linear discriminant analysis; LR, logistic regression; BernoulliNB, Bernoulli naive Bayes; MLPClassifier, multi-layer perceptron classifier; SGD, stochastic gradient descent.
Construction and Validation of the Combined Model
Univariate analysis revealed a significant difference in Rad-Score between the two groups of patients (p < 0.05). Combined with clinical features (smoking history, alcohol history, NLR, S/L, CEA, and FIB) with P values of <0.05, a combined model was constructed based on binary logistic regression. A nomogram (Figure 3) was created, and the performance of the model was evaluated using ROC and calibration curves. The AUCs of the combined model for the validation and training sets were 0.821 and 0.870, respectively. The calibration curve showed that the nomogram’s predictions were relatively close to the actual clinical observations. The ROC and calibration curves of the combined model are shown in Figures 4 and 5. The visual nomogram in the combined model. ROC curve analysis for the nomogram. (A) Combined model in training set; (B) Combined model in validation set. Calibration curve analysis for the nomogram. (A) Combined model in training set; (B) Combined model in validation set.


Discussion
In this study, we developed and validated a combined model based on CT radiomics and clinical parameters, which effectively predicted the sensitivity of nCRT in patients with ESCC using machine learning methods. It may provide a potentially effective, non-invasive, practical, and reliable method for predicting the sensitivity of nCRT in ESCC patients, thereby offering clinicians with supporting information for clinical decision-making.
One of the hallmarks of malignant tumors is their heterogeneity, which is associated with their malignant biological behaviors.19,20 Radiomics can reveal microscopic tumor information that is undetectable in naked-eye analyses of two-dimensional (2D) medical images, quantifying the internal spatiotemporal heterogeneity of tumors.21,22 This information includes cell infiltration, necrosis of fine structures, and abnormal angiogenesis. These factors can also reflect tumor glucose metabolism and angiogenesis status to some degree.12,23-26 A large number of previous studies have demonstrated the application of radiomic features in predicting the treatment response of nCRT in patients with ESCC. Liu Y et al established a model to predict pCR to nCRT in ESCC by combining magnetic resonance radiomics and dynamic hemodynamics. 27 The research teams of Liu Y and Lu S both established MRI-based radiomic models to accurately predict the pathological response of ESCC patients after receiving nCRT.28,29 Kasai A et al developed a CT-based radiomic model combined with artificial intelligence (AI) technology to predict the response and prognosis of patients with ESCC to chemoradiotherapy. 30 For patients with locally advanced ESCC, when nCRT combined with esophagectomy is used as the standard treatment, many researchers have explored neoadjuvant immunotherapy. Wang JL et al established and validated a radiomic model based on enhanced CT images combined with clinical data to predict the major pathological response of patients with ESCC to neoadjuvant immunotherapy. 18 CT is the primary imaging modality used during clinical staging and therapeutic response assessment for esophageal cancer. This study primarily investigated CT-based radiomics in ESCC by extracting 1688 radiomics features from the ROIs on plain CT images, including first-order statistics, shape, size, and texture features. These features cover one-dimensional, 2D, and 3D characteristics, and comprehensively describe the tumor characteristics in CT images. We analyzed the correlation between these radiomics features and the sensitivity of ESCC to nCRT. Nine relatively important radiomics features, including five first-order features and four gray-level size zone matrix features were identified. First-order features represent the distribution of voxel intensities in the image, whereas the gray-level size zone matrix comprises texture features representing the spatial characteristics or voxel intensity distribution of the image gray levels, providing information on the relative positions of different gray levels in the image. After feature selection, we utilized eight classifiers to construct a radiomics model to evaluate the maximal effect of CT radiomics in predicting the sensitivity of ESCC to nCRT, and selected the most suitable algorithm for fitting radiomics features. The results showed that all eight classifiers exhibited high AUC values, with SVM being the most effective classifier. These high AUC values indicate that the CT radiomics model has a strong potential for clinical application as a tool for predicting patient response to nCRT.
Furthermore, this study screened six clinical factors that were found to be associated with the sensitivity of ESCC to nCRT (NLR, S/L, CEA, FIB, and smoking and alcohol history), and a clinical model was established based on these factors. The NLR, the ratio of neutrophils to lymphocytes, reflects the balance of these two inflammation-related cells and can reflect the inflammatory response in the body. Changes in this value may reflect alterations in the tumor microenvironment. Systemic inflammation is an important feature of malignant tumors that may significantly affect tumor initiation and progression, as well as their invasive and metastatic capabilities. Recent studies have found that an elevated NLR is associated with poor prognosis in many tumors 31 and can reflect the sensitivity of patients with tumors to treatment.32,33 S/L, which is the ratio of AST to ALT, represents two enzymes that are involved in various biochemical metabolic pathways of cells and are commonly used indicators to reflect liver function. Elevated S/L ratios represent high oxidative stress and an inflammatory environment in the body. Oxidative stress and inflammation are closely related to cancer development.34,35 The S/L ratio is an independent prognostic factor for malignancies such as prostate and bladder cancer.36,37 CEA is an acidic glycoprotein present on the surfaces of cancer cells that exists as a membrane structural protein with human embryonic antigen characteristics. It is a broad-spectrum tumor marker that can be used for disease monitoring, therapeutic response evaluation, and the prognostic assessment of various malignancies.38-41
FIB, a plasma protein synthesized in the liver, is an important indicator of coagulation function. 42 Changes in FIB levels reflect abnormal activation of the coagulation system. A number of studies have shown that a hypercoagulable state is associated with tumor invasion and metastasis.43,44 An abnormal coagulation status can also affect the generation and function of immunosuppressive cells, leading to abnormal functions of regulatory T cells, tumor-associated macrophages, and other immunosuppressive cells—thus providing immune escape mechanisms for tumor cells.45,46 Smoking and alcohol consumption represent high risk factors for esophageal cancer. Tobacco contains carcinogens such as aromatic amines, aldehydes, phenols, and nitrosamines, which can affect the development of esophageal cancer when passing through the esophagus. 47 The main component of alcohol is ethanol, which is oxidized to acetaldehyde by alcohol dehydrogenase in the liver, then metabolized to acetic acid by acetaldehyde dehydrogenase. Ethanol and acetaldehyde can enter esophageal epithelial cells through local infiltration or systemic circulation, exerting direct carcinogenic effects. When smoking and alcohol consumption exceed certain levels, the risk of esophageal cancer increases significantly. 48
In this study, these six clinical factors were combined with the Rad-Score to construct a combined model, upon which a nomogram model was then developed. The predictive performance of the combined model was evaluated using ROC and calibration curves. The results showed that the combined model exhibited good predictive performance, with an AUC value of 0.870 in the training set and 0.821 in the validation set. The calibration curve demonstrated appropriate goodness of fit for the combined model. Overall, we have successfully developed a combined predictive model for assessing the sensitivity of patients with ESCC to nCRT and it has demonstrated good predictive performance in both training and validation sets.
Despite our promising results, this study had certain limitations. First, this was a retrospective study with a relatively small sample size, and no sample size analysis and calculation were performed. The applicability of the model requires further validation in a larger sample. Second, this study established an esophageal cancer prediction model using radiomics, identifying 9 features from the 1688 extracted features related to treatment efficacy and deriving a linear regression equation. However, because of the small number of cases in this study, it may lead to overfitting in linear analysis, thereby affecting prediction accuracy. In addition, interpreting the relationship between these features and tumor biological characteristics remains an issue to be resolved. Finally, there are standardization issues with CT image data across different hospitals and equipment, which can affect the extraction of radiomics features and the generalization ability of the model. The current research is still in its preliminary stage and requires validation and application in larger-scale, multi-center clinical trials.
Conclusion
Our combined model based on CT radiomics and clinical parameters showed good performance for predicting the sensitivity of patients with ESCC to nCRT. This model can effectively provide risk stratification and inform decision-making for clinical treatment. By providing information tailored to each patient’s unique circumstances, this can enhance personalized therapy.
Footnotes
Acknowledgments
We thank all of the participants who were enrolled in this study, as well as the Radcloud Radiomics Platform for their assistance.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Medical and Health Science and Technology Development Program of Shandong Province (202209031000), the Tianjin Key Medical Discipline (Specialty) Construction Project (TJYXZDXK-010A), the National Natural Science Foundation of China (U23A20461), the Clinical Medical Research Center of Shandong Province (2021LCZX04), the Major Basic Research Project of the Shandong Natural Science Foundation (ZR2022ZD31), and the 2021 Shandong Medical Association Clinical Research Fund - Qilu Special Project (YXH2022DZX02002) National Key Laboratory of Advanced Drug Delivery and Release Systems.
