Abstract
Background
The influential factors of chemotherapy-induced myelosuppression in esophageal cancer in central China are unclear. This study aimed to develop a model for prediction of incidence of myelosuppression during chemotherapy among patients with esophageal cancer.
Methods
In this retrospective study, a total of 1446 patients with esophageal cancer who underwent five different chemotherapy regimens between 2013 and 2020 at our institute were randomly assigned in a 7:3 ratio to training and validation data sets. Clinical and drug-related variables were used to develop the prediction model from the training data set by the machine learning method of random forest. Finally, this model were tested in the validation data set.
Results
The prediction model were established with 16 indispensable variables selected from 46 variables. The model obtained an area under the receiver-operating characteristic curve of .883 and accompanied by prediction accuracy of 80.0%, sensitivity of 77.8% and specificity of 81.8%.
Conclusion
This new prediction model showed excellent predictive ability of incidence of myelosuppression in turn providing preventative measures for patients with esophageal cancer during chemotherapy.
Introduction
Esophageal cancer (EC) is one of the most malignant gastrointestinal tumors worldwide, ranking seventh in morbidity and sixth in mortality among all malignant tumors. 1 China has a high incidence of EC, accounting for almost half of the new cases in the world every year, and ranks fifth and fourth respectively in morbidity and mortality among men. 2 Esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC) are the two main histological types, and ESCC is the most common type of EC, occurring in younger individuals and accounting for approximately 80% of all EC globally.3,4 The prognosis of advanced EC patients is poor with the 12.4% and 20.9% five-year survival rate in Europe and China, respectively.5,6
Chemotherapy is the main treatment for advanced EC patients. Chemotherapeutic drugs can inhibit tumor growth by interfering with the proliferation of tumor cells. However, while killing tumor cells, they will also inhibit or kill the normal proliferating cells, such as hematopoietic stem and progenitor cells (HSPCs) in the bone marrow. 7 Chemotherapy-induced myelosuppression (MS) is one of the most common dose-limiting complications in cancer therapy. 8 The main manifestations of MS are leukopenia, neutropenia, anemia, and thrombocytopenia, which often can lead to severe infections, fatigue and excess bleeding.9,10 The discontinuation or interruption of chemotherapeutic drugs caused by MS not only affects the curative effect of chemotherapy, but also results in a series of complications, endangering the life safety of patients. In an investigation of 301 cancer patients, approximately 79% undergo MS therapy after chemotherapy, and chemotherapy was delayed, reduced or stopped due to MS in about 64% of patients. 7 Therefore, it is of great significance for therapy of EC patients to accurately assess the risk factors of MS and to establish a risk prediction model for MS.
At present, most of the studies on EC focus on the search for risk factors and biomarkers. 11 Studies have shown that the potential risk factors for being diagnosed with EC are male sex, gastroesophageal reflux disease (GERD), cigarette smoking, alcohol consumption and obesity.11-13 MiR-330-5p and miR-221 can be used as efficacy biomarkers in EC patients undergoing chemoradiotherapy.14,15 However, the risk prediction model for the occurrence of MS is not well understood, and the quantitative risk assessment for chemotherapy-induced MS are also lacking. Therefore, the identification, exploration and intervention of all potential risk factors may assist in developing more effective risk prediction models on MS after chemotherapy for EC.
To the best of our knowledge, due to the regional specificity of EC, the incidence of EC is mainly concentrated in northern China, while there are few studies in central China because of the limited cases of EC. This study not only analyzed the risk factors of MS in EC patients with chemotherapy in central China, but also innovatively established a quantitative risk prediction model of MS, and verified the accuracy of the model through a real-world retrospective study. This study will provide valuable evidence for clinical EC patients to select appropriate chemotherapy regimen and reduce the incidence of MS.
Methods
Study Population
A prediction model of chemotherapy regimen was established by using clinical data, which captures all patients with esophageal cancer. Data from December 2013 to December 2020 were extracted and included for analysis (n = 1446).
Data Collection
According to our institute protocol, patients were included if they were treated with five different chemotherapy regimens, which include lobaplatin (chemotherapy protocol 1, n = 714), paclitaxel (PTX) (chemotherapy protocol 2, n = 95), 5-fluorouracil (5-F) (chemotherapy protocol 3, n = 60), platinum chemotherapy drugs (PCD) & PTX (chemotherapy protocol 4, n = 444), PCD & 5-F (chemotherapy protocol 5, n = 133). Taking lobaplatin group as the reference level, the MS induced by different chemotherapy protocol was compared. The exclusion criteria were patients with no or unsatisfied chemotherapy protocol (n = 3639), missing variables more than 30% (n = 180), patients with history of chemotherapy or radiation therapy outside our hospital (n = 193), patients with MS has occurred before chemotherapy (n = 65). Data on demographic (age and sex), laboratory findings and drug-related information were collected from the hospital information system. Laboratory test results including blood routine tests, liver and renal function, electrolytes, myocardial enzymes were recorded and compared. Patients were categorized according to the presence or absence of MS within 30 days after the first chemotherapy. The criteria for MS: white blood cells (WBC) <4×109/L, neutrophils <2×109/L, platelets (PLT) <100×109/L, hemoglobin (Hb) <110 g/L. It is judged as MS when one of the four indicators occurred.
Statistical Analysis
Statistical analysis was performed using R version 3.6.1 (Free Software Foundation, Boston, Massachusetts) and SPSS version 26.0 (Armonk, NY; Graphpad version 7.00, San Diego, CA). Indicators with more than 30% missing data were deleted, and multiple imputation was performed for those with less than 30% missing data. Continuous variables were presented as median (interquartile range; IQR) and mean (standard deviation; SD) for non-normal distribution data and normal distribution data, respectively. Categorical variables were presented as n (%). For continuous variables, variance analysis and rank sum test was used for normal distribution data and non-normal distribution data, respectively. Fisher test (n < 5) or χ2 test (n ≥ 5) was used to compare differences between presence and absence of categorical variables.
Prediction Model Development
We developed the prediction model based on machine learning. The selected 1446 patients were randomly divided into 2 separate data sets: 70% of the patients (MS = 473, non‐MS = 539) in our database were selected to the training data set (the algorithm creation group), and the remaining 30% (MS = 203, non‐MS = 231) were reserved as the external validation set (validation group) to obtain unbiased estimates of correct classification rates and variable importance. The univariable analysis was performed on the basis of the training set for all indicators. Variables with P < .2 were included in the multivariate model. After all significant univariate variables were included in the model, the logistic regression model was used to gradually regress forward and backward, and the obtained model removed the insignificant and small influence on the prediction results to obtain the final prediction model. In the training set, the importance of variables was measured by the resulting deterioration in model quality, and the receiver‐operating curve (ROC) was drawn to evaluate the pros and cons of the model. On the basis of ROC of the training set, the ROC of validation set was drawn to evaluate whether there was a significant difference between the ROC of these two sets according to their area under the ROC (AUC). The predicted probability value given by the model is divided into negative and positive results according to a certain threshold t.
The prediction performance was assessed by several criteria including the overall prediction accuracy, sensitivity and specificity. The equations are as follows:
Sensitivity and specificity allow computation of the percentage of correctly predicted MS and non‐MS, respectively, while prediction accuracy means percentage of correctly predicted non‐MS.
Results
Demographic and Clinical Characteristics
A total of 5523 consecutive esophageal cancer inpatients in the medical record system were screened initially from 2013 to 2020. Data extraction for this investigation occurred in March 2021. Patients without chemotherapy in our hospital were excluded. Patients were also excluded from the investigation if their chemotherapy drugs or regimens were not included in this study, or if their MS has occurred before chemotherapy, or if their more than 30% data were missed. Ultimately, 1446 patients were included in the final analysis (Figure 1). Clinical and demographic details for the 1012 esophageal cancer patients in the training set were reported in Supplementary Table S1. Of them, 473 (46.7%) developed MS. The median participant age was 61 (IQR, [54-66]) years, and 862 (85.2%) were male. For variables about the criteria for MS: median WBC was 6.145 (IQR, [5.0-8.0]) 109/L; median PLT was 200 (IQR, [59-253]) 109/L; median neutrophil was 3.92 (IQR, [2.97-5.86]) 109/L; median HGB was 123 (IQR, [113-135]) g/L. Other variables were presented in Supplementary Table S1. Participant flow diagram. n = number of participants.
Variables of Importance
In general, increasing the number of variables is not conducive to clinical practice. The variables with prominent feature were chosen base on the univariable analysis. The final model included 16 indispensable features for MS prediction: times of chemotherapy, hemoglobin (Hb), red blood cells (RBC), creatinine (Cr), platelets (PLT), hematological dugs, blood urea nitrogen (BUN), creatine kinase (CK), age, antiemetic, liver protecting drug, chemotherapy protocol, analgesics, sex, urine protein, Vit B12. Multivariable logistic regression model were used to gradually regress forward and backward to obtain the prediction model and explore the risk factors associated with MS (Figure 2). The prediction model formula is as follows: Risk factors associated with myelosuppression.
Risk Factors Associated with Myelosuppression in Patients with Esophageal Cancer.

Ranking of risk indicators for myelosuppression after chemotherapy.
Classification Results
We obtained different sensitivity, specificity, and accuracy while changing the threshold of t. The ROC was developed on basis of the sensitivity and specificity of the above values. The AUC is often used as an additional performance index. The closer AUC is to 1, the greater is the predictive ability of the model. A model without predictive ability will coincide with the diagonal line. Figure 4 showed the ROC for this model and obtained an AUC of .883 and .881 for training and validation set, respectively. Such results sufficiently indicated that a big separation for MS and non‐MS patients was indeed obtained from this prediction model. The ROC of validation set and training set basically coincided, and there was no statistical difference between them (P = .884), indicated that this prediction model was highly accurate. The result of high prediction accuracy and successful prediction suggested that this new model was efficiently used to predict MS. The receiver‐operating curve (ROC) of training set and validation set.
As shown in Figure 5, when segmented according to the specific point (P = .466), the prediction probability of this model for predicting MS of different chemotherapy regimens is good. The accuracy of the validation set is 80.0%, the sensitivity is 77.8%, and the specificity is 81.8%. The discriminant curve analysis (DCA) (Figure 6) showed that the curve of the training set was higher than the extreme value curve in a large threshold range, indicating that the clinical application value is high. The consistency evaluation showed that the data set curve of this prediction model and the diagonal basically overlap, indicating that the consistency of this model is good (Figure 7). Four-quadrant scatter plot of prediction model of myelosuppression in patients with esophageal cancer. Discriminant curve analysis (DCA) of prediction model of myelosuppression in patients with esophageal cancer. Consistency evaluation of prediction model for myelosuppression in patients with esophageal cancer.


Discussion
In recent years, the incidence of cancer was continuously increasing in the global. With the progress of biomedical science, novel biotherapy and targeting therapy have provided more options for cancer treatment. However, for these therapeutic modalities, the high cost and specificity of patients, e.g., patients with specific receptors or genes, have restricted their clinical application to some extent. With relatively affordable price and broad-spectrum antitumor activity, chemotherapy remains as the major option for most cancer types. While inhibiting and killing the tumor cells, chemotherapeutic drugs would inevitably damage the normal cells and induce a variety of side effects.16,17 MS is one of the most common systematic toxicity of chemotherapy. The occurrence of MS will require the reducing dose of chemotherapy drugs, prolong the time of chemotherapy, and even have the possibility to terminate chemotherapy. What’s worse, MS would increase the risk of visceral bleeding, infection, shock, etc., especially in elderly patients, which can be fatal and not conducive to the prognosis. 18
The current clinical treatment for MS mainly focused on the symptomatic treatment. If MS can be predicted before chemotherapy, it will greatly alleviate the economic and disease burden of patients. Recently, real-world studies (RWS) have been widely applied in the prediction of drug-related adverse effects.19-21 However, due to the regional specificity of the incidence, esophageal cancer is relatively rare in the middle of China, and there are few RWS on the adverse effects of esophageal cancer after chemotherapy. Herein, in this project we established a prediction model involved in various comprehensive factors related to chemotherapy-related MS for esophageal cancer. The AUC of the training and validation set was .883 (.863−.904) and .881 (.85−.912), respectively, indicating good accuracy of this newly developed model.
MS after chemotherapy is largely determined by the chemotherapy regimen. Lobaplatin is the third-generation platinum compound developed by Germany. Compared with cisplatin, it reduces the occurrence of hair loss, nephrotoxicity, ototoxicity, etc., but hematological toxicity, especially thrombocytopenia, is more common. 22 Among the chemotherapy drugs analyzed in this study, lobaplatin, PTX and 5-F all have relatively hematological toxicities and lobaplatin has the most serious effect in inducing MS. It has been reported that combination of multiple chemotherapeutic agents can improve the therapeutic efficacy as well as reduce the side effects. 23 Compared with mono-chemotherapy, combinational therapy by PTX+PCD and 5-F+PCD showed protective effects to MS. Besides, the results of this study showed that multiple courses of chemotherapy have a wide variety of effects on MS. It could be attributed to that multi-cycle chemotherapy would increase the cumulative toxicity of chemotherapeutic drugs, thereby exacerbating the risk of MS. 24
The blood parameters were important risk factors in MS. In the present research, the Hb, PLT and RBC counts of MS were lower than that of Non-MS (Supplementary Table S1). According to Table 1, with increasing of Hb and PLT, the risks of MS significantly reduced. The lower Hb and PLT counts before chemotherapy indicated the deficiency of hematopoietic function in the patient’s bone marrow, which may be attributed by the patient’s low hematopoietic reserve capacity. 25 However, our research showed no significant impact of RBC count on MS, which could be attributed to the limited number of cases in this study.
To clarify the relationships of tissue functions with MS, we investigated liver function, electrolytes, renal function and heart function of patients. After multivariate logistic regression analysis, BUN, Cr and urine protein for renal function and CK for heart function were included for developing the prediction model. Among the three markers for renal function, Cr played the most important role in the model, followed by BUN and urine protein (Figure 3). With the elevating level of Cr and urine protein, the risks to induce MS increased correspondingly. BUN showed no significant influence. As chemotherapy agents are excreted through the kidney. Injured renal function would diminish the renal clearance, reduce the drug elimination and increase the in vivo drug exposure time, thereby aggravating the systematic toxicities.26,27 The elevating of CK indicated the deficiency of heart function. According to our results, heart function showed no obvious impact on MS. It could be resulted from that the baseline CK for MS and Non-MS were both in normal range.
Other factors including sex, concomitant use of drugs and age were also in our model. Male patients were more incline to occur MS than the female, which could be related to the bad lifestyle habits of men, such as smoking and drinking, etc. The concomitant use of other drugs increased the risk while age showed no obvious effect in inducing MS.
The present study had several limitations. First, it was a retrospective study, which was lack of standardized data collection and these variables cannot be pre-designed. Some factors such as nutritional status, previous chemotherapy situation, drug dose intensity, dysphagia and hoarseness are also important variables for MS, but were not included as variables in the analysis for various reasons. In addition, cell lineages from MS were separately analyzed to identify unique predictors for each single line in previous study. 28 In this study, MS indicators were not subdivided due to the limited number of patients. In fact, we had attempted to carry out a prospective study to obtain a better conclusion. However, owing to the limited patient number of esophageal cancer in the designed period of this study, the prospective study failed to draw a meaningful result. In the future, the prospective studies with wide time span and well-designed data collection could be performed to obtain more comprehensive results. Second, this study was carried out in a single-center with sub-sufficient sample size. We had collected all the consecutive patients with esophageal cancer from 2013 to 2020 at our hospital, so the study population was still representative in the middle of China. Multi-center studies could be performed to establish models with wider application range. Third, the validation of this study used an internal validation method by separating 30% of the collected data. External validation was also needed to verify the prediction model.
Although the findings in this study was restricted to the limited sample size and research settings, the obtained predictive model could be tailored to some patients and countermeasures could be taken for preventing MS. Taking the patient’s physiological and pathological state into consideration, the chemotherapy regimen with lower risk for MS could be chosen among the diversity options. When patients are subjected to chemotherapy, the medical staff should closely monitor the patients to ensure the therapeutic effect and life quality of patients to the greatest extent. For those with high risk, the closely monitoring of blood parameter and tissue functions before chemotherapy and follow-up lab test during chemotherapy could be conducted for early access to the risk of MS. Moreover, prophylactic treatments of gCSF, erythrocyte stimulating agents, additional platelets and blood transfusions, or the new agents like Trilacyclib, could be applied to promote the function recovery of bone marrow and prevent the incidence of MS.
Conclusion
Herein, we successfully established a risk prediction model for MS. Based on 16 risk factors, the AUC of the training model was as high as .883. And the AUC of validation model was .881, indicating the good accuracy of this model. Limited by the regional specificity of esophageal cancer, this model is established based on the single-center data. However, considering the scarcity of RWS on the adverse reactions of esophageal cancer, the present study would provide a method and basis for the prediction of MS after chemotherapy for the Chinese population, especially the population in the middle region.
Supplemental Material
Supplemental Material - Predictive Model of Chemotherapy-Induced Myelosuppression for Patients with Esophageal Cancer
Supplemental Material for Predictive Model of Chemotherapy-Induced Myelosuppression for Patients with Esophageal Cancer by Ziming Zheng, Qilin Zhang, Yong Han, Tingting Wu, and Yu Zhang in Cancer Control
Footnotes
Acknowledgments
We thank Mo Lei from Shanghai Le9 Healthcare Technology Co., Ltd for his help in the statistical analysis of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the grants from National Key R&D Program of China (2017YFC0909900).
Ethical Approval
This study protocol was approved by the institutional ethics board of the Union Hospital of Huazhong University of Science and Technology (No. 2018S333).
Informed Consent
This study was performed in accordance with the ethical guidelines of the Declaration of Helsinki and Strengthening the Reporting of Observational Studies in Epidemiology (STROBE).
Patient Information
Based on the clinical data of EC patients from 2013 to 2020 in Union Hospital of Tongji Medical College, Huazhong University of Science and Technology.
Data Availability
The datasets generated and/or analysed during the current study are not publicly available due to the relevant regulations of Union Hospital of Huazhong University of Science and Technology (Wuhan, China), but are available from the corresponding author on reasonable request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
