Abstract
Objective
To develop and externally validate a prognostic nomogram to predict overall survival (OS) in patients with resectable colon cancer.
Methods
Data for 50,996 patients diagnosed with non-metastatic colon cancer were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. Patients were assigned randomly to the training set (n = 34,168) or validation set (n = 16,828). Independent prognostic factors were identified by multivariate Cox proportional hazards regression analysis and used to construct the nomogram. Harrell’s C-index and calibration plots were calculated using the SEER validation set. Additional external validation was performed using a Chinese dataset (n = 342).
Results
Harrell’s C-index of the nomogram for OS in the SEER validation set was 0.71, which was superior to that using the 7th edition of the American Joint Committee on Cancer TNM staging (0.59). Calibration plots showed consistency between actual observations and predicted 1-, 3-, and 5-year survival. Harrell’s C-index (0.72) and calibration plot showed excellent predictive accuracy in the external validation set.
Conclusions
We developed a nomogram to predict OS after curative resection for colon cancer. Validation using the SEER and external datasets revealed good discrimination and calibration. This nomogram may help predict individual survival in patients with colon cancer.
Introduction
Colon cancer is the fourth most common cancer and the fifth-leading cause of cancer-related deaths worldwide. 1 Improvements in surgical techniques and the application of comprehensive treatments have improved the long-term survival of patients with colon cancer.2,3 However, the survival of patients after radical colon cancer surgery varies greatly among different tumor, nodes, metastasis (TNM) classifications.
The 7th edition of the American Joint Committee on Cancer (AJCC) TNM classification released in 2010 classified M0 colon cancer into eight groups based on the depth of invasion and the number of metastatic lymph nodes. 4 However, although this staging system is widely used to predict prognosis, patients with the same stage often have different clinical prognoses, suggesting that survival is also affected by other factors. Previous studies accordingly found that clinicopathological features such as sex, tumor site, and serum biomarkers affected the survival of patients with colon cancer.5–7
A nomogram is a predictive tool that creates a simple graphical representation of a statistical predictive model to generate the numerical probability of a clinical event. 8 Nomograms have been widely applied to many types of tumors, including colon cancer.9–11 However, few nomograms predicting the survival probability of patients with colon cancer have been validated in different populations. 12 The current study therefore aimed to develop and validate a nomogram for colon cancer based on the Surveillance, Epidemiology, and End Results (SEER) database, which contains data for both western and eastern patients, and to additionally validate the nomogram using an external Chinese cohort.
Patients and methods
Patient selection
In this study, we extracted data from the SEER database using SEER*Stat, version 8.3.6 software (released 8 August 2019; http://seer.cancer.gor/data/options.html). We identified patients with colon cancer diagnosed between 2010 and 2015. The inclusion criteria were: (1) colon cancer stage I–III; (2) primary site code (C18.0, C18.2, C18.3, C18.4, C18.5, C18.6, and C18.7); (3) adenocarcinoma (histology codes 8140–8147, 8210–8211, 8220–8221, 8260–8263, 8480–8481, and 8490); and (4) histologically confirmed diagnosis. The exclusion criteria were (1) cases with missing information; (2) no cancer-directed surgery; (3) radiation; and (4) death or no follow-up within 30 days. The included patients were divided randomly into a training set and a validation set (ratio 2:1). The results were also validated using data for patients diagnosed at the First Affiliated Hospital of Bengbu Medical College (BBMC) between 2015 and 2019. A flow chart of patient selection is shown in Figure 1.

Flow chart of patient selection. (a) Surveillance, Epidemiology, and End Results database. (b) Bengbu Medical College database.
This study was conducted in accordance with the World Medical Association’s Declaration of Helsinki and approved by the Ethics Committee of Bengbu Medical College (approval number 2020-238). The study did not include any personally identifiable information, and the need for informed consent was therefore waived by our ethics committee.
Variables
We extracted data for 10 clinicopathological variables, including sex, age, race, primary site, grade, TNM stage, T stage, N stage, adjuvant chemotherapy, and carcinoembryonic antigen (CEA) level. Tumor location was categorized as right-sided colon cancer (RCC) or left-sided colon cancer (LCC). RCCs included cancers of the cecum, ascending colon, hepatic flexure of colon, and transverse colon, and LCCs included cancers of the splenic flexure of the colon, descending colon, and sigmoid colon. Race was categorized as Asian or Pacific Islander (API) or non-API (white and black). Tumor staging was performed in accordance with the 7th edition of the AJCC TNM classification. Age was converted to a categorical variable. The primary endpoint was overall survival (OS). Patients in the SEER database were followed up from diagnosis until death or 31 December 2016. Patients in the BBMC database were followed up from diagnosis until death or 30 June 2020.
Construction of the nomogram
OS was estimated using the Kaplan–Meier method and validated by log–rank test. All significant variables in the univariate analysis were entered into multivariate Cox proportional hazards analysis. All the included variables satisfied the proportional hazards assumption based on a log-log survival plot. Prognostic factors were identified by a forward stepwise Cox proportional hazards regression model. A nomogram based on the results of this model was established to predict 1-, 3-, and 5-year OS.
Validation of the nomogram
The nomogram was validated by discrimination and calibration. Discrimination was evaluated by Harrell’s concordance index (Harrell’s C-index), which can deal with censored data and estimate the probability of an event, with a higher C-index indicating better discrimination. The difference between predicted and actual survival was compared using calibration plots. The nomogram was validated using an internal SEER validation set (n = 16,828) and an external BBMC validation set (n = 342).
Statistical analyses
Statistical analysis was carried out using IBM SPSS Statistics for Windows, version 23.0 (IBM Corp., Armonk, NY, USA). The nomogram was constructed and verified using R software (version 3.6.1). 13 A P-value <0.05 was considered statistically significant, and all statistical tests were two-sided.
Results
A total of 50,996 patients from the SEER database were included in the present study. There were 34,168 patients in the training set, 16,828 patients in the SEER validation set, and 342 patients in the BBMC validation set. The clinicopathological characteristics of the patients are shown in Table 1.
Demographic and clinicopathologic characteristics of included patients.
SEER, Surveillance, Epidemiology, and End Results; BBMC, Bengbu Medical College; CEA, carcinoembryonic antigen; API, Asian or Pacific Islander.
The potential variables in the training set were selected by forward stepwise selection. Univariate Cox regression analysis showed that age, race, primary site, grade, T stage, N stage, chemotherapy, and CEA level were associated with OS, and multivariate Cox regression analysis confirmed these eight variables as independent predictors of OS (Table 2). A nomogram predicting 1-, 3-, and 5-year OS was constructed based on these independent prognostic factors (Figure 2). Adding the points for each variable produced total scores predicting the probabilities of 1-, 3-, and 5-year OS.
Univariate and multivariate analyses of overall survival in the training set.
HR, hazard ratio; CEA, carcinoembryonic antigen; API, Asian or Pacific Islander.

Nomogram predicting 1-, 3-, and 5-year overall survival (OS) of patients with colon cancer after curative resection.
The nomogram was validated using the SEER and BBMC validation sets. The clinicopathological characteristics of the patients in the BBMC validation set are shown in Table 1. Harrell’s C-index for the SEER validation set was 0.71 (95% confidence interval (CI), 0.65–0.77), which was superior to that for the seventh edition of the AJCC TNM stage (0.59; 95%CI, 0.55–0.63) (P<0.001). Calibration plots of the SEER validation of OS showed satisfactory consistency between the actual observed and predicted probabilities (Figure 3). Harrell’s C-index for the BBMC validation set was 0.72 (95%CI, 0.65–0.79). The calibration plot of the nomogram indicated no deviations from the reference line (Figure 4).

Calibration plots of the Surveillance, Epidemiology, and End Results validation set. X-axes represent nomogram-predicted probability of survival and y-axes represent actual survival. The reference line is gray. (a) 1-year, (b) 3-year, and (c) 5-year calibration plots.

Calibration plots of the Bengbu Medical College validation set. X-axes represent nomogram-predicted probability of survival and y-axes represent actual survival. The reference line is gray. (a) 1-year, (b) 3-year, and (c) 5-year calibration plots.
Discussion
In this study, we developed a nomogram based on the SEER database to predict OS in patients with resectable colon cancer. We also validated this prognostic model in an independent external cohort. The nomogram showed better predictive accuracy than the traditional TNM staging system in patients with resectable colon cancer. The significant clinicopathological variables included in this novel nomogram were age, race, primary site, grade, chemotherapy, and CEA level.
The current nomogram had certain advantages and key features compared with previous nomograms for resectable colon cancer. First, the present population was not age-restricted, compared with some studies that only included elderly patients 14 or younger patients. 11 In addition, some previous studies only performed internal validation, 15 while we validated the nomogram using both internal and external datasets, given that validation using a dataset from other countries is generally considered as the most stringent form of external validation. 16
Several previous studies found associations between sex and survival. One study found that female sex was associated with a poor prognosis in patients with stage IIIC colon cancer, 5 while another showed that female sex was associated with higher lymph node yield, which implied an increase in OS in stage I–III colorectal cancer. 17 However, no previous studies have reported on the relationship between female sex and survival in patients with stage I–III colon cancer, and the current study found that sex was not an independent prognostic factor in patients with stage I–III colon cancer.
Elderly patients with stage I–III colon cancer have a higher risk of death than younger patients. Many studies have confirmed that postoperative mortality increases with increasing age.18,19 Elderly patients tend to have more comorbidities than younger patients, especially cardiovascular and respiratory diseases, and often also have large and locally invasive tumors. 20 In addition, although adjuvant chemotherapy can improve the long-term survival of patients with stage III colon cancer, half of all elderly patients with stage III colon cancer do not receive adjuvant chemotherapy 21 and have a higher risk of death. It is also possible that younger patients may derive greater OS benefit from adjuvant chemotherapy compared with elderly patients. 22
The present study identified race as an independent prognostic factor in patients with colon cancer, with API patients having a better prognosis than non-API patients. Similar results have been reported in previous studies of patients with colon cancer and gastric cancer.23,24 Histological grade was also confirmed as an adverse prognostic factor. 25 Patient prognosis could thus be predicted more precisely by incorporating race and histological grade into the nomogram.
In this study, we categorized the tumor location as LCC or RCC. RCC has been associated with a significantly increased risk of death, independent of stage, race, and adjuvant chemotherapy6,26; CEA has also been identified as a prognostic factor for all stages of colon cancer, with patients with an elevated CEA level having a 62% increase in the risk of death (hazard ratio 1.62, 95%CI 1.53–1.74) compared with patients with a normal CEA level. 27 As shown by the current nomogram, patients with RCC and high preoperative CEA tended to have a significantly poorer prognosis.
Accurate prognostic assessment is critical for both doctors and patients. The current nomogram was more precise than the 7th edition of the AJCC TNM staging system, allowing the development of individualized follow-up strategies. However, although our nomogram was verified externally and showed good accuracy, there were several limitations. First, although we knew whether or not patients in the SEER database had received chemotherapy, we did not know the specific scheme of chemotherapy, and different schemes may lead to bias. Second, our nomogram was based on the SEER database, and all the analyses used data available in that database. However, information on some known survival predictors, such as BRAF, KRAS, and TP53 mutations, CpG island methylator phenotype, mismatch repair status, and chromosomal instability status, was not included. 28 Comorbidity, performance status, socioeconomic status, smoking, and other lifestyle factors may also influence survival, and these data were also lacking in the SEER database. Finally, although we carried out external verification, the number of patients in the external cohort was small, and the model needs to be further verified using a larger sample size.
Conclusions
We developed a nomogram to predict 1-, 3-, and 5-year OS in patients with colon cancer following curative resection. Validation of the nomogram using the SEER and BBMC datasets revealed good discrimination and calibration. This nomogram may thus be helpful for predicting individual survival among patients with resected colon cancer.
Footnotes
Acknowledgement
Declaration of conflicting interest
The authors declare that there is no conflict of interest.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Innovation Team Project of Bengbu Medical College [grant number BYKC201909].
