Abstract
Introduction
Non-small cell lung cancer (NSCLC) accounts for 85% of lung cancer, which remains the leading cause of cancer-related death worldwide. 1 Lung squamous cell carcinoma (LUSC) is a highly aggressive subtype of NSCLC, accounting for approximately 25% to 30% of all cases. 2 As with the majority of patients with lung cancer, almost two-thirds of patients with LUSC are diagnosed at an advanced stage. 1 Compared with non-squamous NSCLC, LUSC is more frequently located in the proximal bronchus and is more likely to invade the large blood vessels. 3 Besides, the incidence of comorbidities such as chronic obstructive pulmonary disease and heart disease is higher.4,5 Due to these characteristics, the limited first-line treatment options would lead to an undesirable impact on survival outcomes, particularly for metastatic LUSC patients. 6 Despite a large number of relevant studies, the prognosis of LUSC with metastasis is still unsatisfactory (5-year survival rate is still less than 5%) owing to the limitations of early diagnosis and treatment.7,8
Currently, the American Joint Committee on Cancer (AJCC) eighth TNM staging system is the most widely used method to assess the prognosis of lung cancer patients. 9 Although the M stage was divided into M1a, M1b, and M1c in the latest staging system, it still has inherent limitations attributed to the lack of consideration of clinical pathology and treatment information. 10 Therefore, more accurate prognostic models are urgently needed to inform clinical decisions for metastatic LUSC patients. In recent years, the nomogram has been widely used as a predictive method in oncology, which has been shown to present better predictive power than traditional TNM classification.11–13 In our previous study, the nomogram model also exhibited favorable discriminative ability for stage I-III LUSC. 14
In this work, we developed a prognostic nomogram for metastatic LUSC patients based on the Surveillance, Epidemiology, and End Results (SEER) database and assessed its reliability and feasibility by an independent internal cohort. Additionally, a dynamic website has also been designed for clinical use.
Methods
Patients Selection
This study was conducted as a retrospective study using the SEER database (http://seer.cancer.gov/), which is an authoritative population-based cancer registry that covers roughly 48.0% of the US population. 15 In this study, data of patients diagnosed with metastatic LUSC from 2010 to 2015 were retrieved using SEER*Stat version 8.3.6 (user name: 21304-Nov2020). Patients who met the following criteria were included in the study: (1) Microscopically confirmed LUSC (ICD-O-3 code: 8070/3, 8071/3, 8072/3, 8073/3, 8074/3, 8075/3, 8084/3) with metastasis; (2) Elder than 18 years and have complete demographic data; (3) With complete clinical staging status. Patients with other primary tumors or incomplete follow-up information were excluded. The selection criteria and screening process are illustrated in Figure S1.
Data Collection
The following clinical variables were extracted in the present study: sex, age, race, marital status, laterality, grade, tumor size, tumor extension, T-stage (T stage was restaged with information on tumor size and extension based on the eighth edition), N-stage, metastatic status, treatment information, and follow-up data. The primary endpoint was overall survival (OS), which was defined as the time from diagnosis to death of any cause. Events that had not occurred by the last follow-up date were recorded as censoring.
Statistical Analysis
Software R (Bell Laboratories, version 4.1.2) and SPSS (IBM, version 24.0) were used for statistical analysis. We divided patients into training and testing cohorts with a ratio of 7:3 using the “createDataPartition” function in the R “crate” package to ensure that the outcome events (death or alive) were randomly distributed between the 2 cohorts. The chi-square test was used to compare the baseline data of the 2 cohorts. For the training cohort, the univariate proportional hazards model was used to check each parameter’s power in predicting OS. And then factors with P < .05 were further included in a multivariate cox regression analysis. R “rms” package was used to formulate a nomogram based on the result of the multivariate analysis. Harrell’s C-index was used to reflect the predictive accuracy and discriminative power of the nomogram, a calibration curve (1000 bootstrap resampling) was used to test the calibration of the nomogram, and DCA was used to evaluate the nomogram’s clinical utility. Then the total points of each patient were calculated according to the established nomogram model, and based on this, the X-tile procedure was used to divide 2 groups of patients with different prognostic risks. 16 Finally, we established a web-based version nomogram with the R “DynNom” package. All statistical tests were two-sided, and P values of less than .05 were considered to be statistically significant.
Results
Demographic and Pathology Characteristics of the 2 Cohorts
We screened 9910 patients with LUSC diagnosed between 2010 and 2015 from the SEER database, all of whom were confirmed to have metastasis at the time of initial diagnosis. After grouping all the patients by the “creatDataPartition” function in a 7:3 ratio, there were 6937 in the training cohort and 2973 in the validation cohort. There were more males in both training cohort (65.4% vs 34.6%) and validation cohort (65.5% vs 34.5%). The median age of all patients was 69 (IQR: 62-77). The proportion of patients older than 60 years in our study was as high as 79.4%, indicating that LUSC with metastasis predominantly occurs in elderly patients. In addition to this, these patients tend to have a later T stage (41.7% in the training cohort and 41.8% in the validation cohort with the T4 stage) and a later N stage (N2-N3 accounted for 67.1% in the training cohort and 67.0% in the validation cohort). Bone (32.1%) was one of the most common metastatic organs. The median follow-up time was 63 months. At the end of follow-up, death from any cause had occurred in 9585of 9910 (96.72%), 6724 of 6937 (96.92%), and 2861 of 2973 (96.23%) patients in the total, training, and validation cohorts, respectively. The demographic and pathology characteristics are listed in Table 1, which showed there were no significant differences between the 2 cohorts.
Demographic Characteristics of the Training and Validation Cohorts.
Abbreviation: AJCC, American Joint Committee on Cancer.
Construction of Nomogram in the Training Cohort
Independent risk factors included in the nomogram were assessed in the training cohort. Univariate Cox regression analysis suggested that the following 10 factors may be significant prognostic factors: age, marital status, T stage, N stage, bone metastasis, brain metastasis, liver metastasis, surgery, chemotherapy, and radiotherapy. Nevertheless, the effect of marital status on prognosis was attenuated in multivariate analysis. Hence, we eventually selected the remaining 9 independent prognostic factors as variables in the nomogram model (Figure 1). The specific value of clinicopathological factors in the nomogram in the training cohort could be queried in Table S1. The nomogram’s Akaike information criterion (AIC) is 104598.3, and Harrel’s C-index is 0.711 (95% CI: 0.705-0.717). The hazard ratio (HR), 95% CI, and the P value of each prognosis factor are listed in Table 2.

Nomogram for predicting 6-month, 1-year, and 2-year survival in lung squamous cell carcinoma (LUSC) patients with metastasis. “Surgery” refers to surgery performed on the primary cancer site.
Univariate and Multivariate Cox Analyses on Variables for the Prediction of Overall Survival of Training Cohort.
Abbreviations: AJCC, American Joint Committee on Cancer; CI, confidence interval; HR, hazard ratio.
Calibration and Validation of the Nomogram
We further calibrated and verified the nomogram. The Harrell’s C-index in the training cohort and validation cohort were 0.711 (95% CI: 0.705-0.717) and 0.707 (95% CI: 0.697-0.717), respectively. Receiver operating characteristic (ROC) curves of 6 months, 1 year, and 2 years are plotted in Figure S1, and time-dependent area under the curve (AUC) (Figure 2) was calculated in both groups. It shows that the AUC was elevated than 0.7 in both cohorts within 5 years, which indicated favorable discrimination by the nomogram. Figure 3 illustrated the nomogram calibration curves for 6 months, 1 year, and 2 years, confirming a high degree of agreement between the anticipated and actually observed survival probabilities in the training and validation cohorts. Moreover, the decision curve analysis (DCA) for the nomogram and TNM staging systems is presented in Figure 4. The DCA demonstrated that the nomogram in predicting OS is more beneficial than that of the TNM staging system in patients with LUSC metastasis, which displays large positive net gains in predictive models for nearly all threshold probabilities at different points in time. In conclusion, the above results demonstrated that the nomogram we constructed had considerable discriminative and calibration capabilities.

Time-dependent area under the curve (AUC) for training cohort (A) and validation cohort (B).

Calibration curves of training cohort (A-C) and validation cohort (D-F) for 6 months, 1 year, and 2 years.

Decision curve analysis (DCA) of training cohort (A-C) and validation cohort (D-F) for 6 months, 1 year, and 2 years.
Risk Stratification of Overall Survival by the Nomogram Model
In order to assess subgroups of patients positively affected by the nomogram, we developed an OS risk classification system based on the nomogram total points for each patient, dividing all patients into a high-risk group and a low-risk group. The optimal cutoff value for total points was 236.41, which was determined using X-tile software version 3.6.1 with the minimal P value approach. 16 As depicted in Figure 5, the nomogram total points showed good prognostic classification for LUSC patients with metastasis in both the training cohort and the validation cohort.

Graphs showing the Kaplan-Meier curves for 2 groups based on the predictors from the nomogram model in the whole population (A), training cohort (B), and those in the validation cohort (C).
Development of Webserver for Easy Access of Our Nomogram Model
For the purpose of making our nomogram easier to apply in clinic and more convenient for doctors and patients, we built an online version of the nomogram in the form of a web tool (https://lusc-nomogram.shinyapps.io/LUSC-nomogram/). The patient’s survival rate could be calculated once the clinical variables which are displayed on the left side of the page selected under each variable.
Discussion
The gloomy prognosis of LUSC is mainly due to the initial diagnosis occurring at an advanced stage. 1 Although in TNM staging, M1 all belong to stage IV, the prognosis of tumor patients is still heterogeneous due to differences in age, metastatic organs, and treatment methods.17–19 However, there is no prognostic model available for M1 stage LUSC to date. Using large population data as the study subjects in the present study, we developed and validated a prognostic nomogram for patients with M1 stage LUSC and stratified the patients into different risk subgroups. The factors included in the model are readily available from clinical data. Moreover, we have verified the great performance of the model through various verifications. The online tool developed based on this can also bring great convenience to clinicians and patients.
Bone is a prevalent site of metastasis in various cancers, including NSCLC, and 30% to 40% of patients with NSCLC develop bone metastasis during the course of the disease. 20 Bone metastasis could lead to the occurrence of bone adverse events and reduce the patients’ quality of life. A previous study had discovered that multiple bone metastasis and alkaline phosphatase in NSCLC patients are associated with poor prognosis. 21 In the present study, the HR of bone metastasis was 1.472 (95% CI: 1.396-1.552). The presence of brain and liver metastasis also greatly increases the risk for LUSC patients. On the other hand, chemotherapy and radiotherapy are the basic methods for NSCLC patients without the driver gene mutation. Our study confirmed the importance of both treatments for advanced LUSC patients. As for surgery, the National Comprehensive Cancer Network guidelines did not recommend surgery for stage IV patients, but our study suggested that surgical resection of the primary site could bring OS benefits to patients. Bateni et al. 22 found that pneumonectomy was safe for stage IV patients. Of course, surgical indications and preoperative examination may need to be more rigorous for these patients.
Previous studies have proposed that some factors may affect OS in patients with advanced NSCLC, such as age, tumor size, lymph node metastasis, and treatment modality.23,24 These factors, which are generally considered to have a prognostic impact, were taken into full account in our nomogram. Lung adenocarcinoma (LUAD) and LUSC are the largest subgroups of NSCLC. Nonetheless, increasing evidence confirms that there are significant differences between the LUAD and LUSC in histopathology, clinicopathological features, transcriptomic profiles, driver genetic changes, and treatment response. 25 As a result, the prognostic factors between LUSC and LUAD are not entirely identical. At present, most studies on LUSC prognosis involve gene expression26,27 and methylation.28,29 Since only a minority of patients with advanced disease will undergo surgery (only 5.6% in our study), it is difficult to obtain enough samples for genetic testing. Thus, a simple and convenient prognostic model is desirable for most advanced LUSC patients.
To the best of our knowledge, this is the first nomogram model for patients with advanced LUSC. We analyzed a large set of samples using the data from 18 medical centers registered in the SEER database, which represented populations in different regions. The Harrel’s C-index, time-dependent AUC, calibration curve, and DCA all suggested that the nomogram had potential clinical application value. In addition, we can distinguish high-risk patients from low-risk patients according to the total points of the nomogram. The K-M plot and log-rank analysis also show that there are significant differences in OS between the 2 groups. As a consequence, we should attach great attention to patients with a total point higher than 236.41.
Nevertheless, there are still several limitations in the current study. Firstly, as a retrospective study, selection bias is inevitable. A second limitation is that we did not calculate the sample size, but included all eligible samples as most studies do,30,31 which may affect the stability of the model. The information on treatment in the SEER database is not so detailed, such as the specific drug regimen of chemotherapy; in addition, factors such as smoking history, Eastern Cooperative Oncology Group (ECOG) status, 32 PD-L1 expression level, 33 and other tumor markers that we consider to be prognostic factors were not available. In addition, although we fully considered the metastatic organs, the information about the number of metastatic lesions in the SEER database is lacking, so we cannot incorporate this important factor into the model establishment. In the future, a multicenter clinical trial is needed as an external validation to evaluate the utility of our nomogram.
Conclusion
In a nutshell, we successfully constructed a nomogram to predict the OS for LUSC with metastasis. Our nomogram staging technique has the advantages of high accuracy and has good clinical application value.
Supplemental Material
sj-docx-1-tct-10.1177_15330338221132035 - Supplemental material for Construction and Validation of Prognosis Nomogram for Metastatic Lung Squamous Cell Carcinoma: A Population-Based Study
Supplemental material, sj-docx-1-tct-10.1177_15330338221132035 for Construction and Validation of Prognosis Nomogram for Metastatic Lung Squamous Cell Carcinoma: A Population-Based Study by Yuting Liu, Min Sun, Ying Xiong, Xinyue Gu, Kai Zhang and Li Liu in Technology in Cancer Research & Treatment
Footnotes
Abbreviations
Authors’ Note
Yuting Liu and Min Sun contributed equally to this work. The authors state that this article does not contain any studies with human participants or animals so exempt from institutional review board approval. Informed consent from study participants was not required as SEER is an anonymized database that is open to the public. The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. Publicly available datasets were analyzed in this study. These data can be found here:
.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant number 8207112731).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
