Sage Journals: Discover world-class research

Abstract

Importance:

Nomogram prognostic models can facilitate cancer patient treatment plans and patient enrollment in clinical trials.

Objective:

The primary objective is to provide an updated and accurate prognostic model for predicting the survival of advanced non-small-cell lung cancer (NSCLC) patients, and the secondary objective is to validate a published nomogram prognostic model for NSCLC using an independent patient cohort.

Design:

1817 patients with advanced NSCLC from the control arms of 4 Phase III randomized clinical trials were included in this study. Data from 524 NSCLC patients from one of these trials were used to validate a previously published nomogram and then used to develop an updated nomogram. Patients from the other 3 trials were used as independent validation cohorts of the new nomogram. The prognostic performances were comprehensively evaluated using hazard ratios, integrated area under the curve (AUC), concordance index, and calibration plots.

Setting:

General community.

Main outcome:

A nomogram model was developed to predict overall survival in NSCLC patients.

Results:

We demonstrated the prognostic power of the previously published model in an independent cohort. The updated prognostic model contains the following variables: sex, histology, performance status, liver metastasis, hemoglobin level, white blood cell counts, peritoneal metastasis, skin metastasis, and lymphocyte percentage. This model was validated using various evaluation criteria on the 3 independent cohorts with heterogeneous NSCLC populations. In the SUN1087 patient cohort, the continuous risk score output by the nomogram achieved an integrated area under the receiver operating characteristics (ROC) curve of 0.83, a log-rank P-value of 3.87e−11, and a concordance index of 0.717. In the SAVEONCO patient cohort, the integrated area under the ROC curve was 0.755, the log-rank P-value was 4.94e−6 and the concordance index was 0.678. In the VITAL patient cohort, the integrated area under the ROC curve was 0.723, the log-rank P-value was 1.36e−11, and the concordance index was 0.654. We implemented the proposed nomogram and several previously published prognostic models on an online Web server for easy user access.

Conclusions:

This nomogram model based on basic clinical features and routine lab testing predicts individual survival probabilities for advanced NSCLC and exhibits cross-study robustness.

Keywords

non-small-cell lung cancer nomogram clinical trial data sharing

Introduction

Lung cancer is the leading cause of cancer-related death.¹ Prognostic models that integrate multiple clinical attributes offer greater precision in predicting outcomes and can also aid in defining patient enrollment criteria for clinical trials. Several prognostic models have been developed for advanced lung cancer, but these models did not undergo external validation against multiple independent data sets.^2-5 Given the variety of regimens considered appropriate treatment in advanced NSCLC,⁶ this calls into question the generalizability of these models. In addition, user-friendly online implementations of diagnostic/prognostic models have greatly enhanced patient care for breast cancer,⁷ but no such tools are available for lung cancer prognosis.

Recently, public accessibility of clinical trial data has led to a paradigm shift in clinical research. The Food and Drug Administration (FDA) Amendments Act passed in 2007 resulted in the registration and reporting of most clinical trials in the United States on ClinicalTrials.gov. The Trial and Experimental Studies Transparency Act passed in 2012 required that the results of interventional trials be reported to a publicly available online database.⁸ Shared databases are becoming a highly valuable resource for the construction, validation, and subsequent recalibration of prognostic models. One such data sharing initiative is Project Data Sphere, LLC, an independent, not-for-profit initiative of the CEO Roundtable on Cancer’s Life Sciences Consortium that broadly shares de-identified comparator arm data from late-phase oncology clinical trials with researchers.^9,10 How to best use such valuable shared clinical trial data to improve medical research and patient care is still in the exploration stage.

The first objective of this study was to use publicly available clinical trial data to perform external validation on a previously published nomogram prognostic model, which is most commonly used for patients with advanced NSCLC.¹¹ The second objective was to develop and validate a new model for advanced NSCLC, adhering to recommendations regarding transparency in methods and performing appropriate external validation.⁴ In addition, external validation was performed using data from multiple independent data sets to demonstrate generalizability of the model. The third objective was to develop a public online implementation of the new nomogram and a number of previously published prognostic models.^11-14

Patients and Methods

Patients

De-identified NSCLC patient data from Project Data Sphere were used in model construction and validation. Application for data access was submitted to Project Data Sphere through https://projectdatasphere.org/projectdatasphere/html/registration and was approved. We consented to and complied with the Data User Agreement. It included 1817 patients from the comparator arms of 4 Phase III randomized clinical trials with advanced NSCLC. The 4 trials used were as follows: CA031 (n = 524), a trial comparing nab-paclitaxel and carboplatin to solvent-based paclitaxel as first-line therapy for patients with advanced NSCLC;¹⁵ SUN1087 (n = 480), a trial comparing sunitinib plus erlotinib to erlotinib alone in patients with advanced NSCLC refractory to 1 or 2 chemotherapy regimens;¹⁶ SAVEONCO (n = 358), a trial assessing the efficacy of semuloparin sodium for prevention of venous thromboembolism in patients with a variety of advanced solid tumors;¹⁷ and VITAL (n = 455), a trial comparing (ziv-)aflibercept and docetaxel to docetaxel alone for advanced NSCLC refractory to treatment with platinum-based chemotherapy.¹⁸ The characteristics of the comparator arms of the 4 trials used for this study are shown in Table 1, with greater detail on inclusion and exclusion criteria in Supplementary Table 1. The 4 trials include patients receiving first- and/or second-line chemotherapy, with regimens varying between studies. Because this study focused on patient prognosis, only patients from the comparator arms for these trials were included. SAVEONCO includes patients with a variety of different advanced stage cancers, and only patients with lung cancer were used for the analysis. For the other 3 studies, all patients from the comparator arms were used. Figure 1A shows how each of these data sets were used for both model development and validation in this study.

Table 1.

Summary of the comparator arm of the clinical trial data used in this study. Inclusion criteria are shown only in brief. Greater detail and information on exclusion criteria is shown in Supplementary Table 1.

Study	CA031	SUN1087	SAVEONCO	VITAL
Clinical trial government ID	NCT00540514	NCT00457392	NCT00694382	NCT00532155
Study dates	Nov 2007-Aug 2013	Jul 2007-Jul 2010	Jun 2008-Jan 2013	Sep 2007-Jan 2011
Stage	III, IV	IV	I-IV	I-IV
#Control arm NSCLC patient	524	480	358 of 1604 total patients	455
Patients used in analysis	466	477	269	364
Primary objective of study	Comparison of albumin-bound paclitaxel + carboplatin with paclitaxel + carboplatin	Comparison of erlotinib with erlotinib + SU011248 in patients previously treated with platinum-based chemotherapy	Comparison of Semuloparin sodium with placebo for prevention of VTE in patients at high risk of VTE initiating a new course of chemotherapy	Comparison of aflibercept + docetaxel with placebo + docetaxel
Primary outcome	Response to chemotherapy by radiologic findings (overall survival secondary outcome)	Overall survival	Time to VTE-related event (overall survival secondary outcome)	Overall survival
Inclusion criteria	Stage IIIB or IV NSCLC with no prior chemotherapy for metastatic disease	Locally advanced/metastatic NSCLC and prior treatment with no more than 2 chemotherapy regimens including a platinum-based regimen	Metastatic or locally advanced solid tumor of lung, pancreas, stomach, colon/rectum, or ovary initiating a new course of chemotherapy	Locally advanced or metastatic NSCLC with disease progression after 1 and only 1 prior chemotherapy treatment, which was platinum-based
Line of therapy	First line	Second or third line	First or second line	Second line
Cancer treatment	Paclitaxel + carboplatin	Erlotinib	Various chemotherapies	Docetaxel
Follow-up duration	18 months post-treatment	Median 22.0 months
Sponsor	Celgene	Pfizer	Sanofi	Sanofi
1-year survival rate	0.492	0.699	0.709	0.47
Role in model development	Training	Testing	Testing	Testing

Abbreviations: NSCLC, non-small-cell lung cancer; VTE, venous thromboembolism.

Figure 1.

Flowchart demonstrating use of various models and data sets in this study. (A) CA031 was used as a training set for the new nomogram. SUN1087, SAVEONCO, and VITAL were used as external validation sets for the new nomogram. All models involved in this study are implemented on a web portal. (B) Exploratory survival analysis for the 4 data sets used in this study.

Nomogram development

The new nomogram was developed using CA031¹⁵ as a training set and then validated in 3 independent validation sets: SUN1087,¹⁶ SAVEONCO,¹⁷ and VITAL.¹⁸ Clinical variables that were present in both CA031¹⁵ and at least 1 validation data set were selected as potential co-variables in the model. Overall survival was used as the primary outcome for the nomogram model. Univariate analyses were first used to establish the associations between potential predictors and overall survival in the training set. Co-variables with statistically significant associations with survival (P-value < 0.05) in the univariate analysis were then used as co-variables in a multivariate Cox model, using overall survival as the outcome variable. LASSO (Least Absolute Shrinkage and Selection Operator) penalty^19,20 was used in the multivariate Cox model to further remove possible redundancy (by reducing the number of variables) and improve the robustness of the model. The tuning parameter (cost parameter) for the LASSO penalty was determined by cross-validation within the training set. The fitted multivariate Cox model was used as the final nomogram model. All computations were conducted in the R environment. The fitted nomogram used the covariates as input and generated a risk score for each new patient in the testing sets.

Survival analysis

Overall survival time was calculated from the date of randomization until death or the date of last follow-up. Survival curves were estimated using the Kaplan-Meier product-limit method.²¹ Differences in the survival curves were compared using a log-rank test. A univariate Cox proportional-hazards model²² was used to determine the association between a continuous variable and overall survival in univariate analysis.

Validation criterion

For each study, patients without histology demonstrating adenocarcinoma (AD), squamous cell carcinoma (SCC), and large cell carcinoma (LC) of the lung, such as those with uncertain histology, were excluded. For the purposes of validation, missing data were imputed as the population median. Four criteria were used for the evaluation of the prediction performance of Hoang et al’s¹¹ model and our new nomogram:

Survival difference between predicted high- and low-risk groups: first, risk score was calculated using the nomogram for each patient from the testing sets based on the patients’ covariate values. The patients in each testing set were assigned into a high- or low-risk group based on their risk score calculated by the nomogram, with the median risk score from the training set as the cut-off value. The log-rank P-value was used to determine whether survival differences existed between the high- and low-risk groups in each testing set.

Concordance index: concordance is defined as the probability of agreement for any 2 randomly chosen patients, where agreement means that the patient with the shorter survival time of the 2 also has the larger risk score. For survival data, some of the pairs are incomparable. For comparable pairs, patients may also be tied. The final concordance is $(a g r e e + t i e d / 2) / (a g r e e + d i s a g r e e + t i e d) .$

Integrated area under the curve (AUC) of time-dependent receiver operating characteristics (ROC)^23,24: the area under the ROC curve at each month from the 6th to the 18th month, the span in which most patients’ follow-up times fell, was calculated, and these AUCs were integrated into a single score (integrated AUC). The time-dependent ROC curve was calculated by Inverse Probability Censoring Weighted (IPCW) estimation. Weights were computed by the Kaplan-Meier estimator of the censoring distribution. The time-dependent AUCs of different time points were integrated into a single score (AUC).

Calibration plot: for classification models, a calibration plot is often used to help visualize how consistent the predicted probabilities (ie, the predicted survival probabilities in this case) are with observed event rates (the event in this case is defined as being alive). To construct the calibration plots, we implemented the calibration function from R package caret. The following steps were used for each nomogram:

The patients in the testing data set were split into 20 roughly equal groups by their predicted survival probabilities.

The number of samples with true results (alive or dead at specified time points) equal to the event class (alive) were determined.

The event rate was determined for each bin.

The generated calibration plot is essentially a scatter plot of the observed event rate by the mid-point predicted probability value of the bins. The confidence intervals on the estimated proportions are constructed using the binomial test.

Implementation of previously published models

To facilitate clinician, researcher, and patient utilization of the prognostic models published previously and of our new model, we created a user-friendly Web server for our model together with the 4 published prognostic models for lung cancer^11-14 shown in Supplementary Table 2 (http://lce.biohpc.swmed.edu/lungcancer/nomogram). Details of the implementations of the 4 published models are described in Supplementary Table 2.

Results

Clinical trial and patient population characteristics

Thirty-one patients from CA031¹⁵ were excluded for having a cancer type other than AD, SCC, and LC. Three patients were excluded from this study because of a lack of follow-up information or because >50% of covariates were missing. One patient was excluded from the VITAL cohort because survival information was missing. Kaplan-Meier plots for follow-up time and follow-up status for all 4 studies are shown in Figure 1B. The 1-year survival rates for these 4 studies range from 0.47 to 0.709. A summary of the data distribution for the 21 variables selected for evaluation is shown in Supplementary Table 3.

Validation of a previously published nomogram

As a first step, we performed validation of the previously published nomogram by Hoang et al¹¹ in 2005 with data from CA031,¹⁵ which contains all variables used in the nomogram. Hoang et al’s prognosis is the most cited and used prognostic model for advanced NSCLC, but no independent validations have been performed since its publication in 2005 due to a lack of validation cohorts. Supplementary Figure 1 shows that patients in the CA031 cohort in the high-risk group predicted by the Hoang model have significantly worse survival outcomes compared with those in the low-risk group (P = 4.58e−9, log-rank test), with the high- and low-risk groups’ 1-year survival rates being 0.332 and 0.549, respectively. The integrated area under the ROC curve was 0.646 and the concordance index was 0.611 (Supplementary Figure 2(a)). This shows that the Hoang prognostic model was valid in an independent NSCLC cohort.

Developing a new prognostic nomogram

Results of univariate analysis of the association between each eligible variable and patient overall survival within CA031¹⁵ are displayed in Table 2. In total, 12 variables had a P-value < 0.05 and were included in the LASSO Cox regression model, and 9 variables remained after variable selection using LASSO penalty. These variables were as follows: sex, Eastern Cooperative Oncology Group (ECOG) score, peritoneal metastasis, skin metastasis, liver metastasis, hemoglobin level, white blood cell count, lymphocyte percentage, and large cell histology. The nomogram was constructed by fitting a multivariate Cox proportional hazard model using these variables from CA031.¹⁵ Table 3 shows the hazard ratio (95% confidence interval) and P-value of each variable within this multivariate model, of which 5 P-values remained <0.05.

Table 2.

Univariate analysis of prognostic survival for the CA031 study. The columns are variable name, univariate likelihood ratio test P-value, and hazard ratio from a univariate test. Variables marked with an asterisk are logarithm transformed.

Variable	Univariate P-value	Hazard ratio (95% confidence interval)
Stage
IV vs III	1.53e−01	1.21 (0.93-1.58)
Sex
Men vs Women	5.99e−06	1.79 (1.37-2.34)
Histological type
LC vs AD	5.75e−01	1.22 (0.63-2.36)
SCC vs AD	3.20e−04	1.49 (1.20-1.85)
Body mass index (kg/m²)	2.35e−01	0.99 (0.96-1.01)
Age (years)	4.96e−01	1.00 (0.99-1.02)
ECOG score	1.33e−05	1.84 (1.37-2.48)
Brain metastasis	4.87e−01	0.80 (0.41-1.55)
Peritoneal metastasis	1.15e−06	1.88 (1.48-2.39)
Skin metastasis	2.05e−02	2.14 (1.20-3.82)
Bone metastasis	1.95e−01	1.19 (0.92-1.55)
Liver metastasis	8.35e−06	1.81 (1.41-2.32)
Glucose (mg/dL)*	3.28e−01	1.35 (0.74-2.46)
Alkaline phosphatase (U/L)*	6.84e−01	0.96 (0.80-1.16)
Alanine aminotransferase (U/L)*	5.04e−01	1.07 (0.87-1.32)
Aspartate aminotransferase (U/L)*	9.09e−02	1.31 (0.96-1.78)
Creatinine (mg/dL)	8.06e−02	1.65 (0.94-2.89)
Total bilirubin (mg/dL)*	2.16e−02	0.74 (0.58-0.96)
Hemoglobin (g/dL)	1.05e−03	0.89 (0.83-0.96)
White blood cell count (K/uL)*	9.10e−15	3.49 (2.54-4.79)
Neutrophil percentage (0%-100%)	3.22e−07	19.9 (6.16-64.3)
Lymphocyte percentage (0%-100%)	5.48e−08	0.03 (0.01-0.10)

Abbreviations: LC, large cell carcinoma; SCC, squamous cell carcinoma; AD, adenocarcinoma.

Table 3.

Hazard ratios (HR) and 95% confidence intervals of nomogram parameters. Variables marked by an asterisk are logarithm transformed.

Variable	HR (95% confidence interval)	P-value
Sex
Men vs Women	1.71 (1.30-2.26)	1.39e−4
Histological type
LC vs AD and SCC	1.46 (0.75-2.85)	2.70e−1
ECOG score	1.35 (0.99-1.82)	5.55e−2
Peritoneal metastasis	1.76 (1.36-2.28)	1.54e−5
Skin metastasis	1.78 (0.98-3.23)	5.97e−2
Liver metastasis	1.53 (1.19-1.98)	1.05e−3
Hemoglobin (g/dL)	0.90 (0.83-0.96)	3.32e−3
White blood cell count (K/uL)*	2.48 (1.74-3.53)	4.52e−7
Lymphocyte percentage (0%-100%)	0.28 (0.06-1.25)	9.48e−2

Abbreviations: LC, large cell carcinoma; SCC, squamous cell carcinoma; AD, adenocarcinoma; ECOG, Eastern Cooperative Oncology Group.

Validation of the new prognostic nomogram

The proposed nomogram was developed using the CA031¹⁵ patient cohort as a training set and validated in 3 independent patient cohorts. The validation results are presented in Figure 2 and Supplementary Figure 2(b) to (d). In Figure 2, the risk groups are defined using the median value of the predicted 2-year survival probabilities. The Hoang et al nomogram was also validated on these 3 cohorts for comparison. In the SUN1087¹⁶ patient cohort, the continuous risk score output by the new nomogram achieved an integrated area under the ROC curve of 0.83 from the 6th month to the 18th month and a concordance index of 0.717 (Supplementary Figure 2(b)). The log-rank P-value was 3.87e−11, with the high- and low-risk groups’ 1-year survival rates being 0.456 and 0.773, respectively. In the SAVEONCO¹⁷ patient cohort, the integrated area under the ROC curve was 0.755 and the concordance index was 0.678 (Supplementary Figure 2(c)). The log-rank P-value was 4.94e−6, with the high- and low-risk groups’ 1-year survival rates being 0.568 and 0.785, respectively. In the VITAL¹⁸ patient cohort, the integrated area under the ROC curve was 0.723, the concordance index was 0.654 (Supplementary Figure 2(d)), and the log-rank P-value was 1.36e−11, with the high- and low-risk groups’ 1-year survival rates being 0.278 and 0.574, respectively. The integrated AUC, concordance index, and log-rank P-value measurements from the 3 independent testing cohorts showed that the proposed nomogram works well in the 3 test data sets, indicating the robustness of the prediction model. Supplementary Table 4 provides a summary of these results.

Figure 2.

Evaluation of nomograms by log-rank test. Evaluation of previous nomogram and new nomogram on testing data sets including (A, B) the SUN1087 study, (C, D) the SAVEONCO study, and (E, F) the VITAL study. Each panel shows the separation of the Kaplan-Meier estimator by the dichotomized risk score for the testing patients, and P-values were calculated based on log-rank test for the statistical significance of survival time difference between the predicted high- and low-risk groups. The risk groups are classified based on the median of predicted survival rates.

We also validated the performance of our new nomogram as well as the Hoang et al nomogram using the calibration plot (Figure 3). From both Figures 2 and 3, we can clearly see that our new nomogram outperforms the Hoang model at least on these 3 neutral data sets.

Figure 3.

Calibration plots. Calibration plots of previous nomogram and new nomogram were generated using 3 testing data sets: (A, B) the SUN1087 study, (C, D) the SAVEONCO study, and (E, F) the VITAL study.

Building a user-friendly Web server for the updated and previous nomograms

We have also provided an online version of this nomogram (Supplementary Figure 3) to facilitate its widespread use by physicians and researchers (http://lce.biohpc.swmed.edu/lungcancer/nomogram/index.php). Online implementations of several previously developed models are also available^11-14 (Supplementary Figure 4). Comparison of overall survival probabilities can be made between these nomograms by inputting patients’ clinical features and reading output generated by the Web server.

Discussion

The landscape of lung cancer patients and treatment has shifted over time. Therefore, it is of value to provide more updated nomograms given new data. We developed and validated a prognostic model for patients with advanced stage NSCLC treated with chemotherapy using up-to-date patient data collected after 2007. The nomogram was built using data from CA031¹⁵ and validated with 3 independent clinical trials. The new nomogram meets the guidelines for AJCC endorsement.²⁵ A wide range of clinical features have been incorporated for use in prognostic models in the past, and a detailed comparison with other published prognostic models is presented in Supplementary Table 2. Robust external validation demonstrated discriminatory power in the nomogram through 3 different statistical measures for each validation data set. First, dichotomization of patients into high- and low-risk subgroups based on their calculated nomogram score showed a statistically significant difference in the survival curves between groups. Second, the nomogram had a high c-index in each validation set, which means that for any given pair of patients, there is a high probability that the nomogram score correctly predicts which patient will have better survival. Third, the nomogram performed well by area under the ROC curve at every measured time-point, indicating that the nomogram had a strong ability to predict whether or not a patient would be alive at a given time after randomization. In addition, our analyses demonstrated the good calibration performance of our new nomogram.

Each of the validation data sets contained patients receiving different types of treatment, including first-, second-, and third-line treatments and both cytotoxic and targeted regimens. Survival analysis thus unsurprisingly demonstrated considerable heterogeneity in survival characteristics between the studies used to validate the nomogram. In spite of this, the proposed nomogram demonstrated accuracy and robust performance across multiple testing data sets. A major strength of this study, therefore, is that we have demonstrated the external validity of our model in a broad range of clinical scenarios. The 3 testing data sets include patients on both targeted and non-targeted therapies. This is in contrast to many previously published prognostic tools for advanced stage NSCLC, which did not undergo external validation and therefore have not been proven to be generalizable to the disparate situations that clinicians are likely to encounter.²

Moreover, in the modern treatment era, conventional cytotoxic chemotherapy still remains a component of treatment for almost all patients with lung cancer, although immunotherapy and molecular-targeted therapy are more and more commonplace. Only a small minority of patients (about 20%) will have a druggable kinase alteration such as epidermal growth factor receptor (EGFR) or anaplastic large-cell lymphoma kinase (ALK). For example, the FDA recently limited the indication of EGFR inhibitor Erlotinib to only those cases with specific EGFR mutations. And, although immunotherapy has become a treatment option for patients, only about 25% of patients have high-level (⩾ 50%) PDL1 expression for which first-line immunotherapy is better than chemotherapy. Therefore, the variables included in our proposed model should be applicable to the general patient population. However, in the future, it will be interesting to consider the key biomarkers in NSCLC, including EGFR genotype, ALK genotype, and PDL1 status, together with clinical covariates for patient prognosis when such data sets with large sample sizes are available.

One challenge for using public data arose from the missing data across cohorts. Performance status (ECOG or Karnofsky) is the only variable included in all 5 prognostic models, stressing the importance of this variable for prognosis. But, there is a certain degree of dissimilarity between the other variables included between the 5 models, which calls for a direct comparison of these models. However, this is not possible within the scope of this study as some clinical variables that are frequently used in prognostic modeling of NSCLC, such as serum lactate dehydrogenase, were not available from the public clinical trial data.

By implementing prognostic models on a Web server, we provide an easy way for researchers and clinicians to access the predicted overall survival probabilities and also to make straightforward comparisons between survival probabilities generated by different models. Our hope is that greater transparency in data reporting for clinical trials will be accompanied by greater access for clinicians to prognostic models that can enhance precise tailoring of patient management.

Supplemental Material

Suppl_material – Supplemental material for Development and Validation of a Nomogram Prognostic Model for Patients With Advanced Non-Small-Cell Lung Cancer

Supplemental material, Suppl_material for Development and Validation of a Nomogram Prognostic Model for Patients With Advanced Non-Small-Cell Lung Cancer by Tao Wang, Rong Lu, Sunny Lai, Joan H Schiller, Fang Liz Zhou, Bo Ci, Stacy Wang, Xiaohan Gao, Bo Yao, David E Gerber, David H Johnson, Guanghua Xiao and Yang Xie in Cancer Informatics

Footnotes

Acknowledgements

The authors acknowledge Tsung-Wei Ma for help with downloading the data and converting the format and Jessie Norris for proofreading the manuscript.

Funding:

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institutes of Health (1R01GM115473, 5R01CA152301, P50CA70907, 5P30CA142543, and 1R01CA172211), the National Cancer Institute (NCI) Midcareer Award in Patient-Oriented Research (K24CA201543-01; to D.E.G.), and the Cancer Prevention and Research Institute of Texas (RP120732 and RP180805).

Declaration of conflicting interests:

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author Contributions

TW and RL contributed equally.

Supplemental Material

Supplemental material for this article is available online.

References

Siegel

Miller

Jemal

Cancer statistics, 2016. CA Cancer J Clin. 2016;66:7–30.

Mahar

Compton

McShane

et al . Refining prognosis in lung cancer: a report on the quality and relevance of clinical prognostic tools. J Thorac Oncol. 2015;10:1576–1589.

Steyerberg

Moons

van der Windt

et al . Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10:e1001381.

Collins

Reitsma

Altman

Moons

KG.

Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br J Surg. 2015;102:148–158.

Moons

de Groot

Bouwmeester

et al . Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11:e1001744.

Masters

Temin

Azzoli

et al . Systemic therapy for stage IV non-small-cell lung cancer: American Society of Clinical Oncology clinical practice guideline update. J Clin Oncol. 2015;33:3488–3515.

Shachar

Muss

HB.

Internet tools to enhance breast cancer care. NPJ Breast Cancer. 2016;2:16011.

Drazen

JM.

Transparency for clinical trials: the TEST Act. N Engl J Med. 2012;367:863–864.

Hede

Project Data Sphere to make cancer clinical trial data publicly available. J Natl Cancer Inst. 2013;105:1159–1160.

10.

Green

Reeder-Hayes

Corty

et al . The Project Data Sphere initiative: accelerating cancer research by sharing data. Oncologist. 2015;20:464–464.e20.

11.

Hoang

Schiller

Bonomi

Johnson

DH.

Clinical model to predict survival in chemonaive patients with advanced non-small-cell lung cancer treated with third-generation chemotherapy regimens based on eastern cooperative oncology group data. J Clin Oncol. 2005;23:175–183.

12.

Albain

Crowley

LeBlanc

Livingston

RB.

Survival determinants in extensive-stage non-small-cell lung cancer: the Southwest Oncology Group experience. J Clin Oncol. 1991;9:1618–1626.

13.

Paesmans

Sculier

Libert

et al . Prognostic factors for survival in advanced non-small-cell lung cancer: univariate and multivariate analyses including recursive partitioning and amalgamation algorithms in 1,052 patients. J Clin Oncol. 1995;13:1221–1230.

14.

Finkelstein

Ettinger

Ruckdeschel

JC.

Long-term survivors in metastatic non-small-cell lung cancer: an Eastern Cooperative Oncology Group Study. J Clin Oncol. 1986;4:702–709.

15.

Socinski

Bondarenko

Karaseva

et al . Weekly nab-paclitaxel in combination with carboplatin versus solvent-based paclitaxel plus carboplatin as first-line therapy in patients with advanced non-small-cell lung cancer: final results of a phase III trial. J Clin Oncol. 2012;30:2055–2062.

16.

Scagliotti

Krzakowski

Szczesna

et al . Sunitinib plus erlotinib versus placebo plus erlotinib in patients with previously treated advanced non-small-cell lung cancer: a phase III trial. J Clin Oncol. 2012;30:2070–2078.

17.

Agnelli

George

Kakkar

et al . Semuloparin for thromboprophylaxis in patients receiving chemotherapy for cancer. N Engl J Med. 2012;366:601–609.

18.

Ramlau

Gorbunova

Ciuleanu

et al . Aflibercept and Docetaxel versus Docetaxel alone after platinum failure in patients with advanced or metastatic non-small-cell lung cancer: a randomized, controlled phase III trial. J Clin Oncol. 2012;30:3640–3647.

19.

Tibshirani

Bien

Friedman

et al . Strong rules for discarding predictors in lasso-type problems. J R Stat Soc Series B Stat Methodol. 2012;74:245–266.

20.

Friedman

Hastie

Tibshirani

Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.

21.

Kaplan

ELMP

. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53:457–481.

22.

Collett

Modelling Survival Data in Medical Research. Boca Raton, FL: Chapman & Hall/CRC; 2003.

23.

Heagerty

Lumley

Pepe

MS.

Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56:337–344.

24.

Guinney

Wang

Laajala

et al . Prediction of overall survival for patients with metastatic castration-resistant prostate cancer: development of a prognostic model through a crowdsourced challenge with open clinical trial data. Lancet Oncol. 2017;18:132–142.

25.

Kattan

Hess

Amin

et al . American Joint Committee on Cancer acceptance criteria for inclusion of risk models for individualized prognosis in the practice of precision medicine. CA Cancer J Clin. 2016;66:370–374.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.10 MB