Abstract
Purpose
The aim of the present study was to develop a nomogram for prognostic prediction of patients with lung cancer in hospice.
Methods
The data was collected from 1106 lung cancer patients in hospice between January 2008 and December 2018. The data were split into a training set, which was used to identify the most important prognostic factors by the least absolute shrinkage and selection operator (LASSO) and to build the nomogram, while the testing set was used to validate the nomogram. The performance of the nomogram was assessed by c-index, calibration curve and the decision curve analysis (DCA).
Results
A total of 1106 patients, including 835 (75%) from the training set and 271 (25%) from testing set, were retrospectively analyzed in this study. Using the LASSO regression, 5 most important prognostic predictors that included sex, Karnofsky Performance Scale (KPS), quality-of-life (QOL), edema and anorexia, were selected out of 28 variables. Validated c-indexes of training set at 15, 30, and 90 days were .778 [.737-.818], .776 [.743-.809], and .751 [.713-.790], respectively. Similarly, the validated c-indexes of testing set at 15, 30, and 90 days were .789 [.714-.864], .748 [.685-.811], and .757 [.691-.823], respectively. The nomogram-predicted survival was well calibrated, as the predicted probabilities were close to the expected probabilities. Moreover, the DCA curve showed that nomogram received superior standardized net benefit at a broad threshold.
Conclusions
The study built a non-lab nomogram with important predictor to analyze the clinical parameters using LASSO. It may be a useful tool to allow clinicians to easily estimate the prognosis of the patients with lung cancer in hospice.
Introduction
Lung cancer is one of the malignant cancers that seriously threaten human health. According to the latest data from the International Agency for Research on Cancer, it is estimated that there were 2.09 million new lung cancer cases were reported in 2018, 1 and 1.76 million patients die out of it. In China, the lung cancer incidence and mortality was relatively high compared to most countries.2,3 More than 70% of patients with lung cancer diagnosed as advanced tumors, resulting in only 16.1% of lung cancer patients could survive over 5 years after diagnosis, which was lower than that of developed countries in Europe and the United States (20.1%). 4
More and more attention was paid to patients with terminal-stage lung cancer in hospice in China. In China, the hospice is an institution that provides free hospice care for advanced cancer patients having financial conditions, funded by the Li Ka-shing Foundation. To improve the management of patients in hospice, research on the survival time of patients in hospice is indispensable. There were several tools that had been used to evaluate prognosis in terminally ill patients, such as the Palliative Prognostic Index (PPI), the Palliative Performance Scale (PPS) and modified Glasgow Prognostic Score (mGPS).5-7 The Zhou LJ et al constructed a simple Chinese Prognostic Scale (ChPS), to predict the survival rate of patients with terminal-stage cancer, of which accuracy rate of prediction wasn’t satisfying. 8 Jing C et al developed a new prognostic scale for ChPS (new-ChPS Scale) by a prospective survey on the prognostic factors. 9 However, none of these models were tailor-made for patients with advanced lung cancer and the applicability of them still need to be verified. Some of the Scale incorporated biologic and imaging parameters, however, the patients in Chinese hospice could not afford their follow-up blood testing or imaging examination. Therefore, an effective and economical tool was proposed to evaluate the outcome of advanced cancer in our study, which could also triage the patients and inform their family members.
Cox proportional hazard model is the most common method for assessing the effects of various factors in the survival analysis. However, under the condition that the number of independent variables was high while sample size was low. The Cox proportional hazard model was associated with limitation such as multicollinearity, reduction in estimation precision, and non-interpretability of the coefficients. 10 Least Absolute Shrinkage and Selection Operator (LASSO) is an advanced machine learning method, which can overcome the above problems by adding a penalized function to the estimation of the partial maximum likelihood. In this way, the coefficients of redundant variables become exactly zero and the most probable prognostic factors are retained in the model.11,12 In some of the previous research, LASSO method was applied and its superiority over traditional methods was confirmed through different studies.13,14 In Viet-Huan Le’s study, LASSO regression model was applied to find out the best CT-based radiomics features for predicting the overall survival (OS) of lung cancer. 15
In this study, through collecting and analyzing the data of the patients with lung cancer in hospice, we tried to identify the most important prognostic factors by LASSO method. Furthermore, we constructed a user-friendly nomogram with the selected variables, which help clinicians to give rise to rapid computation and evaluate the prognosis of patient. Predicting the prognosis of patient had the following meanings. First, medical staff in the hospice made follow-up strategies according to patients' conditions. Patients with poor prognosis need more frequent visits, and the prediction model could provide a certain reference. Second, in China, many patients were very concerned about their survival time, which was related to whether they need to deal with several personal matters urgently, such as the disposal of property and the fulfillment of last wishes. Third, the prediction of survival time help patients prepare for the future challenges psychologically and practically. 16 A good prognosis not only could increase patients’ confidence in survival but reduce their mental burden. 17
Material and Method
Study Population
We restricted our study cohort to 1106 patients who were diagnosed with primary lung cancer between January 2008 and December 2018. The information of patients was obtained from the Hospice Unit of the First Affiliated Hospital of Shantou University. The study was approved by the Ethics Committee of the First Affiliated Hospital of Shantou University (approval number: B-2022-164). Requirement for informed consent was waived because the study was retrospective and the identity of all patients remained undisclosed.
Variables Extraction
The baseline demographics included age, gender, ethnicity, literacy, history of alcohol use, smoking, history and effect of analgesic treatment, awareness of the disease and past medical history (hypertension or diabetes). The cancer-related information included metastasis, previous cancer treatment, duration of pain, concomitant symptoms, previous analgesic treatment, and its effect. The Karnofsky Performance Scale (KPS) was used to assess patient’s performance status, which was translated into Chinese. 18 The lowest score of KPS is 0 and the highest score is 100. The higher the score, the better the health status of the patient. The quality-of-life (QOL) scale in the study was developed by Dr Sun Yan in the 1990s by adapting widely used international scales to a version suitable for China. 19 The QOL scale consists of 12 items (energy, sleep, appetite, activities of daily life, perception of cancer, attitude toward treatment, facial expression, fatigue, work relationships, pain, side effects of treatment and family relationships), with a total score of 60. X-tile 3.6.1 software (Yale University, New Haven, CT, USA) was employed to determine the best cutoff for KPS/QOL classified as different groups. 20 Karnofsky Performance Scale was categorized as 30 or lesser, 40, and 50 or more. Quality-of-life was divided into 3 levels: 30 or lesser, 31-35, 36 or more. The numeric rating scale (NRS) score was used to evaluate the level of pain. The score of 0-3 is mild pain, a score of 4-7 is moderate pain, and a score of 8-10 is severe pain. 21 The survival time was defined as the number of days from registration to an event (dead or service paused). All the information was collected and recorded by 2 qualified doctors during the first follow-up visit. Multiple Imputation was used to handle the missing data. 22
Statistical Analysis
The patients were split into a training set and testing set in a random manner without replacement at a ratio of 3:1. To evaluate the differences between the training and testing sets, continuous variables with normal distribution were presented as the mean (± standard deviation) using student t-tests, while continuous variables with skewed distribution were presented as the median interquartile range (IQR) using the Mann-Whitney U test. Categorical variables were presented as frequency (proportion) and chi-square tests was applied for their comparisons. Kaplan-Meier curves with risk table were utilized to display the survival of the patients from training and testing sets respectively.
Least absolute shrinkage and selection operator regression was a machine learning algorithm first proposed by Robert Tibshirani in 1996. In this study, the LASSO regression was used to estimate the coefficients of COX regression model. With LASSO method, coefficients of unimportant variables were penalized to zero and important variables were retained, which enabled to adjust for model’s over fitting and avoid extreme predictions. For our analysis, the lasso method was used to screen out the most representative variables for further multivariate COX regression analysis and construction of nomogram which enabled to predict the 15-days, 30-days and 90-days survival probability of the patients.
In evaluating the performance of the proposed nomogram, we employed both calibration, which was performed using 1000 bootstrap resamples, and the concordance index (C-index) which measured the classification accuracy. Furthermore, we also applied the decision curve analysis (DCA), a novel method to evaluate the nomogram from the perspective of clinical consequences by calculating the net benefit.
All analyses were carried out with R (Version 3.6.2, R Foundation, Vienna, Austria) and R packages (‘survminer’, ‘glmnet’, ‘rms’, ‘timeROC’, ‘mice’, ‘ggDCA’). P-value < .05 was considered statistically significant.
Result
Demographic, Clinical, and Tumor Characteristics of Patients with Lung Cancer in Hospice between the Training and Testing Set.
Abbreviations: Values are presented as no. (%) or median (Q1, Q3)

Kaplan-Meier curves with risk table for patients with lung cancer in training set and testing set.
Using the LASSO regression, 5 prognostic predictors which included sex, KPS, QOL, edema and anorexia, were selected out of 28 variables which were probably associated with OS in the training set (Figure 2). The optimal λ value for LASSO regression with 10-fold cross-validation was .1026. According to the coefficients obtained from LASSO method, we inferred that KPS was the most important factor in predicting survival probability. Furthermore, a nomogram with 5 prognostic predictors above selected by LASSO regression was constructed based on COX regression model for predicting the survival rate (Figure 3). To use the nomogram, a patient can obtain each variable score by matching its value to the top points axis. The total sum of each variable score was marked on total points axis and a line was drawn downward to determine the probability of median survival time. To examine the performance of our predictive nomogram, we employed both discrimination and calibration assessments. As shown in Table 2, C-index analysis for the nomogram showed a good discrimination at 15, 30, and 90 days in both training set (C-index = .778 (95% CI .737-.818), .776 (95% CI .743-.809), and .751 (95% CI .713-.790), respectively) as well as in testing set (C-index = .789 (95% CI .714-.864), .748 (95% CI .685-.811), and .757 (95% CI .691-.823), respectively. The nomogram-predicted survival was well calibrated at 15, 30, and 90 days by the training and testing sets, and the predicted probabilities were close to the expected probabilities (Figure 4). Moreover, the DCA curve was used to assess the clinical utility of the nomogram by calculating the net benefit. The result showed that nomogram received superior standardized net benefit at a broad threshold (Figure 5). Selection of predictors using the LASSO regression analysis in patients with lung cancer (A) Using 10-fold cross-validation, the dotted vertical lines were drawn at the optimal values by minimum criteria and 1-s.e. Criteria (B) LASSO coefficient profiles of the 28 variables. The vertical line was drawn in terms of the formula (x = log (λ1-s.e). At the optimal values λ1-s.e =.1026, 5 variables (sex, anorexia, edema, QOL and KPS) with a nonzero coefficient were finally identified. The Nomogram for predicting 15-days, 30-days and 90-days OS. The Concordance Index (C-index) with 95% CI at 15, 30, and 90 Days in Both Training Set and Testing Set. Calibration curves for predicting overall survival rate by the nomogram in the training and testing set. Calibration curves of the prognostic nomogram for 15-days overall survival (A), 30-days overall survival (C) and 90-days overall survival (E) in the training set; calibration curves for 15-days overall survival (B), 30-days overall survival (D), and 90-days overall survival (F) in the testing set. The decision curves analysis curve of the prognostic nomogram in the training and testing set.



Discussion
The incidence and mortality of lung cancer were both currently ranking first among all cancer reported worldwide. In China, the lung cancer mortality was relatively high compared to most countries. 4 However, China’s hospice system was established late, that results in limited research on lung cancer patients. Most of the study on hospice patients in China were the traditional survival analysis without offering a practical tool to evaluate the prognosis of these patients.
The current study collected the follow-up data of 1106 patients with lung cancer from the Hospice Unit of the First Affiliated Hospital of Shantou University Medical College. Different from the traditional COX regression method for survival analysis, our study adopted an advanced algorithm of machine learning-LASSO, which can efficiently screen out key variables from many clinical indicators. Available pieces of evidence suggest that LASSO has better predictive performance than traditional models.13,14 Furthermore, based on the selected predictors, we interpreted a nomogram for clinicians to quickly assess the prognosis of individual with lung cancer. Considering that most patients in hospice could not afford their follow-up blood testing and imaging examination, we chose not to incorporate any laboratory indexes into our predictive model. Furthermore, the economical and practical model still performed well in prediction.
In this study, we identified sex as a significant prognostic predictor for patients with lung cancer in hospice. According to the statistics of World Health Organization (WHO), the cumulative mortality risk of males with lung cancer was far higher than females in 2018 (3.19% vs 1.32%), which proved that sex was a significant factor related to prognosis. 23 The KPS and QOL were widely recognized as an effective indicator for assessing the survival status of the patients, and the results of this study was consistent with previous study. 24 In the existing literature, KPS and QOL were also used to construct the prognostic models and had good predictive performance.25-27 Moreover, some evidence suggested that certain symptoms have an important impact on the prognosis of cancer patients.9,28,29 In our study, symptoms including edema and anorexia were selected by LASSO as key predictors in the model. According to the existing studies anorexia is one of the typical manifestations of cachexia and cancer and anorexia-cachexia syndrome (CACS) is present in 57-61% of patients with lung cancer, which was directly attributable for 20% of cancer deaths. 30 Therefore, it is convincing that the anorexic was selected as one of the predictors in our study. We recommended a unique and robust model which consists of prognosis scales and symptoms for survival analysis in hospice patients with lung cancer.
There were still several limitations in our study. Firstly, this was a retrospective study, which may cause recall bias and prevent our model from getting better performance. Secondly, since most of our patients survive less than half a year, hence, this may lead to inaccurate predictions of the prognosis of the patients with longer survival times. To solve this problem, we have adopted 10-fold cross-validation to reduce this error. Thirdly, since our data was only from one single research center and in the absence of further external validation, the use of this prediction model in other hospice care center should be cautious. Last but not least, our nomogram didn’t require any laboratory indicators, which probably prevented the nomogram from reaching excellent performance. However, the majority of home hospice care patients in China was low-income and it was unrealistic for them to afford the lab examination. Combined with the situation of Chinese hospice, our non-lab nomogram could be a compromise and economical tool to predict patients’ prognosis. In future, studies involving large-sample size and multi-center still need to be carried out and incorporated to improve our nomogram.
Conclusion
This study identified the most important and non-lab based prognostic factors by LASSO method and built a nomogram for clinical use. Our finding might be an important contribution to the prediction of patients with lung cancer in hospice, allowing clinicians to easily estimate the status of their patients and to help adjust their follow-up management.
Footnotes
Abbreviations
LASSO: least absolute shrinkage and selection operator
KPS: Karnofsky Performance Scale
QOL: quality-of-life
NRS: numeric rating scale
IQR: interquartile ranges
OS: overall survival
DCA: decision curve analysis
CI: confidence interval
Authors' Contributions
Yicheng Zeng, Xianbin Cai, Weihua Cao, and Xubin Jing contributed to the idea and design. Yicheng Zeng, Chaofen Wu, Muqing Wang, Yanchun Xie, and Wenxia Chen contributed to the data collection. Yicheng Zeng, Xi Hu, Yanna Zhou contributed to the data analysis.Yicheng Zeng, Xubin Jing and Xianbin Cai contributed to the manuscript writing and revision. All authors approve the final version of the manuscript.
Availability of Data and Material
The data used and analysed in the study are available from the corresponding author on reasonable request.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the grants from the Medical Scientific Research Foundation of Guangdong Province, China (grant no.201811981727429), the Guangdong Science and Technology Department, China (grant no. 20200304–65), the Li Ka Shing Foundation “Heart of Gold” National Hospice Services Program.
Ethical Approval and Ethical Standards
This study was approved by the ethical review board of the First Hospital Affiliated of Shantou University Medical College (approval number:B-2022-164) and was conducted in accordance with the standards of the Declaration of Helsinki. Informed consent was waived because of the retrospective nature of the study.
