Abstract
OBJECTIVE:
Early detection and diagnosis of lung cancer remain challenging but would improve patient prognosis. The goal of this study is to develop a model to estimate the risk of lung cancer for a given individual.
METHODS:
We conducted a case-control study to develop a predictive model to identify individuals at high risk for lung cancer. Clinical data from 500 lung cancer patients and 500 population-based age- and gender-matched controls were used to develop and evaluate the model. Associations between environmental variants together with single nucleotide polymorphisms (SNPs) of beta-catenin (ctnnb1) and lung cancer risk were analyzed using a logistic regression model. The predictive accuracy of the model was determined by calculating the area under the receiver operating characteristic (ROC) curve.
RESULTS:
Prior diagnosis of chronic obstructive pulmonary disease (COPD), pulmonary tuberculosis, family history of cancer, and smoking are lung cancer risk factors. The area under the curve (AUC) was 0.740, and the sensitivity, specificity, and Youden index were 0.718, 0.660, and 0.378, respectively.
CONCLUSION:
Our risk prediction model for lung cancer is useful for distinguishing high-risk individuals.
Introduction
Lung cancer is the leading cause of cancer mortality, accounting for approximately 28% of all cancer-related deaths [11]. According to GLOBOCAN 2012, 35.78% of all newly diagnosed lung cancer cases and 37.56% lung cancer deaths occur in China. Most patients are diagnosed at an advanced stage, and their tumors cannot be surgically resected [4]. As a result, the overall 5-year survival rate is just 6–18%,16. When the disease is diagnosed at an early stage, the 5-year survival rate is 67% [17]. Therefore, early stage detection when treatment might be more effective would increase the survival rate associated with lung cancer.
Low-dose computed tomography (LDCT) was acclaimed as a major breakthrough in lung cancer screening in a high-risk population based on age and smoking status, and it has been recommended by various organizations [12, 13]. Although these results are encouraging, there are also potential negative consequences of screening, including an increase in false positives. While the need to specify a high-risk target population is well accepted, there has been increasing interest in developing methods for individual risk prediction to define a high-risk population for whom the potential benefits of LDCT screening would outweigh the negatives.
Several lung cancer risk-predicting models have been proposed [1, 2, 25], but most predictors focused on traditional risk factors such as age, gender, smoking status, family history of lung cancer, environmental exposure, pneumonia history, and a history of chronic obstructive pulmonary disease (COPD). In addition, most of the aforementioned models were based on data from developed countries, but there is limited data from China. Recent advances in genetic epidemiology have led to the identification of genetic and molecular variants affecting the risk of disease, and genetic markers such as single-nucleotide polymorphisms (SNPs) can now be added to risk models for improved prediction of disease risk [29, 5]. SNPs cause variations in the transcription and expression of certain genes and might affect individual susceptibility to carcinogenesis. SNPs of several genes such as xrcc1 have been shown to be useful predictive or prognostic markers in lung cancer [14, 30, 31]. In particularly, assessing SNPs can improve prediction capability when combined with other epidemiology factors, as shown in the Liverpool Lung Project (LLP) model [20].
Deregulation of Wnt signaling is reportedly involved in lung carcinogenesis [7, 22, 26]. Beta-catenin (ctnnb1) is critical for epithelial layer establishment and maintenance and is a key downstream component of the canonical Wnt signaling pathway. Abnormal CTNNB1 expression is reportedly associated with poor prognosis in lung cancer [6, 32]. Some studies also found a strong association between ctnnb1 SNPs and cancer prognosis. Shen Hongbing et al. [8] found that rs7629386 in ctnnb1 was associated with lung cancer prognosis in Chinese Han and Caucasian populations in a three-stage genome-wide association study. Wang et al. [27] reported that genetic variation in ctnnb1 was associated with gastric cancer susceptibility and prognosis in a Chinese population.
The present study was designed to test the hypothesis that SNPs of ctnnb1 are associated with lung cancer susceptibility and can be used in conjunction with epidemiologic factors to develop a lung cancer risk-predicting model for the Chinese population.
Materials and methods
Study population
This hospital-based, case-control study included 1000 subjects from the northeastern of China (Chang- chun city, Jilin province). All subjects are local residents of Han descent, including 500 patients clinically diagnosed with lung cancer (including 373 NSCLC patients and 127 SCLC patients) and 500 age and gender-matched cancer-free controls. Eligible patients had histologically confirmed primary lung cancer with no previous cancer history and were not receiving radiotherapy or chemotherapy for other conditions. Control participants were randomly selected from individuals who underwent routine physical examinations in our hospital. They were frequency-matched to the cases according to age, gender, and residential area. The study was approved by the Ethics Committee of the First Hospital of Jilin Medical University and conducted according to the Declaration of Helsinki. All subjects provided written informed consent.
Diagnostic criteria and data collection
Standardized interviews were conducted by trained interviewers at the hospital or at the subjects’ homes. Risk factor information and peripheral blood lymphocytes were collected for the time prior and up to the index date (i.e., the time of diagnosis for cases and the interview date for controls).
Tag SNP selection and genotyping
We selected SNPs on the basis of the following principal criteria: (1) Tag SNPs identified using genotype data from the CHB (Chinese Han Beijing) population data of HapMap (HapMap Data Rel 27 PhaseII
Distributions of study-specific characteristics in the case and control groups
Distributions of study-specific characteristics in the case and control groups
Hardy-Weinberg equilibrium (HWE) was assessed with a goodness-of-fit chi-square (
Results
Genotype distribution and characteristics of cases and controls
We recruited 500 incident cases of lung cancer and 500 population controls between 2010 and 2012 for this analysis. Table 1 shows the distribution of study-specific risk factors in the two groups. The genotype distributions of SNPs followed HWE in the control group (
Lung cancer risk model
The multivariate analysis revealed significantly increases in risk associated with prior diagnosis of COPD, pulmonary tuberculosis, family history of cancer, and smoking (Table 2).
Receiver operative characteristic (ROC) analysis
The classification ability of the model in Table 2 was evaluated by measuring the ROC area under the curve (AUC). Figure 1 shows the ROC curve derived from our model; the ROC AUC was 0.740. Furthermore, the sensitivity, specificity, and Youden index were 0.718, 0.660, and 0.378, respectively.
Multivariate risk model
Multivariate risk model
Abbreviations: aOR
The ROC AUC for lung cancer risk prediction model was 0.740, the straight line represented the ROC curve expected by chance along.
Lung cancer is the world’s leading cause of cancer-related deaths. Early detection and diagnosis will improve the prognosis of lung cancer patients. LDCT screening significantly reduces lung cancer mortality by 20% in high-risk individuals [18, 19], and restricting screening to high-risk individuals will markedly reduce the cost of screening programs. Ideally, clinical information will be used to identify individuals who may benefit from increased lung cancer screening surveillance. Several lung cancer risk prediction models have been constructed to make screening more efficient and to identify high-risk individuals [3, 10, 21, 23, 24]. These models employ patient characteristics and epidemiologic, social, and clinical risk factors. In contrast to traditional risk factors, an SNP is an inherited genetic variation that offers the advantage of stability during the lifetime of the individual. Raji et al. [20] demonstrated that SNP rs663048 in gene sez6l improved the predictive ability of a lung cancer risk model (the LLP risk model). Thus, it is likely that identifiable genetic susceptibility will constitute an important factor in the selection of a more tightly defined risk group in the future.
Our aim was to develop a model to estimate the absolute risk of lung cancer for a given individual. This could be utilized for primary and secondary prevention, possibly to help identify those most likely to benefit from CT screening or as an additional resource for medical decision-making. Our model is specific to people of Han Chinese descent in northeastern China. We adopted several traditional factors such as male sex, prior diagnosis of COPD, pulmonary tuberculosis, family history of cancer, and smoking. We also explored the effect of SNPs in the ctnnb1 gene that play potential roles in lung cancer pathogenesis. In our study, prior diagnosis of chronic obstructive pulmonary disease (COPD), pulmonary tuberculosis, family history of cancer, and smoking are lung cancer risk factors. These results are consistent with other studies [2, 25, 29].
Beta-catenin (ctnnb1) is critical for epithelial layer establishment and maintenance and is a key downstream component of the canonical Wnt signaling pathway. Abnormal CTNNB1 expression is reportedly associated with poor prognosis in lung cancer [6, 22]. Some studies also found a strong association between ctnnb1 SNPs and cancer risk or prognosis. Hu et al. [8] found that rs7629386 in ctnnb1 was associated with lung cancer prognosis in Chinese Han and Caucasian populations in a three-stage genome-wide association study. Wang et al. [27] reported that genetic variation in ctnnb1 was associated with gastric cancer susceptibility and prognosis in a Chinese population. Lee and colleagues evaluated the correlation between the rs4135385 polymorphism and breast cancer risk, and the results showed that the SNP was significantly associated with breast cancer risk [15]. Huang et al. [9] utilized tag SNPs of the CTNNB1 gene to predict the risk and recurrence of prostate cancer and found that the rs1798802, rs11564465 and rs2293303 polymorphisms had no significant association with prostate cancer risk; however, the GA/AA and CT/TT genotypes of SNP rs1798802 and rs11564465, respectively, were significantly related to Gleason grade and PSA recurrence of prostate cancer. Although, these studies found the association between ctnnb1 SNPs and cancer risk or prognosis. However, there is no report on the association of polymorphisms of CTNNB1 with risk of lung cancer. In the present study, the rs11564465, rs1798802, rs1880481, rs2293303, rs4135385, and rs7629386 were selected to be analyzed, unfortunately, no SNPs were associated with the risk of lung cancer. The explanation of this phenomenon may be associated with the different races and different types of tumors.
We constructed independent training and validation sets to avoid model overfitting. The ROC-AUC of our model was 0.740, suggesting that this model can identify high and low risks of lung cancer, although this requires rigorous external validation in future studies. The predictive ability of our model is not directly comparable with that of other models because differences in distributions of predictor factors can affect performance statistics [28]. In our model, we mainly included age, gender, benign lung disease, family history of cancer. These data can be conveniently and accurately collected. We did not include dietary factors and occupational exposures because the food frequency questionnaire was complicated, and dietary habits are likely to vary over a long period. Occupational exposures (e.g., to asbestos) are studied in special populations. The simplistic form of our model is more easily and directly applicable for use in the primary care setting.
This study has several limitations. Our prediction tool is based on relative risk estimates that were derived from a single center, limited sample size, case-control study. At the same time, there is also the potential that recall and other information biases could have influenced our results.
In conclusion, our model was constructed by traditional epidemiologic factor among Han Chinese subjects. The results of this study show that it can discriminate between individuals with high- and low-risk for lung cancer. Further studies are planned with larger cohorts of unselected cases and controls to validate the prediction model. If these findings are confirmed, our risk model could provide individuals and healthcare professionals with an easy estimate of lung cancer risk to guide discussions and decision making for prevention and surveillance.
Footnotes
Acknowledgments
We thank Li Deng for technical assistance with sample collection and preparation. This work was supported by Youth fund of the First Hospital of Jilin University (JDYY52015003), the National Natural Science Foundation of China grants (#81501962), Jilin Provincial Science and Technology Department Grant (3D512J243428), to X.W.; Jilin Provincial Science and Technology Department Grant (20111807, 20140414014GH, 20150101176, J.C.), the Key Clinical Project of the Ministry of Health of the People’s Republic of China Grant (2001133, W.L.), the Key Project of Science and Technology Research of the Ministry of Education (311015, J.C.), and the Bethune Program B (2012202, J.C.) of the Jilin University.
Conflict of interest
The authors report no conflicts of interest.
