Abstract
Lung cancer is among the major causes of cancer deaths, and the survival rate of lung cancer patients is extremely low. Recent studies have demonstrated that the gene CDKN3 is related to neoplasia, but in the literature severe controversy exists over whether it is involved in cancer progression or, conversely, tumor inhibition. In this study, we investigated the expression of CDKN3 and its association with prognosis in lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC) using datasets in Lung Cancer Explorer (LCE; http://qbrc.swmed.edu/lce/). We found that CDKN3 was up-regulated in ADC and SCC compared to normal tissues. We also found that CDKN3 was expressed at a higher level in SCC than in ADC, which was further validated through meta-analysis (coefficient = 2.09, 95% CI = 1.50–2.67,
Introduction
Lung cancer is one of the most prominent types of cancer, both in the US and around the world. For the past two decades, it has been the second-most diagnosed type of cancer and the leading cause of cancer death in the US for both males and females. 1 On a global scale, the GLOBOCAN project by the International Agency for Research on Cancer (IARC) reported in 2012 that lung cancer had the highest incidence and mortality among all cancer types. 2 Despite massive efforts in research, the 5-year survival rate of only 16.8% is still significantly lower than most other cancers. 3 Therefore identifying effective prognostic factors is critical, so that patients can be properly categorized by potential outcomes and risks, and more appropriate treatments can be planned to improve survival. As diverse subtypes of lung cancers tend to differ in their biology,4–7 one plausible route to better treatments may lie in investigating prognostic markers in a subtype-specific manner.
Deregulated proliferative signaling is a keystone of cancer development. 8 Under normal conditions, a cell cycle is elaborately controlled by three major checkpoints: the G1/S checkpoint, G2/M checkpoint and M checkpoint. The G1/S checkpoint, also called the restriction point, determines whether or not the cell enters a new round of DNA replication and cell division. Once past this point, the cell is committed to completing the entire cell cycle. 9 Thus the restriction point is the most crucial for the proliferation rate. G1/S transition signaling is frequently mutated in cancers, 10 which makes it a hotspot for the discovery of drug targets and prognostic markers for various cancers, including lung, bladder, liver cancer and others.11–15
The CDKN3 gene, mapped to chromosome 14q22, 16 encodes a dual specificity phosphatase at G1/S transition, which interacts with Cdk2, dephosphorylates threonine 160 when Cyclin is dissociated or degraded, and prevents its Cyclin-dependent kinase activity.17–19 When overexpressed in yeast or HeLa cells, cell cycle progression is retarded, 17 suggesting CDKN3 is a negative regulator of cell proliferation. It has also been shown that CDKN3 is down-regulated in human glioblastoma 20 and that CDKN3 knock-down facilitates leukemia xenograft growth in mice. 21 Paradoxically, however, even more evidence implies the opposite scenario: CDKN3 is in fact overexpressed in hepatocellular carcinoma, cervical cancer and epithelial ovarian cancer.22–24 Overexpression of CDKN3 promotes cell proliferation in renal cancer cells and hepatocellular carcinoma cells,22,25 and CDKN3 depletion inhibits the growth of ovarian cancer cell lines, 24 which makes it a potential therapeutic target. In addition to the increase in cell proliferation, CDKN3 overexpression also renders renal cancer cells more tumorigenic, invasive and resistant to apoptosis. 25 Moreover, CDKN3 expression has been associated with poor survival in cervical cancer and may be a potential prognostic marker. 23 Therefore it is probable that whether CDKN3 is tumor-suppressive or oncogenic depends on different molecular contexts in diverse cancer types, which is still far from being fully understood.
Non-small-cell lung cancer (NSCLC) is the most common cause of lung cancer death, accounting for up to 85% of deaths from lung cancer. Two subtypes account for 70% of NSCLC, lung adenocarcinoma (ADC) and squamous cell carcinoma (SCC). In order to investigate whether CDKN3 is a potential prognostic marker in ADC and/or SCC, we examine CDKN3 differential expression and its association with patient outcomes in the clinical studies collected in our lung cancer database. We find that CDKN3 is differentially expressed between normal, ADC and SCC samples. Subsequent meta-analyses substantiate its differential expression between ADC and SCC, and indicate that CDKN3 expression is a robust biomarker for poor prognosis in ADC patients but not in SCC patients.
Methods
Data Source and Preprocessing
The datasets included in the analysis were selected from our group's Lung Cancer Database (LCDB) supporting Lung Cancer Explorer (LCE; http://qbrc.swmed.edu/lce/). LCDB stores datasets selected from over sixty lung cancer gene expression microarray datasets in the public domain (GEO, ArayExpress etc.) as well as private data from University of Texas Southwestern Medical Center. Thirty-seven datasets satisfying the following criteria were collected into LCDB: (1) clinical outcomes were available along with gene expression profiles; and (2) genome-wide platforms either had clear probe annotations or could provide probe sequences to be processed by Blast alignment probe mapping in Probemapper. 26 In our analysis, when raw data were not available, data processed by the authors were used; otherwise, data processed by our group from raw data were used. The gene expression profiles of each sample were log2-transformed, and standardized to have zero median and unit variance.
Probe-level data were aggregated to the gene level by the following two steps: (1) Probes were mapped to genes based on one of three probe mapping information sources (prioritized from high to low): Blast alignment in Probemapper (weight = 1), platform vendor, and Bioconductor. The highest-priority source among those available was used. (2) When multiple probes were mapped to a gene, the arithmetic mean of the probe-level expression was computed as the expression level for the gene.
Occasionally a patient may have had multiple tumor or normal samples. In such cases, the average expression level from the samples was used for that patient under the tumor and normal conditions, respectively.
Selection of Datasets in Statistical Analyses
Datasets with patients annotated as ADC or SCC were selected from LCDB in our study, and datasets without CDKN3 expression information were excluded. The included datasets were further filtered according to the criteria for specific analysis (Table 1). Patient characteristics of each dataset are summarized in Table S1. For the meta-analysis of CDKN3 differential expression, datasets with both ADC and SCC patients were included. For the meta-analysis of CDKN3 hazard ratios (HR) in ADC or SCC, datasets containing ADC or SCC patients with information about overall survival time and vital status were included. When comparing CDKN3 expression levels in a particular dataset, we required that the sample size of each group (ADC, SCC, normal) be larger than 20 to assure robust analysis results.
List of candidate datasets for the analyses.
Survival data for the corresponding cancer subtype exist, but for either high CDKN3 or low CDKN3 group there are no deceased patients and thus the Cox proportional hazard model is not applicable.
More than 20 patients have normal samples.
Only one patient has survival time information.
Statistical Methods
All the statistical analyses were performed using R (http://cran.us.r-project.org/). Cox proportional hazard models were fitted and meta-analyses conducted using the R packages “survival” and “meta”, respectively. For the meta-analysis on CDKN3 differential expression, a logistic regression of cancer subtype (SCC vs ADC) on CDKN3 expression was fitted for each eligible dataset. The regression coefficients were then pooled by meta-analysis. Similarly, the associations between CDKN3 expression and age, cancer stage, and gender were investigated by combining linear regression coefficients or mean differences via meta-analysis (Figs. S8–S10).
The prognostic value of CDKN3 was studied using the Cox proportional hazard model for ADC and SCC patients separately. The median (or mean) of CDKN3 expressions in ADC/SCC patients of a dataset was used as a cutoff to categorize the patients into high CDKN3 and low CDKN3 groups. Then a Cox proportional hazard model was fitted to estimate the HR of the high CDKN3 group vs. low CDKN3 group and the corresponding 95% confidence interval (CI). Finally meta-analysis was used to combine estimates of log(HR) from different datasets.
Pooled values of logistic/linear regression coefficients, log(HR) or mean differences were calculated using the inverse-variance fixed effect model
27
or DerSimonian-Laird random effects model.28,29 The fixed effect model assumes a common true effect for all studies in the meta-analysis, while the random effects model assumes that the true effect for a study is a normally distributed random variable. When inter-study heterogeneity is negligible, the fixed effect model is preferred for its higher statistical power. When there is non-negligible variation between the effects of different studies, the type I error rate is elevated if using the fixed effect model, so the random effects model should be used instead.
30
The choice between the fixed effect and random effects models was made based on testing the heterogeneity between studies via the commonly-used statistics I
2
and Cochran's Q
31
. If I
2
was higher than 50% and the
Results
CDKN3 is Differentially Expressed among Normal, ADC and SCC Tissues
Given the contradictory observations on whether CDKN3 tends to be up-regulated or down-regulated in tumors and the possibility of its behavior being context-dependent, we compared the expression levels of CDKN3 among three tissue types: normal, ADC and SCC. We searched the preliminarily-filtered datasets for those with sample size >20 for each of the three types, and Hou et al's study 34 was selected (Table 1, Y† in column “ADC >20 & SCC >20”). In this dataset, by comparing CDKN3 expression across normal, ADC and SCC lung tissues, we found that both ADC and SCC had significantly higher CDKN3 expression levels (Fig. 1), suggesting that CDKN3 is up-regulated in ADC and SCC compared to normal samples. Intriguingly, there was also a drastic difference in CDKN3 expression between ADC and SCC, with SCC significantly higher than ADC.

Box plots of CDKN3 expression levels in normal tissues, ADC samples and SCC samples. Comparison among the three groups was made by the commonly used parametric method, one-way ANOVA with Tukey test, as well as a nonparametric method, Kruskal-Wallis test (with pairwise Wilcoxon-Mann-Whitney test).
Differential Expression of CDKN3 between ADC and SCC is Corroborated by Meta-Analysis
To confirm that higher expression of CDKN3 in SCC compared to ADC was a generalizable observation, we performed a meta-analysis for CDKN3 differential expression. Datasets with both SCC and ADC samples were used in this analysis (Table 1, column “ADC & SCC”). Here, I
2
= 61.9% > 50% and Cochran's Q-test

Forest plot displaying the results of the meta-analysis on CDKN3 differential expression, SCC vs ADC. Coef >0 indicates that CDKN3 is expressed at higher level in SCC than in ADC.

Funnel plot for assessing publication bias. Begg's funnel plot and Egger's test show no significant publication bias regarding the meta-analysis for CDKN3 differential expression.

Box plots of CDKN3 expression levels in ADC samples and SCC samples. Comparison between the two groups was made by two-sample
CDKN3 Expression is Prognostic in ADC but not in SCC
Given the evidently and consistently different expression levels of CDKN3 between ADC and SCC in multiple studies, it is possible, though not necessary, that the pathways interconnected with CDKN3 are in different statuses between ADC and SCC, which may in turn make CDKN3 exhibit entirely distinct behaviors in two such contexts. Under this general hypothesis, we performed two separate survival meta-analyses summarizing the HR of high vs. low CDKN3 expression (median as cutoff), one for ADC and the other for SCC. Datasets including ADC or SCC patients with available survival information are indicated by Y in the ADC+survival column or SCC+survival column of Table 1. In the meta-analysis for ADC, I
2
was 0% and Cochran's test

Forest plot displaying the results of the meta-analysis on association between CDKN3 and ADC patient survival. HR >1 means high CDKN3 expression is associated with poor survival outcomes.

Funnel plot for assessing publication bias. Begg's funnel plot and Egger's test show no significant publication bias regarding the meta-analysis for the association between CDKN3 expression and ADC patient survival.
In SCC the heterogeneity is also negligible (I
2
= 0% and Cochran's test

Forest plot displaying the results of the meta-analysis on association between CDKN3 and SCC patient survival. No significant association between CDKN3 expression and overall survival in SCC patients is observed.

Funnel plot for assessing publication bias. Begg's funnel plot and Egger's test show no significant publication bias regarding the meta-analysis for the association between CDKN3 expression and SCC patient survival.
The prognostic value of CDKN3 in ADC was also demonstrated by the Kaplan-Meier plots within individual datasets (Fig. S2 using median and S5 using mean as the cutoff). Seven and eight out of 12 datasets, respectively, did not show significant survival difference between the two groups. Such inconsistency with the conclusion from the meta-analysis is legitimate, as a meta-analysis has much higher statistical power than any individual study
35
Significant correlation between sample sizes and –log(
Discussion
In the present study, we compared CDKN3 expression in ADC and SCC to that of normal lung tissues and found that CDKN3 is up-regulated in those two subtypes of NSCLC, as already reported in studies of several other cancers,22–24 but the mechanism of CDKN3 up-regulation is unclear. It was postulated in prostate cancer that such up-regulation may be explained by chromosomal ablation, because 14q22 is amplified in the PC3 prostate cancer cell line.36,37 However, this may not be the case in ADC or SCC, since no significant chromosomal amplification was found at 14q22 by arrayCGH in SCC or by SNP array in ADC.38,39 In breast cancer cells treated with glutaminase inhibitor, H3K4me3, the histone modification associated with active gene transcription was decreased at several cancer-related genes, including CDKN3. 40 If high H3K4me3 can be shown to contribute to up-regulated CDKN3 expression and could also be attenuated by glutaminase inhibition in lung cancer cells, it would add to the validity of glutamine metabolism as a therapeutic target, given that several glutamine metabolism inhibitors are already in clinical trials for treating lung cancer. 41 Other potential upstream regulators of CDKN3 expression include nAChRs, 42 P21, 43 PPARγ, 44 DNMTs45,46 and YB-1, 47 but more dissection is needed to validate whether they regulate CDKN3 expression in NSCLC and whether such regulation is direct or indirect.
In the comparison of CDKN3 expression, we also observed its differential expression between ADC and SCC. Our meta-analysis further showed that SCC consistently expressed a higher level of CDKN3 than ADC. Motivated by the idea that ADC and SCC are different at the molecular level and thus each subtype may have distinct CDKN3-related behavior, we have investigated the association between CDKN3 and overall survival in ADC and SCC patients separately, and the results suggest that CDKN3 is a marker for poor survival outcomes in ADC, but not in SCC.
To our knowledge, this is the first meta-analysis of the prognostic value of CDKN3. A common practice in meta-analyses is searching the literature for publications regarding a particular topic and summarizing the published effects. One potential pitfall of such an approach is that the results of the meta-analysis maybe subject to publication bias. 48 For example, since significantly positive results are more likely to be published, if the published studies regarding the prognostic effect of a specific gene are collected for meta-analysis, the effects summarized tend to be biased towards larger effects. On the contrary, the studies included in our lung cancer database have no such bias, and they were included in our meta-analysis as long as genome-wide microarray data of lung cancer was available, regardless of the specific topics the authors investigated and the conclusions they drew. Such meta-analysis based on genome-wide data increases the number of eligible studies and at the same time is less prone to publication bias, as can be seen in Figure 2B, 3B and 3D, thus giving more reliable results than traditional meta-analysis. Nevertheless, our study has limitations. We did not investigate by multivariate Cox models whether CDKN3 is a factor independent of other prognostic indicators such as TNM stages and p53 mutations, 12 and further study is needed to answer whether the prognostic value of CDKN3 differs between patient subgroups categorized by age, gender, ethnicity, smoking status, etc., but a potential difficulty of taking more factors into consideration in survival analyses is that fewer studies will be eligible and the number of patients in each group will be smaller, which makes results less stable. Instead, using the demographic and clinical variables available in LCDB, we performed meta-analyses on the association between CDKN3 expression and age, gender and cancer stage within ADC and SCC patients, and none of the analyses showed significant associations (Figs. S8–S10). Smoking status information is only available for three ADC datasets, so only comparisons within individual datasets were conducted (Fig. S11). Two out of three datasets showed positive associations between smoking and CDKN3 expression.
As CDKN3 expression has a higher level in SCC than in ADC, a simple justification for the difference in CDKN3's prognostic value between the two subtypes is that its expression in SCC patients is so high that its neoplastic effect is almost saturated, while in ADC patients, lower overall expression allows for a larger effect on survival by inter-patient variations in expression. If this were true, we might deduce that SCC patients generally perform worse than ADC patients in terms of overall survival. However, no significant difference in survival outcomes was observed between ADC and SCC (Fig. S12). For the three datasets (Matsuyama, Bild and Lee) with visible separation of survival curves between ADC patients with low CDKN3 expression and ADC patients with high CDKN3 expression, we compared the SCC patient group to ADC patient groups with high and low CDKN3 expression (Fig. S13). In two (Matsuyama and Lee) out of these three datasets, the survival curves of the SCC group tend to coincide with that of high-CDKN3 ADC group and diverge, though not statistically significantly, from that of low-CDKN3 ADC group, while this pattern is not observed in the Bild dataset and the three curves seem to separate from each other visibly. Still, we cannot exclude the possibility that the effect of CDKN3 is different from that in ADC due to some certain unique molecular context of SCC. By showing the cytoplasmic expression of CDKN3 in renal cell carcinoma, Lai et al. 25 proposed that proliferation enhancement by CDKN3 could be due to its interaction with and dephosphorylation of unknown cytoplasmic substrates. If this assumption holds, different substrates and interaction partners in SCC and ADC may account for the different prognostic effects of CDKN3. Further study of the biology of CDKN3 under different pathological conditions is needed to identify the dynamics of its interactome. On the other hand, some authors claimed that CDKN3 promotes oncogenesis by its aberrant transcripts49–52; thus it is also possible that SCC and ADC express different transcript variants or mutants of CDKN3.
Cyclin-dependent kinase inhibitor (CDKI) is a family of proteins with high importance in cell cycle control. In addition to CDKN3, other members of CDKI include CDKN1A, CDKN1B, CDKN1C, CDKN2A, CDKN2B, CDKN2C and CDKN2D. Although the vast majority of publications support their function as tumor suppressors, rare adverse observations also exist. For example, in thyroid cancer CDKN2A is upregulated in follicular adenomas, follicular carcinomas and papillary carcinomas compared to normal tissues. 53 Additionally, CDKN2D expression was found to be a negative prognostic factor in ovarian cancer patients. 54 Further investigations should be conducted to understand the complex functions of CDKI in cancer.
In summary, differential expression of CDKN3 was found across normal tissues, ADC and SCC. Through meta-analysis, we identified CDKN3 to be a prognostic marker predicting poor overall survival outcomes in ADC, which implies that CDKN3 may be valuable as a potential biomarker for prognosis. Given that the effect of CDKN3 in SCC is different from that in ADC, further research into its molecular mechanisms in various lung cancer subtypes is needed.
Author Contributions
Conceived and designed the experiments: XW, YX, GX, MC. Analyzed the data: XZ, YZ. Wrote the first draft of the manuscript: XZ, XW, YX, GX, MC. Contributed to the writing of the manuscript: YZ. Agree with manuscript results and conclusions: XZ, XW, YX, GX, YZ, MC. Jointly developed the structure and arguments for the paper: XZ, XW, YX. Made critical revisions and approved final version: GX, MC. All authors reviewed and approved of the final manuscript.
Footnotes
Acknowledgement
We would like to thank Jessie Norris for helping us to proofread the manuscript.
