Abstract
High-grade serous ovarian cancer (HGSOC) is a complex disease in which initiation and progression have been associated with copy number alterations, epigenetic processes, and, to a lesser extent, germline variation. We hypothesized that, when summarized at the gene level, tumor methylation and germline genetic variation, alone or in combination, influence tumor gene expression in HGSOC. We used Elastic Net (ENET) penalized regression method to evaluate these associations and adjust for somatic copy number in 3 independent data sets comprising tumors from more than 470 patients. Penalized regression models of germline variation, with or without methylation, did not reveal a role in HGSOC gene expression. However, we observed significant association between regional methylation and expression of 5 genes (WDPCP, KRT6C, BRCA2, EFCAB13, and ZNF283). CpGs retained in ENET model for BRCA2 and ZNF283 appeared enriched in several regulatory elements, suggesting that regularized regression may provide a novel utility for integrative genomic analysis.
Introduction
Epithelial ovarian cancer remains a disease with high mortality 1 due in part to late stage at diagnosis and a high frequency of resistance to chemotherapeutic agents.2–4 In high-grade serous ovarian cancer (HGSOC), the most common histotype (~70% of cases), genetic variation, aberrant gene expression, and changes in methylation in certain genes have been implicated in etiology.5–13 Integrating multiple layers of genomic information offers a potential means by which to clarify the genomic architecture of HGSOC. Agnostic, genome-wide, gene-based omics integration methods foster hypothesis generation unachievable with single data type candidate gene approaches. In 2015, Pineda et al14,15 described an innovative agnostic approach to combine genetic variation, gene expression, copy number variation, and methylation using a flexible penalized regression method which allows for simultaneous agnostic dimension reduction and effect estimation. This methodology can help elucidate the genomic complexities characteristic of HGSOC by selecting only genomic features predictive of gene expression. To uncover genetic and epigenomic regulation of expression in HGSOC with the hope of improving HGSOC risk prediction models and identifying potential therapeutic targets, we applied this form of integrated analysis to 3 independent data sets totaling more than 470 patients.
Materials and Methods
Eligible patients consisted of women with a primary diagnosis of HGSOC in 3 previously described studies: The Cancer Genome Atlas (TCGA) project (N = 339), 16 the Australian Ovarian Cancer Study (AOCS, N = 78), 17 and the Mayo Clinic Ovarian Cancer Case-Control Study (N = 54). 13 The Cancer Genome Atlas cases from the Mayo Clinic were analyzed only as part of the Mayo Clinic set. Fresh frozen primary tumors were used to derive gene expression, DNA methylation, and copy number variation data, and blood was used as a source of DNA for germline genotype. Data for each type, as described in Supplemental Table 1, were processed separately for each study.16–20
From among 57 773 genes (Ensembl GTF 75, human genome build 19 [hg19], GRCh37), we restricted analysis to 22 275 protein-coding genes. Quality control steps excluded genes with low gene expression, low gene expression variance, and extremely high expression values, resulting in 9727 genes available in all 3 data sets (Supplemental Figure 2). For each gene, we defined regional gene CpG and single-nucleotide polymorphism (SNP) sites as those residing within 500 kb upstream and downstream of the annotated start and stop positions, respectively. We analyzed data sets sequentially by sample size starting with TCGA, followed by AOCS and then Mayo Clinic, only evaluating significant genes/models in subsequent data sets. In TCGA, we analyzed 3 models using ENET penalized regression methods implemented in the R package “glmnet” (v2.0-4) 21 : methylation-only, germline genotype only, and methylation combined with germline genotype (Supplemental Figure 1). Gene expression was the outcome variable in all 3 models; all analyses adjusted for copy number, which was estimated at an Ensembl coordinate midpoint of each gene. Detailed description of ENET parameters, crossvalidation, and derivation of unadjusted and adjusted P values is shown in Supplemental Materials.
Results
In TCGA analyses, results from the methylation-only model were significant after multiple test correction, and germline variation (alone or in combination with DNA methylation) was not associated with tumor gene expression. In particular, methylation at 11 genes in TCGA data was associated with gene expression at P value of <.05 after multiple testing correction (Table 1). In the AOCS data set, methylation at 8 of these 11 genes was associated with expression (uncorrected P value of <.1; Table 1), and, in the Mayo Clinic set, methylation at a subset of 5 genes was associated with expression (WDPCP, KRT6C, BRCA2, EFCAB13, ZNF283; uncorrected P value of <.1; Table 1; Supplemental Figure). The specific CpGs retained in the methylation model differed between AOCS and Mayo Clinic data sets, likely due to robust correlation of CpGs within each gene.
Penalized regression (ENET) model a of gene-based DNA methylation association with gene expression in TCGA, AOCS, and Mayo Clinic data sets of high-grade serous ovarian cancer.
Abbreviations: AOCS, Australian Ovarian Cancer Study; ENET, Elastic Net; TCGA, The Cancer Genome Atlas.
Adjusted for gene copy number.
P values adjusted for multiple testing. Only results with <.05 P value tested in AOCS data set.
P values not adjusted for multiple testing. Only results with <.10 P value tested in Mayo Clinic data set.
P values not adjusted for multiple testing.
— indicates not tested.
For the 5 genes showing association between regional methylation and gene expression in all 3 data sets, we examined regional regulatory features of retained CpGs in the AOCS and Mayo Clinic data sets, both of which used the Illumina 450k methylation array. Methylation enrichment analysis comparing distribution of CpGs retained to CpGs not retained in each model was assessed using a Fisher exact test for 4 genomic regulatory features: predicted enhancer elements and experimentally determined DNase I hypersensitivity sites both determined by the ENCODE project, UCSC-defined CpG island features, and UCSC-defined gene regions features. We found no striking patterns observed regarding the 4 genomic regulatory features. However, CpGs retained in the ZNF283 methylation model were more likely to be located in the north shelf of CpG islands than unretained CpGs (11% vs 4%) in the Mayo Clinic data set (uncorrected P = .01, Supplemental Table 2). Also of note, CpGs retained in the BRCA2 methylation model were more likely to be located in the body of the gene than unretained CpGs (49% vs 30%, uncorrected P = .04) in the AOCS data set (Figure 1, Supplemental Table 2); this was supported in the Mayo Clinic data set (40% gene body vs 35% other gene locations), although not statistically significant.

Distribution of BRCA2 CpG locations by retention status in ENET methylation models.
Discussion
As variation in gene expression is affected by a complex network of correlated genetic and methylation variants, the ENET methodological approach to high-dimensional data expands our understanding of gene expression regulation in HGSOC. Concurrently with a gene-centric, genome-wide approach summarizing the effect of multiple CpGs on individual gene expression to reduce multiple testing burden, ENET modeling agnostically selects most predictive CpGs and SNPs while accounting for their correlation structure. Although variation in genetics and in gene expression at several genes detected in our study (BRCA2, KRT6C, ZNF283) has been associated with risk of onset, recurrence, and chemoresistance in ovarian and breast cancers,17,22–24 agnostically evaluated gene-level methylation in these genes has not been previously reported to affect expression in HGSOC. We did not reveal a role for germline genetic variation alone or jointly with DNA methylation in altering gene expression in HGSOC, contrasting the application of ENET in other cancers 25 and the results of other single-variant expression quantitative trait loci methods in HGSOC.20,26 We cannot rule out potential small associations that could not be detected with our modest-sized data sets. To our knowledge, this is the first study to interrogate all genome-wide protein-coding genes for the impact of methylation on gene expression in HGSOC using Elastic Net regularized regression method. We showed that DNA methylation in 5 genes was associated with gene expression in HGSOC. This method of genome-wide data integration has the potential to improve clinical risk prediction models and reveal novel therapeutic targets in HGSOC.
Footnotes
Acknowledgements
The authors thank Drs Jonathon Tyrer and Paul PD Pharoah for data sharing.
Author Contributions
YN, ME, and CW conceived and designed the experiments. YN, ME, CW, SMA, and MCL analyzed the data. YN, ME, and JMC wrote the first draft of the manuscript. YN, ME, JMC, SMA, and ELG contributed to the writing of the manuscript. YN, JMC, KRK, CW, SMA, MCL, SJW, and ELG agree with manuscript results and conclusions. YN, ME, JMC, SMA, SJW, and ELG jointly developed the structure and arguments for the paper. YN, ME, JMC, SMA, DWG, BLF, SJW, and ELG made critical revisions and approved final version. All authors reviewed and approved of the final manuscript.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by the National Institutes of Health grant, R25 CA92049 (Mayo Cancer Genetic Epidemiology Training Program).
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
