Abstract
Abstract
We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Although the selected regions mark only ∼0.67% of the genome, 70.5% of our predicted binding regions fall within independently identified, strongly expression-correlated and histone-marked enhancer regions, which cover ∼8% of the genome (Ernst et al., Nature 2011, 473, 43–49). Even more remarkably, 85.6% of our selected regions overlap transcription factor (TF) binding regions identified in evolutionarily conserved DNase1 hypersensitivity cluster regions, which cover 0.75% of the genome (Boyle et al., Genome Research 2011, 21, 456–464). P-values for these overlaps are effectively zero (Z-scores of 328 and 715 respectively). Furthermore, 62% of our selected regions overlap the intersection of the evolutionarily conserved DNase1 hypersensitivity-identified TF-binding regions of Boyle et al. (2011) with the histone-marked enhancers found to be strongly associated with transcriptional activity by Ernst et al. (2011). Two hundred thirty of our candidate cis-regulatory regions overlap cancer-associated variants reported in the Catalogue of Somatic Mutations in Cancer (http://www.sanger.ac.uk/genetics/CGP/cosmic/). We also identify 1,252 potential proximal promoters for the 7,561 disjoint lincRNA regions currently in the Human lincRNA Catalog (www.broadinstitute.org/genome_bio/human_lincrnas/). Our investigation used approximately half of all currently available ENCODE ChIP-seq datasets, suggesting further gains are likely from analysis of all datasets currently available.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
