Abstract
Elucidating the functional mechanisms underlying most associations between phenomes and genomes uncovered by genome-wide association studies remains a challenging problem. Deep neural networks that excel in feature learning from sequential data have recently emerged as promising approaches to addressing this challenge by mapping sequence patterns in DNA to functional genomic events. Despite the impressive progress made in this regard, the existing studies are largely limited to examining a type of network architecture that primarily consists of simple stacked convolutional layers of filters of a uniform size. These networks lack the consideration of specifics in the mapping of DNA sequences to functional genomic events, thereby impairing the learning efficiency of these networks. To address this problem, in this article, we propose an efficient DNA sequence learner (EDSL), a novel biologically informed architecture that (1) introduces filters of varying sizes in the first convolutional layer to enhance the learning of sequence patterns of diverse sizes and (2) utilizes dense connections to facilitate the participation of sequence patterns at varying levels in prediction. Our results regarding both synthetic data and a dataset consisting of 367 experimentally derived functional genomic profiles demonstrate the effectiveness of the proposed design choices and the superiority of the EDSL over existing networks in terms of both prediction performance and sequence pattern learning. Moreover, our ablation study indicates that both the proposed design choices enhance learning—importantly, in a differential and complementary manner.
Get full access to this article
View all access options for this article.
