Abstract
Controlled intra-nuclear organization of proteins is critical for sustaining correct function of the cell. Proteins and RNA are transported by passive diffusion and associate with compartments by virtue of diverse molecular interactions—presenting a challenging problem for data-driven model building. An increasing inventory of proteins with known intra-nuclear destination and proliferation of molecular interaction data motivate an integrative method, leveraging the existing evidence to build accurate models of intranuclear trafficking. Kernel canonical correlation analysis (KCCA) enables the construction of predictors based on genomic sequence data, but leverages other knowledge sources during training. The approach specifically involves the induction of protein sequence features and relations most pertinent to the recovery of nucleolar associated protein-protein interactions. With success rates of about 78%, the classification of nucleolar association from KCCA-induced features surpasses that of baseline approaches. We observe that the coalescence of protein-protein interaction data with sequence data enhances the prediction of highly interconnected, key ribosomal and RNA-related nucleolar proteins. For supplementary material, see
Get full access to this article
View all access options for this article.
