Abstract
The reconstruction of ancestral non-coding RNA (ncRNA) sequences is particularly challenging due to the main conservation forces being applied to the structure, rather than the sequence. Naively trying to preserve the structure during the reconstruction tends to produce ancestors that are more energetically fit to the structure than their descendants, a clear contradiction. While most sequences are associated to only one functional structure, RNA families have an old and complex history. It has been hypothesized that some ancestral RNAs were combining multiple functions, with multistable structures. At some point, a duplication event happened, and each copy subspecialized into a specific structure. To circumvent the bias introduced by reconstructing sequences when only one structure is conserved, we recently proposed an approach using substitution and base pair costs that focuses on simultaneously reconstructing the ancestor of two related ncRNA families, assuming that they were created by this process of duplication followed by subspecialization. In this work, we improve the previous approach by leveraging advances in tree decomposition algorithms to (1) incorporate simultaneously more constraints and positions in the reconstruction which (2) allows to use a more realistic energetic model. Results on simulated datasets demonstrate significant improvements in ancestral sequence inference accuracy while reducing the number of optimal sequences inferred by several orders of magnitude. On real datasets of RFam clans (Glm and FinP-traJ), we show that the new approach is able to infer fewer optimal ancestral sequences that are more fit to both structures compared with previous methods.
Get full access to this article
View all access options for this article.
