Abstract
Gene duplication is a fundamental driver of species adaptation and the evolution of new functions, making the reconstruction of historical duplication events crucial for understanding evolutionary processes. Whole-genome duplications (WGDs), which duplicate all gene families simultaneously, have profoundly influenced the evolution of plants, yeast, and vertebrates. Genome-scale data, such as syntenic blocks and gene family counts, are commonly employed to infer WGDs. However, detecting ancient WGDs remains challenging, as their genomic signatures are often overshadowed by extensive rearrangements and gene losses. Phylogenetic reconciliation methods between species and gene trees offer a potential means of identifying such ancient events, but frequently assume independence among gene families. This can lead to missed detections of WGDs, where gene duplications are inherently interdependent. Phylogenomics reconciliation addresses this challenge by reconciling multiple gene families at once. Unfortunately, existing models often constrain the space of possible reconciliations, overlook gene losses resulting from fractionation, or depend on conserved synteny across multiple species. This limits the number of genes that can be analyzed concurrently.
In this work, we explore a phylogenomics reconciliation model that avoids synteny reliance, explicitly incorporates gene losses, and permits flexible remapping of duplications. Reconciliation under this model is NP-hard, and existing algorithms lack the scalability for large-scale datasets. To address this need, we present novel algorithmic strategies that efficiently handle tens of thousands of gene trees—a level of scalability previously unattained. We also evaluate our approach against existing methods. Experiments on both simulations and real data show that traditional LCA-mapping can yield incorrect WGD predictions after fractionation, whereas our approach is more robust. By comparing predictions using true and reconstructed gene trees, we further show that reconstruction errors greatly affect method performance and that gene tree correction is necessary for reliable results. Real data tests also reveal that our approach can recover WGDs missed by other reconciliation methods.
Get full access to this article
View all access options for this article.
