Abstract
A fundamental assumption in phylogenetics and phylogenomics is that a single, global evolutionary model can adequately characterize the substitution processes operating across all sites in a molecular sequence alignment. However, this assumption is frequently violated in practice due to heterogeneity in evolutionary processes, leading to local model mis-specification and potential bias in downstream inference. While a variety of statistical and machine learning-based approaches have been developed to address this issue, these methods often rely on restrictive model assumptions or are designed for narrowly scoped applications, limiting their generalizability across diverse datasets and evolutionary contexts. Here, we present REVEAL (“REsampling and Visual EvALuation”), a general-purpose statistical framework for detecting and localizing model mis-specification in biomolecular sequence data. REVEAL operates without introducing additional assumptions beyond those inherent to standard global model-based analyses. It employs sequence-aware statistical resampling to construct a local support matrix along the sequence alignment, facilitating the identification of site-level model violations. Through extensive simulation experiments, we demonstrate that REVEAL achieves robust control of both type I and type II errors, with precision of
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
