Abstract
Gene family evolution is determined by microevolutionary processes (e.g., point mutations)
and macroevolutionary processes (e.g., gene duplication and loss), yet macroevolutionary
considerations are rarely incorporated into gene phylogeny reconstruction methods. We
present a dynamic program to find the most parsimonious gene family tree with respect
to a macroevolutionary optimization criterion, the weighted sum of the number of gene
duplications and losses. The existence of a polynomial delay algorithm for duplication/loss
phylogeny reconstruction stands in contrast to most formulations of phylogeny reconstruction,
which are NP-complete. We next extend this result to obtain a two-phase method for
gene tree reconstruction that takes both micro- and macroevolution into account. In the first
phase, a gene tree is constructed from sequence data, using any of the previously known
algorithms for gene phylogeny construction. In the second phase, the tree is refined by
rearranging regions of the tree that do not have strong support in the sequence data to
minimize the duplication/lost cost. Components of the tree with strong support are left intact.
This hybrid approach incorporates both micro- and macroevolutionary considerations,
yet its computational requirements are modest in practice because the two-phase approach
constrains the search space. Our hybrid algorithm can also be used to resolve nonbinary
nodes in a multifurcating gene tree. We have implemented these algorithms in a software
tool, NOTUNG 2.0, that can be used as a unified framework for gene tree reconstruction or
as an exploratory analysis tool that can be applied
Get full access to this article
View all access options for this article.
