We introduce GenPhylo, a Python module that simulates nucleotide sequence data along a phylogeny avoiding the restriction of continuous-time Markov processes. GenPhylo uses directly a general Markov model and therefore naturally incorporates heterogeneity across lineages. We solve the challenge of generating transition matrices with a pre-given expected number of substitutions (the branch length information) by providing an algorithm that can be incorporated in other simulation software.
Get full access to this article
View all access options for this article.
References
1.
CasanellasM, Fernández-SánchezJ, Roca-LacostenaJ. The embedding problem for Markov matrices. Publicacions Matemàtiques, 2023; 67:411–445.
2.
ChangJT. Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency. Math Biosci, 1996; 137(1):51–73; doi: 10.1016/S0025-5564(96)00075-2
LakeJA. Reconstructing evolutionary trees from DNA and protein sequences: Paralinear distances. Proc Natl Acad Sci U S A, 1994; 91(4):1455–1459; doi: 10.1073/pnas.91.4.1455
8.
MalloD, De Oliveira MartinsL, PosadaD. Simphy: Phylogenomic simulation of gene, locus, and species trees. Syst Biol, 2016; 65(2):334–344.
9.
MoYK, HahnMW, SmithML. Applications of machine learning in phylogenetics. Mol Phylogenet Evol, 2024; 196(108066):108066; doi: 10.1016/j.ympev.2024.108066
10.
NguyenL-T, SchmidtHA, von HaeselerA, et al.IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol, 2015; 32(1):268–274.
11.
RambautA, GrassNC. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci, 1997; 13(3):235–238; doi: 10.1093/bioinformatics/13.3.235
12.
SchallerD, HellmuthM, StadlerPF. Asymmetree: A flexible python package for the simulation of complex gene family histories. Software, 2022; 1(3):276–298.
13.
SjöstrandJ, ArvestadL, LagergrenJ, et al.Genphylodata: Realistic simulation of gene family evolution. BMC Bioinformatics, 2013; 14:209.
14.
SpielmanSJ, WilkeCO. Pyvolve: A flexible python module for simulating sequences along phylogenies. Plos ONE, 2015; 10(9):e0139047–e7; doi: 10.1371/journal.pone.0139047