Abstract
Introduction
Mycobacterium tuberculosis, a deadly pathogen
Evolutionary changes in Mtb: The case of resistance
Drug resistance often points to significant mutational events and hence evolution of Mtb. The emergence of extensively drug resistant (XDR) TB and MDR TB in KwaZulu-Natal, South Africa, was shown to have taken place via several independent evolutionary events which involved accumulation of stepwise resistance mutations over decades, for example, mutations in
Other studies have corroborated Mtb evolution to antibiotic resistance (AR). In an investigation of the M strain MDR-TB outbreak that occurred in Buenos Aires, Argentina, it was found that Mtb M strain evolved to XDR through accumulation of resistance mutations over the years. Many of these mutations were non-synonymous, hence suggested positive selection. The ancestor of the M strain qualified for MDR status as early as 1973. In a time frame of approximately 4 decades, this MDR Mtb strain successfully spread and evolved resistance to several additional drugs, thus gaining pre-XDR and ultimately XDR status. 4
Very recently, whole genome sequencing (WGS) was used to demonstrate the microevolution of a parental Mtb strain into clonal variants. The main driver of the microevolution was drug selective pressure, favoured by intermittent antibiotic therapy. Within 7 months, the most recent common ancestor of the Mtb strain evolved from MDR to pre-XDR status as a result of point mutations and insertions-deletions (indels) that were responsible for phenotypic resistance to a wide range of antibiotics. 5 Mutations in the form of indels and single nucleotide polymorphisms (SNPs) were further shown to contribute to Mtb genome evolution via enrichment of AR genes. 6 Finally, de novo evolution of resistance, particularly to fluoroquinolones, was suggested to have driven the development and spread of XDR-TB in Belarus, a country known to have a high incidence of MDR-TB. 7
Molecular evolutionary rates of Mtb
Mtb is characterised by extensive mutation and evolution, necessary for adaptation to its host or to selection pressure imposed by antibacterial drugs. A number of prior studies corroborate this fact. For example, comparative analysis of the genomes of drug susceptible and drug resistant Mtb isolates suggests rapid evolution of Mtb. 8 In addition, within-host microevolution of Mtb is characterised by elevated rates of nucleotide substitution which have implications for the appearance of AR. 5 On a larger scale, high levels of mutation resulting in significant genomic diversity and the rapid evolution of Mtb have consistently been identified in different regions of the world shedding light on the emergence of AR.3,4,8 -11 Among the numerous Mtb virulence factors, PE/PPE genes have been shown to evolve at high rates due to particularly high frequency of mutations.12,13 At the same time, varying rates of mutation or evolution of Mtb have been demonstrated to occur during human infection, with decreasing mutation rates through latent infection 14 and high mutation rates during active disease. 9
As outlined above, the majority of studies on Mtb evolutionary rates have focussed on genes involved in colonisation or infection dynamics, virulence, and drug resistance. However, there is paucity of information on the rates of molecular evolution of individual biological processes and molecular functions in Mtb, irrespective of the participation of these processes and functions in a specific pathogen-related mechanism. Here, we assess the evolutionary rates of select processes and functions in Mtb using the
Pan genome estimation of M. tuberculosis
We perform a large-scale molecular evolutionary analysis of the core genes of Mtb amongst 264 strains. Prior studies of Mtb evolutionary pangenomics have dealt with smaller numbers of strains. For example, in their core and pan genome analysis, Wan et al
15
used complete genomes of 21 Mtb strains to shed light on genetic diversity in and evolution of the Mtb Manila family. Other authors have investigated 36 strains to better understand how molecular evolution has shaped the primary (core) and secondary (accessory) genome of Mtb.
16
In another recent study, Zakham et al
17
probed the core genome and virulence determinants of 168 Mtb strains plus other human-adapted
Yang et al
16
coined the term ‘super core genes’ which are defined as core genes with 2 or more copies in more than 90% of Mtb strains, and they suggested the action of evolutionary forces that changed the copy numbers. Copy number variations in turn might favour adaptation of Mtb to a human host and contribute to phenotypic differences between Mtb and
Here, we present an analysis of the molecular evolutionary rates of 815 core Mtb protein-coding genes with the aim of uncovering the nature of the genes with high rates of evolution and the biological pathways in which they are involved.
Methods
Mtb data collation using TAGOPSIN
Data retrieval and collation were performed using TAGOPSIN. 19 In brief, TAGOPSIN is a Java command line programme for rapid and systematic retrieval of select data from 7 public biological databases relevant to comparative genomics and protein structure studies. The programme allows a user to retrieve organism-centred data and assemble them in a single local database in PostgreSQL (https://www.postgresql.org). This local database constitutes a useful resource for running specific queries easily. Table 1 gives the statistics of the dataset built by TAGOPSIN for Mtb.
The statistics of the dataset built by TAGOPSIN for Mtb.
Identification of proteins by Gene Ontology term
Gene Ontology 20 (GO, version 2021-07) terms of the ‘biological process’ (BP) and ‘molecular function’ (MF) namespace that describe vital processes and functions in the cell were manually chosen using the ontology editor, Protégé. 21 The number of representative UniProtKB/Swiss-Prot proteins for each term and all its corresponding children terms were obtained from the local database. This task was done programmatically for all available strains of Mtb. Nine terms (Table 2) were selected for evaluation of the evolutionary rates of their associated genes. They are mainly terms that have at least 50 representative proteins (ie, Swiss-Prot entries) and/or that describe essential processes and functions in the cell.
GO terms and corresponding number of unique RefSeq gene products used in the calculation of
In addition, to identify proteins with low evolutionary rates that may be of use as drug target candidates, GO terms that describe functions associated with membrane proteins were chosen since the latter are amongst the first contacts a small molecule drug would have with the cell. Hence, children terms of ‘transmembrane transporter activity (GO:0022857)’ were chosen to cover membrane proteins or enzymes (Table 2), and this term incorporates roughly 100 proteins of all available Mtb strains. Mechanisms that commonly occur in bacteria like active and passive transport, but also efflux transport in drug-resistant bacteria, were considered here. Examples of gene products in these groups include adenosine triphosphate (ATP) synthase, protein translocase, and ATP binding cassette (ABC) transporter permease. Supplemental Table S1 lists the details of these terms, in particular their GO IDs and children or parent terms.
For each one of the 9 terms and its corresponding children terms, the representative protein entries from Swiss-Prot as well as RefSeq were identified. In this step, RefSeq entries were included so as to cover a maximum number of strains of Mtb for phylogenetic analysis. The numbers of unique RefSeq gene products identified for each term and used in downstream evolutionary analyses are given in Table 2. If gene products were common or overlapping among different GO terms, this was disregarded and they were included in their respective GO terms.
Estimation of the rates of molecular evolution for select GO processes and functions
Next, maximum likelihood phylogenetic analysis was performed. In brief, for every GO term listed in Table 2, coding sequences (CDSs) were retrieved for all gene products of the 264 strains of Mtb from the local database. Because the genomic features are annotated differently, the reference genome for Mtb (genome AC ‘NC_000962’) was excluded from data selection so as to maintain consistency in the dataset and in all subsequent analyses. However, to cater for this, the genome AC ‘NC_018143’ was used for the reference laboratory strain Mtb H37Rv. Codon-based multiple sequence alignment (MSA) was then performed using PRANK
22
(version 170427). The alignment file was used to estimate a phylogenetic tree with IQ-TREE
23
(version 1.6.12). DNA models of nucleotide substitution as inferred by IQ-TREE were applied.
For comparative analysis, genes known to evolve or hypothesised as evolving at high rate based on known
Fast-evolving reference genes of Mtb used for calculation of
Slow-evolving reference genes of Mtb used for calculation of
Rv number refers to the annotation of reference genome NC_000962 corresponding to the genome of the laboratory strain, Mtb H37Rv. It is given here with reference to Comas et al 29 .
Median
Estimation of the rates of molecular evolution for core genes in Mtb
The core genome is defined here as the set of genes common to all 264 strains of Mtb including the reference laboratory strain Mtb H37Rv (genome AC ‘NC_000962’). This set of genes comprises a total of 815 gene products after preprocessing. For each gene product, codon-based MSA was performed in PRANK
22
(version 170427), and the alignment file was used to estimate a phylogenetic tree in IQ-TREE
30
(version 2.1.3).
GO BP terms were assigned programmatically to each one of the 815 gene products along with the
All gene products belonging to the same generic biological pathway were manually clustered based on the GO BP terms assigned in the previous step. GO terms that were ambiguous as well as clusters that contained less than 3 gene products were excluded from further analysis. A total of 338 gene products could thus be unambiguously classified into clusters. For each cluster of gene products, the distribution of
Identification of the pan genome
In addition to core genes, dispensable and strain-specific genes were identified so as to estimate the Mtb pan genome of 264 strains. Dispensable genes were defined as those genes that are present in at least 2 strains and at most 263 strains, while strain- specific genes were defined as those genes that are present in only 1 strain. The pan genome was estimated based on the total number of CDSs in each category and for each strain. These numbers were obtained by querying the local database by the gene product names.
Results
Mtb processes evolve at different rates
Maximum likelihood estimates of

Box plots showing the distribution of
Median
Genes participating in the cellular amino acids (AA) metabolic process have a median ω higher than that of slow-evolving reference genes but smaller than that of fast-evolving reference genes (Kruskal-Wallis
Similarly, median ω of the carbohydrate metabolic process was larger than that of slow-evolving reference genes but smaller than that of fast-evolving reference genes (Kruskal-Wallis
Furthermore, although Kruskal-Wallis test indicates a significant difference in median ω among groups of genes for the cell wall organisation or biogenesis process (
Mtb kinase function evolves at significantly high rate
There was a statistically significant difference in median

Box plots showing distribution of
Evolutionary rate of Mtb peptidase function
Median
Evolutionary rates of Mtb transmembrane transporter functions
Maximum likelihood estimates of

Box plots showing distribution of
Similarly, the inorganic molecular entity transmembrane transporter activity has a median ω of zero, lower than that of either set of reference genes (Kruskal-Wallis

(A) Distribution of
Moreover, there was a statistically significant difference among groups of genes for the MFs ‘active transmembrane transporter activity’ (Kruskal-Wallis
Overall, these results suggest a significantly low rate of evolution of the inorganic molecular entity transmembrane transporter and passive transmembrane transporter activities. With median ω values of zero, these 2 functions possibly experience purifying selection. It is very likely that the median ω of the ‘efflux transmembrane transporter activity’ function is biased towards sample size. Indeed, the Wilcoxon test has little power with small samples. In fact, if a sample has 5 or fewer values, the Wilcoxon test will almost always give a
Rates of evolution of Mtb core genes by pathway
A total of 33 different clusters were obtained following manual clustering of gene products along with their
In general, pathways that are associated with low
We define a benchmark for interpretation, as used by Verma et al,
31
gene products with a
It is noteworthy that the generic pathways of AA biosynthetic process and AA metabolic process are associated with particularly high
Perhaps unsurprisingly, the general pathway of lipid biosynthetic/metabolic process has high variability around the median, as indicated by the large interquartile range (Figure 4B). It is conceivable since mycobacteria possess a distinctive mycolic acid cell wall. Many genes participating in this generic pathway show signs of positive diversifying selection (ω > 1), for example, genes coding for lipoyl synthase, CDP- diacylglycerol–glycerol-3-phosphate 3-phosphatidyltransferase, glycosyltransferase family 1 protein, rhamnosyl O-methyltransferase and (3R)-hydroxyacyl-ACP dehydratase subunit HadB (data not shown). The gene encoding the enzyme acyl-ACP thioesterase potentially experiences diversifying selection (ω > 0.70).
Classified as participating in ‘negative regulation of growth’, the ribonuclease VapC toxins VapC32, VapC39 and VapC46 are identified as Mtb core genes and are possibly undergoing positive diversifying selection with
Fast-evolving genes related to pathogenicity and resistance.
Other generic pathways that demonstrate high variability around the median include ‘catabolic process’, ‘post-translational modifications’, ‘cell cycle’, ’cofactor/coenzyme biosynthetic process’, ‘translation’ and ‘DNA-templated transcription’. Their large interquartile range shows that
Pan genome estimation of 264 strains of Mtb
Figure 5 shows the distribution of core, dispensable and strain-specific genes across 264 strains of Mtb. Considering multiple copies of a gene, the average number of core genes is 3241 which gives a mean percentage of core genes of 76.5%. When disregarding multiple copies however, the average core gene count is 1113 and this is in fact constant throughout our Mtb population. The mean percentage of core genes in this case is 54.0%.

The distribution of core (A), dispensable (B) and strain-specific (C) genes across 264 strains of Mtb.
Our analysis also shows that dispensable genes occur in multiple copies. We obtain an average of 996 dispensable genes taking into account multi-copy genes. It represents a mean percentage of 23.5% of the pangenome. Excluding multiple copies of a gene, the average dispensable gene count is 948 which is 46.0% of the pangenome.
Our preliminary analysis indicates that there are 318 genes on average which have at least 2 copies in more than 90% of strains. Actually, we find that super core genes are present in all 264 strains utilised here, and they represent a mean percentage of 28.6% of the total single-copy core gene count, that is, 1113.
Co-occurrence of dispensable genes
In order to get better insight into the evolution of the Mtb species, we carried out a preliminary analysis of the dispensable genes since they can play important roles in phenotypic variation and genome evolution. The dispensable genes identified in the previous step were analysed by groups of strains belonging to the same geographical location. In particular, the KZN strains of Mtb are known to have originated from KwaZulu-Natal, South Africa, while the Beijing/Beijing-like strains originated from the Beijing area, China. Gene products common to the KZN strains for example were obtained from the local database. A total of 955 gene products of the dispensable genome co-occurred in Mtb KZN strains.
It is noteworthy that on one hand, CDSs of some of these gene products exist in multiple copies, and on the other hand, some gene products do not appear in all KZN strains. For example, the gene product ‘beta-ketoacyl-ACP synthase’ appears in 4 out of 5 KZN strains while the product ‘beta-ketoacyl-[acyl- carrier-protein] synthase II’ appears in only 1 out of 5 strains. Importantly, we note that among the genes that exist in multiple copies are those annotated as transposases, suggesting a potential role in the pathogenicity of Mtb KZN strains.
Certainly, the function of transposases and IS elements in the evolution of antibiotic resistance and virulence is well-established. There is for instance an average of 12 copies of the gene coding for ‘IS3-like element IS987 family transposase’ per KZN strain. Moreover, a prior study revealed a high rate of molecular evolution of the PPE38 genomic region and suggested that this region is antigenic and hypervariable. 13 Our results show that the CDS of PPE38 exists in multiple copies in Mtb KZN strains. Of all PPE family proteins, PPE38 is the only one to have a multi-copy CDS, a finding that could confirm its role as Mtb virulence factor. Among other virulence factors, ‘VapC toxin family PIN domain ribonuclease’ exists in 3 KZN strains only (out of 5), pointing to a possible strain specificity of this toxin in giving a pathogenic phenotype. The ESAT-6-like proteins EsxI and EsxK belong to the ESAT-6 family of virulence factors. Here, EsxI is present in 2 copies per KZN strain while EsxK is present in only 1 strain, suggesting high rate of evolution and strain specificity respectively. Unsurprisingly, there are 2 copies of the gene product ‘multidrug-efflux transporter’ per Mtb KZN strain, which likely explains an AR phenotype.
Similarly, a total of 984 gene products co-occurring in Mtb Beijing and Beijing-like strains were obtained. In particular, the gene coding for ABC transporter ATP-binding protein/permease is present in 2 copies on average. This transporter is reported to be involved in Mtb virulence. We also note the presence of the following products in only 1 Beijing/Beijing-like strain (out of a total of 7 strains): ‘3-hydroxyacyl-thioester dehydratase HtdY’, ‘4Fe-4S dicluster domain-containing protein’, ‘AbrB/MazE/SpoVT family DNA-binding domain-containing protein’, ‘DNA or RNA helicase of superfamily II’, ‘DNA topoisomerase (ATP-hydrolysing) subunit A’, ‘IS110-like element IS1547 family transposase’, ‘PPE family protein PPE66’, ‘PPE family protein PPE67’, ‘antitoxin VapB47’, ‘multidrug transporter’, ‘ribonuclease VapC38’, among others. The majority of these products are present in 1 and the same strain, M. tuberculosis Beijing (genome AC ‘NZ_CP011510’). As with Mtb KZN, many transposases exist in multiple copies in the Beijing and Beijing-like strains, indicating a crucial role in the development of AR and virulence in Mtb Beijing. For example, each Mtb Beijing strain has on average 16 copies of the gene coding for IS3 family transposase. In addition, ESAT-6-like proteins EsxI and EsxK are present in 2 copies per Beijing or Beijing-like strain. In order to better understand which gene products confer specificity on KZN and Beijing strains, the products were distinctively identified.
Discussion
These findings point to the continuous process of evolution in genes that are part of essential pathways such as transcription and translation, amino acid and lipid metabolism. This study investigated the evolutionary rates of sets of genes involved in different pathways as per the GO grouping. Genes for the salvage nucleotides synthesis pathway, the amino acids biosynthesis and those for kinases were found to have high
High rates of evolution linked to M. tuberculosis survival and pathogenicity
Here, we show that the cellular AA metabolic process and the kinase activity function likely evolve at a significantly high rate in the Mtb species. These findings correlate well with Mtb pathogenicity and survival. Indeed, several studies have highlighted the essentiality of amino acid biosynthesis and protein kinase activity during chronic infection and pathogenesis.33 -38
Mycobacteria can synthesise all 20 amino acids.
39
It can evade the host defencive mechanism of histidine starvation, by making its own through the de novo pathway. Amino acids are essential precursors of protein synthesis and for many other metabolic intermediates. Glutamate and glutamine are the main nitrogen source for other molecules. Amino acid biosynthetic pathways and protein kinases are crucial for pathogenicity.
33
There could also be a potential link with the synthesis of virulence factors, such as PE/PPE proteins of Mtb,
40
which have a conserved N-terminal domain that incorporates Pro-Glu (PE) or Pro-Pro-Glu (PPE) residues. The glutamine family amino acid metabolic process was included in evolutionary rate estimation. This family of AAs comprises arginine, glutamate, glutamine and proline. It is possible that the cellular AA metabolic process evolves at a significantly high rate in Mtb for the synthesis of diverse forms of PE/PPE proteins. Phylogenetic analysis of several amino acid biosynthesis genes has shown that gene duplication and horizontal transfer contributed to the genomes of Corynebacteria.
41
The paralogues
There is convincing evidence that kinases also play an important role in Mtb physiology and pathogenicity. Kinases are essential for protein activation through phosphorylation and are particularly important in signal transduction pathways. They have been classified into His-, Tyr- and Ser/Thr-kinases as per the amino acid they phosphorylate. There are 11 Ser/Thr kinases in Mycobacteria : Pkn-A, B, D, E, F, G, H, I, J, K and L which are differentially distributed among different species. They are organised into sub-domains and have conserved amino acids motifs. The activation loop of Ser/Thr kinases have been shown to be highly variable indicating that they interact with different molecules. The loop is phosphorylated for enzyme activity. 42 The protein tyrosine kinase PtkA is essential for growth of Mtb in macrophages, the preferred niche of the pathogen. Likewise, PknB is a serine/threonine protein kinase that is necessary for survival of Mtb both in vitro and in vivo, and for host immune evasion. It is also required to establish an infection and cause disease. 37
Low rate of molecular evolution revealed
Our findings also reveal a significantly low rate of molecular evolution for the carbohydrate metabolic process, to what is previously reported about Mtb carbohydrate metabolism in vivo and in vitro. Mtb has been classified as an obligate aerobe whereby respiration is a vital component of its physiology and is subject to change in different host micro-environments. 43 Moreover, many enzymes involved in gluconeogenesis were considered in our evolutionary analysis, and during infection carbohydrate synthesis in Mtb occurs primarily through conversion of lipid and/or AA intermediates by gluconeogenesis. 44 A mechanism of carbon co-catabolism that enables flow of carbon via both glycolysis and gluconeogenesis has also been revealed. 45 We thus expect a high rate of molecular evolution of the carbohydrate metabolic process in Mtb. Nonetheless, we hypothesise that our result of a low evolutionary rate might be relevant when the pathogen is in a state of latency.
Pangenome of Mtb
In general, our numbers of core and dispensable genes are lower than those found by other authors. This is because the addition of more strains shrinks the core gene count in favour of an increase in dispensable and strain-specific genes. However, the numbers of strain-specific genes obtained in this analysis are low compared with other studies. Here, the maximum number of strain-specific genes is 21 for the strain Mtb EAI5/NITR206. Conversely, Yang et al 16 for example estimated 97 strain-specific genes for that same strain. On the whole, they obtained higher numbers of strain-specific genes and lower numbers of dispensable genes for any given strain because they analysed much fewer strains than in our study. Notably, it can be hypothesised that genes identified as strain-specific by Yang et al have in fact spread throughout a larger population, possibly by HGT, and become dispensable genes. Certainly, other authors have demonstrated that HGT is a key mechanism that has shaped the evolution of the species via chromosomal DNA transfer. 46
A second original aspect described here is the identification of genes that potentially determine the respective specificities of Mtb KZN and Beijing/Beijing-like strains. A total of 104 and 133 gene products specific to Mtb KZN and Beijing respectively were thus identified. The PPE family proteins have been established as important Mtb virulence factors. Our analysis shows that PPE1, PPE17, PPE22, PPE32, PPE38, PPE44 and PPE50 are specific to the KZN strains and not to the Beijing or Beijing-like strains. Conversely, PPE66 and PPE67 possibly confer specificity on the Beijing/Beijing-like strains and contribute to their virulence. Moreover, among the ESAT-6 family of virulence factors, ESAT-6-like proteins EsxO, EsxP and EsxW are specific to Mtb KZN while EsxL is specific to Mtb Beijing/Beijing-like. These findings suggest that different selection pressures prevailing in different geographical regions could have caused varying levels of mutation and rates of evolution which in turn could have led to the appearance of distinct virulence factors of the PPE and ESAT-6 families in Mtb KZN and Beijing strains. It has been previously reported that PPE38 for instance is encoded by a genomic region that is characterised by rapid molecular evolution. 13 The emergence of strain-specific virulence factors is in addition to the numerous PPE and ESAT-6 proteins common to both groups of strains. A similar conclusion can be drawn for the toxins MazF3 and MazF8 which seem to be specific to Mtb KZN. Serine-threonine kinases have been shown to play important roles in Mtb pathogenicity.37,47,48 Here, a comparison of dispensable genes between Mtb KZN and Beijing/Beijing-like strains suggests that ‘Stk1 family PASTA domain-containing Ser/Thr kinase’ is specific to Mtb Beijing/Beijing-like.
Interplay between core genome and accessory genome
Our results indicate a high proportion of the dispensable genome (46.0% excluding multiple copies of a gene) relative to the core genome (54.0% excluding multiple copies). Other authors have identified a generally higher share of the core genome, albeit using fewer strains. We do obtain a high percentage of the core genome, that is, 76.5%, but only when considering multiple copies of a gene. In that case, there is a concomitant decrease in the percentage of dispensable genes (from 46.0% to 23.5% of the pangenome). It is thus conceivable that there is indeed interconversion amongst core genes and dispensable genes through copy number variations as proposed by Yang et al 16 Interconversion may also happen via changes in the number of copies of the same gene within the same strain. Besides the relative share of core and dispensable genome with respect to the pangenome, our discovery of high rates of molecular evolution amongst core genes further support this concept. It is possible that strain-specific genes also participate in such a dynamic process. Hence, during evolution of Mtb, we cannot rule out an interplay between the core genome and the accessory genome (dispensable + strain-specific) that potentially facilitates the emergence of pathogenic strains and the development of traits related to virulence. Some genes might evolve at high rate in some strains to enable the pathogen to adapt to and survive in its host, but also to cause disease. The core genome is involved probably because it encodes functions required for housekeeping cellular processes and organismal survival. One hypothesis is that genes determining pathogenicity (which were formerly part of the accessory genome) and genes necessary for pathogen survival are co-located and are co-transcribed in the Mtb core genome. The region harbouring those genes could thus be undergoing rapid evolution.
Conclusion
Here, we presented the results of an investigation of the rates of molecular evolution of select biological processes and molecular functions in Mtb. We have shown that the cellular AA metabolic process and the kinase activity function evolve at significantly high rate while the carbohydrate metabolic process evolves at significantly low rate in the Mtb species. We have supported our findings with evidence reporting that the high rates of evolution correlate well with Mtb physiology and pathogenicity.
We also corroborated the findings of previous authors by showing that, even genes of the Mtb core genome evolve at high rate and encode virulence-associated traits, and, that there is indeed an interplay between the core genome and the accessory genome that drives the evolution of Mtb. We have also shown that core genes participating in AA biosynthetic/metabolic process evolve at a high rate, and pathways like post-translational modifications, translation and DNA-templated transcription experience variable rates of molecular evolution. This study is however limited by the fact that groups of genes were analysed together under a GO term. Understanding the evolution at the single gene level would probably point to those that are mutating at differential rates under selection pressures. This can also be extended to an understanding of why some (or parts of) proteins are more likely to bear structural changes than others. Furthermore, our investigation relied on existing GO annotations and therefore newly characterised genes from recent studies have not been included here. Future work should consider such transcribed products which are yet to be annotated. 49
Supplemental Material
sj-docx-1-evb-10.1177_11769343241239463 – Supplemental material for Large-scale Pan Genomic Analysis of Mycobacterium tuberculosis Reveals Key Insights Into Molecular Evolutionary Rate of Specific Processes and Functions
Supplemental material, sj-docx-1-evb-10.1177_11769343241239463 for Large-scale Pan Genomic Analysis of Mycobacterium tuberculosis Reveals Key Insights Into Molecular Evolutionary Rate of Specific Processes and Functions by Eshan Bundhoo, Anisah W Ghoorah and Yasmina Jaufeerally-Fakim in Evolutionary Bioinformatics
Footnotes
Acknowledgements
We acknowledge support from the University of Mauritius and the Human Heredity and Health in Africa Bioinformatics Network (H3ABioNet).
Author Contributions
YJF conceived the study. EB and AWG wrote scripts for data analysis and visualisation. EB designed and performed the analysis. EB wrote the original draft. AWG and YJF edited the manuscript. All authors read and approved the final manuscript.
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Higher Education Commission of Mauritius [scholarship to EB].
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
