Abstract
Gene duplication and loss play an important role in the evolution of novel functions and for shaping an organism's gene content. Recently, it was suggested that stress-related genes frequently are exposed to duplications and losses, while growth-related genes show selection against change in copy number. The fungal chitinase gene family constitutes an interesting case study of gene duplication and loss, as their biological roles include growth and development as well as more stress-responsive functions. We used genome sequence data to analyze the size of the chitinase gene family in different fungal taxa, which range from 1 in
Introduction
Chitin is a polymer which consists of N-acetylglucosamine monomers (GlcNAc), linked by β-1,4-glucosidic bonds. It is widely distributed in nature and it is a constituent of the exoskeleton of invertebrates, of zooplankton and of fungal cell walls. Chitinases (EC 3.2.1.14) hydrolyze the bonds between GlcNAc residues releasing oligomeric, dimeric (chitobiose) or monomeric (GlcNAc) products. Chitobiose can be further cleaved by N-acetylhexosaminidases (EC 3.2.1.52) into GlcNAc (Keyhani and Roseman, 1999). Chitinases are divided into two different glycoside hydrolase families (18 and 19) based on amino acid sequence similarity (Henrissat, 1991; Henrissat and Bairoch, 1993). These two families share limited similarity at the amino acid level and have different three-dimensional structures and modes of action (Iseli et al. 1996). These enzymes can display either exo- or endoactivity, depending on the structure of the catalytic site (Terwisscha van Scheltinga et al. 1994; van Aalten et al. 2000; van Aalten et al. 2001).
Growth and morphological development of fungi makes cell wall remodelling a necessity. Cell expansion and division, spore germination, hyphal branching and septum formation all depend on the activities of hydrolytic enzymes intimately associated with the fungal cell wall, among them chitinases (Adams, 2004). Chitinases are also implied in autolysis and recycling of older parts of the fungal mycelia (Duo-Chuan, 2006). Chitinases also have aggressive roles as fungal pathogenicity factors during infection of other fungi (mycoparasitism), insects and nematodes (Wattanalai et al. 2004; Duo-Chuan, 2006; Gan et al. 2007a). Furthermore, chitinases are involved in degradation of chitin for nutritional needs (Duo-Chuan, 2006). Lysis of the host cell wall and degradation of nematode egg shells are shown to be important steps in the mycoparasitic and nematophagous attack (Howell, 2003; Benitez et al. 2004; Gan et al. 2007a), and hence chitinases from various fungi used as biocontrol agents have been cloned and characterised (Felse and Panda, 1999; Hoell et al. 2005; Klemsdal et al. 2006; Gan et al. 2007a; Gan et al. 2007b; Dong et al. 2007).
The diversity of chitinase function during the fungal life cycle raises interesting questions regarding the evolution of this important gene family. Fungal chitinases belong to glycoside hydrolase family 18 (GH18) and they consist of discrete domains, which are variously arranged in different orders in different proteins (Gilkes et al. 1991; Warren, 1996; Henrissat and Davies, 2000). Besides the catalytic domain there is very often a substrate-binding domain present. These substrate-binding domains are not necessary for chitinolytic activity, although they seem to enhance the efficiency of the enzymes (Suzuki et al. 1999; Limon et al. 2001).
There is a large variation in the number of GH18 genes present in different fungal genomes, from 1 in
Identification of expansions as well as contractions of protein families in fungi with diverse ecological roles can aid in understanding relationships between function and phylogeny. The fungal GH18 gene family constitutes an interesting case study of gene duplication and loss, as their biological roles includes growth and development as well as more stress-responsive functions. Hence it is possible to test the hypothesis that growth-related genes display selection against changes in copy number while stress-related genes tolerate more duplications and losses, within a single gene family. Here we present a study where we use genome sequence data to analyze the size of the GH18 gene family in different fungal taxa and to infer their phylogenetic relationships. We also employ a stochastic birth and death model to test for non-random evolution of the fungal GH18 gene family. We show that the fungal GH18 gene family indeed evolves non-randomly and we identify fungal lineages where larger-than-expected expansions or contractions potentially indicate the action of adaptive natural selection.
Materials and Methods
Biomining of genome sequences
In order to avoid sampling bias our study was restricted to fungal species where genome sequence information and estimates of divergence times were available.
Phylogenetic analysis
Amino acid sequences of GH18 catalytic domains were determined by InterProScan (Zdobnov and Apweiler, 2001) or Conserved Domain Database searches (Marchler-Bauer et al. 2005). Sequences were manually trimmed and aligned with Clustal X (Thompson et al. 1997) and inspected using BioEdit (Hall, 1999). Amino-acid similarity between sequences was calculated using MegAlign, implemented in the DNASTAR program package (DNASTAR, Madison, WI). Phylogenetic analysis of catalytic domains was performed using maximum likelihood methods implemented in PhyML-aLRT 1.1 (Guindon and Gascuel, 2003; Anisimova and Gascuel, 2006). The JTT amino-acid substitution model (Jones et al. 1992) was used, the proportion of invariable sites was set to 0, one category of substitution rate was used and gaps were treated as unknown characters. The starting tree to be refined by the maximum likelihood algorithm was a distance-based BIONJ tree estimated by the program (Guindon and Gascuel, 2003). Statistical support for phylogenetic grouping was assessed by approximate likelihood-ratio tests based on a Shimodaira-Hasegawa-like procedure (SH-aLRT) (Anisimova and Gascuel, 2006) and by bootstrap analysis (500 resamplings).
Likelihood analysis of gene gain and loss
In order to statistically test whether the size of the fungal GH18 gene family is compatible with a stochastic birth and death model we used the program CAFE (Computational Analysis of gene Family Evolution) (De Bie et al. 2006), which is based on the probabilistic framework developed by Hahn et al. (2005). From a specified phylogenetic tree and the gene family size in extant species, we inferred the most likely gene family size at internal nodes, tested for accelerated rates of gene family expansions or contractions and identified the branches that are responsible for the non-random evolution.
The fungal GH18 gene family data in extant species that were used in the current analysis are found in Table 1. Fungal GH18s can be divided into three major phylogenetic clusters, A, B and C, and further subdivisions within these clusters are made (Seidl et al. 2005). Our phylogenetic analysis shows that cluster C GH18s can be merged with cluster A and therefore we analysed the data in three ways; cluster A GH18s separately, cluster B GH18s separately and all GH18s merged. CAFE assumes that the gene family under study is present in the most recent common ancestor of all taxa included in the analysis. Therefore
Number of chitinase genes in different fungal species.

Distribution of GH18 gain and loss among fungal lineages. Phylogenetic relationships among the fungal species used in the current study are shown, including divergence dates in millions of years (Taylor and Berbee, 2006). Circled numbers represent total number of GH18 genes in extant species and estimates of total number of GH18 genes for ancestral species. Boxed taxon names indicates a significant (
Alternative estimates of divergence times can be made by assuming that
The birth and death parameter (λ) was estimated from the data (De Bie et al. 2006) and was 0.001 for all datasets.
Results
There was considerable variation in the number of GH18 genes between different fungal species, ranging from 1 in

Phylogenetic relationships of fungal cluster A family 18 glycoside hydrolase catalytic domains. Phylogenetic analyses were performed using maximum likelihood methods as implemented in PhyML-aLRT, based on an alignment of family 18 glycoside hydrolase catalytic domain amino acid sequences. Branch support values (bootstrap proportions/approximate likelihood-ratio test probabilities) are associated with nodes, with a dash indicating that the support was <70%/0.70. The bar marker indicates the number of amino-acid substitutions. Protein identifiers include protein name, GenBank accession nos. or locus/protein ID from the respective genome projects. Group names are indicated, see text for reference. Cluster B GH18 Chi18–12 from

Phylogenetic relationships of fungal cluster B family 18 glycoside hydrolase catalytic domains. Phylogenetic analyses were performed using maximum likelihood methods as implemented in PhyML-aLRT, based on an alignment of family 18 glycoside hydrolase catalytic domain amino acid sequences. Branch support values (bootstrap proportions/approximate likelihood-ratio test probabilities) are associated with nodes, with a dash indicating that the support was <70%/0.70. The bar marker indicates the number of amino-acid substitutions. Protein identifiers include protein name, GenBank accession nos. or locus/protein ID from the respective genome projects. Group names are indicated, see text for reference. Cluster A GH18 06156 from
Mean intraspecific levels of % amino-acid identity for GH18 subgroups in
: no GH18 members present.
To investigate the evolutionary change of the number of GH18 genes in fungi, we estimated the number of genes in ancestral species and the number of gene gains and losses for each branch of the phylogenetic tree of the fungal species (Fig. 1). The analysis showed that the fungal GH18 gene family, analysing both cluster A and B together, as well as cluster A GH18s alone have evolved non-randomly (p < 0.001, Table 3). Six branches were identified as contributing to this non-random pattern, including 5 expansions and 1 contraction (Table 3). These branches included the ancestor to the Pezizomycotina clade, as well as extant species
Non-randomly evolving branches in the fungal GH18 gene family.
See De Bie et al. (2006) for reference.
Gene family size change as compared with the most recent ancestor.
Taylor and Berbee, (2006) also published two alternative estimates of divergence times of fungal taxa, although these alternative estimates resulted in more improbable age estimates when compared with age estimates in other phyla. Analyzing the evolution of the GH18 gene family using these alternative estimates was performed to assess the robustness of the analysis to differences in divergence dates. The two more recent estimates of divergence dates (estimated age of the fungal phylum at 495 or 923 millions of years) both showed that the GH18 family have evolved non-randomly (p < 0.001, Table S2), and identified the same branches as contributing to this non-random pattern (Table S2). The oldest estimate (estimated age of the fungal phylum at 1630 millions of years) gave no significant changes in size of the GH18 gene family (Table S2).
The involvement of GH18 subgroups C-I and C-II in both expansions and contractions in ascomycetes qualified them for further study. Based on the genome sequence, C-I members in

Partial alignment of
Discussion
Accurate estimates of divergence times for fungi are notoriously hard to obtain due to a very limited fossil record. In the current study we have used three different time estimates on our GH18 gene family data. The differences relates to whether the Devonian ascomycete
The number of GH18 genes in different fungal species varies considerably. An expansion in size of a particular gene family or subgroup within a gene family, such as GH18s, suggests that this gene family or subgroup has been important for the fitness of the species during evolution. The observed variation could possibly be attributed to differences in morphology, growth patterns, nutrient acquisition or antagonistic ability between species. Our approach can be used to establish links between phylogeny of the GH18 gene family with the ecological role of the species and to identify specific subgroups as important evolutionary targets in specific fungal lineages.
Filamentous ascomycetes generally possess larger number of GH18 genes as compared with other fungal groups. This larger GH18 gene family size can possibly be attributed to a larger gene copy-number in certain subgroups, but more importantly to the presence of several GH18 subgroups that appear to be unique for filamentous ascomycetes (A-II, C-I and C-II). Subgroups C-I and C-II are identified as the most likely target for the observed expansion in
Another intriguing result is the different evolutionary trajectories of the GH18 gene family between the human pathogen
The high interspecific sequence variability in C-I and C-II as compared with other GH18 subgroups can be interpreted as the result of diversifying selection. Diversifying evolution due to positive selection is reported for plant chitinases that function as defence proteins against invading fungal pathogens (Bishop et al. 2000; Tiffin, 2004) and in reproductive proteins, in animals and plants (Clark et al. 2006) as well as in fungi (Karlsson et al. 2008). It is possible that the high sequence variability in C-I and C-II represents an adaptation towards differences in cell wall composition in antagonistic species. The significant expansion of the fungal GH18 gene family in the ancestor of the Pezizomycotina probably reflects the emergence of the unique C-I and C-II GH18s in filamentous ascomycete fungi, although the ecological factors driving the selection for these subgroups remain obscure. Although very speculative, the emergence of C-I and C-II GH18s coincide with the estimated time of colonization of terrestrial environments by plants (Sanderson, 2003), which suggests the possibility that the emergence of terrestrial plants created new ecological niches where filamentous ascomycetes could expand into to compete for space and nutrients.
Another expansion took place in cluster B GH18s in
Species with yeast-like or monocentric growth styles such as
The analysis of the GH18 gene family size in basidiomycetes shows no conflict with a random process. However, 5 of the 8 GH18 genes in
Our analysis shows the usefulness of the combination of a stochastic birth and death model and phylogenetic information in a probabilistic framework for identification of lineages with unusually evolving gene families. However, the birth and death model assumes independence among individual genes. This means that any large-scale chromosome duplication, deletion or polyploidization that acts on several gene family members at once violates the assumption of the model (Hahn et al. 2005). Interpretation of gene family size differences between taxa that are separated by genome duplications should be made with caution. There are indications of a recent polyploidization in
Another violation against the assumption of independence among individual genes may be seen in
In the current study we have used fungal genome data in comparative work to infer phylogenetic relationships in the fungal GH18 gene family and to detect non-random expansions and contractions. This approach can be used to establish links between phylogeny of the GH18 gene family with the ecological role of the species, and identify specific GH18 subgroups as important evolutionary targets in specific fungal lineages. Within the fungal GH18 gene family we observe selection against changes in copy number in GH18s involved in growth and development as well as selection for increased copy number in GH18s involved in stress-related functions, supporting the idea of a bipolar principle that governs tolerance to duplications and losses (Wapinski et al. 2007). The results also indicate that antagonistic fungal-fungal interactions constitute an important evolutionary force in soil borne ascomycetes, but not for fungal species that are pathogenic in humans.
Footnotes
Acknowledgements
This work was supported by a grant from the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (Uppsala Microbiomics Center). Genome sequence data were produced by the US Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/), The Fungal Genome Initiative at the BROAD Institute (http://www.broad.mit.edu/annotation/fungi/fgi/index.html) and Génolevures (
). We thank Timothy James, Åke Olson and Nils Högberg for critical reading of the manuscript.
