Abstract
ILs are part of the immune system and are involved in multiple biological activities. ILs have been shown to evolve under positive selection; however, little information exists regarding which codons are specifically selected. By using different codon-based maximum-likelihood (ML) approaches, signatures of positive selection in mammalian ILs were searched for. Sequences of 46 ILs were retrieved from publicly available databases of mammalian genomes to detect signatures of positive selection in individual codons. Evolutionary analyses were conducted under two ML frameworks, the HyPhy package implemented in the Data Monkey Web Server and CODEML implemented in PAML. Signatures of positive selection were found in 28 ILs: IL-1A and B; IL-2, IL-4 to IL-10, IL-12A and B; IL-14 to IL-17A and C; IL-18, IL-20 to IL-22, IL-25, IL-26, IL-27B, IL-31, IL-34, IL-36A; and G. Codons under positive selection varied between 1 and 15. No evidence of positive selection was detected in IL-13; IL-17B and F; IL-19, IL-23, IL-24, IL-27A; or IL-29. Most mammalian ILs have sites evolving under positive selection, which may be explained by the multitude of biological processes in which ILs are enrolled. The results obtained raise hypotheses concerning the ILs functions, which should be pursued by using mutagenesis and crystallographic approaches.
Introduction
Different cells of the immune system are able to secrete regulatory proteins, namely cytokines (cks) in response to a variety of stimuli. In mammals, cks produced as a part of the innate immune response have the ability to influence the extent and nature of the adaptive immunity response, and are thus crucial for many aspects of the immune response.1–6 When cks are secreted by leukocytes and act on other cells they are called IL.3,7–10 ILs are polypeptides of low molecular mass involved in several biological activities, including immunity, inflammation, inflammatory diseases, hematopoiesis, oncogenesis, neurogenesis and fertility, among many others. Each IL is normally involved in a multiple biological processes.9,11–13 Currently, 37 ILs have been identified in mammals, some with a variable number of variants. Classification of ILs has been based on sequence homology, receptor chain similarities, activity, structural or functional features, leading to complex and intricate classifications.8,9,14–20 The activities of ILs are possible through the binding of these polypeptides to specific cell-surface receptors, components of which can be shared by several ILs, which are able to transmit intracellular signals through different signaling pathways.1,11,13,21 Sometimes, for optimal function, interaction between complementary ILs is necessary. 22 These interactions can be synergistic, additive or antagonistic, and exhibit both negative and positive regulatory effects.9,11,22–24
Owing to the host–pathogen co-evolution, the immune system and its genes are constantly evolving, being under constant pressure and selection for adaptation, and where advantageous mutations are highly favored and deleterious mutations are quickly eliminated.5,25 ILs such as IL-2, IL-3, IL-4, IL-5, IL-13, IL-23A, IL-28A, IL-28B and IL-291,2,4,26–31 have been identified as some of the immune system genes under positive selection in different mammals. Despite all of these studies, most do not specify the codons positively selected,2,4,27,29–31 with the exception of IL-4 26 in which 15 residues were detected as being under positive selection and located on sites responsible for binding to receptors. With these results in mind, we have extended the search for signatures of positive selection to include mammalian ILs 1-37 by using different maximum-likelihood approaches (ML).
Materials and methods
Sequences
The sequences of the mammalian ILs used in the analyses were retrieved from GenBank (http://www.ncbi.nlm.nih.gov/), Ensembl (http://useast.ensembl.org/index.html) and UniProt (http://www.uniprot.org/). For each IL, amino acid residues were numbered from the first human methionine residue, with signal peptides and propeptide amino acids included in the numbering. The number of sequences retrieved for each IL ranged from 14 to 47 (some of the IL genes were not found in all mammalian genomes) and included representatives of most mammalian groups (e.g. Artiodactyla, Carnivores, Lagomorphs, Primates, Rodents, etc). The identification of the species used for each IL and the accession numbers are listed in Appendix 1 (Supplementary material). Each of the 37 ILs were aligned using ClustalW, as implemented in the software program BioEdit version 7.1.3, 32 and adjusted manually.
Codon-based analyses of positive selection
According to Poon and collaborators, 6 and in order to reliably identify codons under positive selection, only ILs that were represented in at least 10 species were analyzed. Hence, IL-17D, IL-28A and B, IL-32, IL-36B and IL-37 were not considered. IL-3 and IL-33 were also excluded as the alignments produced were not reliable and prone to affect the detection of positive selection by leading to false predictions. 33 Signatures of positive selection are inferred if the ratio of non-synonymous substitutions per non-synonymous substitutions site (dN) over synonymous substitutions per synonymous sites (dS) is statistically higher than the value observed under neutrality, 1. Here, to detect such signatures in individual codons of mammalian IL sequences, the different dN/dS ratios (ω) were compared using two ML frameworks, the HyPhy package implemented in the Data Monkey Web Server (http://www.datamonkey.org/) 6 and CODEML implemented in PAML version 4,34,35 being considered in the analysis the results where dN/dS ratios were significantly higher than 1.
In the Data Monkey Web Server, the best-fitting nucleotide substitution model was first determined using the automated tool available. Sequences of the IL genes were analyzed under three available models, single likelihood ancestor counting (SLAC), fixed-effect likelihood (FEL) and random effect likelihood (REL). The SLAC model is based on the reconstruction of ancestral sequences and counts the number of dS and dN changes at each codon position of the phylogeny. FEL estimates ratios of dN to dS changes for each site in an alignment. REL uses a flexible distribution and allows dS and dN to vary across sites independently.36–38
In CODEML two opposing models, M7 and M8, were compared using likelihood ratio tests. While M7 assumes that ω ratios are distributed among sites according to a beta distribution allowing codons to evolve neutrally or under purifying selection, M8 is an extension of the M7 model with an extra class of sites with an independent ω ratio freely estimated from the data allowing positive selection. M7 versus M8 were compared by computing twice the difference in the natural logs of the likelihoods (2ΔlnL). The value obtained was used in a likelihood ratio test along with the degrees of freedom (2) and allowed the rejection of the null model (P < 0.05). Amino acids detected as under positive selection in M8 were identified using the Bayes Empirical Bayes approach (BEB), with posterior probability > 95%. BEB is the preferred approach because it accounts sampling errors in the ML.34,35,39–41 For each gene, a neighbor-joining tree was constructed in MEGA5 42 as working topology with selected options p-distance as the substitution model and complete deletion to gaps/missing data treatment.
For a more conservative approach, and as used previously,43,44 only sites detected to be under positive selection in more than one ML method were considered.
Results
Phylogenetic tests of positive selection. a
Codons identified by more than one ML method are underlined. Codons located in close vicinity of N-glycosylation sites are shaded and those codons related with disulfide bonds are boxed.
Codons with P-values < 0.05.
Codons with Bayes factor > 50.
P < 0.05.
Alterations in coding regions of genes that lead to amino acids substitutions can induce changes in protein conformation. These changes may be conservative or radical, and may alter physiochemical properties of the proteins, such as charge and polarity. 33 From our physiochemical study of the amino acids under postive selection (see Appendix 2 in Supplementary material) we observed that for the majority of codons under positive selection changes in charge and polarity occur. For example, in IL-36G, amino acid positions 2 and 5 correspond to 13 and 12 amino acid possibilities, respectively, i.e. the different amino acids that can be found in all the species analyzed.
Discussion
Mammalian genes involved in the immune response are among the most rapidly evolving genes25,45,46 as they include proteins with biological activities designed to protect the host (antibacterial, antiviral, antifungal or antiparasitic). As positive selection is likely correlated with sites of important activity, an effort was made to try and verify if the positions under positive selection have any particular function that led to their evolution. For most ILs with sites under positive selection, a correlation between those sites and interaction with other molecules, such as receptors, pathogens and binding proteins (BP), was found. It is known that sites where interaction of the ILs with their receptors occurs are essential for their function.26,47 To verify a cause–effect relationship between ILs and their receptors, we extended our positive selection analyses to IL receptors (ILR), in particular focusing on sites that interact with ILs at sites detected as being under positive selection (Table 2). From all ILs where positive selection was detected, only in IL-4 and IL-18 was that correlation found. For IL-4, and by focusing on the residues under positive selection known to interact with IL-4 receptors, mainly IL-4Rα and IL-13Rα (see Figure 1), our results are in agreement with previous studies.48–51 Indeed, of the IL-4 residues detected under positive selection, Glu84 is located between sites of interaction with IL-4Rα and IL-13Rα, while Arg105 and Phe106 are located in interaction sites of IL-4 with IL-4Rα. No codons under positive selection were detected at the sites of IL-4:γc interaction, as suggested by others.1,26 Of the two residues of IL-4Rα under positive selection, only Ser95 has a direct interaction with IL-4 (Asn113). IL-4 positively selected sites Arg105 and Phe106 are located near sites where IL-4 interacts with IL-4Rα.26,49,52 For IL-13Rα, none of the seven residues detected as under positive selection interacted with Glu84 of IL-4, but the positively selected ILe254 from IL-13Rα interacted with IL-4. The sites detected are in accordance with those described by Koyanagi et al.,
26
but not with those detected by O’Connel et al.
1
One explanation for this discordance is the number of species used, which differs between studies. For IL-18, of the seven residues detected as under positive selection, only Leu45 is located near the site of interaction with IL-18Rα (Lys44) and IL-18BP (Glu42). The signatures of positive selection in these sites might also be involved in modulating signaling intensity.
26
Positively selected sites in the three-dimensional structures of IL-4-IL-4Rα-IL-13Rα (marked in yellow). IL-4 appears in pink, IL-4Rα in blue and IL-13Rα in brown. Phylogenetic tests of positive selection for receptors and binding proteins
a
Codons identified by more than one ML method are underlined. Codons with P-values < 0.05. Codons with Bayes factor > 50. P < 0.05.
In their evolution, pathogens have evolved an arsenal of immune-evading strategies, which include antagonists for the host immune-related proteins 28 that target critical sites for these proteins’ functions. Thus, to escape from antagonists, those targeting sites need to continuously evolve so that the IL-ILR interaction can still be functional. Supporting this concept, sites under positive selection for IL-4 have been suggested to be associated with escaping from pathogen-encoded antagonistic proteins. 26 BP are polypeptides that have the ability to interact with corresponding proteins and neutralize them. In the ILs studied, there are many examples of such proteins. Indeed, IL-18BP has the ability to bind to IL-18 with high affinity preventing IL-18–IL-18R interaction and, therefore, neutralizing their biological activities. Some authors have demonstrated that IL-18BP prevents LPS-induced IFN-γ production.53–55 In some models, it has been described that blocking IL-18 results in a reduction in disease severity in injuries related to IL-18 increased production, namely injuries of heart, kidneys, liver, arthritis, etc. 56 By studying IL-18, Leu45 which was detected as under positive selection is located close to the site of interaction of IL-18 (Glu42) with IL-18BP. In addition, when we studied IL-18BP, we detected that Cys131 is under positive selection next to the site of interaction of IL-18BP (Lys130) with IL-18.
Nevertheless, for some ILs, the evidence of positive selection is not so intelligible. When pathogens or stress are encountered, the initial response is via the innate immune system with activation of the inflammasome. Interestingly, when examining Table 1 two of the genes with the largest number of positively selected residues are IL-1A and IL-6. IL-1A is the initial cytokine produced upon inflammasome activation, which directly leads to increased expression of IL-6. 9 Given their critical importance in the early activation of the immune response, it is tempting to speculate that there has been selective pressure to maintain (rather than to change) structure and function of these molecules from their early precursors. However, knowledge on the functional role of IL-1A and IL-6 derives mostly from studies in humans with almost nothing known for other mammals. It is possible that, in other species, these ILs might have evolved for other tasks, leading to the observed divergence in sequence across species.
Codons under positive selection were also detected within sites of N-glycosylation or where disulfide bonds exist. Glycosylation is considered to be important for protein folding, oligomerization, intrinsic stability, solubility, capacity to diffuse troughout the organism, interaction with cell surface receptors and subsequent biological activity.57–59 This modification is an effective way to generate diversity and modulate protein properties due to inherent structural variations of the glycans.57,59,60 For IL-1A, IL-9, IL-22, IL-25 and IL-31, residues under positive selection were found in close proximity to sites where N-glycosylation occurs (Table 1). Disulfide bonds play an important role in folding, stability and function of the protein, being, according to some authors,61,62 associated with functional differentiation of the proteins. These bonds are thought to be well conserved in proteins. 61 Positive selection was detected at sites where disulfide bonds exist for IL-2 and IL-17A (Table 1) and may compromise their structure, stability and function. The changes in charge and polarity observed at the positively selected codons are likely to impose changes in the protein conformation with consequences at the protein function.
Conclusions
Positive selection may function to maintain a host response to pathogens, diseases and environmental conditions. Here, we have detected positive selection in 28 ILs with some of the identified codons associated with critical functions of these proteins. Several reasons might underlie our results, including the specific biological functions in which IL-s are involved and the maintenance of critical structural features of the ILs. However, the limited knowledge of the role of specific codons in IL functions for most mammalian species hampers the complete understanding of our observations. Further functional and structural studies by using mutagenesis and crystallographic approaches should be performed for full comprehension of the role of the observed variation in mammalian ILs.
Footnotes
Funding
The Portuguese Foundation for Science and Technology (FCT-Portugal) supported the doctoral fellowship of Fabiana Neves (SFRH/BD/81916/2011), the post-doctoral fellowships of Joana Abrantes (SFRH/BPD/73512/2010) and Pedro J. Esteves (SPRH/BPD/27021/2006). This study was performed under the project conceded by FCT (PTDC/BIA-BEC/103158/2008).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
