LRRfinder2.0: A webserver for the prediction of leucine-rich repeats

Abstract

Leucine-rich repeats (LRRs) are versatile motifs present in more than 6000 proteins throughout the phylogenetic kingdom. Tandem LRRs generate a characteristic horseshoe with a diverse range of functions. Fulfilling a key role in the innate immune system, LRRs form the TLR and NOD-like receptor (NLR) pathogen-recognition domain. Host–pathogen interactions mediated by LRRs drive those involved in ligand recognition to become distinct from their consensus motif. Most LRRs range between 21 and 30 residues; however, large insertions in certain TLRs can generate repeats of over 60 amino acids. LRR variability makes them ideal for species-specific mediation of host-pathogen interactions. Teleost TLRs show large insertions, making cross-species alignments difficult without prior demarcation of their LRR motifs. We present LRRfinder2.0, a webserver for LRR prediction. LRRfinder2.0 utilizes scoring matrices comprising more than 60,000 LRR motifs from more than 200 species. The underlying TLR database tLRRdb contains more than 3500 manually annotated sequences, augmenting identification of irregular LRR motifs.

Keywords

Leucine-rich repeat Toll-like receptor LRRfinder

Introduction

Leucine-rich repeats (LRRs) have been identified in more than 6000 proteins, belonging to a more general class of solenoid structures.¹ Structurally stable LRR motifs provide a solution to the conservative nature of evolution, facilitating modifications to perform a diverse range of functions. LRR-containing proteins have been identified in many biologically important processes in plants, invertebrates and vertebrates, including extracellular matrix assembly, cell signalling and adhesion, neuronal development and host–pathogen interactions.^2–6

LRR domains are comprised of between 2 and 45 tandemly arranged LRR motifs for which 7 classes have been proposed, characterized by differing lengths and motif consensus: ‘Cysteine-containing’, ‘RI-like’, ‘SDS22-like’, ‘Bacterial’, ‘Plant-specific’, ‘Treponema pallidum’ and ‘Typical’.⁷ Each 20–30 amino acid-containing motif can be separated into a highly conserved segment (LRRhs) and a variable segment (LRRvs), with a LRRhs sharing consensus of LxxLxLxxN/C(x)xL where L represents Leu, Ile, Val or Phe, N stands for Asp, Thr, Ser or Cys, and x is any amino acid.¹

The structure and tandem arrangement of LRRs in stretches of variable length provides a highly evolvable and versatile framework for binding myriad ligands.^4,8 Throughout the phylogenetic kingdom, LRRs are involved in pathogen recognition, as demonstrated by the vastly expanded repertoire of LRR-containing proteins in the sea urchin that respond to microbial-associated molecular patterns (MAMPs) in the absence of an adaptive immune system.² In mammals, MAMPs are recognized primarily by pattern recognition receptors (PRRs), distinguishing between self and microbial structures to initiate an innate immune response.⁹ In this process, TLRs, as well as NOD-like receptors (NLRs), are key components of the innate immune system. TLRs and NODs are comprised of an ‘extracellular’ LRR domain, mediating pathogen recognition and an ‘intracellular’ signal domain initiating cytokine production.^9,10 In TLRs, the extracellular LRR domain has been shown to be under positive selection, generating a diverse collection of motifs as an ideal basis for an LRR predictor.^11–15 As advances in high-throughput technologies generate sequences at a rapid pace, we need to have improved tools allowing for the fast and precise identification of LRRs in innate immune receptors, to allow for detailed ligand-binding analysis, as well as comparative studies.

Here, we present LRRfinder2.0, a webserver for the prediction of LRRs. Based upon TLR sequences, the LRRfinder prediction method has been applied previously to LRR identification in both plant and mammalian immune receptors.¹⁶ A searchable database of domain and motif annotated TLR-sequences has been extended to include more than 60,000 sequences from more than 200 species. The latest release offers improved prediction with more than 14,000 unique LRR motifs and incorporates a post-translational modification site, surface accessibility and structural predictions. LRRfinder2.0 has a broad range of applications, including LRR demarcation for improved alignments in comparative modelling, identification of functionally important residues and scanning of novel genomes for immune-related proteins.

Material and methods

TLR database: tLRRdb

The LRRfinder2.0 prediction model is based upon a sequence-generated position-specific scoring matrix (PSSM), as described in Offord et al.¹⁶ The current release includes more than 3000 full-length and partial nucleotide sequences translated from the NCBI and Ensembl databases.^17,18 LRRhs motif positions were annotated manually using multiple Clustal¹⁹ alignments of known LRR-containing structures from the Protein Data Bank (PDB).²⁰

Most TLRs are comprised of up to six key regions. Signal peptide cleavage sites were predicted using SignalP and Phobius.^21,22 The spanning region between the cleavage site and the first LRRhs was determined as the LRR N-terminus (LRRNT) if it contained at least one cysteine residue. An annotated LRRCT domain includes the last LRR, containing the consensus CxC region, and ends at the transmembrane barrier. The exceptions to this rule are the insect Toll proteins, which can include several LRR C-terminus (LRRCT) regions between LRR-containing domains. The transmembrane helix was predicted using a combination of TMHMM, MEMSAT and Phobius.^22–24 The downstream signalling Toll/interleukin-1 (TIR) domain was annotated as any region following a transmembrane helix. Partial sequences consisting of only a TIR domain were annotated manually from multiple sequence alignments.

The annotated sequences are stored in the TLR database tLRRdb and can be searched via several options. These include: accession, keyword, phylogeny, TLR, LRRhs motif, LRR length and sequence. BLAST²⁵ and Clustal have been implemented to allow users to identify and align the most related sequences to their input query. In addition, known LRR-containing PDB sequences have been provided, allowing for several alignment colour schemes to be available. These include Clustal colours, similarity, secondary structure and domain annotation.

LRRfinder2.0 predictor: Features and description

The LRRfinder2.0 server accepts protein queries via four input methods: user-defined protein sequence, NCBI protein accession, NCBI nucleotide accession or user-defined nucleotide sequence. Nucleotide sequences are automatically translated prior to analysis.

Protein sequences are analysed using the protocol described in Figure 1. Several PSSM options are available to the user in this release. TLR- and taxon-specific matrices are provided to allow for the observed variation in amino acid usage between phylogenetic groups (data not shown). LRR-containing proteins are not limited to TLRs therefore we have included a PSSM option based upon all LRR-containing PDB sequences and their related NCBI counterparts.

Figure 1.

Schematic layout of LRRfinder2.0 webserver protocol for LRR prediction. The LRRfinder2.0 sequence processing can be divided into six stages. Following LRR identification, the submitted sequence is passed to several applications for domain, post-translational modification, structural property and surface accessibility prediction. The information is then compiled and presented in subsections for user-friendly viewing.

LRRfinder2.0 prediction is applied using an 11-residue sliding window, a user-defined PSSM and annotated LRRhs motifs from tLRRdb. Known motifs have precedence over predictions due to the irregularity of ligand-binding repeats and the tabular output provides details on all tLRRdb matches within the sequence. User-defined e-value thresholds categorize predicted LRRhs regions as either ‘significant’ or ‘insignificant’, and list overlapping predictions.

TLRs are comprised of several domains, including signal peptide and transmembrane regions that are predicted by SignalP and TMHMM respectively. LRR-domain capping regions and TIR domain are identified via a BLAST comparison to tLRRdb, listing the top hits and similarity scores. Post-translational modifications, such as glycosylation and phosphorylation sites, are predicted using the CBS tools NetCGlyc, NetOGlyc, NetNGlyc and NetPhos.^26–28

Comparative protein modelling relies upon accurate alignments, assisted by the demarcation of domains and LRR identification. Secondary structure and surface accessibility predictions via PSIPRED²⁹ have been included to allow users to validate LRRhs predictions, identify functionally important residues and improve alignment of the LRRvs, which may vary in sequence, but maintain structural conservation.

Results and discussion

The evolution of innate immune receptors has been studied by many researchers. Phylogenetic analyses have shown that the vertebrate TLR and NLRs evolved independently by gene duplication prior to the divergence of protostomes and deuterostomes,^30,31 with the occurrence of both gene loss and gene conversions.¹³ Although vertebrate TLRs have been cited as an example of evolutionary conservation and strong functional constraint,³⁰ recent studies have suggested positive selection in some TLRs in an extremely broad range of organisms, including primates, teleosts and birds.^{11–15,32–34} Within innate immune receptors the solenoid LRR domains often show higher rates of evolution than their intracellular counterparts.³⁵

Taking this evidence into consideration, the exact identification of LRRs is imperative. The relevance of LRRfinder2.0 in computational modelling of LRR-containing proteins has been shown for bovine TLR2, improving MODELLER³⁶ sequence alignments by influencing manual adjustments (Willcocks et al., submitted). In addition, the LRRfinder methodology has also been applied to the identification of LRR motifs in echinoderm, plant and mammalian immune-related proteins.^{14,15,37–42} The results generated provide insight into the functional diversification of LRRs between innate immune receptors of different species. Indeed, using LRRFinder2.0 with sequences from urochordates, we were able to detect a variety of LRR-containing proteins. Urochordates are model organisms in comparative and evolutionary immunology. Genome-wide analysis of Ciona intestinalis has previously identified two TLR candidates capable of recognizing myriad MAMPs and inducing cytokine production.⁴³ A standalone version of the LRRfinder2.0 PSSM predictor was used to scan the C. intestinalis and C. savignyi Ensembl protein libraries finding more than 500 and 800 LRR-containing proteins respectively. LRRfinder2.0 was able to identify more LRRs within the two TLR candidates, improving on previous motif predictions by SMART,⁴⁴ as shown in Figure 2. The success of the current system for the identification of LRR-containing proteins via genome scanning strongly suggests that the process can be applied to a much greater sample size, spanning all LRR protein classes and may therefore provide a better insight into LRR evolution and the ancestral immune repertoire.

Figure 2.

Comparison of LRR prediction methods for Ci-TLR1 and Ci-TLR2. Sasaki et al.⁴³ identified 7 and 10 LRRs in the Ciona intestinalis receptors Ci-TLR1 (NP_001159599.1) and Ci-TLR2 (NP_001159600.1), respectively, using SMART. LRRfinder2.0 shows a significant increase in the number of predicted LRRs by comparison, identifying 18/19 and 17/18 motifs in Ci-TLR1 and Ci-TLR2 respectively.

The LRRfinder2.0 webserver provides a user-friendly application for the identification of LRRs, post-translation modification sites, surface accessibility and secondary structure prediction in LRR-containing proteins. LRRfinder2.0 has broad applications, including improving alignments for computational modelling, identifying functionally important residues and scanning novel genomes for LRR-containing proteins which have the potential to be involved in immune-related processes. Further investigation using our dataset could be used to provide insight into the structural variations caused by naturally-occurring indels and, together with available crystal structures of LRR-containing proteins, may provide explanations for the differences in protein-ligand interactions identified in the TLRs of different species.

Availability and requirements

Project home page: www.lrrfinder.com.

Operating system(s): Platform independent (web server).

Programming language(s): Perl, PHP, JavaScript, Ajax, CSS and HTML.

License: No license required. For information about a standalone version, please contact: vofford@rvc.ac.uk

Footnotes

Acknowledgements

We would like to thank Mr S. Thompson for infrastructure support during the development of LRRfinder2.0.

Funding

This work was supported by grant R.9VPR.OFFV of the RVC to DW. The manuscript represents publication number PID_00447 of the RVC.

References

Enkhbayar

Kamiya

Osaki

. Structural principles of leucine-rich repeat (LRR) proteins. Proteins 2004; 54: 394–403.

Ghosh

Lun

Majeske

. Invertebrate immune diversity. Develop Comp Immunol 2011; 35: 959–974.

Hocking

Shinomura

McQuillan

. Leucine-rich repeat glycoproteins of the extracellular matrix. MatrixBiol 1998; 17: 1–19.

Kedzierski

Montgomery

Curtis

Handman

. Leucine-rich repeats in host-pathogen interactions. Arch Immunol Ther Exp (Warz) 2004; 52: 104–112.

Matsushima

Tachi

Kuroki

. Structural analysis of leucine-rich-repeat variants in proteins associated with human diseases. Cell Mol Life Sci 2005; 62: 2771–2791.

DeYoung

Innes

. Plant NBS-LRR proteins in pathogen sensing and host defense. Nat Immunol 2006; 7: 1243–1249.

Kobe

Kajava

. The leucine-rich repeat as a protein recognition motif. Curr Opin Struct Biol 2001; 11: 725–732.

Mariuzza

Velikovsky

Deng

. Structural insights into the evolution of the adaptive immune system: the variable lymphocyte receptors of jawless vertebrates. Biol Chem 2010; 391: 753–760.

Werling

Jungi

. TOLL-like receptors linking innate and adaptive immune response. Vet Immunol Immunopathol 2003; 91: 1–12.

10.

Inohara Chamaillard McDonald

Nunez

. NOD-LRR proteins: role in host-microbial interactions and inflammatory disease. Annu Rev Biochem 2005; 74: 355–383.

11.

Ferwerda

McCall

Alonso

. TLR4 polymorphisms, infectious diseases, and evolutionary pressure during migration of modern humans. Proc Natl Acad Sci U S A 2007; 104: 16 645–16650.

12.

Jann

Werling

Chang

. Molecular evolution of bovine Toll-like receptor 2 suggests substitutions of functional relevance. BMC Evol Biol 2008; 8: 288–288.

13.

Temperley

Berlin

Paton

. Evolution of the chicken Toll-like receptor gene family: a story of gene gain and gene loss. BMC Genomics 2008; 9: 62–62.

14.

Areal

Abrantes

Esteves

. Signatures of positive selection in Toll-like receptor (TLR) genes in mammals. BMC Evol Biol 2011; 11: 368–368.

15.

Smith

Jann

Haig

. Adaptive evolution of Toll-like receptor 5 in domesticated mammals. BMC Evol Biol 2012; 12: 122–122.

16.

Offord

Coffey

Werling

. LRRfinder: a web application for the identification of leucine-rich repeats and an integrative Toll-like receptor database. Develop Compar Immunol 2010; 34: 1035–1041.

17.

Schuler

Epstein

Ohkawa

Kans

. Entrez: molecular biology database and retrieval system. Methods Enzymol 1996; 266: 141–162.

18.

Hubbard

Barker

Birney

. The Ensembl genome database project. Nucliec Acids Res 2002; 30: 38–41.

19.

Larkin

Blackshields

Brown

. Clustal W and Clustal X version 2.0. Bioinformatics 2007; 23: 2947–2948.

20.

Berman

Westbrook

Feng

. The Protein Data Bank. Nucleic Acids Res 2000; 28: 235–242.

21.

Petersen

Brunak

von Heijne

Nielsen

. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 2011; 8: 785–786.

22.

Kall

Krogh

Sonnhammer

. Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Res 2007; 35: W429–432.

23.

Sonnhammer

von Heijne

Krogh

. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 1998; 6: 175–182.

24.

Jones

Taylor

Thornton

. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 1994; 33: 3038–3049.

25.

Altschul

Gish

Miller

. Basic local alignment search tool. J Mol Biol 1990; 215: 403–410.

26.

Julenius

. NetCGlyc 1.0: prediction of mammalian C-mannosylation sites. Glycobiology 2007; 17: 868–876.

27.

Hansen

Lund

Tolstrup

. NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. GlycoconjugateJ 1998; 15: 115–130.

28.

Blom

Sicheritz-Ponten

Gupta

. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 2004; 4: 1633–1649.

29.

McGuffin

Bryson

Jones

. The PSIPRED protein structure prediction server. Bioinformatics 2000; 16: 404–405.

30.

Roach

Glusman

Rowen

. The evolution of vertebrate Toll-like receptors. Proc Natl Acad Sci U S A 2005; 102: 9577–9582.

31.

Hughes

Piontkivska

. Functional diversification of the toll-like receptor gene family. Immunogenetics 2008; 60: 249–256.

32.

Chen

Wang

Tzeng

. Evidence for positive selection in the TLR9 gene of teleosts. Fish Shellfish Immunol 2008; 24: 234–242.

33.

Wlasiuk

Nachman

. Adaptation and constraint at Toll-like receptors in primates. Mol Biol Evol 2010; 27: 2172–86.

34.

Huang

Temperley

Ren

. Molecular evolution of the vertebrate TLR1 gene family – a complex history of gene duplication, gene conversion, positive selection and co-evolution. BMC Evol Biol 2011; 11: 149–149.

35.

Mikami

Miyashita

Takatsuka

. Molecular evolution of vertebrate Toll-like receptors: evolutionary rate difference between their leucine-rich repeats and their TIR domains. Gene 2012; 503: 235–243.

36.

Fiser

Sali

. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol 2003; 374: 461–491.

37.

Biruma

Martin

Fridborg

. Two loci in sorghum with NB-LRR encoding genes confer resistance to Colletotrichum sublineolum. Theor Appl Genet 2012; 124: 1005–1015.

38.

Wang

Sun

. Recognition of nucleic acid ligands by toll-like receptors 7/8: importance of chemical modification. Curr Med Chem 2012; 19: 1365–1377.

39.

Martin

Biruma

Fridborg

. A highly conserved NB-LRR encoding gene cluster effective against Setosphaeria turcica in sorghum. BMC Plant Biol 2011; 11: 151–151.

40.

Buckley

Rast

. Dynamic evolution of toll-like receptor multigene families in echinoderms. Front Immunol 2012; 3: 136–136.

41.

Russell

Widdison

Leigh

Coffey

. Identification of single nucleotide polymorphisms in the bovine Toll-like receptor 1 gene and association with health traits in cattle. Vet Res 2012; 43: 17–17.

42.

Buckley

Rast

. Characterizing immune receptors from new genome sequences. Methods Mol Biol 2011; 748: 273–298.

43.

Sasaki

Ogasawara

Sekiguchi

. Toll-like receptors of the ascidian Ciona intestinalis: prototypes with hybrid functionalities of vertebrate Toll-like receptors. J Biol Chem 2009; 284: 27 336–27343.

44.

Ponting

Schultz

Milpetz

Bork

. SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Res 1999; 27: 229–232.