Abstract
Bacterial identification using genetic sequencing is fast becoming a confirmatory tool for microbiologists. Its application in veterinary diagnostic laboratories is still growing. In addition to availability of Sanger sequencing, pyrosequencing has recently emerged as a unique method for short-read DNA sequencing for bacterial identifications. Its ease of use makes it possible to diagnose infections rapidly at a low cost even in smaller laboratories. In the current study, pyrosequencing was compared with Sanger sequencing for identification of the bacterial organisms. Fifty-four bacterial isolates spanning 23 different bacterial families encountered in veterinary diagnostic microbiology laboratories were sequenced using 16S ribosomal RNA gene with pyrosequencing and Sanger sequencing. Pyrosequencing was able to identify 80% of isolates to the genus level, and 43% isolates to the species level. Sanger sequencing with approximately 500 bp performed better for both genus (100%) and species (87%) identification. Use of different sequence databases to identify bacteria isolated from animals showed relative importance of public databases compared to a validated commercial library. A time and limited cost comparison between pyrosequencing and genetic sequencing of 500 bp showed pyrosequencing was not only faster but also comparable in cost, making it a viable alternative for use in classifying bacteria isolated from animals.
Introduction
16S ribosomal (r)RNA gene analysis using sequencing as a taxonomic tool for bacterial identification is a common diagnostic tool in many clinical diagnostic laboratories.2,4 Sequencing is particularly helpful in identifying organisms that are difficult to classify based on conventional techniques. 6 Genotype-based identification is still not commonly used in many veterinary or environmental laboratories, mainly due to cost and time constraints.10,11 Pyrosequencing has emerged as an alternative to full or partial length 16S rRNA gene sequencing to identify bacterial species by signature matching using short-read sequences. 9 Pyrosequencing, like polymerase chain reaction (PCR), is much simpler, as it involves the biotinylation of one of the PCR primers and is accomplished by synthesis of the complementary strand using enzyme combinations, nucleotide bases, and a sequencing primer.
The current study describes the application of pyro-sequencing to identify animal bacterial pathogens using the 16S rRNA gene variable regions V1 and V6 4 (also previously referred to as the V3 region). 9 The identities were then compared with those obtained with more commonly used 500-bp sequencing for bacterial identification. 5 Finally, the cost and time of each sequencing technique was also compared, as well as the usefulness of various databases, both with public sequence databases as well as a commercial validated library, for performing genotypic identifications.
Materials and methods
Bacterial isolates and DNA extraction
Fifty-four bacterial isolates, from 23 bacterial families and 48 species, were included in the study. Seventeen isolates were type cultures, a and the remainder consisted of isolates from animals or specimens submitted to the Pennsylvania Veterinary Laboratory (Harrisburg, Pennsylvania) for diagnosis. These isolates were characterized using conventional biochemical methods. The isolates included in the current study covered a variety of organisms that were either recovered within the last year at the authors’ laboratory or were from a banked set from previous cases, and differed based on Gram stain, growth characteristics, and relative ease of identification using conventional identification techniques. DNA was extracted by boiling 3–5 colonies in 200 µl of molecular grade water.
Primers
For pyrosequencing, V1: 5’-(biotin) GAAGAGTTTGATCATGGCTCAG-3’and 5’-TTACTCACCCGTCCGCCACT-3’ (also used as sequencing primer); and V6: 5’-(biotin) GCAACGCGAAGAACCTTACC-3’and 5’-ACGACAGCCATGCAGCACCT-3’ (also used as sequencing primer) were used. 9 For Sanger sequencing, primers 5F and 531R were used. 6
Polymerase chain reaction
The PCR reaction was conducted b using 5 pmol of forward and reverse primers c in a 20-µl volume with 2.5 µl of DNA template using the following run conditions: 95°C for 10 sec, 35 cycles at 94°C for 0 sec and at 62°C for 15 sec, with final extension at 72°C for 1 min, and a 4°C final hold.
Pyrosequencing and 500-bp 16S rRNA sequencing
For pyrosequencing, 20 µl of PCR product was mixed with streptavidin Sepharose beads d in binding buffer, e then denatured and washed as recommended by the manufacturer’s protocol using an automated workstation. e Bead-immobilized PCR products were mixed with annealing buffer e containing 0.3 µM of sequencing primer heated at 80°C for 2 min and cooled to 25°C. The annealed products were then loaded into the workstation e as per manufacturer’s instructions with dispensation order set to 14 (CTGA). For 500-bp sequencing, the PCR products were purified, and sequencing reactions were performed following manufacturer’s recommendation, and analyzed b using the 5F (5 µM) primer.
Data, cost, and time analyses
Each sequence was analyzed using the Ribosomal Database Project’s Sequence Match (RDP; http://rdp.cme.msu.edu/seqmatch/seqmatch_intro.jsp), 3 the National Center for Biotechnology Information Basic Local Alignment Search Tool (NCBI BLAST; http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) using GenBank, 1 and microbial identification software with a MicroSEQ® ID 16S rDNA 500 Library v2.2. b Similarity scores obtained were expressed as percent genetic identity for all databases. The isolates were categorized into either a species with ≥99% match, a genus with ≥95% match, or a higher taxon with <95% with criteria described previously. 2 Obtained sequences were assessed for quality with 2 different software programs,b,e and acceptable runs and data were included in the study. Isolates with noninterpretable sequence data were retested. Materials cost and time (without instrument and labor cost) for DNA extraction, PCR, and pyrosequencing for processing 12 isolates per batch run (24 reactions) and 16 isolates per batch (full capillary array) with Sanger sequencing were computed.
Results
The ability of 16S rRNA gene pyrosequencing or 500-bp sequencing to identify the isolates is shown in Table 1. Sanger sequencing generated high-quality reads, with a mean ± standard deviation length of 484 ± 50 base pairs (bp) and few ambiguous bases that could be manually called. Pyrosequencing generated a read length of 37 ± 7 bp. Manual base calling of the pyrosequencing data was not necessary except for a few isolates to ascertain the number of bases in homopolymeric peaks. Corynebacterium pseudotuberculosis for example showed a 92.7% match (with raw data) but percent identity score increased to 100% after correcting for homopolymers.
Identification of bacteria using pyrosequencing and 500-bp Sanger sequencing.*
No ID indicates that the isolate could not be identified to any level more specific than class. Higher taxa indicates a match to a family or order. All identity percent matches are with GenBank unless otherwise indicated. Numbers in parentheses after organism name indicate number of isolates analyzed if greater than one. RDP = Ribosomal Database Project; MicroSEQ = MicroSEQ ® ID 16S rDNA 500 Library v2.2.
American Type Culture Collection type cultures.
Sanger sequencing with 500-bp sequence was able to identify all the study isolates to at least the genus level and 87% of isolates to the species level. Isolates that could not be identified to a species, but could be identified to a genus included members of Bacillaceae, Aeromonas spp., Streptomyces spp., Vibrio parahaemolyticus, and Campylobacter jejuni. Pyrosequencing, on the other hand, was able to identify 43% isolates to the species level, 80% to at least the genus level, and 91% to at least a family or order (higher taxa). The remainder (9%) could not be identified to any level of clinical significance. Pyrosequencing was able to identify Streptococcus spp. isolates including S. equi and S. dysgalactiae to the species level, as well as differentiating Staphylococcus aureus from coagulase-negative staphylococci. The advantage of short-read sequences for bacterial identification obtained with pyrosequencing was evident from the ability of the V1 region to distinguish Enterococcus cecorum from S. bovis. Using multiple gene alignments with ClustalW within the V1 region, E. cecorum can be seen as genetically distinct from other streptococci and enterococci (Table 2). Another Gram-positive clinical isolate, Paenibacillus spp., which showed ambiguous Gram stain, could also be successfully resolved using pyrosequencing. Many members of the Pasteurellaceae, Enterobacteriaceae, Bacillaceae, and Brucellaceae families could only be identified to the genus level with pyrosequencing and in some cases (e.g., Aeromonas spp. or members of Leptospiraceae), pyrosequencing could not identify the study isolates.
Multiple alignments of nucleotides within the V1 region for Streptococcus spp. and Enterococcus spp.*
Consensus bases are underlined.
Generated with V1 pyrosequencing.
Source GenBank.
The role of different databases using V1, V6, combined V1 and V6, or 500-bp 16S rRNA gene regions for bacterial identification is shown in Table 3. Pyrosequencing using RDP could identify isolates to species level slightly better than using the NCBI database. The MicroSEQ library was not found to be useful for pyrosequencing. A number of isolates such as Campylobacter fetus, Moraxella bovoculi, Rhodococcus equi, Gallibacterium anatis, and Corynebacterium pseudotuberculosis were identified better with both V1 and V6 regions rather than with V1 or V6 alone (data not shown). In addition, isolates that could be resolved to a species with 500-bp Sanger sequencing showed a greater percent identity with the NCBI database than with RDP or MicroSEQ.
Taxonomic classification of bacteria (n = 54) with 16S ribosomal RNA gene region sequencing and sequencing databases (%).*
RDP = Ribosomal Database Project; NCBI = National Center for Biotechnology Information GenBank; MSQ = MicroSEQ® ID 16S rDNA 500 Library v2.2. Data are expressed as percentages and are not cumulative within each identification (ID) group.
The time needed to complete pyrosequencing was only 2.5 hr as opposed to more than 7.5 hr needed to complete 500-bp sequencing. The Sanger sequencing cost per isolate was (in U.S. dollars) $13 for extraction, amplification, clean-up, and sequencing through the Core facility. The consumables cost for 500-bp sequencing was $8 per isolate using standard 8 µl of dye-termination reaction mix ($3 for extraction, PCR, and clean-up, and $5 for polymer, buffer, and reagent mix for 16 capillary array run). The consumables cost per isolate for pyrosequencing was comparable at $7.50 ($3 for extraction and PCR; and $4.50 for 2 pyrosequencing reactions) in a batch run but was much higher ($35.00) if only a single sample was analyzed.
Discussion
Sequencing is particularly helpful in situations where organisms are difficult to characterize using standard culture methods.5,6 Several studies have previously shown successful use of 16S rRNA gene for bacterial species identifications in human clinical medicine.5,8 The current study showed that pyrosequencing helped identify 80% of the animal origin isolates to the genus level by using sequence data from just 2 variable regions within the 16S rRNA gene: V1 and V6. Sanger sequencing with approximately 500 bp was able to identify all the study isolates to the genus level and 87% to the species level pointing to the usefulness of longer reads within the 16S rRNA gene region for identifying bacterial isolates. The study also showed pyrosequencing was able to distinguish S. aureus from other staphylococci, Rhodococcus equi from the closely related genus Dietzia spp., and S. bovis from E. cecorum. Enterococcus cecorum has recently emerged as a poultry pathogen 12 and has been previously implicated in human infections. 7 Enterococcus cecorum is phenotypically different from S. bovis but many commercial rapid kits fail to identify it correctly. At the genotypic level, E. cecorum could be identified easily.
Most organisms in the present study could be classified correctly using previously described criteria. 2 However, to avoid misclassification, the analysis should be interpreted with caution because the intra- and interspecies variability has not yet been fully established for all the genera. 5 A few organisms could not be identified with pyrosequencing while still yielding good interpretable sequence data with 500-bp sequencing. This was likely due to amplification problems, but was not further investigated beyond retesting. The 16S rRNA gene, although unique and stable in most organisms, is in some cases (e.g., Aeromonas spp. where intragenomic heterogeneity has been described) problematic when sequencing-based identifications have been encountered. 8
Both pyrosequencing and 500-bp sequencing were able to identify many of the Gram-negative fastidious organisms (e.g., Actinobacillus spp., Campylobacter spp., Histophilus spp., Brucella spp.) or nonfermenting (GNNF) rods (e.g., Pseudomonas spp., Moraxella spp.) to the genus or species level. Some of these isolates are harder to identify with conventional identification methods. Pyrosequencing could not resolve Bordetella avium, one of the GNNF isolates, beyond the family level. Clinically relevant GNNF isolates can be studied further to see if inclusion of additional V regions can improve genetic identifications.
The use of sequencing databases showed GenBank and RDP databases each offered distinct advantages for sequence matching but were more or less similar in identifying organisms using Sanger sequencing. Organisms of veterinary importance were better identified to species level with GenBank using 500-bp sequences and with the RDP database using pyrosequencing data. MicroSEQ, on the other hand, is a validated library but it was not always useful for identifying isolates recovered from animals using 500-bp sequencing. MicroSEQ was better able to identify pathogens such as Escherichia coli, S. bovis, Enterobacter aerogenes, and Klebsiella pneumoniae, also encountered frequently as human pathogens. GenBank and RDP databases both had trouble distinguishing these isolates to species level because other strains within the same genus were found to share very close identity. MicroSEQ lacked information about some of the veterinary isolates (e.g., Ornithobacterium rhinotracheale, Mycoplasma corogypsi, Histophilus somni, and Clostridium septicum). Both RDP and GenBank were better in identifying these organisms. Also, MicroSEQ (both full-length and 500-bp libraries) algorithm has no provisions for comparing short input sequences and therefore did not yield any meaningful comparison for the pyrosequence data. To aid organism identifications for the veterinary laboratories using pyrosequence data, a database similar to MicroSEQ can be developed to store validated pyrosequences.
In conclusion, the current study shows the importance of pyrosequencing using V1 and V6 regions for bacterial identification. Further analysis of other V regions (V1–V9) or combination of regions even outside 16S including 23S and or rpoB may further improve pyrosequencing-based organism identifications.4,10 The present study also showed that the longer read sequences within the 16S rRNA obtained with 500 bp were superior when identifying microorganisms. A choice for integrating these technologies in the routine workflow can be made based on the need, access, availability, and cost for full-scale sequencing. For many veterinary diagnostic laboratories, the operational cost of maintaining in-house full-sequencing capabilities is often impractical. Pyrosequencing, because it is rapid and simple, can be used as a first step to classify bacteria and/or to support conventional identifications.
Footnotes
Acknowledgements
The authors acknowledge the technical support provided by Dawn Miller, Connie Carpenter, Ann Ohme, Corey Zellers, and Hershey Medical Center sequencing core laboratory (Joe Bednarczyk and Dr. Bruce Stanley).
a.
American Type Culture Collection, Manassas, VA.
b.
Veriti® 96-Well Fast Thermal Cycler, GeneAmp® Fast PCR Master Mix, ABI 3130xl analyzer, Base Scanner software; Applied Biosystems, Carlsbad, CA.
c.
Invitrogen Corp., Carlsbad, CA.
d.
GE Healthcare, Chalfont St. Giles, United Kingdom.
e.
PyroMark™ Q24 and instrument software, Qiagen Inc., Valencia, CA.
The authors declared that they had no conflicts of interests in their authorship and publication of this contribution.
The authors received no financial support for the research, authorship, and/or publication of this article.
