Sage Journals: Discover world-class research

Abstract

Clustering of 16s rRNA amplicon sequences into operational taxonomic units (OTUs) is the most common bioinformatics pipeline for investigating microbial community by high-throughput sequencing technologies. However, the existing algorithms of OTUs clustering still remain to be improved at reliability. Here we propose an improved method (bioOTU) that first assigns taxonomy to unique tags at genus level for separating the error-free sequences of known species in reference database from artifacts, and then cluster them into OTUs by different strategies. The remaining tags, which fail to be clustered in the previous step, are further subjected to independent OTUs clustering by the optimized algorithm of heuristic clustering. The performance tests on both mock and real communities revealed that bioOTU is powerful for recovering the underlying profiles at both microbial composition and abundance, and it also produces comparable or less number of OTUs in comparison with the prevailing tools of Mothur and UPARSE. The bioOTU is implemented in C and Python languages with source codes freely available on the GitHub repository.

Get full access to this article

View all access options for this article.

References

Bacci

, Bani

, Bazzicalupo

, et al. 2015. Evaluation of the performances of Ribosomal Database Project (RDP) classifier for taxonomic assignment of 16S rRNA metabarcoding sequences generated from Illumina-Solexa NGS. J. Genomics, 3, 36–39.

Caporaso

J.G.

, Kuczynski

, Stombaugh

, et al. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods, 7, 335–336.

Chen

, Zhang

C.K.

, Cheng

, et al. 2013. A comparison of methods for clustering 16S rRNA sequences into OTUs. PLoS ONE, 8, e70837.

Cole

J.R.

, Wang

, Fish

J.A.

, et al. 2014. Ribosomal Database Project: Data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642.

DeSantis

T.Z.

, Hugenholtz

, Larsen

, et al. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol., 72, 5069–5072.

DeSantis

T.Z.

Jr ., Hugenholtz

, Keller

, et al. 2006. NAST: A multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res. 34, W394–W399.

Edgar

R.C.

2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 26, 2460–2461.

Edgar

R.C.

2013. UPARSE: Highly accurate OTU sequences from microbial amplicon reads. Nat. Methods, 10, 996–998.

Edgar

R.C.

, Haas

B.J.

, Clemente

J.C.

, et al. 2011. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27, 2194–2200.

10.

Giongo

, Davis-Richardson

A.G.

, Crabb

D.B.

, et al. 2010. TaxCollector: Modifying current 16S rRNA databases for the rapid classification at six taxonomic levels. Diversity, 2, 1015–1025.

11.

Haas

B.J.

, Gevers

, Earl

A.M.

, et al. 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504.

12.

Human Microbiome Project Consortium. 2012. A framework for human microbiome research. Nature, 486, 215–221.

13.

Huse

S.M.

, Dethlefsen

, Huber

J.A.

, et al. 2008. Exploring microbial diversity and raxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet. 4, e1000255.

14.

Huse

S.M.

, Welch

D.M.

, Morrison

H.G.

, et al. 2010. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ. Microbiol., 12, 1889–1898.

15.

Hwang

, Oh

, Kim

T.K.

, et al. 2013. CLUSTOM: A novel method for clustering 16S rRNA next generation sequences by overlap minimization. PLoS ONE, 8, e62623.

16.

Jiang

, Wu

, Wang

, et al. 2015. Dysbiosis gut microbiota associated with inflammation and impaired mucosal immune function in intestine of humans with non-alcoholic fatty liver disease. Sci. Rep. 5, 8096.

17.

Liu

, DeSantis

T.Z.

, Andersen

G.L.

, et al. 2008. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res. 36, e120.

18.

Mizrahi-Man

, Davenport

E.R.

, and Gilad

2013. Taxonomic classification of bacterial 16S rRNA genes using short sequencing reads: Evaluation of effective study designs. PLoS ONE, 8, e53608.

19.

Nelson

M.C.

, Morrison

H.G.

, Benjamino

, et al. 2014. Analysis, optimization and verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS ONE, 9, e94249.

20.

Patin

N.V.

, Kunin

, Lidstrom

, et al. 2013. Effects of OTU clustering and PCR artifacts on microbial diversity estimates. Microb. Ecol. 65, 709–719.

21.

Preheim

S.P.

, Perrotta

A.R.

, Martin-Platero

A.M.

, et al. 2013. Distribution-based clustering: Using ecology to refine the operational taxonomic unit. Appl. Environ. Microbiol. 79, 6593–6603.

22.

Pruesse

, Quast

, Knittel

, et al. 2007. SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 7188–7196.

23.

Schloss

P.D.

2013. Secondary structure improves OTU assignments of 16S rRNA gene sequences. ISME J. 7, 457–460.

24.

Schloss

P.D.

, and Handelsman

2005. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl. Environ. Microbiol. 71, 1501–1506.

25.

Schloss

P.D.

, and Westcott

S.L.

2011. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl. Environ. Microbiol. 77, 3219–3226.

26.

Schloss

P.D.

, Gevers

, and Westcott

S.L.

2011. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE, 6, e27310.

27.

Schloss

P.D.

, Westcott

S.L.

, Ryabin

, et al. 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541.

28.

Sun

, Cai

, Huse

S.M.

, et al. 2012. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Brief. Bioinform., 13, 107–121.

29.

Sun

, Cai

, Liu

, et al. 2009. ESPRIT: Estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res. 37, e76.

30.

Wang

, Garrity

G.M.

, Tiedje

J.M.

, et al. 2007. Naïve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267.

31.

Wang

, Cai

, Sun

, et al. 2012. Secondary structure information does not improve OTU assignment for partial 16s rRNA sequences. ISME J. 6, 1277–1280.

32.

Wang

, Yao

, Sun

, et al. 2013. M-pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinform. 14, 43.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.18 MB

bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences

Abstract

Abstract

Get full access to this article

References

Supplementary Material