Abstract
Abstract
Clustering of 16s rRNA amplicon sequences into operational taxonomic units (OTUs) is the most common bioinformatics pipeline for investigating microbial community by high-throughput sequencing technologies. However, the existing algorithms of OTUs clustering still remain to be improved at reliability. Here we propose an improved method (bioOTU) that first assigns taxonomy to unique tags at genus level for separating the error-free sequences of known species in reference database from artifacts, and then cluster them into OTUs by different strategies. The remaining tags, which fail to be clustered in the previous step, are further subjected to independent OTUs clustering by the optimized algorithm of heuristic clustering. The performance tests on both mock and real communities revealed that bioOTU is powerful for recovering the underlying profiles at both microbial composition and abundance, and it also produces comparable or less number of OTUs in comparison with the prevailing tools of Mothur and UPARSE. The bioOTU is implemented in C and Python languages with source codes freely available on the GitHub repository.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
