Sage Journals: Discover world-class research

Abstract

Current technologies allow the sequencing of microbial communities directly from the environment without prior culturing. One of the major problems when analyzing a microbial sample is to taxonomically annotate its reads to identify the species it contains. The major difficulties of taxonomic analysis are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, and sequencing errors. Microbial communities can be studied with reads clustering, a process referred to as genome binning. In this study, we present MetaProb 2 an unsupervised genome binning method based on reads assembly and probabilistic k-mers statistics. The novelties of MetaProb 2 are the use of minimizers to efficiently assemble reads into unitigs and a community detection algorithm based on graph modularity to cluster unitigs and to detect representative unitigs. The effectiveness of MetaProb 2 is demonstrated in both simulated and real datasets in comparison with state-of-art binning tools such as MetaProb, AbundanceBin, Bimeta, and MetaCluster. On real datasets, it is the only one capable of producing promising results while being parsimonious with computational resources.

Get full access to this article

View all access options for this article.

References

Andreace

, Pizzi

, and Comin

2021. MetaProb 2: Improving unsupervised metagenomic binning with efficient reads assembly using minimizers. In Jha, S.K., Mandoiu, I., Rajasekaran, S., Skums, P., and Zelikovsky, A., eds. Computational Advances in Bio and Medical Sciences, pgs. 15–25. Cham, Springer International Publishing.

Apostolico

, Guerra

, Landau

, et al. 2016. Sequence similarity measures based on bounded hamming distance. Theor. Comput. Sci. 638, 76–90.

Bayat

, Deshpande

N.P.

, Wilkins

M.R.

, et al. 2020. Fast short read de-novo assembly using overlap-layout-consensus approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 17, 334–338.

Blondel

V.D.

, Guillaume

J.-L.

, Lambiotte

, et al. 2008. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, P10008.

Bonald

, de Lara

, Lutz

, et al. 2020. Scikit-network: Graph analysis in python. J. Mach. Learn. Res. 21, 1–6.

Brandes

, Delling

, Gaertler

, et al. 2006. Maximizing modularity is hard. arXiv:physics/0608255.

Comin

, Di Camillo

, Pizzi

, et al. 2020. Comparison of microbiome samples: Methods and computational challenges. Brief. Bioinform. 22, 88–95.

Eisen

J.A.

2007. Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes. PLoS Biol. 5, e82.

Felczykowska

, Bloch

S.K.

, Nejman-Faleńczyk

, et al. 2012. Metagenomic approach in the investigation of new bioactive compounds in the marine environment. Acta Biochim. Pol. 59, 501–505.

10.

Girotto

, Comin

, and Pizzi

2017a. Higher recall in metagenomic sequence classification exploiting overlapping reads. BMC Genomics. 18, 917.

11.

Girotto

, Comin

, and Pizzi

2017b. Metagenomic reads binning with spaced seeds. Theor. Comput. Sci. 698, 88–99. Algorithms, Strings and Theoretical Approaches in the Big Data Era (In Honor of the 60th Birthday of Professor Raffaele Giancarlo).

12.

Girotto

, Pizzi

, and Comin

2016. MetaProb: Accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics. 32, i567–i575.

13.

Guerrini

, Louza

F.A.

, and Rosone

2020. Metagenomic analysis through the extended burrows-wheeler transform. BMC Bioinformatics. 21, 299.

14.

Kang

D.D.

, Froula

, Egan

, et al. 2015. Metabat, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 3, e1165.

15.

2016. Minimap and miniasm: Fast mapping and de novo assembly for noisy long sequences. Bioinformatics. 32, 2103–2110.

16.

2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 34, 3094–3100.

17.

Lindgreen

, Adair

K.L.

, and Gardner

P.P.

2016. An evaluation of the accuracy and speed of metagenome analysis tools. Sci. Rep. 6, 19233.

18.

Mallawaarachchi

, Wickramarachchi

, and Lin

2020. GraphBin: Refined binning of metagenomic contigs using assembly graphs. Bioinformatics. 36, 3307–3313.

19.

Mande

S.S.

, Mohammed

M.H.

, and Ghosh

T.S.

2012. Classification of metagenomic sequences: Methods and challenges. Brief. Bioinform. 13, 669–681.

20.

Marchiori

, and Comin

2017. Skraken: Fast and sensitive classification of short metagenomic reads based on filtering uninformative k-mers, 59–67. In BIOINFORMATICS 2017—8th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017, volume 3, 59–67.

21.

Ounit

, Wanamaker

, Close

T.J.

, et al. 2015. Clark: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics. 16, 1–13.

22.

Pellegrina

, Pizzi

, and Vandin

2020. Fast approximation of frequent k-mers and applications to metagenomics. J. Comput. Biol. 27, 534–549.

23.

Qian

, and Comin

2019. Metacon: Unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage. BMC Bioinformatics. 20, 367.

24.

Qian

, Marchiori

, and Comin

2018. Fast and sensitive classification of short metagenomic reads with skraken, 212–226. In Peixoto, N., Silveira, M., Ali, H.H., Maciel, C., and van den Broek, E.L., eds. Biomedical Engineering Systems and Technologies. Cham, Springer International Publishing.

25.

Qin

, Li

, Raes

, et al. 2010. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 464, 59–65.

26.

Richter

, Ott

, Auch

, et al. 2008. Metasim—A sequencing simulator for genomics and metagenomics. PLoS One. 3, e3373.

27.

Sczyrba

, Hofmann

, and McHardy

A.C.

2017. Critical assessment of metagenome interpretation—A benchmark of metagenomics software. Nat. Methods. 14, 1063–1071.

28.

Segata

, Waldron

, Ballarini

, et al. 2012. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods. 9, 811–814.

29.

Staley

J.T.

, and Konopka

1985. Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu. Rev. Microbiol. 39, 321–346.

30.

Storato

, and Comin

2020. Improving metagenomic classification using discriminative k-mers from sequencing data, 68–81. In Cai, Z., Mandoiu, I., Narasimhan, G., Skums, P., and Guo, X., eds. Bioinformatics Research and Applications. Cham, Springer International Publishing.

31.

Vinh

L.V.

, Lang

T.V.

, Binh

L.T.

, et al. 2015. A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol. Biol. 10, 1–12.

32.

Wang

, Leung

H.C.

, Yiu

S.M.

, et al. 2012. Metacluster 5.0: A two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics. 28, i356–i362.

33.

Wood

, and Salzberg

2014. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46.

34.

Y.W.

, and Ye

2011. A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18, 523–534.

35.

Zielezinski

, Girgis

, Bernard

, et al. 2019. Benchmarking of alignment-free sequence comparison methods. Genome Biol. 20, 144.

36.

Zielezinski

, Vinga

, Almeida

, et al. 2017. Alignment-free sequence comparison: Benefits, applications, and tools. Genome Biol. 18, 186.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.08 MB

MetaProb 2: Metagenomic Reads Binning Based on Assembly Using Minimizers and K-Mers Statistics

Abstract

Get full access to this article

References

Supplementary Material