Sage Journals: Discover world-class research

Abstract

Metatranscriptomic analysis provides information on how a microbial community reacts to environmental changes. Using next-generation sequencing (NGS) technology, biologists can study the microbe community by sampling short reads from a mixture of mRNAs (metatranscriptomic data). As most microbial genome sequences are unknown, it would seem that de novo assembly of the mRNAs is needed. However, NGS reads are short and mRNAs share many similar regions and differ tremendously in abundance levels, making de novo assembly challenging. The existing assembler, IDBA-MT, designed specifically for the assembly of metatranscriptomic data and performs well only on high-expressed mRNAs. This article introduces IDBA-MTP, which adopts a novel approach to metatranscriptomic assembly that makes use of the fact that there is a database of millions of known protein sequences associated with mRNAs. How to effectively use the protein information is nontrivial given the size of the database and given that different mRNAs might lead to proteins with similar functions (because different amino acids might have similar characteristics). IDBA-MTP employs a similarity measure between mRNAs and protein sequences, dynamic programming techniques, and seed-and-extend heuristics to tackle the problem effectively and efficiently. Experimental results show that IDBA-MTP outperforms existing assemblers by reconstructing 14% more mRNAs.

Get full access to this article

View all access options for this article.

References

Benson

, Karsch-Mizrachi

, Lipman

, et al. 2000. GenBank. Nucleic Acids Res., 28, 15–18.

Booijink

, Boekhorst

, Zoetendal

, et al. 2010. Metatranscriptome analysis of the human fecal microbiota reveals subject-specific expression profiles, with genes encoding proteins involved in carbohydrate metabolism being dominantly expressed. Appl. Environ. Microbiol. 76, 5533–5540.

Bosch

, and Grody

2008. Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. J. Mol. Diagn., 10, 484–492.

Eisen

2007. Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol., 5, e82.

Finn

, Tate

, Mistry

, et al. 2000. The Pfam protein families database. Nucleic Acids Res., 28, 263–266.

Frias-Lopez

, Shi

, Tyson

, et al. 2008. Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. USA, 105, 3805–3810.

Fullwood

, Wei

, Liu

, et al. 2009. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 19, 521–532.

Gilbert

, Field

, Huang

, et al. 2008. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE, 3, e3042.

Glazer

, and Kechris

2009. Conserved amino acid sequence features in the α subunits of MoFe, VFe, and FeFe nitrogenases. PLoS ONE, 4, e6136.

10.

Grabherr

, Haas

, Yassour

, et al. 2011. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol., 29, 644–652.

11.

Henikoff

, and Henikoff

1992. Amino acid substitution matrices from protein blocks. PNAS, 89, 10915–10919.

12.

Kent

2002. BLAT—the BLAST-like alignment tool. Genome Res., 12, 656–664.

13.

Leung

, Yiu

, Parkinson

, et al. 2013. IDBA-MT: de novo assembler for metatranscriptomic data generated from next-generation sequencing technology. J. Comput. Biol. 20, 540–550.

14.

Morozova

, and Marra

2008. Applications of next-generation sequencing technologies in functional genomics. Genomics, 92, 255–264.

15.

Parro

, Moreno-Paz

, and Gonzalez-Toril

2007. Analysis of environmental transcriptomes by DNA microarrays. Env. Microbiol., 9, 453–464.

16.

Peng

, Leung

, Yiu

, et al. 2012. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28, 1420–1428.

17.

Pettersson

, Lundeberg

, and Ahmadian

2009. Generations of sequencing technologies. Genomics, 93, 105–111.

18.

Poretsky

, Bano

, Buchan

, et al. 2005. Analysis of microbial gene transcripts in environmental samples. Appl. Environ. Microbiol., 71, 4121–4126.

19.

Poretsky

, Sun

, Mou

, et al. 2010. Transporter genes expressed by coastal bacterioplankton in response to dissolved organic carbon. Environ. Microbiol., 12, 616–627.

20.

Qin

, Li

, Raes

, et al. 2010. A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464, 59–65.

21.

Schulz

, Zerbino

, Vingron

, et al. 2012. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 28, 1086–1092.

22.

Tartar

, Wheeler

, Zhou

, et al. 2009. Parallel metatranscriptome analyses of host and symbiont gene expression in the gut of the termite Reticulitermes flavipes. Biotechnol. Biofuels, 2, 25.

23.

Tatusov

, Koonin

, and Lipman

1997. A genomic perspective on protein families. Science, 278, 631–637.

24.

Xiong

, Frank

, Robertson

, et al. 2012. Generation and analysis of a mouse intestinal metatranscriptome through illumina based RNA-sequencing. PLoS ONE, 7, e36009.

IDBA-MTP: A Hybrid Metatranscriptomic Assembler Based on Protein Information

Abstract

Abstract

Get full access to this article

References