Abstract
The human intestinal microbiome is a microbial ecosystem that expresses as many as 100 times more genes than the human host, thereby constituting an important component of the human
The Microbiome
Microbiome research has greatly benefited from the technological breakthroughs that enabled the genomic characterization of the human genome less than two decades ago. Soon after the deciphering of the human genome, attention has been shifting to the enormous genomic prokaryotic gene pool within the human body, which far exceeds that of the human eukaryotic genome, but whose contribution to human physiology remained elusive. The Nobel laureate Joshua Lederberg first termed
Collectively, these pioneering works led to the discovery that our microbiome consists of vast numbers of cells, with the latest estimates indicating approximately equal numbers between microbiota cells and our own cells,
8
and express as many as 100 times more genes when compared to the human eukaryotic gene pool. A great deal of heterogeneity was shown to exist between individuals in their microbiome composition.
6
While the basis to this heterogeneity is not entirely comprehended, human microbiome structure and stability is estimated to be influenced by a multifactorial array of host genetics and immune and environmental factors.6,9 Later, it was realized that healthy individuals harbor a
The microbiota communities, of which the gut microbiome is the very best studied, play important multifactorial roles in human physiology. They are important in controlling pathogen colonization 13 and in immune system development, 14 and they help us in our digestion by hydrolyzing the compounds in our diet, which could not be broken down through enzyme production, 15 and in the production of vitamins, such as vitamins B12, B5, and K. 16 Changes in microbiota populations, termed dysbiosis, have been associated with a number of human conditions, such as inflammatory bowel disease, 17 obesity, 12 and nonalcoholic fatty liver disease. 18 Furthermore, dysbiosis has been shown to occur as a result of pathogen infections, such as Human Immunodeficiency Virus (HIV) 19 and influenza virus. 20 Concise general reviews of microbiome functions and associations with disease are comprehensively described elsewhere.21–24
Characterization of the Microbiome
Characterizing the microbiota in terms of their taxonomy and phylogeny has been carried out in a large number of studies by sequencing of the 16S ribosomal RNA subunit gene.25–27 This gene contains regions that are conserved throughout bacterial species and hypervariable regions that are unique for specific genera, which are targeted for sequencing and used for taxonomic characterization. The sequenced variable regions are then clustered into Operational Taxonomic Units, providing invaluable information on their taxonomic characterization.28–30
Whole genome shotgun sequencing followed by metagenomic analysis adds a more detailed layer of information to the taxonomical characterization of a sample, by generating information on the gene composition of the bacteria present.
31
This information can in turn be used to discover new genes and to formulate putative functional pathways and modules, thus providing insight into functional and genetic microbiome variability.
32
Metagenomic analysis is carried out on genomic DNA isolated from the environment under study, but it does not distinguish whether this genomic DNA comes from cells that are viable or not or whether the predicted genes are actually expressed and under what conditions.
33
In addition, other
Characterization of the Metatranscriptome
Recent advances in sequencing technologies that have revolutionized metagenomic analyses have also advanced the approaches aimed at studying and understanding gene expression on a global scale. Understanding the critical roles that host gene expression plays on a cellular or tissue level has come a long way from the elegantly described differential display approach in the early days
39
to a global
Isolation and Processing of Microbiome mRNA
Typically, a metatranscriptome experiment of the microbiome involves isolation of total RNA from bacteria colonizing the area of interest (eg, gut, skin, and oral cavity). In eukaryotes, mRNA can be selected by synthesizing cDNA using oligod(T) primers, therefore taking advantage of the poly-A tail characterizing mRNA species. Prokaryotic mRNA makes up only 1%-5% of total RNA species, with the majority being 16S and 23S rRNAs as well as tRNAs. 45 However, in contrast to eukaryotic mRNA, prokaryotic mRNA lacks a poly-A tail, making its selection during cDNA synthesis inapplicable. Therefore, various approaches have been developed and implemented to address this issue.46–48 Removal of rRNA with the use of probes targeting specific rRNA regions that are attached to magnetic beads represents an attractive option. The process involves annealing of probes to target sequences (rRNA) followed by their removal with the use of a magnet. 46 As with all methods involving RNA manipulation, the challenge of avoiding degradation by contaminating ribonucleases is also presented in this approach. Maintaining the commonly implemented laboratory practices for these types of sensitive protocols to avoid introduction of contaminating ribonucleases is important, and the incorporation of RNase inhibitors into the procedure can represent an effective protection strategy. What remains is an enriched population of other mRNAs that are representative of transcriptionally active genes. For massively parallel sequence analysis, these RNAs are fractionated, cDNA is synthesized, and adapters are ligated to the cDNA ends (following end repair) generating a library that is amplified and then sequenced. Sequence reads are mapped to reference genomes, and the expressed genes are identified based on the sequence reads covering these regions. 49
Computational Analysis of Metatranscriptomics Data
A typical metatranscriptome dataset contains many millions of sequenced mRNA molecules, termed RNA-seq reads. Moreover, as metatranscriptome experiments are consistently increasing in size and number, automated, efficient, high-throughput analyses are essential to infer the biological meaning from these datasets.33,50 Several comprehensive analysis suites (eg, HUMAnN 51 and MG-RAST 52 ) have been developed over the past few years, are extensively used, and provide an end-to-end solution. These are applied alongside the combinations of specialized bioinformatic tools (eg, BOWTIE 53 and GEM 54 for mapping, Trimmomatic 55 for quality filtering, and CuffDuff 56 for differential gene expression) to achieve the same overall goal of inferring the gene expression levels and changes in gene expression levels, from the raw sequenced mRNA reads. A few analytic steps are essential in this process and are, therefore, present uniformly exist in all metatranscriptome analyses. These steps consist of the filtering of non-mRNA reads, and as well as the host reads, filtering and trimming low-quality reads and nucleotides (similar to the quality control process in high-throughput metagenomic analysis), identifying the open-reading frames, mapping the reads to a reference database, normalization, and calculations of the gene expression levels along with other summarizing statistics. 41
An analytic step that is optional is the assembly of the reads into contigs, which can be executed after the initial filtering. If executed, the assembly step is followed by mapping the contigs to reference genomes, when these are available. While an assembly step is challenging computationally and requires higher quality experimental sequencing data, it holds the potential to uncover information regarding the gene expressions that is not attainable without it, such as the relation between adjacent genes and the start and stop sites. Experimentally, to enable the assembly, deeper sequencing is required, and therefore, commonly only highly abundant regions can be assembled from a larger set of reads. 57 An assembly step is essential in cases in which a reference genome and subsequent gene annotations are not available. This is less common in the context of the gut microbiota but is relevant in RNA-seq of nonmodel organisms. In the event that a reference genome is not available, the annotations of the sequenced transcripts are usually obtained by sequence similarity to sequenced and annotated proteins. In other words, the assembled transcripts are aligned against large annotated protein databases with software, such as Blast2GO, 58 and if highly similar proteins are found, a similar biological function is usually inferred. Some suites for full transcriptome reconstruction have been developed and are based on extensive computational techniques, usually relying on graph-theoretic concepts (eg, Trinity). 59
Another important issue in the analysis and inference of biological information from metatranscriptomics data is combining the analysis of the RNA-seq data and the whole DNA data, ie metagenomics. Analyzing these two types of data simultaneously for a sample enables us to conclude the actual expressed genes vs the potentially existing genes. 43 Regardless of the existence of the assembly step, at the end of the RNA-seq analysis and the postnormalization process, a summary of the data is converted into relative gene expression values and can then be further analyzed similar to the statistical analyses seen in 16S and metagenomic sequencing (eg, gene expression level within a sample, richness within samples, and similarity between samples).
Utilization of Metatranscriptomics in Health and Disease
Assessment of Microbial Activity
Identifying functionally active bacteria within a mixed bacterial microbiome may highlight the disease-driving bacteria within a generally inactive microbial pool. Several strategies of determining transcriptionally active bacteria have been described.33,60 Gosalbes et al. 33 utilized the presence of 16S rRNA transcript as a way to determine the phylogenetic structure of active bacteria in the gastrointestinal tract (finding the phyla Firmicutes as predominantly active followed by Bacteroidetes). In healthy individuals, characterization of mRNA revealed activation of pathways involved in carbohydrate metabolism, cell component synthesis, and energy production.
The value of characterizing microbiota in a combined metagenomics–metranscriptomics approach is highlighted by the effect of commensal bacteria on xenobiotics. As discussed earlier, gut microbiota plays an important role in metabolizing carbohydrates and proteins by producing and secreting an array of enzymes. These metabolic activities can potentially affect xenobiotic stability when the xenobiotics are substrates of these enzymes. There are over 40 xenobiotics (that are or have been on the market) that are affected by the gut microbiota without the underlining mechanisms being fully understood.
61
This effect is best shown in a tragic turn of events where an antiviral drug marketed under the generic name sorivudine was converted by the gut microbiota into a compound that inhibited the metabolism of the anticancer drug 5-fluorouracil causing its accumulation leading to toxicity. Within a short period of time, 18 patients who were prescribed a combination of sorivudine and 5-fluorouracil died.
62
Although it is established that gut bacteria are responsible for the metabolism of many drugs, the exact bacteria involved and the molecular pathways implicated are frequently unknown. Recent studies are beginning to unravel the microbiota metabolic processes that influence drug metabolism. An elegant study by Maurice et al.
60
implemented a flow cytometry approach to isolate active bacterial populations from the gut that were then characterized by 16S rRNA gene sequencing and metatranscriptomics to determine the gene expression profiles in response to xenobiotics. This study showed that there are distinct sets of active bacteria in the gut (composed mainly of
Another important example can be seen in the case of the cardiac drug digoxin that can be inactivated by gut microbiome metabolism. Transcriptional profiling revealed that specific strains of the gut bacteria,
Assessment of Microbiome–immune Interactions
The effects of microbiome on the mucosal immune system are considered pivotal in affecting host physiology. Studies focusing on toll-like receptor 5 (TLR5) knockout (KO) mice are an interesting example of the use of metatranscriptomics to complement metagenomics and 16S rRNA characterization of such microbiota–immune interactions. TLR5 is expressed in the intestinal mucosa and recognizes flagellin, the principal component of the bacterial flagella. Mice lacking TLR5 were shown to develop metabolic syndrome
63
and colitis
64
and were characterized by dysbiosis of the gut microbiota.63,65 Another impairment noted in TLR5-deficient mice relates to the maintenance of barrier function. Indeed, the mucosal innate immune system plays diverse roles in microbial containment through mechanisms, such as regulation of protective mucus production
66
and secretion of high concentrations of IgA enabling coating of
Studying Microbiome Antisense RNA
Although traditional metatranscriptome analysis involves characterizing the mRNA transcripts under specific environmental conditions, and from this data determining metabolic pathways that are activated, the bacterial transcriptome represents a high level of complexity.
70
As RNA-seq methods matured allowing for strand-specific libraries to be constructed, it was revealed that the bacterial transcriptome encodes a surprisingly large number of
Studying Microbiome Small Noncoding RNAS
The bacterial transcriptome includes small noncoding RNAs (sRNAs) that are generally between 50 and 500 bp in size and are involved in gene regulation.
70
They do so by interacting, through base pairing, with the 5‘-Untranslated Region (UTR) of target mRNA sequences regulating the translation or stability of the transcript.
70
They are involved in regulating important processes in bacteria, such as iron metabolism,
78
virulence,
73
and quorum sensing,
79
and are important as they allow for rapid adaptation to changing environments.
80
sRNAs have been identified in multiple bacterial species.
81
The advent of next-generation sequencing methods has accelerated their identification for various bacterial species, such as
Limitations and Challenges of Metatranscriptomics Analysis
Several challenges associated with metatranscriptome analysis merit mentioning. The isolation of high-quality RNA samples from some biological samples (such as feces) can be a difficult if not daunting task. Experimental strategies have been developed to tackle some of these issues
42
; nevertheless, significant challenges do remain. The potential of host RNA contamination in the sample that can occur to various degrees depending on the sample (eg, contamination is high in biopsy samples) can prove to be problematic. In these cases, rRNA from the host cannot be removed by following a strategy of annealing probes to target bacterial rRNA sequences followed by their removal with the use of a magnet, and they remain as contaminants that can increase the overall processing costs and complicate downstream analysis of data. Another issue to consider is that mRNA has a short half-life and thus it may be hard to detect rapid/short-lived responses to environmental stimuli.
43
Furthermore, the presence of mRNA is not always synonymous with the presence of protein (or protein activity for that matter). As such, pipelines integrating metagenomics, metatranscriptomics, metabolomics, and metaproteomics datasets may potentially enable to gain a
On a final note, traditionally, large-scale expression studies using methods, such as microarrays and serial analysis of gene expression, have been accompanied by validation of results by an independent technique, Quantitative Polymerase Chain Reaction (qPCR) being considered as the
Conclusion
Metatranscriptomics holds great potential to uncover biological information that may be otherwise obscured by other genomic methodologies. It provides an accurate snapshot, at a given moment in time and under specific conditions, of the actual gene expression profile rather than its potential, as inferred from DNA-based shotgun metagenomic sequencing. As such, deciphering microbiome metatranscriptomics may better enable the elucidation of functional changes that dictates the microbiome functions at given contexts, its interactions with the host, and functional alteration that accompany the conversion of a
While metatranscriptomics microbiome analysis holds promise in enhancing our understanding of the complex community behavior of the microbiome, several challenges need to be met in order to enhance the reproducibility and general applicability of metatranscriptome analysis. Despite these challenges, metatranscriptomics analysis of the microbiome may be of great value in moving from a descriptive microbiome facet to a deeper understanding of causality in microbial contribution to homeostasis and disease susceptibility. As such, integration of metatranscriptomics into microbiome research may enable to gain better understanding of its diverse roles in mammalian physiology and integrate these data into the clinical world.
Author Contributions
Wrote the first draft of the manuscript: SB, GZS, EE. Agree with the manuscript results and conclusions: SB, GZS, EE. Jointly developed the structure and arguments for the paper: SB, GZS, EE. Made critical revisions and approved the final version: SB, GZS, EE. All authors reviewed and approved of the final manuscript.
Footnotes
Acknowledgments
We thank the members of the Elinav Lab for fruitful discussions. We apologize to authors whose relevant work was not included in this review owing to space constraints.
