Life Sciences Discovery and Technology Highlights

Abstract

High-Throughput Biology

High-Throughput Screening Enhances Kidney Organoid Differentiation from Human Pluripotent Stem Cells and Enables Automated Multidimensional Phenotyping

Organoids derived from human pluripotent stem cells are a potentially powerful tool for high-throughput screening (HTS), but the complexity of organoid cultures poses a significant challenge for miniaturization and automation. In this article, Czerniecki et al. present a fully automated, HTS-compatible platform for enhanced differentiation and phenotyping of human kidney organoids. The entire 21-d protocol, from plating to differentiation to analysis, can be performed automatically by liquid-handling robots or alternatively by manual pipetting. High-content imaging analysis reveals both dose-dependent and threshold effects during organoid differentiation. Immunofluorescence and single-cell RNA sequencing identify previously undetected parietal, interstitial, and partially differentiated compartments within organoids and define conditions that greatly expand the vascular endothelium. Chemical modulation of toxicity and disease phenotypes can be quantified for safety and efficacy prediction. Screening in gene-edited organoids in this system reveals an unexpected role for myosin in polycystic kidney disease. Organoids in HTS formats thus establish an attractive platform for multidimensional phenotypic screening. (Czerniecki, S. M.; et al. Cell Stem Cell 2018, 22, 929–940)

Intestinal Metagenomes and Metabolomes in Healthy Young Males: Inactivity and Hypoxia-Generated Negative Physiological Symptoms Precede Microbial Dysbiosis

Sket et al. explore the metagenomic, metabolomic, and trace metal makeup of intestinal microbiota and environment in healthy male participants during the run-in (5 d) and the following three 21-d interventions: normoxic bedrest (NBR), hypoxic bedrest (HBR), and hypoxic ambulation (HAmb), which are carried out within a controlled laboratory environment (circadian rhythm, fluid and dietary intakes, microbial bioburden, oxygen level, exercise). The fraction of inspired O₂ (F_iO₂) and partial pressure of inspired O₂ (P_iO₂) are 0.209 and 133.1 ± 0.3 mm Hg for the NBR and 0.141 ± 0.004 and 90.0 ± 0.4 mm Hg (~4000 m simulated altitude) for HBR and HAmb interventions, respectively. Shotgun metagenomes are analyzed at various taxonomic and functional levels, ¹H- and ¹³C-metabolomes are processed using standard quantitative and human expert approaches, whereas metals are assessed using X-ray fluorescence spectrometry. Inactivity and hypoxia result in a significant increase in the genus Bacteroides in HBR, in genes coding for proteins involved in iron acquisition and metabolism, cell wall, capsule, virulence, defense, and mucin degradation, such as β-galactosidase (EC3.2.1.23), α-L-fucosidase (EC3.2.1.51), sialidase (EC3.2.1.18), and α-N-acetylglucosaminidase (EC3.2.1.50). In contrast, the microbial metabolomes, intestinal element and metal profiles, and the diversity of bacterial, archaeal, and fungal microbial communities are not significantly affected. The authors observe progressive decreases in defecation frequency and concomitant increases in electrical conductivity (EC) that preceded or took place in the absence of significant changes at the taxonomic, functional gene, metabolome, and intestinal metal profile levels. The fact that the genus Bacteroides and proteins involved in iron acquisition and metabolism, cell wall, capsule, virulence, and mucin degradation are enriched at the end of HBR suggests that both constipation and EC decreased intestinal metal availability, leading to modified expression of co-regulated genes in Bacteroides genomes. Bayesian network analysis is used to derive the first hierarchical model of initial inactivity-mediated deconditioning steps over time. The PlanHab wash-out period corresponds to a profound lifestyle change (i.e., reintroduction of exercise) that results in stepwise amelioration of the negative physiological symptoms, indicating that exercise apparently prevents the crosstalk between the microbial physiology, mucin degradation, and proinflammatory immune activities in the host. (Sket, R.; et al. Front Physiol. 2018, 9(198), 1–16)

Next-Generation Sequencing and Precision Medicine

Challenges in the Setup of Large-Scale Next-Generation Sequencing Analysis Workflows

Although next-generation sequencing (NGS) can now be considered an established analysis technology for research applications across the life sciences, the analysis workflows still require substantial bioinformatics expertise. Typical challenges include the appropriate selection of analytical software tools, the speed up of the overall procedure using high-performance computing parallelization and acceleration technology, the development of automation strategies, data storage solutions, and finally the development of methods for full exploitation of the analysis results across multiple experimental conditions. Recently, NGS has begun to expand into clinical environments, where it facilitates diagnostics, enabling personalized therapeutic approaches, but is also accompanied by new technological, legal, and ethical challenges. There are probably as many overall concepts for the analysis of the data as there are academic research institutions. Among these concepts are, for instance, complex information technology architectures developed in-house, ready-to-use technologies installed on-site, as well as comprehensive everything-as-a-service (XaaS) solutions. In this mini-review, Kulkarni and Frommolt summarize the key points to consider in the setup of the analysis architectures, mostly for scientific rather than diagnostic purposes, and provide an overview of the current state of the art and challenges of the field. (Kulkarni, P.; Frommolt, P. Comput. Struct. Biotechnol. J. 2017, 15, 471–477)

Introduction to Single-Cell RNA Sequencing

During the past decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in bulk, and the data represent an average of gene expression patterns across thousands to millions of cells. This might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. The authors of this article present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. (Olsen, T. K.; Baryawno, N. Curr. Protoc. Mol. Biol. 2018, 122, e57)

Toward Precision Medicine: Discovering Novel Gynecological Cancer Biomarkers and Pathways Using Linked Data

Next-generation sequencing (NGS) is playing a key role in therapeutic decision making for cancer prognosis and treatment. NGS technologies are producing a massive amount of sequencing data sets. Often, these data sets are published from different isolated sequencing facilities. Consequently, the process of sharing and aggregating multisite sequencing data sets is thwarted by issues such as the volume of the data and the need to discover relevant data from different sources, build scalable repositories, automate data linkage, provide efficient querying mechanisms, and enable information-rich intuitive visualization.

In this report, Jha et al. present an approach to link and query different sequencing data sets (TCGA, COSMIC, REACTOME, KEGG, and GO) to indicate risks for four cancer types—ovarian serous cystadenocarcinoma, uterine corpus endometrial carcinoma, uterine carcinosarcoma, cervical squamous cell carcinoma, and endocervical adenocarcinoma—covering the 16 healthy tissue-specific genes from Human BodyMap 2.0 from Illumina (San Diego, CA). The differentially expressed genes from Human BodyMap 2.0 are analyzed together with the gene expressions reported in COSMIC and TCGA repositories and lead to the discovery of potential biomarkers for a tissue-specific cancer.

The authors analyze the tissue expression of genes, copy number variation, somatic mutation, and promoter methylation to identify associated pathways and find novel biomarkers. The authors discover 20 mutated genes and three potential pathways causing promoter changes in different gynecologic cancer types. The authors propose a data-interlinked platform called BIOOPENER (Galway, Ireland) that glues together heterogeneous cancer and biomedical repositories. The key approach is to find correspondences (or data links) among genetic, cellular, and molecular features across isolated cancer data sets, giving insight into cancer progression from normal to diseased tissues. The proposed BIOOPENER platform enriches mutations by filling in missing links from TCGA, COSMIC, REACTOME, KEGG, and GO data sets and provides an interlinking mechanism to understand cancer progression from normal to diseased tissues with pathway components, which in turn help to map mutations, associated phenotypes, pathways, and mechanisms. (Jha, A.; et al. J. Biomed. Semantics 2017, 8, 40)

Whole-Genome Sequencing Analysis for Cancer Genomics and Precision Medicine

Explosive advances in next-generation sequencing and computational analyses have enabled exploration of somatic protein-altered mutations in most cancer types with coding mutation data intensively accumulated. However, there is limited information on somatic mutations in noncoding regions, including introns, regulatory elements, and noncoding RNA. Structural variants and pathogens in cancer genomes remain widely unexplored. Whole-genome sequencing (WGS) approaches can be used to comprehensively explore all types of genomic alterations in cancer and help us to better understand the whole landscape of driver mutations and mutational signatures in cancer genomes and elucidate the functional or clinical implications of these unexplored genomic regions and mutational signatures. This review describes recently developed technical approaches for cancer WGS and the future direction of cancer WGS and discusses its utility and limitations as an analysis platform and for mutation interpretation for cancer genomics and cancer precision medicine. Taking into account the diversity of cancer genomes and phenotypes, interpretation of abundant mutation information from WGS, especially noncoding and structure variants, requires the analysis of large-scale WGS data integrated with RNA-Seq, epigenomics, immunogenomic, and clinic-pathological information. (Nakagawa, H.; Fujita, M. Cancer Sci. 2018, 109, 513–522)

Formalin-Fixed Paraffin-Embedded Sample Conditions for Deep Next-Generation Sequencing

Precision medicine is possible in oncology practice only if targetable genes in fragmented DNA, such as DNA from formalin-fixed paraffin-embedded (FFPE) samples, can be sequenced using next-generation sequencing (NGS). The aim of this study is to examine the quality and quantity of DNA from FFPE cancerous tissue samples from surgically resected and biopsy specimens. Methods include extraction of DNA from unstained FFPE tissue sections prepared from surgically resected specimens of breast, colorectal and gastric cancer, and biopsy specimens of breast cancer. A total of ⩾60 ng DNA from a sample is considered adequate for NGS. The DNA quality is assessed by Q-ratios, with a Q-ratio >0.1 considered sufficient for NGS.

Results show the Q-ratio for DNA from FFPE tissue processed with neutral-buffered formalin is significantly better than that processed with unbuffered formalin. All Q-ratios for DNA from breast, colorectal, and gastric cancer samples indicate DNA levels sufficient for NGS. DNA extracted from gastric cancer FFPE samples prepared within the past 7 years is suitable for NGS analysis, whereas those older than 7 years may not be suitable. The authors’ data suggests that adequate amounts of DNA can be extracted from FFPE samples, not only of surgically resected tissue but also of biopsy specimens.

In conclusion, the type of formalin used for fixation and the time since FFPE sample preparation affect DNA quality. Sufficient amounts of DNA can be extracted from FFPE samples of both surgically resected and biopsy tissue, thus expanding the potential diagnostic uses of NGS in a clinical setting. (Nagahashi, M.; et al. J. Surg. Res. 2017, 220, 125–132)

Precision Medicine Based on Surgical Oncology in the Era of Genome-Scale Analysis and Genome Editing Technology

Accumulated evidence suggests that multiple molecular and cellular interactions promote cancer evolution in vivo. Surgical oncology is of growing significance to a comprehensive understanding of malignant diseases for therapeutic application. Tanaka et al. analyze more than 1000 clinical samples from surgically resected tissue to identify molecular biomarkers and therapeutic targets for advanced malignancies. Cancer stemness and mitotic instability are then determined as the essential predictors of aggressive phenotype with poor prognosis. Recently, whole-genome/exome sequencing shows a mutational landscape underlying phenotype heterogeneity in caners. In addition, integrated genomic, epigenomic, transcriptomic, metabolic, proteomic, and phenomic analyses elucidate several molecular subtypes that cluster in liver, pancreatic, biliary, esophageal, and gastroenterological cancers. Identification of each molecular subtype is expected to realize precise medicine’s goal of targeting subtype-specific molecules; however, there are obstacles and limitations to determining matching druggable targets or synthetic lethal interactions. Current breakthroughs in genome-editing technology can provide us with unprecedented opportunity to recapitulate subtype-specific pathophysiology in vitro and in vivo. Given this great potential, on-demand editing systems can design actionable strategies and revolutionize precision cancer medicine based on surgical oncology. (Tanaka, S. Ann. Gastroenterol. Surg. 2018, 2, 106–115)

Genome Editing

Small Molecules Promote CRISPR-Cpf1–Mediated Genome Editing in Human Pluripotent Stem Cells

Human pluripotent stem cells (hPSCs) have potential applications in biological studies and regenerative medicine. However, precise genome editing in hPSCs remains time-consuming and labor-intensive. Ma et al. demonstrate that the recently identified CRISPR-Cpf1 can be used to efficiently generate knockout and knockin hPSC lines. The unique properties of CRISPR-Cpf1, including shorter crRNA length and low off-target activity, are very attractive for many applications. In particular, the authors develop an unbiased drug selection–based platform feasible for high-throughput screening in hPSCs, and this screening system enables the identification of small molecules VE-822 and AZD-7762 that can promote CRISPR-Cpf1–mediated precise genome editing. Significantly, the combination of CRISPR-Cpf1 and small molecules provides a simple and efficient strategy for precise genome engineering. (Ma, X.; et al. Nat. Commun. 2018, 9, 1303)

MACBETH: Multiplex Automated Corynebacterium Glutamicum Base Editing Method

CRISPR/Cas9 or Cpf1-introduced double-strand break dramatically decreases the bacterial cell survival rate, which hampers multiplex genome editing in bacteria. In addition, the requirement of a foreign DNA template for each target locus is labor demanding and may encounter more genetically modified organism–related regulatory hurdles in industrial applications. Wang et al. describe a multiplex automated Corynebacterium glutamicum base editing method (MACBETH) using CRISPR/Cas9 and activation-induced cytidine deaminase, without foreign DNA templates. It can achieve single-, double-, and triple-locus editing with efficiencies up to 100%, 87.2%, and 23.3%, respectively. In addition, MACBETH is applied to generate a combinatorial gene inactivation library for improving glutamate production, and a pyk&ldhA double inactivation strain is found to improve glutamate production by threefold. Finally, MACBETH is automated with an integrated robotic system, which could enable to the generation of thousands of rationally engineered strains per month for metabolic engineering of C. glutamicum. As a proof-of-concept demonstration, the automation platform is used to construct an arrayed genome-scale gene inactivation library of 94 transcription factors with a 100% success rate. Therefore, MACBETH would be a powerful tool for multiplex and automated bacterial genome editing in future studies and industrial applications. (Wang, Y.; et al. Metab. Eng. 2018, 47, 200–210)

Computational Genomics and Machine Learning

Developing Novel Methods to Image and Visualize 3D Genomes

To investigate three-dimensional (3D) genome organization in prokaryotic and eukaryotic cells, three main strategies are employed, namely, nuclear proximity ligation-based methods, imaging tools (such as fluorescence in situ hybridization [FISH] and its derivatives), and computational/visualization methods. Proximity ligation-based methods are based on digestion and relegation of physically proximal cross-linked chromatin fragments accompanied by massively parallel DNA sequencing to measure the relative spatial proximity between genomic loci. Imaging tools enable direct visualization and quantification of spatial distances between genomic loci, and advanced implementation of (super-resolution) microscopy helps to significantly improve the resolution of images. Computational methods are used to map global 3D genome structures at various scales driven by experimental data, and visualization methods are used to visualize genome 3D structures in virtual 3D space based on algorithms.

In this review, Ma et al. focus on the introduction of novel imaging and visualization methods to study 3D genomes. First, the authors introduce the progress made recently in 3D genome imaging in both fixed cells and live cells based on long-probe labeling, short-probe labeling, RNA FISH, and the CRISPR system. As the fluorescence-capturing capability of a particular microscope is very important for the sensitivity of bioimaging experiments, the authors also introduce two novel super-resolution microscopy methods, SDOM and low-power super-resolution STED, which have potential for time-lapse super-resolution live-cell imaging of chromatin. Finally, Ma et al. review some software tools developed recently to visualize proximity ligation-based data. The imaging and visualization methods are complementary to each other, and all three strategies are not mutually exclusive. These methods provide powerful tools to explore the mechanisms of gene regulation and transcription in cell nuclei. (Ma, T.; et al. Cell Biol. Toxicol. 2018, doi: 10.1007/s10565-018-9427-z)

Enhancing Next-Generation Sequencing-Guided Cancer Care through Cognitive Computing

Using next-generation sequencing (NGS) to guide cancer therapy has created challenges in analyzing and reporting large volumes of genomic data to patients and caregivers. Specifically, providing current, accurate information on newly approved therapies and open clinical trials requires considerable manual curation performed mainly by human molecular tumor boards (MTBs). The purpose of this study is to determine the utility of cognitive computing as performed by Watson for Genomics (WfG) compared with a human MTB.

A total of 1018 patient cases that previously underwent targeted exon sequencing at the University of North Carolina (UNC) and subsequent analysis by the UNCseq informatics pipeline and the UNC MTB between November 7, 2011, and May 12, 2015, were analyzed with WfG, a cognitive computing technology for genomic analysis. Using a WfG-curated actionable gene list, Patel et al. identify additional genomic events of potential significance (not discovered by traditional MTB curation) in 323 (32%) patients. Most of these additional genomic events are considered actionable based on their ability to qualify patients for biomarker-selected clinical trials. Indeed, the opening of a relevant clinical trial within 1 mo prior to WfG analysis provided the rationale for identification of a new actionable event in nearly a quarter of the 323 patients. This automated analysis took <3 min per case.

These results demonstrate that the interpretation and actionability of somatic NGS results are evolving too rapidly to rely solely on human curation. Molecular tumor boards empowered by cognitive computing could potentially improve patient care by providing a rapid, comprehensive approach for data analysis and consideration of up-to-date availability of clinical trials and for data analysis in the delivery of precision medicine. Patients and physicians who are considering enrollment in clinical trials may benefit from the support of such tools applied to genomic data. (Patel, N. M.; et al. Oncologist 2018, 23, 179-185)

Predicting Novel MicroRNA: A Comprehensive Comparison of Machine Learning Approaches

The importance of microRNAs (miRNAs) is widely recognized in the community because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The challenge to this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers and, if not properly addressed in the model and the experiments, can result in unrealistic performance reports and make the classifier unable to work properly for pre-miRNA prediction. Another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structures that do not contain a pre-miRNA.

This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs—supervised and unsupervised training. Stegmayer et al. present and analyze the ML proposals that have appeared during the past 10 y in the literature. They compare several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with the same features and data sets instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low- to mid-imbalance levels between classes, supervised methods can be the best. However, at very high-imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance. (Stegmayer, G.; et al. Brief Bioinform. 2018, doi: 10.1093/bib/bby037)

Aberration Hubs in Protein Interaction Networks Highlight Actionable Targets in Cancer

Despite efforts for extensive molecular characterization of cancer patients, such as the international cancer genome consortium (ICGC) and the cancer genome atlas (TCGA), the heterogeneous nature of cancer and our limited knowledge of the contextual function of proteins complicate the identification of targetable genes. In this report, the authors present Aberration Hub Analysis for Cancer (AbHAC) as a novel integrative approach to pinpoint aberration hubs (i.e., individual proteins that interact extensively with genes that show aberrant mutation or expression). Analysis of the breast cancer data of the TCGA and the renal cancer data from the ICGC show that aberration hubs are involved in relevant cancer pathways, including factors promoting cell cycle and DNA replication in basal-like breast tumors, and Src kinase and vascular endothelial growth factor signaling in renal carcinoma. Moreover, the analysis uncovers novel functionally relevant and actionable targets, among which the authors have experimentally validated abnormal splicing of spleen tyrosine kinase as a key factor for cell proliferation in renal cancer. Thus, AbHAC provides an effective strategy to uncover novel disease factors that are identifiable only by examining mutational and expression data in the context of biological networks. (Karimzadeh, M.; et al. Oncotarget 2018, 9, 25166–25180)

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.