Knowledge from Small-Molecule Screening and Profiling Data

Abstract

We are pleased to work with the Journal of Biomolecular Screening (JBS) to present a special issue on generating Knowledge from Small-Molecule Screening and Profiling Data.

Since its inception as an avenue for probe- and drug-discovery activities, high-throughput screening (HTS) has produced large data sets with great potential to enrich our understanding of the interactions of chemical matter with biological systems. In most cases, such activities focus on discovering a “needle in a haystack” to perturb a particular cellular process (probe) or find a starting point (lead) from which a treatment (drug) might be developed for a disease.

High-throughput diversity screening is a well-established activity in pharmaceutical and agrochemical research, with a body of evidence to support its effectiveness.¹ Indeed, HTS can no longer be considered a novel or disruptive technology, given it has been practiced for around 20 years! That does not mean, however, that there is nothing to improve. In the realm of data analysis, this special issue illustrates how much there is still to do, and how much still to learn, particularly as screening technology allows us to study multiparametric cellular responses.

With the advent of chemical biology repositories like ChemBank,² PubChem,³ and ChEMBL,⁴ it rapidly became clear that small-molecule screening data sets on common compound collections could be more than the sum of their parts, especially when aggregated and publicly shared. The US National Institutes of Health (NIH) Roadmap formally recognized this reality 10 years ago by establishing the Molecular Libraries Program (initially the Molecular Libraries Screening Network [MLSCN] and subsequently the Molecular Libraries Probe Production Centers Network [MLPCN]). Increasingly, as data from this network’s activities have become available via PubChem,³ and later the BioAssay Research Database (BARD),⁵ it has become possible to imagine cross-sectional analyses that make use of data from multiple experiments simultaneously, even those performed by separate investigators worldwide.

This special issue on Knowledge from Small-Molecule Screening and Profiling Data opens with a review article⁵ that presents perspective on the development of BARD, a fourth-generation repository and knowledge environment for small-molecule science. At its core, BARD aims to present the successful experiment in public probe discovery undertaken by the NIH Roadmap and to contextualize screening and follow-up data from multiple diverse Network Centers collected over multiple years. BARD also aims to pave the way for future work in chemical biology research using structured vocabularies to describe assays in a way that is amenable to rapid search, filtering, and computational analysis. Two additional reviews from Abraham et al.⁶ and Singh et al.⁷ provide an overview of multiparametric analyses and suggest that more information could be extracted from future high-content screens through better data analysis.

The original reports of this JBS special issue feature multiple perspectives on the maturation of high-throughput and high-content screening as technologies. Assay quality is still a key determinant of the effectiveness of screening and can be aided by advances in both process and statistics. Zhang et al.⁸ describe a novel approach to testing whether a screen is fit for its purpose, while Murie et al.⁹ propose a statistical method for dealing with screens that have a high (real) hit rate.

High-content or phenotypic screens are increasingly high throughput and often used directly for lead discovery. This expanded scope has highlighted the need for more sophisticated data-analysis methods to include multiparametric endpoints and imaging. Haney¹⁰ illustrates the importance of visualization and understanding the underlying distribution in high-content data sets. Smith and Horvath¹¹ offer a novel approach to the challenging area of phenotypic screening analysis, while Bornot et al.¹² show the value of using historical data to aid data analysis.

Once a primary screen has completed, it is often the case that hits require further triage and prioritization. Often this step has been performed through specificity or selectivity assays, to disqualify a hit, but biophysics techniques now offer the possibility of confirming hits via direct binding methods. Genick et al.¹³ provide an account of the application of biophysics in a large pharmaceutical company screening environment. When molecule libraries and miniaturized assays mix, there is always the potential for false positives due to an unwanted mechanism, often very specific to the assay technology in use. Schorpp et al.¹⁴ provide a case study in the identification of frequent hitters for the AlphaScreen technology.

A very large study, across multiple assay technologies, is detailed by Hansson et al.,¹⁵ illustrating a number of trends that can be observed across a large collection of molecules when applied to many years’ worth of screening data. Many chemists will not be surprised to see an old friend, lipophilicity, appear as a cause of promiscuity in molecules. Large data sets such as these give much opportunity for algorithms to find interesting relationships, such as chemical scaffolds that appear enriched in a single screen or multiple screens. Two groups describe the application of such methods to screening or profiling data to identify small-molecule scaffolds (Wawer et al.¹⁶) or natural product motifs (Coma et al.¹⁷). Although most of the articles presented in this issue focus on HTS, Beresini et al.¹⁸ remind us there are other ways to use a collection of molecules and that screening a subset can often deliver what is required when full HTS is not possible.

As more data are produced to aid activities not directly associated with the original screen, Dancik et al.¹⁹ use the data to compute similarities in biological response between molecules that might have very different chemical structures, while Swamidass et al.²⁰ and Jaeger et al.²¹ describe how these data can be used to build network models that connect assays, phenotypes, and disease.

As experience with mature public chemical-biology data sets has shown,^2-4 one of the key challenges in integrating data collected at multiple laboratories is connecting metadata—descriptions of experiments—across the many different ways researchers choose to describe their science. The review article on BARD⁵ provides one perspective on these challenges, and an original report from the Library of Integrated Network-Based Cellular Signatures (LINCS) Network of NIH-funded Centers²² provides a detailed account of how that network is addressing these issues. As the volume and complexity of screening and profiling data continue to accrue, additional work will be needed to ensure facile interoperability between data sets.

Again, we are delighted to present this special issue to you as a broad and diverse collection of research and perspectives on generating, mining, and interpreting data from high-throughput and high-content experiments directed at probe and drug discovery.

Darren V. S. Green, PhDComputational ChemistryGlaxoSmithKlineStevenage, Herts (UK)

Paul A. Clemons, PhDChemical Biology ProgramBroad InstituteCambridge, MA (USA)

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

References

Macarron

Banks

M. N.

Bojanic

. Impact of High-Throughput Screening in Biomedical Research. Nat. Rev. Drug Discov. 2011, 10, 118–195.

Seiler

K. P.

George

G. A.

Happ

M. P.

. ChemBank: A Small-Molecule Screening and Cheminformatics Resource Database. Nucl. Acids Res. 2008, 36, D351–D359.

Wang

Xiao

Suzek

T. O.

. PubChem: A Public Information System for Analyzing Bioactivities of Small Molecules. Nucl. Acids Res. 2009, 37, W623–W633.

Gaulton

Bellis

L. J.

Bento

A. P.

. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucl. Acids Res. 2012, 40, D1100–D1107.

de Souza

Bittker

J .A.

Lahr

D. L.

. An Overview of the Challenges in Designing, Integrating, and Delivering BARD: A Public Chemical-Biology Resource and Query Portal for Multiple Organizations, Locations, and Disciplines. J Biomol. Screen. 2014, 19, 614–627.

Abraham

Zhang

Parker

Multiparametric Analysis of Screening Data: Growing Beyond the Single Dimension to Infinity and Beyond. J Biomol. Screen. 2014, 19, 628–639.

Singh

Carpenter

A. E.

Genovesio

Increasing the Content of High-Content Screening: An Overview. J Biomol. Screen. 2014, 19, 640–650.

Zhang

Kang

Z. B.

Ardayfio

. Application of Titration-Based Screening for the Rapid Pilot Testing of High-Throughput Assays. J Biomol. Screen. 2014, 19, 651–660.

Murie

Barette

Lafanechère

. Control-Plate Regression (CPR) Normalization for High-Throughput Screens with Many Active Features. J Biomol. Screen. 2014, 19, 661–671.

10.

Haney

Rapid Assessment and Visualization of Normality in High-Content and Other Cell-Level Data and Its Impact on the Interpretation of Experimental Results. J Biomol. Screen. 2014, 19, 672–684.

11.

Smith

Horvath

Active Learning Strategies for Phenotypic Profiling of High-Content Screens. J Biomol. Screen. 2014, 19, 685–695.

12.

Bornot

Blackett

Engkvist

. The Role of Historical Bioactivity Data in the Deconvolution of Phenotypic Screens. J Biomol. Screen. 2014, 19, 696–706.

13.

Genick

. Applications of Biophysics in HTS Hit Validation. J Biomol. Screen. 2014, 19, 707–714.

14.

Schorpp

Rothenaigner

Salmina

. Identification of Small-Molecule Frequent Hitters from AlphaScreen High-Throughput Screens. J Biomol. Screen. 2014, 19, 715–726.

15.

Hansson

Pemberton

Engkvist

. On the Relationship between Molecular Hit Rates in High-Throughput Screening and Molecular Descriptors. J Biomol. Screen. 2014, 19, 727–737.

16.

Wawer

M. J.

Jaramillo

D. E.

Dancík

. Automated Structure-Activity Relationship Mining: Connecting Chemical Structure to Biological Profiles. J Biomol. Screen. 2014, 19, 738–748.

17.

Coma

Bandyopadhyay

Diez

. Mining Natural-Products Screening Data for Target-Class Chemical Motifs. J Biomol. Screen. 2014, 19, 749–757.

18.

Beresini

M.H.

Liu

Dawes

T.D.

. Small-Molecule Library Subset Screening as an Aid for Accelerating Lead Identification. J Biomol. Screen. 2014, 19, 758–770.

19.

Dancík

Carrel

Bodycombe

N. E.

. Connecting Small Molecules with Similar Assay Performance Profiles Leads to New Biological Hypotheses. J Biomol. Screen. 2014, 19, 771–781.

20.

Swamidass

S. J.

Schillebeeckx

C. N.

Matlock

. Combined Analysis of Phenotypic and Target-Based Screening in Assay Networks. J Biomol. Screen. 2014, 19, 782–790.

21.

Jaeger

Min

Nigsch

. Causal Network Models for Predicting Compound Targets and Driving Pathways in Cancer. J Biomol. Screen. 2014, 19, 791–802.

22.

Vempati

U. D.

Chung

Mader

. Metadata Standard and Data Exchange Specifications to Describe, Model, and Integrate Complex and Diverse High-Throughput Screening Data from the Library of Integrated Network-Based Cellular Signatures (LINCS). J Biomol. Screen. 2014, 19, 803–816.