The Communication of Hit Quality Using Natural History Visualizations (NHVs)

Abstract

High-throughput screening (HTS) often yields a list of compounds that requires prioritization before further work is performed. Prioritization criteria typically include activity, selectivity, physicochemical properties, and other absolute or calculated measurements of compound “value.” One critical method of compound prioritization is often not discussed in published accounts of HTS. We have referred to this oft-overlooked metric as “compound natural history.” These natural histories are observational evaluations of how a compound has been reported in the historical literature or compound databases. The purpose of this work was to develop a useful natural history visualization (NHV) that could form a standard, important part of hit reporting and evaluation. In this case report, we propose an efficient and effective NHV that will assist in the prioritization of active compounds and demonstrate its utility using a retrospective analysis of reported hits. We propose that this method of compound natural history evaluation be adopted in HTS triage and become an integral component of published reports of HTS outcomes.

Keywords

high-throughput screening assays triage visualization

Introduction

The goal of HTS (high-throughput screening) is to find compounds that can be optimized into efficient and selective modulators of a chosen target or phenotype. The major challenge to achieving this goal is the winnowing down of thousands to millions of compounds to the best 10–100 compounds via a carefully designed screening tree. This process is complicated by the fact that a large number of compounds poured into the screening funnel can interfere with the technology of the screening methods, the reagents, proteins present in the assay, or all of the above. This interference can confound the effective selection of good compounds from the majority of compounds that will be ineffective.

In this article, we will be referring to the properties and analysis of “hit” compounds. We define a hit compound as a compound identified from a primary HTS, whether virtual, biochemical, or phenotypic, which shows a favorable signal in comparison to the bulk of the screening signal. Best practices for this determination vary depending on the assay, but excellent discussion can be found in the National Institutes of Health’s (NIH) Assay Guidance Manual.¹ A hit compound is not a lead compound for further optimization, because it has not yet been validated as active at the target or phenotype under investigation. A rigorous screening tree (vide infra) is necessary to validate beyond hit status to further development.

Many biochemical and biophysical methods can assist in the hit prioritization and selection process. These should be standard operating procedures. Perhaps, however, the most straightforward method that can influence the prioritization of screening hits is neither biophysical nor biochemical, but data-driven. That is, an oft-overlooked tool in hit characterization and prioritization is the straightforward determination of hit “natural histories.” These natural histories are readily available using cheminformatic evaluation of the hits and searching of the literature for relevant references. Natural histories are the metadata that have been collected for compounds throughout the years in large, public compilations of screening data (e.g., PubChem) and in medicinal chemistry knowledge distilled into simple characterization models (e.g., Lipinski’s rule).² We propose herein a convenient natural history visualization (NHV) of compounds that will prove useful for hit evaluation and prioritization. This visualization conveys information related to the purity, promiscuity, quality, and literature reports of a compound. We recommend that researchers present NHVs of all hit compounds published in HTS reports in which no follow-up medicinal chemistry compound synthesis has been performed to validate the hits. The ready availability of the NHVs will enable editors and readers to immediately judge the potential of the hits that have been reported.

Many large pharmaceutical companies and mature academic HTS groups will have versions of these data that they already use in the evaluation of their screening campaigns; however, this information is not readily communicated in a simple format in literature-screening reports. With the expansion of HTS methodology to smaller start-ups and early-stage academic HTS groups, we propose this NHV model as an accessible way for groups at all levels of expertise to evaluate the quality of HTS hits.

Why “natural history”? Natural history “is a domain of inquiry involving organisms, including animals, fungi, and plants, in their natural environment, leaning more towards observational than experimental methods of study” (https://en.wikipedia.org/wiki/Natural_history). We modify this definition here to read that the natural history of a hit is the domain of inquiry involving “hits” in the laboratory environment, leaning not on experimental methods (as in the useful and important screening tree), but on what has been observed and recorded (metadata) with respect to the hit (compound) in its laboratory environment.

Methods

The Natural History Visualization

Our proposed visualization paradigm for natural histories is shown in Figure 1 . The natural histories components we chose to include in hit characterization include evaluations that are based on scientific evidence of value in the prioritization process: (Element 1) robustness of hit chemical purity and identity characterization; (Elements 2–3) potential for promiscuity that is extrapolated from publicly available screening data; (Element 4) a useful structural and physicochemical property flagging method based on high-quality, pharmaceutical company data; and (Element 5) an evaluation of previous references to the compound in the literature ( Fig. 1A ). These are ordered from left to right in order of their importance to judging hit quality. We chose the tools to generate the element values ( Fig. 1B ) using the following reasoning: (1) The tools should be readily accessible without specialized software and should be based on current, best triage science; (2) the tools report on orthogonal aspects of hit quality; and (3) the tools do not require excessive time or effort to generate the value for each element. Element 1 should be known (or reported, if building NHVs retrospectively). All other elements can be generated starting with the SMILES (simplified molecular-input line-entry system) string of the hit. Figure 1C shows the icons we chose to indicate the hit quality as gauged by each element tool. We chose to use black-and-white versions of icons (https://openmoji.org) to simplify the hit-quality presentation ( Fig. 1C ). An example of a completed NHV (natural history visualization) is shown in Figure 1D .

Figure 1.

The development of an NHV (natural history visualization) of a hit compound.

How We Generated the Values for Each Element in the Visualization

Purity Evaluation (Element 1)

The reported purity and identification (purity/id) of hit compounds is of paramount importance when considering their activity. This is because some targets and biochemical systems are extremely sensitive to the reagents that are often remaining from compound synthesis, even in commercial compound samples. We have previously outlined different levels of hit compound characterization, but have simplified this process as shown in Table 1 .³ These values can be assigned by the researchers who report the hits, or in this case, retrospectively by researchers who want to validate a set of hits that were reported.

Table 1.

Rules for Scoring for Each of the Elements in the NHV.

Element	Name	Warning	Caution	Good	Very Good

1	Purity/id (reported in manuscript)	Purchased/no purity listed	“Warning” criteria and purchased purity reported as ≥95%	“Caution” criteria and purity confirmed by HPLC/MS	“Good” criteria and ¹H and ¹³C NMR confirmed structures
2	PromiscuityHit Dexter 2.0	Promiscuous or highly promiscuous at a high confidence level	Highly promiscuous and/or promiscuous at low or medium confidence	Non-promiscuous at any confidence level	Not used
3	PromiscuityBadapple	pScore >300	100 < pScore ≤ 300	pScore <100	Not used
4	TieringAPT	Tiers 6–7	Tiers 4–5	Tiers 2–3	Tier 1
5	LiteratureSciFinder	≥10	5 ≤ #refs < 10	<5	Not used

[Promiscuity] scores naturally fall into three rankings, and the [literature] element is not well characterized at its low and high ends. #refs, Number of references (see the text for further details) to the hit and similar compounds in Scifinder; APT, Abbott Physicochemical Tiering; HPLC/MS, high-performance liquid chromatography–mass spectrometry; NA, not applicable—used in the case of virtual high-throughput screening when compounds are not tested in biochemical or biophysical assays; ND, not determined—can be used if an element of the list is not determined; NHV, natural history visualization; NMR, nuclear magnetic resonance; Not used, for these elements, we have chosen not to differentiate between good and very good; NR, not reported—either the value wasn’t reported or won’t be reported; pScore, promiscuity score; SAR, the purity of the compound was validated by structure–activity studies.

Hit Dexter (Element 2)

We assigned a value to the [promiscuity] or potential polypharmacology of hits based on their evaluation in the online tools Hit Dexter 2.0^4,5 and Badapple.⁶ Hit Dexter is a machine learning approach to estimate how likely a small molecule is to trigger a positive response in biochemical assays. The models were derived from a dataset of approximately 300,000 compounds [garnered from the PubChem Bioassay database of ~1 million compounds measured in ~1000 PSAs (primary screening assays)] with experimentally determined activity for at least 50 different protein groups. Hit Dexter uses machine learning models (binary classifiers) to identify compounds that have the potential to be either non-promiscuous, promiscuous, or highly promiscuous in PSA. Hit Dexter scoring is based on complete molecular structures. This sets it apart from the tool used to generate Element 3, Badapple, which uses scaffold-based promiscuity scoring. We use the “Assessment” offered by Hit Dexter to assign the value to this element of the NHV (https://nerdd.zbh.uni-hamburg.de/hitdexter/).

Badapple (Element 3)

Badapple (bioactivity data associative promiscuous pattern learning engine) calculates the promiscuity score (pScore) of a compound according to a formula based on each scaffold of the molecule. This model was built on approximately 430,000 compounds measured in a total of more than 800 different assays gathered from the NIH Roadmap Molecular Libraries Program (MLP) screening centers. The pScore cutoffs recommended by Badapple are used to assign the value to this element (https://datascience.unm.edu/tomcat/badapple/badapple).

Abbott Physicochemical Tiering (APT) (Element 4)

APT⁷ is a hit prioritization paradigm that places hits into tiers (from 1 = most favorable to 7 = least favorable) based on a consideration of their calculated physicochemical properties and structural features. This method includes the “consideration of parameters—lipophilicity (ClogP), molecular weight, number of hydrogen bond donors and acceptors, total polar surface area, number of aromatic rings (NAR), and the fraction of sp3 carbons (Fsp3). These parameters describe the overall greasiness (ClogP), size (MW), polarity (tPSA, HBD, HBA), aromaticity (NAR), and three-dimensionality, or level of saturation (Fsp3) of a molecule, and as such should adequately describe the overall physical behavior of a molecule.”⁷ This prioritization is included to point out the potential strengths or weaknesses of compounds that may not be represented in the datasets used in building the Hit Dexter 2.0 and Badapple models. We used a custom protocol in Pipeline Pilot to generate the values for this element, but the values could also be generated using values generated in SwissADME (http://www.swissadme.ch/index.php).

SciFinder (Element 5)

We have previously noted that HTS publications often do not cover literature precedence with enough detail to reveal data associated with hits that have already been reported.⁸ Therefore, we chose to include a simple-to-determine value of literature relevance in our NHV. This value is determined by a structural search in Scifinder (or equivalent) that returns analogs ≥85% similar to the hit compound, with the criteria that analogs be single substances, be commercially available, and fall into the “Studies: Biological” class. This is then further refined by only retrieving all of the “journal” references associated with the aforementioned analogs that mention “screening” as a research topic. The final count of references remaining after this filtering process determines the value of the [literature] element ( Table 1 ).

Results

To demonstrate the utility of these visualizations, we retrospectively applied these principles of NHV to three specific cases selected from the recent literature, and then individual, illustrative hits. We use [name of element] to indicate the elements of each visualization.

Case 1: Highly promiscuous hit compounds extracted by data analysis (Fig. 2). Figure 2 shows the NHV visualizations for hit compounds 1–9 that were identified in two analyses of the PubChem screening data.^9,10 In this case, the [purity/ID] element has a value of NA (not applicable). These were all shown to have the appropriate “warnings” or “cautions” in the [promiscuity] elements. The vast majority were labeled “good” or “very good” by [tiering].

Figure 2.

Natural history visualizations (NHVs) of highly promiscuous compounds.

Case 2: A retrospective evaluation of a recently published HTS targeting main protease (Mpro) from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Fig. 3). Figure 3 shows the NHVs of the hits (8, 10–14) in a high-profile, recently published article describing an HTS targeting Mpro from SARS-CoV-2.¹¹ Note that in all cases, the [purity/id] is listed as NR (not reported). These hits have “warning” and “caution” written (literally) all over them. These hits were all recently shown to be nonspecific, promiscuous SARS-CoV-2 main protease inhibitors.¹²

Figure 3.

Natural history visualizations (NHVs) of hits reported from a high-throughput screening (HTS) targeting main protease (Mpro) from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Case 3: A retrospective evaluation of a recently published virtual high-throughput screening (vHTS) targeting druglike small-molecule inhibitors of aspartate N-acetyltransferase (ANAT) (Fig. 4).¹³ Figure 4 shows hits targeting ANAT (15–19) identified in a virtual screen using AtomNet, a deep convolutional neural network for structure-based drug discovery. In this case, the authors provided high-performance liquid chromatography–mass spectrometry (HPLC/MS) confirmation of their hits [purity/id = good]; the hits were predicted to be non-promiscuous [promiscuity = good] by Hit Dexter but received warnings and cautions when analyzed in Badapple. [Tiering] was “cautionary” or “warning” in four out of five of the hits.

Figure 4.

Natural history visualizations (NHVs) of hits from a virtual high-throughput screening (vHTS; followed by assays) targeting druglike small-molecule inhibitors of aspartate N-acetyltransferase (ANAT).

Two further illustrative examples of NHVs are shown in Figure 5 . Compounds 20–21 were reported as hits in a high-throughput, phenotypic screen for selective appetite modulators.¹⁴ The NHV of 20 shows that it is predicted to be promiscuous and a poor-quality starting point by [tiering]. We have shown that these 4-acetyl-3-hydroxy-1-substituted-5-phenyl-1,5-dihydro-2H-pyrrol-2-ones can most likely denature proteins and are ALARM NMR (a La assay to detect reactive molecules by nuclear magnetic resonance) positive.¹⁵ Although compound 21 is labeled as “good” by Hit Dexter, it has a “warning” raised by the [literature] value and is confirmed to have a high pScore (pScore >300, a strong indication of promiscuity) when analyzed in Badapple.

Figure 5.

Further illustrative natural history visualizations (NHVs) from hits found in recent high-throughput screening (HTS).

Finally, we examined five articles in which the initial chemical matter was confirmed by SAR (structure–activity relationship) studies ( Fig. 6 ). Other than 24,¹⁶ none of these hits is without blemish in the NHV. Of these compounds, the NHV warns of potential promiscuity for 22¹⁷ and 23,¹⁸ which would need to be de-risked in the screening tree. Compound 26¹⁹ has a warning in [tiering], suggesting that the physicochemical properties of this hit may need to be optimized on the way to becoming an effective probe or drug candidate. Compound 25²⁰ is relatively free of warnings and appears to be an excellent starting point given its [tiering] value.

Figure 6.

Natural history visualization (NHV) of hits that were confirmed in [purity/id] by the synthesis of analogs.

Discussion

We have developed a natural history visualization protocol that is based on a hit’s known or reported purity/id, its historical pharmacology, its calculated physicochemical and structural tiering, and references of the hit compound and close analogs in the literature. We developed this model because many HTS publications fail to follow best practices in considering these data. We expect that this model will allow researchers to easily and clearly characterize hits from their screens and allow reviewers and readers to immediately put the hits into the context of known compound metadata. Ultimately, NHVs will enable researchers to report and recognize the best screening science.

We chose the elements of this model to be easy to assign based on known data, free software, simple physicochemical property calculations, and readily available literature searches. Each of these elements should be easily accessible to any screening operation. With our retrospective assignment of NHVs to 26 representative compounds, we have demonstrated that this protocol provides a simple, visual readout of hit quality that is an informative addition and complementary to the data produced by biochemical and biophysical methods.

Natural histories alone can’t qualify or disqualify a compound as a true modulator of a chosen target. This can be done only by a well-designed screening tree with the requisite counter-, interference, and orthogonal screens in place. NHVs can, however, provide guidance to a project team as to the risks that might be associated with following up on a particular compound series. These risks might be well-known in the primary literature. In addition, NHVs should be used for guidance, not as strict cutoffs. This is specifically why we didn’t propose a complex scoring system to generate an overall hit “rating.”

Our model alone does not completely address the issue of compounds that may have novel mechanisms of interference or are not similar enough to known database compounds to have an associated pharmacological history. For example, while 19 has few identifiable red flags raised by the NHV, it is flagged by Badapple, which considers substructures in its characterization of promiscuity. Notably, the follow-up of 21 is warned against in the [literature] assessment. This assessment is confirmed by Badapple but not by Hit Dexter, which is why we chose to have two elements to describe each hit with respect to potential promiscuity.

High-throughput screening remains the best method for finding chemical matter for novel targets. We propose that this, or similar methods, be used and reported to characterize the quality of compounds found in HTS campaigns and to prioritize compounds for further consideration. Adoption of these methods would also assist reviewers, assigned to judge the outcome of a particular screening campaign, to quickly assess the potential liabilities associated with each of the hits in the material under review. In the future, it would be helpful to have these or similar analysis tools under the purview of an organization such as NCATS (National Center for Advancing Translational Sciences), at least in the United States, to provide a curated and evergreen source of promiscuity and tier scoring. We are currently investigating artificial intelligence (AI)-based models to augment this process and will report these results in due course.

Supplemental Material

sj-pdf-2-jbx-10.1177_24725552211017518 – Supplemental material for The Communication of Hit Quality Using Natural History Visualizations (NHVs)

Supplemental material, sj-pdf-2-jbx-10.1177_24725552211017518 for The Communication of Hit Quality Using Natural History Visualizations (NHVs) by Kathryn M. Nelson and Michael A. Walters in SLAS Discovery

Research Data

sj-xlsx-1-jbx-10.1177_24725552211017518 – for The Communication of Hit Quality Using Natural History Visualizations (NHVs)

sj-xlsx-1-jbx-10.1177_24725552211017518 for The Communication of Hit Quality Using Natural History Visualizations (NHVs) by Kathryn M. Nelson and Michael A. Walters in SLAS Discovery

This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Footnotes

Acknowledgements

We would like to acknowledge seminal discussions with Daniel Erlanson (VP of Chemistry at Frontier Medicines) and Jonathan Baell (Professor of Medicinal Chemistry, Monash University) throughout the past 10 years regarding high-throughput screening triage.

Supplemental material is available online with this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Michael A. Walters

References

Markossian

Sittampalam

G. S.;

Grossman

; et al., Eds. Assay Guidance Manual [Online]. Eli Lilly & Company and the National Center for Advancing Translational Sciences: Bethesda, MD, 2004. https://www.ncbi.nlm.nih.gov/books/NBK53196/.

Lipinski

C. A.

Lombardo

Dominy

B. W.

, et al. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 1997, 23, 3–25.

Singh

Dahlin

J. L.

Walters

M. A.

Risk Management in Early Discovery Medicinal Chemistry. Methods Enzymol. 2018, 610, 1–25.

Stork

Chen

Šícho

, et al. Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters. J. Chem. Inf. Model. 2019, 59, 1030–1043.

Stork

Embruch

Šícho

, et al. NERDD: A Web Portal Providing Access to In Silico Tools for Drug Discovery. Bioinformatics 2020, 36, 1291–1292.

Yang

J. J.

Ursu

Lipinski

C. A.

, et al. Badapple: Promiscuity Patterns from Noisy Evidence. J. Cheminform. 2016, 8, 29.

Cox

P. B.

Gregg

R. J.

Vasudevan

Abbott Physicochemical Tiering (APT)—a Unified Approach to HTS Triage. Bioorg. Med. Chem. 2012, 20, 4564–4573.

Dahlin

J. L.

Walters

M. A.

The Essential Roles of Chemistry in High-Throughput Screening Triage. Future Med. Chem. 2014, 6, 1265–1290.

Jasial

Bajorath

Determining the Degree of Promiscuity of Extensively Assayed Compounds. PLoS One 2016, 11, e0153873.

10.

Bajorath

What Is the Likelihood of an Active Compound to Be Promiscuous? Systematic Assessment of Compound Promiscuity on the Basis of PubChem Confirmatory Bioassay Data. AAPS J. 2013, 15, 808–815.

11.

Jin

, et al. Structure of Mpro from SARS-CoV-2 and Discovery of Its Inhibitors. Nature 2020, 582, 289–293.

12.

Townsend

J. A.

, et al. Ebselen, Disulfiram, Carmofur, PX-12, Tideglusib, and Shikonin Are Nonspecific Promiscuous SARS-CoV-2 Main Protease Inhibitors. ACS Pharmacol. Transl. Sci. 2020, 3, 1265–1277.

13.

Stecula

Hussain

M. S.

Viola

R. E.

Discovery of Novel Inhibitors of a Critical Brain Enzyme Using a Homology Model and a Deep Convolutional Neural Network. J. Med. Chem. 2020, 63, 8867–8875.

14.

Jordi

Guggiana-Nilo

Bolton

A. D.

, et al. High-Throughput Screening for Selective Appetite Modulators: A Multibehavioral and Translational Drug Discovery Strategy. Sci. Adv. 2018, 4, eaav1966.

15.

Dahlin

J. L.

Nissink

J. W. M.

Francis

, et al. Post-HTS Case Report and Structural Alert: Promiscuous 4-Aroyl-1,5-Disubstituted-3-Hydroxy-2H-Pyrrol-2-One Actives Verified by ALARM NMR. Bioorg. Med. Chem. Lett. 2015, 25, 4740–4752.

16.

Pissot Soldermann

Simic

Renatus

, et al. Discovery of Potent, Highly Selective, and In Vivo Efficacious, Allosteric MALT1 Inhibitors by Iterative Scaffold Morphing. J. Med. Chem. 2020, 63, 14576–14593.

17.

Gopalsamy

Aulabaugh

A. E.

Barakat

, et al. PF-07059013: A Noncovalent Modulator of Hemoglobin for Treatment of Sickle Cell Disease. J. Med. Chem. 2020, 64, 326–342. https://doi.org/10.1021/acs.jmedchem.0c01518.

18.

Richard-Bildstein

Aissaoui

Pothier

, et al. Discovery of the Potent, Selective, Orally Available CXCR7 Antagonist ACT-1004-1239. J. Med. Chem. 2020, 63, 15864–15882.

19.

Peng

Y.-H.

Liao

F.-Y.

Tseng

C.-T.

, et al. Unique Sulfur-Aromatic Interactions Contribute to the Binding of Potent Imidazothiazole Indoleamine 2,3-Dioxygenase Inhibitors. J. Med. Chem. 2020, 63, 1642–1659.

20.

Martínez-Viturro

C. M.

Trabanco

A. A.

Royes

, et al. Diazaspirononane Nonsaccharide Inhibitors of O-GlcNAcase (OGA) for the Treatment of Neurodegenerative Disorders. J. Med. Chem. 2020, 63, 14017–14044.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB

0.01 MB