Abstract
Academic researchers looking for material to screen may benefit from plated compound collections provided at no cost except shipping by the U.S. National Cancer Institute (NCI). Four plated sets are available, two of which comprise diverse synthetic compounds. These collections, of ~900 and ~1500 compounds, are a convenient size to screen without automated equipment, and a great deal of data about the compounds is available that increases their usefulness. Despite these positive attributes, the collections contain a relatively large number of compounds that are pan-assay interfering and nonspecific (PAINS) or may have other chemical liabilities. Our experience with the compound collections suggests that, perhaps because they contain PAINS and other compounds with liabilities, the collections will yield hits in many assays. This makes them a valuable resource for testing primary screens and follow-up workflows, but by the same token means that hits might not be attractive leads for further development. The NCI sets have a great deal of value for academic researchers as a source of material for early screening. It might be possible, however, to create a better collection specifically for this purpose. One possibility is to pool ~5000–10,000 carefully selected lead-like compounds into ~1000 wells. A collection like this might also generate hits in a wide variety of assays but avoid the downside of those hits often having liabilities.
Introduction
Academic researchers who become interested in assay development and screening often face an immediate and obvious challenge: finding a source of compounds to screen. Testing effects of more than 1000–2000 compounds without automated equipment is laborious (and, if automated equipment were available, it would likely be provided by a screening center that would also have screening libraries available), which serves to set the size of a suitable compound collection. There are several commercially available plated compound sets of about this size. The Prestwick Compound Library and the Library of Pharmacologically Active Compounds (LOPAC) sets, both of which comprise off-patent drugs and other compounds of interest, are of about this size, and they are representative of a large number of similar sets available from multiple sources. For some, these may represent an appealing choice—there is always the hope of finding an already U.S. Food and Drug Administration (FDA)-approved hit that can be repurposed for therapeutic purposes. Other kinds of collections can also be purchased, such as Sellekchem’s offering of 4208 lead-like compounds provided by Pfizer. Commercial collections can be quite expensive, however, and cherry picks of hits must also be purchased. This may place them beyond the reach of researchers in the early stages of assay development or other exploratory investigation prior to a grant submission. The purpose of this Perspective is to make academic researchers worldwide aware of another source of compounds available to them for screening: plated collections provided at very low cost by the U.S. National Cancer Institute (NCI).
Description of the Compound Sets
The assembly and history of the NCI compound collection, which at present consists of more than 280,000 compounds, ~140,000 of which are available for distribution, have been reviewed relatively recently. 1 Four plated compound sets culled from this collection are available, and samples of individual compounds in vials may also be ordered for retesting or early-stage hit expansion. The only cost for both plated sets and vialed compounds is shipping. The collections available and their current composition are the Approved Oncology Drugs Set (version VIII, 133 compounds, 2 plates), the Natural Products Set (version IV, 419 compounds), the Diversity Set (version VI, 1584 compounds, 20 plates), and the Mechanistic Set (version IV, 813 compounds, 11 plates). Plated sets are sent dissolved to a final concentration of either 1 mM or 10 mM in DMSO, except the Natural Products Set, which is provided in powder form. The Approved Oncology Drugs Set contains most of the currently available FDA-approved anticancer drugs. The Diversity Set was chosen to represent maximal structural diversity and a wide range of potential pharmacophores. The Mechanistic Set was chosen to display a broad range of effects on cell viability (measured in the NCI-60 Human Tumor Cell Lines Screen [NCI-60], described below). In all cases, compounds were selected to comprise samples of >90% purity, with a sufficient quantity available on hand to make resupply likely. For researchers interested in screening, the Diversity Set and Mechanistic Set are the obvious choices, since they comprise synthetic compounds and are the largest collections. Detailed descriptions of the compound sets and structure data files (SDFs) can be found at https://dtp.cancer.gov/organization/dscb/obtaining/available_plates.htm.
Data and Utilities Augment the Compound Collections
A great deal of data is available about the NCI compounds that adds significantly to their utility. Basic chemical data (2D and 3D structures, formula weight, SMILES [simplified molecular-input line-entry system], and trivial and systematic names [when available]) are available online, and they can be obtained for individual compounds or downloaded in bulk. In addition, the effects of many compounds on 48 h growth of 60 representative cancer cell lines (the NCI-60 cell lines2,3) selected to represent major human cancer types have been tested repeatedly using a sulforhodamine B assay measuring total protein. These data are available for download for individual compounds or in bulk. For some compounds, single-point responses were measured, while for others (including the majority of compounds in the plated sets), 5-point dose–response curves are available. Access to these data allows researchers to do several important things. First, it can provide a rapid means of assessing whether a compound is toxic to cells. Second, a utility called COMPARE, which analyzes patterns of growth inhibition among the 60 cell lines, has been developed that can be used to identify other compounds in the NCI collection that might have similar actions.4,5 Expression of most human genes has been examined in the NCI-60 cell lines, and COMPARE can cross-correlate these data with effects of compounds on proliferation to identify compounds that are likely to act on a specific gene product. One caveat to the use of this tool: Compounds must affect proliferation in a way likely to give rise to a meaningful signature. Compounds that do not affect proliferation in any cell line or that kill all cell lines potently, for example, are unlikely to generate useful matches.
Finally, links to PubChem data are available for many NCI compounds. PubChem provides data for “substances,” which are samples of compounds deposited by a particular vendor or organization such as the NCI, and also for “compounds,” which are all samples of a particular compound that have been deposited. Finding the PubChem compound ID associated with Cancer Chemotherapy National Service Center (NSC) compounds makes it possible to determine whether NSC compounds have been tested by others in assays beyond effects on growth, and to find structurally similar compounds. In some cases, samples of similar compounds will also be available from the NCI.
There Is Some Bad News: The Sets Appear to Contain Many Unattractive Compounds
Taking all of the features described above into account, the NCI plated sets are a good source of material for academic researchers to use for preliminary screening. They may have a significant drawback, however: They were never intended as a source of lead compounds for screening, and so they include many compounds with apparent chemical liabilities that may make them unattractive for development as probes or drug leads. This can be a particular problem for academic researchers, particularly those new to screening and chemical biology, who may not be trained to recognize compounds with chemical liabilities and the difficulties they pose.6,7 It certainly was for my lab when we started using the NCI sets (see below for a brief summary of our experience with the sets).
One way to identify undesirable compounds is by using software filters. Pan-assay interfering and nonspecific compounds (PAINs),6,7 which were defined originally as compounds that either react with multiple target proteins via covalent interactions or interfere with alpha-screen assay detection technology, can be identified with specific filters. These compounds may generate false positives in assays, or, if activity is genuine, are likely to be so broadly active as not to be useful. Additional filters that can be used to eliminate compounds with liabilities have since been developed, including the rapid elimination of swill (REOS) filter,8,9 which was developed at Vertex (Boston, MA) and combines a rules-based approach based on physicochemical properties like lipophilicity, molecular weight, and hydrogen-bonding characteristics with a list of undesirable chemical groups to flag compounds.
I examined the properties of the Mechanistic and Diversity sets using three PAINS filters and the REOS filter available in Schrödinger Suite’s Canvas software (Schrödinger, New York, NY). In version III of the Mechanistic Set, 17.5% (143/813) of compounds failed one or more of the three PAINS filters, and 65% (530/813) failed the REOS filter. The Diversity Set fared somewhat better—in version V, 11.0% (176/1593) of compounds failed one or more of the three PAINS filters, and 44.8% (714) failed the REOS filter. Although detailed analysis comparing the NCI sets to other screening libraries is beyond the scope of this Perspective, based on the literature, the NCI sets seem to contain more PAINS and swill than other sets. Dahlin and Walters 10 analyzed the contents of four compound collections, and they found that ~5% failed the PAINS filters and ~25% failed the REOS filter. Huggins et al. analyzed eight sets and found that 25–50% failed REOS filtering. 9 The REOS filter thus appears to flag many more compounds than the PAINS filters, and it is reasonable to question whether such a broad filter is valuable. When I applied the filters to the Approved Oncology Drug Set, no compound failed the PAINS filter but several failed the REOS filter, including cabozantonib (a nonspecific tyrosine kinase inhibitor [TKI]), plerixafor (a CXCR4 antagonist), alectinib (an ALK inhibitor), and osimertinib (an epidermal growth factor receptor [EGFR] TKI). Arguing in favor of the REOS filter’s utility, Walters and Namchuk found that applying REOS filtering to hits from a kinase screen reduced by ~60% the number of hits identified as unsuitable by a group of medicinal chemists. 11 Certainly, that so many compounds in the NCI sets fail these filters is a factor that must be taken into account when using the sets and evaluating hits. An Excel file containing the compounds in the Mechanistic and Diversity sets that fail the filters is available from the author on request.
Our Experience with the NCI Sets
The NCI sets have clearly seen use in screening by academics, but it is difficult to determine exactly how much and what was found. A PubMed search using (NCI[Text Word]) AND “diversity set”[Text Word] returned 100 results, many of which appear to be in silico virtual screens; the same search performed on the Mechanistic Set returned only three results. This suggests that the Diversity Set is being used more than the Mechanistic Set, which seems appropriate given the higher proportion of undesirable compounds in the Mechanistic Set.
We have used the Mechanistic and Diversity sets in several different screens. We used two in-cell assays based on stable expression of cyan fluorescent protein–yellow fluorescent protein (CFP-YFP) fluorescence resonance energy transfer (FRET) reporters in K562 erythroleukemia cells to screen the Mechanistic Set for compounds that affect metabolic processes. 12 Using a FRET sensor for intracellular adenosine triphosphate (ATP), we identified 14 inhibitors of nonglycolytic oxidative phosphorylation-dependent ATP production and 13 compounds that inhibited glycolysis, six of which also blocked nonglycolytic ATP production by >50%. All of these compounds were found active in orthogonal assays of their function. The hit rate in these screens was ~2%. Three of the inhibitors of nonglycolytic ATP production failed one or more PAINS filter and the REOS filter, however, while another five failed the REOS filter alone. Four of the glycolysis inhibitors failed one or more PAINS filter; three of these also failed the REOS filter, as did eight other compounds that were not PAINS. Ultimately, none of the compounds we found were considered suitable for further development. As part of this project, we also screened the Mechanistic Set for glucose transport inhibitors, identifying one compound that was confirmed active that passed both the PAINS and REOS filters. An important component of this study involved using NCI growth data on the compounds in the Mechanistic Set to provide evidence that glycolysis inhibition is likely to be an effective strategy for blocking proliferation. This highlights how useful the data available about the compounds can be.
We also screened the Mechanistic Set for compounds that act as either agonists or antagonists of C1 domains using a multiple read assay based on expression of a FRET reporter for diacylglycerol. 13 We confirmed that 16 compounds demonstrated antagonist activity in the primary screen, a hit rate again of ~2%. All of these compounds also blocked translocation of the sensor from cytosol to membrane as expected when measured using microscopy, which served as an orthogonal assay, but none demonstrated the expected inhibitory effects on activation of protein kinase C (PKC)—in fact, some activated PKC. Four of the hits failed a PAINS filter. Two of those compounds also failed the REOS filter, as did 11 other compounds. Only one compound passed both PAINS and REOS filters. Eight of the 16 compounds contained mercury, lead, gold, tin, or copper, and one contained selenium. None of the 16 compounds was considered suitable for further development.
Finally, we screened the Diversity Set for compounds that enhance lytic granule release by cytotoxic T lymphocytes using a high-throughput flow cytometry assay based on antibody binding to a lysosomal membrane protein that becomes exposed to the extracellular solution as a consequence of exocytosis. 14 Of six hits that we confirmed active in the primary assay after obtaining resupply, four failed the REOS filter. None failed a PAINS filter. We are currently working to confirm effects of these compounds in orthogonal assays, and a number of other hits in the primary screen have yet to be confirmed or assessed for lead-like qualities. It is not yet clear whether any of these compounds will be attractive leads for further development.
Summary and Recommendation
Our experiences described above suggest that the NCI sets are a valuable resource for academics. Perhaps because the libraries contain many nonspecifically reactive compounds, we obtained a relatively large number of hits in several different assays, many of which were confirmed active in orthogonal assays. This suggests that screening the sets will generate hits in many, if not most, assays. However, if, as our experience suggests, many of those hits are PAINS or swill, then they will not be terribly attractive either as probes or as candidates for further development, which is not ideal. I was not aware of the problems with chemical liabilities when we started working with the sets, and I believe academic researchers who use the sets need to be cognizant of the issue.
Even when hits have liabilities, however, our experience indicates that finding them serves to validate a screen, and they can still be very useful for developing, testing, and refining postscreening experimental workflows, including orthogonal and confirmatory assays. Taking into account factors such as cost, availability of resupply, information available about compounds, and the apparent likelihood of obtaining hits, I am not aware of a better alternative to the NCI sets available at present for academics to use in early-stage screening, and I hope that this Perspective encourages others to use them appropriately.
Having outlined the utility of the sets, however, I believe that it would be both possible and desirable to create something even better. In my opinion, an alternative should, like the NCI sets, be contained in 800–1500 wells, because that is a convenient size to work with in the absence of automated equipment. Compounds should be selected to avoid PAINS and swill, thus more likely representing reasonable lead-like structures. If the reason that the NCI sets give so many hits in different assays is because they contain PAINS, swill, and the like, however, then the hit rate with such compounds would likely be much lower than with the current NCI sets, and many screens might fail to generate hits. One possible solution to this conundrum is to create a pooled library 15 in which each well contains 5–10 compounds. There are clear drawbacks to this approach—active compounds in a hit well would have to be identified after screening, and there is always the possibility of effects emerging from synergy between two or more compounds—but these seem to me to be manageable in the first instance and possibly very interesting in the second. Such a collection could be assembled from NCI compounds, in which case existing infrastructure could be used for distribution and support, or it could be created and supported by industry as a means of enhancing academic drug discovery efforts. Readers of this Perspective may disagree about the approach. I urge those with different or better ideas to act—academic researchers around the world would likely be grateful to them for creating a valuable tool.
Footnotes
Acknowledgements
I would like to thank David Barrett and Meghan Wyatt for helpful discussions. Supported by National Institutes of Health (NIH) Grant R01 AI 120169.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
