Selecting Approaches for Hit Identification and Increasing Options by Building the Efficient Discovery of Actionable Chemical Matter from DNA-Encoded Libraries

Abstract

Over the past 20 years, the toolbox for discovering small-molecule therapeutic starting points has expanded considerably. Pharmaceutical researchers can now choose from technologies that, in addition to traditional high-throughput knowledge-based and diversity screening, now include the screening of fragment and fragment-like libraries, affinity selection mass spectrometry, and selection against DNA-encoded libraries (DELs). Each of these techniques has its own unique combination of advantages and limitations that makes them more, or less, suitable for different target classes or discovery objectives, such as desired mechanism of action. Layered on top of this are the constraints of the drug-hunters themselves, including budgets, timelines, and available platform capacity; each of these can play a part in dictating the hit identification strategy for a discovery program. In this article, we discuss some of the factors that we use to govern our building of a hit identification roadmap for a program and describe the increasing role that DELs are playing in our discovery strategy. Furthermore, we share our learning during our initial exploration of DEL and highlight the approaches we have evolved to maximize the value returned from DEL selections. Topics addressed include the optimization of library design and production, reagent validation, data analysis, and hit confirmation. We describe how our thinking in these areas has led us to build a DEL platform that has begun to deliver tractable matter to our global discovery portfolio.

Keywords

DEL DNA-encoded library affinity selection hit identification library screening

Introduction

Modern drug discovery that is focused on delivering novel, small-molecule therapeutics to treat disease, via either target-based or phenotypic approaches, typically identifies bioactive starting points (“hits”) through the screening of a greater library of molecules. This hit identification (hit ID) process is a fundamental step for the launch of new discovery campaigns and carries with it significant accountability, as it is the initial direction-setter for a project team, ultimately determining the path to the clinic and dictating the chemical characteristics of the moiety tested. In this article, we describe recent advances in hit ID technologies, contemplate the suite of options that are available, and discuss some of the influencing factors when designing a new hit ID strategy for a program in a large pharmaceutical organization such as Pfizer. Furthermore, we delve into the decision to expand our hit ID platform by investing in DNA-encoded library (DEL) technology and share our experiences and current approaches to develop a robust and reliable source of high-quality lead matter for our project teams.

In the 1990s and early 2000s there was a surge in traditional high-throughput screening (HTS). We saw rapid advancements in screening methodologies and automation, including user-friendly, fluorescent readouts, homogeneous assay protocols, miniaturization to standardized 384- and 1536-well formats, and increasingly sensitive image-based detection systems. An increased adoption of combinatorial chemistry and a crusade across the industry to expand compound files through thoughtful exploration of chemical space, often for specific target classes, fueled this expanded screening capacity. The trifecta was delivered with the biological exuberance of the Human Genome Sequencing Project¹ and advances in recombinant DNA, which provided the drive to propose mechanistic hypotheses for new targets to screen. Thus, for a period, the industry screened and waited for emerging metrics to demonstrate that testing of increased numbers of compounds with high chemical diversity would lead to an enhanced ability to identify successful leads. And there are indeed some examples that could be taken to justify this hope; in screens of 2.6–3 million compounds, only a single lead molecule emerged that effectively inhibited the secretion of PCSK9;² a single noncatechol hit was validated as a dopamine D1 receptor agonist for cAMP production,³ and within a single series that was identified, only one compound demonstrated the desired AMPK activator profile.⁴ The importance of diversity seems to hold true beyond the Pfizer compound file; for example, Han et al.⁵ from Roche describe identifying a singleton hit from a 1 million-compound phenotypic screen that sought inhibitors of hepatitis B surface antigen secretion. Additionally, Forma recently described the identification of tractable hits only after addition of a newly synthesized DEL to an already extensive DEL collection.⁶

Overall, however, this golden age of screening appeared to be insufficient to offset the escalating costs of drug development and to solve the challenges of clinical attrition. Recent analyses based on 106 new drugs from 10 pharmaceutical firms calculate that overall investment in discovery and clinical development approaches $2.6 billion for each successful launch,⁷ and there is a clear trend for increasing costs in recent years. These costs not only cover an individual project’s discovery and development effort, but also must compensate for the expensive failure of other programs within the clinic where, consistently, efficacy has remained the reason for almost 50% of clinical failures in phase II and phase III since 2008.^8,9 Consequently, multiple strategies have been proposed and implemented across the length of the research and development (R&D) pipeline aimed at increasing success in pharmaceutical R&D.^10–13 At the earliest drug discovery stages, the evolution of screening strategies in hit ID, hit validation, and lead optimization, and in pharmacokinetic and safety profiling,¹⁴ has increased the breadth of the discovery screening funnel, enhanced its filtering efficiency and quality, and improved compound selection,¹⁵ thus yielding better positioned starting points from which to launch a medicinal chemistry program, and an enhanced ability to identify series with more favorable pharmacology, ADME (absorption, distribution, metabolism, excretion), and safety profiles. The end result is a shorter path to the clinic and an increased confidence that the new asset possesses the chemical and biological credentials to adequately test therapeutic hypotheses in vivo.

Within the realm of hit ID we have seen parallel strategies, the first to make assays more physiologically relevant, moving away from simple reductionist models and “bringing the patient to the dish,” in the hopes of improving translation.^16,17 This effort naturally aligns with the resurgence of phenotypic screening, which provides the opportunities to identify completely novel targets and links to disease, to illuminate pathway biology for optimal target selection and capitalize on the potential polypharmacology of test compounds.^18–21 Phenotypic programs, however, tend to come with a consequent increase in investment, for example, the cost and time required to build and culture disease-relevant cell models (potentially testing multiple donors), the challenges of hit triage²² and target deconvolution, the risks of investing in novel, often poorly characterized targets that may emerge, and the liability of potential downstream safety hurdles when proceeding with unknown mechanisms of action.

A complementary approach has been to introduce rapid target-based biophysical screens such as affinity selection mass spectrometry (ASMS),^23–25 sometimes known as automated ligand identification system (ALIS)^26–28 and DEL selection.²⁹ Although there are different methodologies for ASMS prosecution (reviewed in Andrews et al.³⁰), all rely on the same basic principle of incubating compound libraries (often >2500 compounds at one time) with a target and then subsequently isolating the target along with bound compounds and using accurate mass analysis to identify the small molecules present.²⁵

DELs take compound compression log orders of magnitude higher, incubating a target simultaneously with billions of compounds, each encoded with DNA tags. Ligands are identified by next-generation sequencing (NGS) of the DNA tags following separation of the target–ligand complex and the release of bound molecules. Once biophysical platforms are established, they can require considerably less investment per campaign and can provide early insight into druggability as well as tools to build target confidence in addition to providing leads for chemistry efforts. ASMS and DEL can provide advantages in cost, speed, access to chemical space diversity/density, and the ability to tackle varied biological conditions in parallel with the obvious disadvantages of being unable to address function and of generally providing reduced physiological context ( Table 1 ).

Table 1.

Differentiators for Hit ID Technologies.


	Knowledge-Based Screening	Diversity Screening	ASMS	DEL Screening	Fragment-Based Screening
Typical library size	<500,000	~500,000 to 4 million	~500,000 to 4 million	2–200 billion	2000–10,000
Library chemistry	Diverse	Diverse	Diverse	On DNA and in water	Greater sampling of chemical space per compound
Compound screening	1 compound	1–20 compounds	~2500 compounds	Entire library screened at once	1–10 fragments
Approach	Binding/functional; cell-based and phenotypic screens possible	Binding/functional; cell-based and phenotypic screens possible	Binding; not cell based	Binding; not cell based	Binding. Not cell based
Protein requirement (50 kD)	Medium/varied	High/varied can be as high as >200 mg to 15 g	Low ~2 mg for ~1 million compounds	Low ~2 mg	Medium: 6–8 mg for binding and SPR 5-10 mg for x-ray
Number of conditions tested	Typically 1	Typically 1	Typically low 1–3	Multiple, frequently 3–12	Typically 1
Typical MW	~450	~450	~450	~400–800	<300
Hit KD	~10⁻⁷ to 10⁻⁵ M	~10⁻⁷ to 10⁻⁵ M	~10⁻⁸ to 10⁻⁵ M	~10⁻⁸ to 10⁻⁷ M	~10⁻⁵ to 10⁻³ M
Kinetics	Depends on technology used	Depends on technology used	Tends to bias for slow off-rate—elution step (0.1–1s)	Tends to bias for slow off-rate—wash step	Expected fast off-rate—methods designed accordingly
Portfolio tractability	Precedented targets including GPCRs, ion channels, kinases, and complexes	Full portfolio including GPCRs, ion channels, kinases, and complexes	~50% of portfolio; success with protein complexes and membrane proteins	~50% of portfolio; whole cells have been successful; some evidence for suitability for membrane proteins	~50% of portfolio; stabilized membrane proteins can be successful
Assay development	Can be complex	Can be complex	Simple if protein is well behaved	Simple if protein is well behaved	Little optimization required
Primary hit rate	Should be higher than diversity screening	Typically 0.5%–1%	0.01%–0.5%	Very low (in part because numbers followed up are typically low)	High: 5%–20%
Speed of confirmation	Rapid-compound reorder	Rapid-compound reorder	Rapid-compound reorder	Slow, requires synthesis	Rapid-compound reorder
Computational support needed	Significant requirements	Some requirements for triage	Low support, for hit list and some requirements for triage	Short-term high investment for hit list	Some valuable for follow-up—requires structural biology support for optimization
Campaign time (variable on investment)	6 months	6 months–1 year	3–4 months	6 months	6–9 months
Campaign cost (once library and platform in place)	Medium	High ≥$450,000	Low	Low	Low
Capital investment	Medium capital for screening	High capital investment	Medium capital investment	Minimal capital investment beyond library; moderate if internal sequencing	Moderate capital investment

Layered onto these overarching platform themes we have seen an evolution in the substrate for hit ID campaigns that can render one approach more or less favorable. The desire for truly novel cures for disease drives priorities toward first-in-class targets and novel ways to modulate known pathways/targets that have high confidence in rationale (CIR) but have previously proved intractable across the industry. Hence, for example, we see investment in protein homoeostasis,^31–34 RNA targets,^35,36 and novel members of gene families that have precedence for druggability, for example, kinases, G-protein-coupled receptors (GPCRs), and solute transporters,³⁷ where companies can exploit prior medicinal design experience. Furthermore, we see a trend for efficiency gain through the multiplexing of endpoints or targets and a desire for the extraction of more parameters (high-content data) from each screen to increase the richness of the data set and thus the knowledge that can be derived from every experiment.^38–40

Typically, each hit ID strategy will begin with a thorough assessment of the extent of target (or disease) knowledge, the competitive landscape, the mechanism of action desired as it relates to laboratory objectives or product profile, and the available reagents or methodologies to test biology rationale and to find target modulators (see Box 1). The competitive landscape and the available chemical equity will often direct the consideration of chemical space that should be addressed. The full Pfizer HTS file of >3 million compounds is viewed internally with high regard in terms of chemical attractiveness, being a curated set acquired from multiple legacy organizations and project-directed synthetic efforts. It is accessible for traditional high-throughput diversity screening in either singleton or compressed formats and also for ASMS, which has the advantage of requiring relatively little assay development (provided the protein is facile to work with), low protein requirements, a rapid turnaround time, low cost, and speedy access to hits for follow-up studies. However, ASMS can be limited in its applicability by proteins with low stability (typically for our needs, stability for 48 h at 4 °C in the presence of 4 % DMSO is desired) and is potentially biased to compounds that have a minimum half-life in the target–ligand complex of ~1 s (which corresponds to a dissociation rate constant [k_off] of approximately 0.7 s⁻¹ or slower) due to the time taken for elution from the size exclusion column.^25,27

At Pfizer we have a long history of traditional HTS and multiple examples of successful clinical candidates and marketed products (Xeljanz,⁴¹ Slentrol,⁴² Maraviroc,⁴³ PH-797804 [p38 kinase],⁴⁴ and PF-05180999 [PDE2A]⁴⁵). However, the investment into delivering an HTS is frequently high in terms of assay development time and expertise, reagent costs, and timelines, and we have had multiple experiences over the years of HTS campaigns that have had a low return on investment. A low-value HTS has typically been due to running the screen too late in a project life cycle (i.e., when chemical lead matter is already available), running a screen too early in a project life cycle (i.e., when biological understanding has yet to be fully developed), or the screening technology selected failing to deliver valid and viable hits for the mechanism in question. In our minds, there are three key drivers in the decision to use a classical HTS approach to screening: (1) CIR trajectory (or increasing likelihood of biology translation); (2) chemistry need, including the likelihood of equity; and (3) screening precedence (scalability and precedence for technology being successful in the target class). High CIR, such as a compelling case from human biology for causality and functionality in disease; a high chemical need for novel equity, such as an unprecedented target or lack of intellectual property space; and/or a well-precedented screening technology for valid hit ID would all support investment in large-scale HTS, and project teams often proceed with large-file HTS when at least two of these parameters are in their favor.¹⁴ It is also worth remembering that in almost all cases, some kind of high-throughput functional assay is necessary downstream of hit ID methods for orthogonal hit validation and/or to support chemistry structure–activity relationship (SAR) efforts, so the time invested into the development of assays that may be suitable for medium- to high-throughput screening is rarely wasted.

Fragment-based drug discovery (FBDD) has solidified its place in the hit ID toolbox in the past 20 years and has propelled ~30 drugs to various stages of the clinical pipeline.⁴⁶ Fragments are generally chemical groups with <20 nonhydrogen atoms, and fragment libraries are typically limited collections (<10,000 compounds) of low-molecular-weight (low-MW) libraries (150–300 Da), which, though small, have the advantage of efficiently exploring chemical space. Fragments can bind to multiple target sites in a variety of poses with high binding energy per atom and may exhibit less steric hindrance or electrostatic repulsion in a binding site compared with drug-like molecules (MW = 500 Da). HTS requires a biochemical or cell-based assay to measure a binding outcome, but these screens are not usually suitable to qualify the weak interactions of fragments and targets; however, since FBDD monitors direct binding to the macromolecule, no such assays are needed at the outset. Biophysical techniques with high detection sensitivity can be used to monitor the weak interactions, including surface plasmon resonance (SPR), microscale thermophoresis (MST), capillary electrophoresis, weak affinity chromatography, biolayer interferometry/ultrafiltration, native MS, and isothermal titration calorimetry, although x-ray crystallography and nuclear magnetic resonance (NMR) tend to afford the greatest throughput in an industrial setting.^47,48 Fragment screening generally offers higher hit rates, often on the order of 5%–20%⁴⁸ compared with HTS, which is typically <1%; nevertheless, fragment hits are usually weak binders with fast off-rates and must be developed into higher-affinity, larger molecules to become leads, and this is typically the bottleneck in the process. Our experience is that FBDD programs are most successful when structure enabled, and that the application of orthogonal biophysical and biochemical assays is critical for validating hits, prioritizing for x-ray crystallography, and building confidence for medicinal chemistry efforts.⁴⁸ As a result, the availability or the high likelihood of generation of a high-quality crystal structure is a prerequisite for us to initiate an FBDD campaign.

DELs have the advantage of being orders of magnitude larger than HTS files, and a campaign of DEL selection and sequencing can also be completed in only a few days to weeks.⁴⁹ Additionally, running multiple conditions is a standard approach, for example, with or without known ligands, with binding complex partners, with mutated targets and selectivity targets, with varying target protein concentrations, and so forth, and this can provide unique insight into the molecular mode of action of ligands. However, the compatible chemistries for DEL builds are more restricted since they are performed in water and must not significantly damage the DNA tag.⁵⁰ The largest libraries are typically formed by the combination of monomers in three or four cycles of chemistry, which tends to lead to high-MW molecules that can be unattractive starting points for chemistry teams, and it is our experience that the follow-up of hits off DNA requires first a confirmation of true binder identification and then resynthesis that (depending on available resources) can be an activation hurdle for chemistry engagement. In the experimental part of this article, we describe how we have built our internal DEL process to overcome some of these challenges and make our DEL platform a more appealing primary strategy for project teams.

Incorporating the factors described above, Table 1 provides a summary of many differentiators (based on both practical and fundamental scientific principles) that may serve as a useful reference to inform the choice of hit ID strategy. One final factor to bear in mind is that there is inevitably a constraint within any organization that comes with considering a portfolio of programs in a holistic manner and fitting them within the capacity of the available capital resources, flexible budget, and colleague expertise. We acknowledge that in an ideal situation, parallel approaches to hit ID will often lead to novel insights through integration of complementary data sets, an advantage that is well described by Leveridge et al.;⁵¹ however, in a world of restricted budgets and pressures to deliver clinical candidates with lean investments, we often find that we are required to limit the number of arms to our hit ID strategies and prioritize resources accordingly. With this in mind, we sought to enhance the impact of our DEL platform in order to fully exploit its unique combination of attributes.

Materials and Methods

General

DELs were prepared using a split-and-pool methodology at HitGen Ltd. (Chengdu, China) following a general strategy similar to that described in Kung et al.⁵² Bromodomain 1 of BRD4 (cat. 6x-His-tev-BRD4-1(44-170)) was purchased from XTAL Biostructures (Natick, MA). The magnetic affinity beads used were either Neutravidin SpeedBeads (GE Healthcare, Piscataway, NJ) or His-Tag DynaBeads (Thermo Fisher, Carlsbad, CA).

DEL Selection and Data Analysis

DEL affinity selection and chemoinformatic analysis was conducted essentially as described in Chen et al.⁵³ For Figures 2 and 3 below, the target contained a C-terminal Avi-tag that was biotinylated during expression to provide a protein that contained biotin residue after purification. This protein was immobilized onto Neutravidin Speedbeads and blocked with biotin or biocytin prior to selection.

Bead-Assisted Ligand Isolation

BRD4 containing a His-Tag (0.75 µM nominal concentration) was incubated with a Cy5-conjugated analog of PFI-1⁵⁴ for 30 min in a total volume of 40 µL of assay buffer (50 mM HepesNa, 150 mM NaCl, 0.01% Tween 20, 1 mM TCEP, pH 7.4). Following this equilibration step, the samples were transferred to wells of a PCR plate that contained prewashed His-Tag DynaBeads. The beads were resuspended and allowed to incubate at room temperature for 10 min. Magnetic separation was performed with a ring magnet (Alpaqua Engineering, Beverly, MA). The supernatant was aspirated to a receiver plate, and the protein was eluted from the beads in assay buffer supplemented with EDTA (10 mM). Following transfer of the eluate to the receiver plate, the fractions were evaluated in an Infinite M1000 plate reader (Tecan Group, Männedorf, Switzerland).

Bead-Assisted Ligand Isolation–Mass Spectrometry

Experiments were performed according to the protocol above using JQ1⁵⁵ as a titrant with the following exceptions: the BRD4 protein concentration was increased to 1.0 µM, and after removal of the supernatant fraction, the beads were resuspended in high-performance liquid chromatography (HPLC)-grade water (20 µL). Bound compounds were eluted from the beads by adding an equal volume of 90% acetonitrile containing 1% formic acid. The samples were analyzed on a Sciex 6500+ triple quadripole mass spectrometer fitted with a LeadSampler 1 autosampler (Sound Analytics, Niantic, CT) and Agilent (Santa Clara, CA) 1290 HPLC system. As needed, the samples were diluted with 50% acetonitrile to bring them within the dynamic range of quantitation. Data were processed with LeadScape Analyst 1.7 (Sound Analytics).

Multiparameter Optimization Score Calculation

Our DEL multiparameter optimization score (MPO_score) is weighted heavily by MW. For every compound in the analysis, individual molecular properties are assigned a score between 0 and 1 according to the descriptions below. Calculation of the MPO score is accomplished according to eq 1, where the sum of cLogP, aromatic ring count, and hydrogen bond donor count scores are multiplied by the MW score to provide a parameter ranging between 0 and 3. The individual and total scores are calculated as follows:

MW score (MW_score): Score of 1 for MW below 400; linear gradient between 1 and 0 for MW between 400 and 600; score of 0 for MW above 600.

Calculated LogP_score (cLogP_score): Score of 0 for cLogP below −1; linear gradient between 0 and 1 for cLogP between −1 and 1; score of 1 for cLogP between 1 and 3; linear gradient between 1 and 0 for cLogP between 3 and 5; score of 0 for cLogP above 5.

Aromatic ring count_score (ARC_score): Score of 1 for 2 or less aromatic rings; score of 0.5 for 3 aromatic rings; score of 0 for 4 or more aromatic rings.

Hydrogen bond donor count_score (HBD_score): Score of 1 for 1 or 2 or less hydrogen bond donors; score of 0.5 for 3 hydrogen bond donors; score of 0 for 4 or more hydrogen bond donors.

MPO_score calculation:

MPO = M W_s c o r e * (\begin{array}{l} c L o g P_s c o r e + A R C_s c o r e \\ + H B D_s c o r e \end{array})

(1)

Box 1:

Selecting Hit ID Approaches

Know the landscape
• Is the target precedented in the literature?
Prior success for diversity screening may be known
• Target knowledge may support file mining and target family-focused subsets
• If unprecedented, rapid approaches like ASMS and DEL may provide insight into druggability
• How much chemical equity is in the literature?
• Use literature matter to seed knowledge-based screening and ligand-based file mining
• If limited equity, ASMS or DEL may provide early tools
• Is the target structurally enabled, or is there a high probability of success for structure determination?
• If yes, then FBDD is more feasible
• If yes, then there may be an opportunity for structure-based virtual screening and file mining
• Is this a competitive space?
• If yes, then consider higher-level investments in parallel and include some rapid readout screens
Know the biology/mechanism of action desired
• Is a functional readout necessary, or will binding suffice?
• ASMS and DEL are rapid options for binders
• HTS for functional or phenotypic approaches
• Is there significant value in screening multiple conditions simultaneously?
• Alternate protein conformations, active site mutations, varying states of activation, pH variations, binding complex partners, selectivity or competition with known binders, and so forth, can be explored by DEL (and ASMS and FBDD to some extent)
• Is kinetic bias a concern?
• Weak fragment binders typically yield fast off-rates
• ASMS limited to slower off-rates; likely applies to DEL as well
Know your target form
• Is it a purified protein or cell-based system?
• ASMS and DEL are rapid options for binders
• Is it a soluble protein versus integral membrane protein versus protein complex?
• Less experience for ASMS/DEL with membrane proteins and complexes, but examples have been successful^71–73
• How much stable, highly purified protein can be generated?
• Typical protein requirements: DEL < ASMS < FBDD < HTS
• Is the protein suitable for both biophysical and biochemical assays?
• Biophysical approaches will particularly complement binding screens (ASMS/DEL/FBDD)
• Orthogonal approaches are valuable for downstream validation

Results and Discussion

Implementing a DEL Platform for Hit ID: Starting Point and DEL Design

DELs provide an exciting opportunity to build diverse chemical collections unbiased by previous organizational therapeutic areas of focus and promise to enable the search for small-molecule ligands to first-in-class targets of academic and industrial importance. The concept of encoding small molecules was first introduced in 1992 by Sydney Brenner and Richard Lerner⁵⁶ and was subsequently reduced to practical routines that include the DNA encoding of multistep library synthesis and hit ID via affinity selection paired with high-throughput sequencing.^57,58 Early demonstrations garnered much attention, for example, the identification of small-molecule ligands to tumor necrosis factor α,⁵⁹ and in the same year, Praecis Pharmaceuticals reported their encoded library technology (ELT) industrialization of split and pool synthesis, along with the discovery of various Aurora A and p38 MAP kinase inhibitors from an 800 million-member DEL.⁴⁹ A summary of the subsequent evolution of ELT at GlaxoSmithKline (GSK; née Praecis) can be found in Arico-Muendel.²⁹ Following the Praecis acquisition by GSK, there was an explosion of ELT-based biotech companies (X-Chem, HitGen, Vipergen, Ensemble Therapeutics, and Philochem) and a rapid uptake of DEL by other pharma (e.g., Roche, Novartis, Lilly, AstraZeneca, and Pfizer). A description of different approaches to compound selection and detection is provided in Chan et al.⁶⁰ and Salamon et al.⁶¹ provides a good overview with multiple examples of chemical tools successfully identified from DELs.

Over the past decade, Pfizer has strategically aligned with partners to explore the promise of ELT: Ensemble Discovery,⁶² X-Chem,⁶³ and HitGen.⁶⁴ Through the most recent partnership with HitGen, we leveraged our organizational expertise in both parallel medicinal chemistry and external collaborations to establish and apply DNA-compatible chemistry using our corporate building block collection to deliver DELs containing designed warheads that possess lead-like properties. Coupling this intuitively advantaged starting position with the speed of DEL hit ID campaigns is an attractive approach to accelerate our delivery of clinical assets. In addition to the benefit of speed, the ability to employ multiple selection conditions in parallel has been demonstrated to provide qualitative assessments of selectivity⁶⁵ and affinity⁶⁶ through the inclusion of an antitarget or varying protein concentrations, generating a rich biological profile for each compound in the screen. This profile enables the prioritization of compounds for follow-up that possess the desired combination of biological attributes. However, while there is often a clear path to nominating excellent in vitro tools, their optimization into viable drug leads remains a formidable hurdle. Embracing this challenge, molecular property optimization of DEL products has become the cornerstone to our DEL design philosophy, and while not a new concept, its implementation has received little attention in the literature.

At first glance, it is an attractive thought to raid the corporate building block collection to create large combinatorial crosses for synthetic designs that employ three or four cycles of chemistry. Such approaches have been described to produce DEL collections of unprecedented size, with the largest report of 40 trillion discrete library members.⁶⁷ While a staggering scale, our assessment of designs that employed four cycles of chemistry indicated that much of the designed matter possessed an MW far above what is considered ideal for drug-likeness. Additionally, our early project team experience found that truncations of high-MW DEL hits lost activity and yielded poor SAR. We have developed new (to DEL) chemistries that feature carbon–carbon bond-forming transformations, targeting more drug-like space with a higher sp3 fraction.^68–70 Incorporating this opportunity, our efforts have focused on library builds that employ chemistries that improve drug-like properties and limit the warhead MW by incorporating only two or three building blocks. Even so, library designs can easily exceed a billion compounds, a count that creates an incredible computational burden to enumerate. We have described a pragmatic solution to this problem,⁷⁴ where properties are determined for representative sublibraries and these subsets are used to guide design. Even with these approaches, the complete combinatorial cross created through split-and-pool synthesis requires further diligence to ensure that the finalized library faithfully represents the property space of the original design.

Reagent Validation

Critical to the success of any hit ID platform is the quality of the reagents used in the primary screening assay; in the context of DEL discovery, the validation of these materials is no exception. Of course, it must be appreciated that the effort expended to characterize protein reagents comes at the expense of swiftness, which is a desirable characteristic of DEL. While an application has been described that leverages the technology as an unbiased approach to prioritize targets for ligandability,⁷⁵ the philosophy does not extend well to targets that have a high CIR for their modulation. In these instances, the probability of clinical success takes priority, and our success as a hit ID platform is measured in the ability to deliver tractable starting points to these programs, regardless of the apparent doability.

In contrast to HTS, where usefulness of a reagent (e.g., highly specific ligand or substrate) can be greatly influenced by the design of an assay that will tolerate poor folded fractions or crude tissue extracts, affinity selection is driven by the concentration of the protein under study and thus places an absolute requirement on the material’s folded fraction. Anecdotally, it has been our experience that for enzymatic and receptor targets, a cavalier approach that includes limited validation risks the generation of unpredictive results, which at best constitute a low quantity of actionable ELT signals and at worst signals that are misleading due to the affinity selection of library components by the misfolded protein. In the former case, the ligandability of the target has not adequately been investigated, and in the latter, considerable resources are expended in the pursuit of false-positive signals.

With this in mind, we typically focus our reagent validation efforts to quantitatively assess the active site content of our protein reagents, both before and after immobilization on affinity resin, as this metric predominates any changes to apparent rate or signal size in a biochemical assay that may occur when enzymes are immobilized. Such changes may result from, for example, a loss of translational freedom in three-dimensional space or an incompatibility of detection reagents with salmon sperm DNA or salt concentrations used in the selection buffer. One such experiment with BRD4 is shown in Figure 1A , where a considerable reduction in a fluorescence polarization (FP) signal window was observed for the protein after immobilization. While the pharmacology appeared to reproduce for immobilized protein, we could not confidently conclude that the protein functionality was preserved for the immobilized reagent.

Figure 1.

Overcoming ambiguity in target validation with BALI and active site titration. (A) Assay interferences from magnetic beads or immobilization effects lead to a reduced signal window for this polarization-based assay. While both data sets provide similar IC₅₀ values for the test compound (i.e., apparent pharmacology is preserved), it is unknown if the reduction in signal window results from a technical artifact (e.g., fluorescence attenuation) or an effect of immobilization on the protein. (B) Diagram of our BALI assay platform, which is analogous to the affinity selection process used in DEL screening. Here, protein and small-molecule binding reactions are allowed to equilibrate, after which the protein and bound ligand molecules are captured with magnetic affinity beads and separated from the unbound fraction. Finally, the bound ligand is eluted and both fractions (bound and unbound) are analyzed with an appropriate detector. (C) Representative data are shown from a BALI fluorescence experiment where a fluorescent tracer was titered in parallel binding experiments with a constant input receptor concentration of 750 nM. Clear inflection points were observed in the supernatant and eluate fraction and indicate the stoichiometric titration of binding sites within the sample. (D) Data are shown for an analogous experiment to C, this time with an unlabeled ligand and using a mass spectrometer as the detector. Following the BALI experiment, ligand concentrations in the bound and unbound fractions were quantified by LC-MS and revealed analogous results to the fluorescence assay.

To rectify this, we developed an affinity selection platform that we refer to as bead-assisted ligand isolation (BALI), which, when appropriately configured, can allow for the quantitative assessment of active site content via the titer of an immobilized receptor sample. The design of the experiment, outlined in Figure 1B , is analogous to the DEL selection approach, whereby captured ligand is detected from the sample after a separation from the bulk solution using an affinity matrix. For low-affinity ligands, this method provides a qualitative assessment of ligand capture, but when the receptor concentration exceeds the ligand K_D by a factor of approximately 10 or more, binding becomes stoichiometric and can be leveraged to quantitatively titer the active sites in the sample.

We conducted the BALI experiment with BRD4 employing the same fluorescent tracer used in the FP experiments such that the concentration of the captured tracer was determined by measuring the fluorescent intensity in a plate reader after elution from the beads. The results from the experiment are shown in Figure 1C as the fluorescent intensities of the supernatant (unbound) and eluate (bound) fractions of the binding reaction. In these data, a clear accumulation of tracer in the BRD4 eluate fraction was observed up to 500 nM, after which a further increase of the tracer dose did not lead to increased retention of the compound in the eluate. Similarly, analysis of the supernatant fraction demonstrated the concomitant absence of tracer in samples with a substoichiometric dose of the compound until the dose approached the equivalence point, after which further increases in tracer led to linear accumulation in this unbound fraction. Taken together, these data indicate the availability of a 500 nM equivalence of competent binding sites, which is in good agreement with the nominal value of 750 nM estimated from the reagent provider using the Bradford assay, a method that can possess considerable uncertainty when the test sample and reference do not have closely matched structures or relative amino acid composition.

We immediately recognized the elegance of this experimental design; the workflow was straightforward to execute in a microplate format without specialized liquid handling or washing procedures, and it eliminated weeks of laboratory work from the target validation process. We were curious to extend the method to other targets but were limited by the fact that each target would require a fluorescent probe, the design of which requires considerable a priori knowledge for fluorophore attachment, that is, known high-affinity ligands and a structural or SAR understanding of exit vectors. To address this, we sought to determine if MS could serve as a detection approach to characterize the capture of unlabeled ligands and eliminate the need to attach a fluorophore. We were delighted that we were able to repeat the experiment with excellent fidelity when we titrated BRD4 with unlabeled ligand and detected it via MS (Fig. 1D). We observed a biological response from the supernatant and eluate samples in both traces with curves that inflect at ~700 nM; these each biochemically indicated the concentration of active sites in the sample and agreed well with the result returned from the previous fluorometric analysis when corrected for a modest increase in BRD4 protein to a nominal concentration of 1000 nM in this follow-up study (from 750 nM). We were also pleased with the additional depth of insight gained from MS detection. During MS analysis, the instrument was calibrated to the analyte quantity with an external standard curve such that the returned data were in units of analyte concentration. From these conversions, we determined that the asymptote of the eluate curve lies at 700 nM analyte, which further corroborated our biochemical findings.

This MS detection-based technique has since become the foundation of our target validation procedure as it obviates assay development and uncertainty that arises from apparent fluctuations in activity. For targets that do not possess any chemical matter of sufficient affinity to enable these studies, qualitative assessments for weak-affinity ligands (K_D up to 10 µM) have been sufficient to demonstrate the ability of immobilized proteins to enrich such ligands and thus indicate a high probability of a successful screen. In the case of targets without small-molecule ligands, similar studies of equivalency for the capture of protein or nucleic acid binding partners have also sufficed to indicate a reagent’s suitability for progression to a DEL selection campaign.

On the Interexperiment Reproducibility of DEL Selection

When transferring plate-based assays between laboratories, it is a commonly accepted practice to repeat an experiment at a new site that has been conducted previously at the site of origin to demonstrate an equivalence of results between the two locations. We used a similar approach for the transfer of DEL selection capabilities from our partner’s lab in China to Pfizer’s U.S. lab (Groton, CT), and we sought to repeat a screen whose results had translated to confirmed off-DNA ligands. Anecdotally speaking, one would expect the straightforward binding assay that underlies DEL selection to reproduce with a reasonable degree of fidelity, and this has been described, in a limited capacity, for a panel of intraexperimental replicates.⁷⁶ However, this report lacked any description of data that pertained to a confirmed hit, and in the absence of a cubic plot to understand the relative change of the experimental fingerprint in that study, we were not able to extract a meaningful understanding of repeatability. As such, we were unable to benchmark expectation of experimental reproducibility and, for a lack of better metric, used the rediscovery of the line feature that provided the confirmed off-DNA hit as an indicator for success (Fig. 2A). Here, the feature that led to the off-DNA confirmed series is green to emphasize its identity from the other signals within the data set in blue. Two months after collection of this initial data set by the HitGen team, we repeated this selection campaign in China under supervision of HitGen scientists. This gave rise to the cubic plot shown in Figure 2B , which is colored in an identical manner. Here we observed excellent concordance between the experiments and demonstrated a successful rediscovery of the hit series. Scrutinizing this line feature, we observe a remarkably similar appearance with respect to signal diversity that indicates to us that the relative signal intensities for individual compounds along the line feature, and thus their apparent activity, are preserved. A better assessment of this aspect is apparent in the scatterplot comparing these two experiments (Fig. 2C). The data are colored consistently with the cubic plots, with data arising from the line feature colored green against a background of other data in blue. We see excellent linear correlation between the hit compounds (green data), demonstrating that the data are well rank ordered, while the background data present a much less correlative scatter (blue data). While it is provocative to postulate that this variation in signal intensity correlates to binding affinity such that one could develop an SAR for the chemical series, our experience has shown that there are many underlying sources that contribute to a compound’s signal from DEL selection (e.g., propensity to react productively as planned or form side products during library production), and such interpretations translate to unproductive hypotheses.

Figure 2.

Assessing the interexperimental reproducibility of DEL selection results. To assess and transfer DEL selection capabilities between sites, we repeated a DEL selection that provided a strong line feature that had given rise to small molecules that were confirmed as ligands to the target off DNA. In A, B, and D, the cubic plot for the initial DEL selection is diagrammed with the x, y, and z coordinates representing the identity of building blocks used at each bond-forming reaction of this three-cycle DEL. Data point size is proportional to the amplitude of degenerate sequencing reads for each compound. Data points corresponding to coordinates not on the line relating to the confirmed hit are blue, while those corresponding to the feature of the confirmed hit are green. (A) The cubic plot for the initial DEL selection campaign, conducted by HitGen scientists in China, which led to the discovery of off-DNA ligands for the target that arose from the signal of the green line feature. (B) Cubic plot of the results generated when a Pfizer scientist repeated the DEL selection in China under supervision of HitGen scientists. (C) Scatterplot comparing the correlation of signals from A and B, following the same color scheme. (D) Cubic plot of results generated at Pfizer Labs in Groton. (E) Scatterplot of the results from D versus those of the initial DEL screen (A). The plot reveals a linear relationship for the hit line feature (green) that is offset from unity ~1.8-fold, a result that is consistent with the greater sequencing depth of the follow-up experiment. (F) Scatterplot of results from D versus those of the intermediate screen in B. A correlation of signal for the hit feature (green) is apparent. Similar to E, the data are offset from unity toward the newer screen due to an increase in sequencing depth of the experiment.

Following this successful replication by Pfizer scientists, we repeated the same selection experiment a third time in the United States, 8 weeks after returning from China. The cubic plot for this Groton experiment is shown in Figure 2D , again demonstrating the robust enrichment of our hit feature (green line). Pleasingly, comparison of this run to the previous iterations from Chengdu revealed an overall recapitulation of the experiment with well-preserved correlation for the confirmed hit feature, albeit deviation from the unity line due to increased signal within the Groton experiment (Fig. 2E,F). This global effect on the data was anticipated from the sequencing depth, here defined as the ratio of NGS reads assigned to a sample relative to the total size of the selection eluate population used for the NGS sample preparation. Further analysis of these samples, in the form of NGS vendor evaluations where repeat sequencing runs of the same samples were performed at different sites, demonstrated the robustness of the NGS process, where deviations in total compound reads were observed to be well distributed throughout the sample population when sequencing depth exceeded 1, and these differences were completely resolved after removal of PCR duplicates using the unique molecular identifier contained within the DNA barcode (data not shown). Since we had previously found it challenging to confirm DEL selection-generated hit proposals through directly progressing to off-DNA synthesis and retesting, on review of these data, we were pleased with the quality and reproducibility of DEL selection methodology, and this bolstered our enthusiasm for the technology.

On Intraexperimental Variability and the Comparison of Between-Sample DEL Selection Variability

In the most common form of industrialized ELT, it is not uncommon for a collection of DELs to total many billions of compounds. It has been demonstrated by others that the productivity of DEL selection can be enhanced by combining multiple DELs of different origins together, and the affinity selection of putative ligands occurs from a single sample of the library pool. Rather than a single affinity selection step, these hit discovery campaigns are conducted as an initial selection followed by iterative reselections using the previous cycle’s output as input for the next experiment; the campaign is terminated when the total compound count of the selection eluate (determined by PCR-based quantitation of DNA) is sufficiently small to allow for adequate sampling by NGS.⁷⁶ We sought to investigate the intersample but intraexperimental reproducibility of DEL selection. This study was built into our initial DEL selection experiment in Groton, where we conducted three parallel selections using the same treatment (i.e., DEL pool dose, sample volume, and target dose).

The three selection campaigns were each composed of an initial round of affinity selection (using an input of 1.5 × 10¹⁵ total copies of DNA-encoded molecules), followed by a single cycle of reselection. The selection outputs ranged from 2.0 to 3.7 × 10⁹ total copies per sample. A quarter of each output sample was processed for sequencing, and the NGS run was allocated to each sample proportionally based on the total population ratios of the campaign eluates.

Given the interexperimental concordance we had observed in the previously described studies, we expected that the affinity selections would be reproducible between samples. Hence, we were intrigued by the nearly twofold intraexperimental variance between selection eluate populations of this study.

The relative portioning of the samples for NGS was conducted to balance the sequencing runs and generate approximately equal signal reads for hit compounds as rationalized in Figure 3A , where each selection eluate is represented as a circular diagram scaled relative to the total population size of the sample. In this model, we assume that the affinity selection of hits yields an approximately equal quantity of hit molecules for each sample, defined by the “true hits” inner circle, with varying amounts of background signal. The quarter portion processed for analysis is indicated by the darkly enhanced region. We consider the primary variable between samples to be the efficiency of washing during each round of selection that arises from a variable degree of compound carryover; this nonbinding fraction constitutes the background signal. If this is correct, then we expect that the magnitude of true hit read signals would be approximately equal between the three replicates. The outcome of this experiment for the individual DEL described in Figure 2 is summarized in the panel of scatterplots shown in Figure 3B-D. Here, the dashed orange line represents unity, and pleasingly, we saw strong agreement between the replicates when sequenced according to this strategy. Taken together, these data strongly support this model and indicate that the magnitude of DEL signal has dependence on the overall operation of the experiment. There has been considerable focus in the field on normalizing DEL signals to enable their comparison, and while useful for within-sample comparisons of N-synthon pharmacophores,^76–78 our observation indicates that any calculations of enrichment that involve a normalization to the total population size of a sample will thus transfer the variability from this background into the hit compound signals. Without a factor to correct for the transfer of this variability to the most interesting signals in the sample (i.e., hit signals), the effect will obfuscate any effort to quantify biological effects of differing treatments. Thus, a more involved normalization strategy will be required to correct for background as well as relative enrichment between samples.

Figure 3.

Intraexperimental repeats demonstrate a high degree of repeatability. Within the experiment that provided the data in Figure 2D, two additional intraexperimental repeat samples were included to probe the within-experiment but between-sample variability of DEL selection. (A) The sequencing strategy for this experiment is diagrammed as a series of colored circles, with the diameter representing the total population of the selection eluate. If DEL selection is a robust binding experiment with high reproducibility, then each of these samples will enrich the same population of true hits (inner circle, orange, representing compound identity and copy count), and the differences of total population size at the end of the experiment arise from the magnitude of the background signal. For NGS library preparation, one quarter of selection eluate from each sample was processed and sequenced. (B) Scatterplot of sample 2 versus sample 1 population. A strong correlation exists between these samples and is evidenced by their proximity to the line of unity (y = x). (C) Scatterplot of sample 3 versus sample 1 signal. A strong correlation exists between these samples and is evidenced by their proximity to the line of unity (y = x). (D) Scatterplot of sample 3 versus sample 2 population. A strong correlation exists between these samples and is evidenced by their proximity to the line of unity (y = x). (E) A Venn diagram represents the overlap of individual trisynthon compounds observed in the three experiments. In total, 258,204 compounds were common between all three replicates and contained 436/436 (100%) of compounds found within the hit feature described in Figure 2.

Finally, concerning this experiment, the frequency of compound appearance from this individual DEL in each of these three samples is shown in Figure 3E . Here, we see the common compounds between all three samples totaling 258,204 compounds, and excitingly this number includes all 436 compounds observed in the data set of 440 total possible compounds in the hit line feature. Conversely, very large portions of the individual samples were observed only once. This indicates to us a high reproducibility of DEL selection for true hits, and for efforts seeking to mine the lower discovery threshold of the technique, the inclusion of experimental replicates may considerably help to bolster confidence in weak signals.

Incorporation of Drug-Likeness into the Hit-Calling Process

In the context of small-molecule drug discovery, there are many physicochemical properties of a compound, aside from its inherent affinity for the target, that constitute its credentials of being a good starting point for the development of a clinical candidate. Attributes such as MW, lipophilicity, the number of aromatic rings (sp3 fraction), polar surface area, permeability, and clearance are all critical determinants that contribute the overall disposition of a chemotype. The speed at which hits can be translated into viable assets that can be used to adequately test therapeutic hypotheses in animal models of disease has a strong dependence on the position of this starting point. Consequently, the decision to begin optimizing an individual series is multifaceted and therefore affinity alone is unable to accurately assess the value of each series found within a hit discovery campaign.

While screening DELs and proposing hits from the resultant data using signal amplitude alone as a metric for value had proven to be a straightforward exercise, the physicochemical properties of resultant hits prioritized in this manner for initial DEL projects often made them disadvantaged versus those arising from other hit ID technologies that sourced chemical matter from our historical chemical collection. To improve this and enable data-driven decision-making, we have integrated multiparameter optimization (MPO) scoring, a concept often used to guide the medicinal chemistry optimization of leads (thoroughly described in Wager et al.⁷⁹). In applying this to DEL, we use an MPO score heavily biased by MW that also includes three additional physical chemical properties (calculated partition coefficient [cLogP], aromatic ring count, and hydrogen bond donor count) as a visualization and sorting tool to highlight compounds that have the best balance of signal and favorable properties. This allowed us to select chemical series and individual compounds with the best balance of drug-like properties for hit confirmation activities.

An example of how this is applied is shown in Figure 4 . Here, a line feature is shown for hits arising from a selection campaign with bromodomain 1 of the BRD4 protein. An MPO score is generated for the designed compound represented by the DNA tag to provide a drug-likeness metric where increasingly positive values correlate with better property values. In this analysis, the size of each data point in the cubic plot indicates the relative amplitude of signal for each compound, and color in a typical stoplight scheme from low (red) to high (green) is used to visualize the MPO score value. Taking in the entire feature (Fig. 4A), it is obvious that the largest signals correspond to compounds with orange and red MPO scores that represent poor combinations of drug-like properties. For demonstrative purposes, we applied a filter based on this MPO score that resulted in the reduced feature in Figure 4B . Here, while the overall amplitude of signal is reduced, the hits with good drug-likeness can be quickly identified. We find that the incorporation of molecular properties to DEL data analysis greatly enhances our ability to deliver more chemical matter to our teams that is received with greater enthusiasm and more likely to be taken up into their projects.

Figure 4.

Demonstrating the identification of hits with better physicochemical properties through the visualization and filtering of multiproperty optimization scores. Approaching a DEL data set, one’s initial intuition is to focus on compounds that generate the largest signal. While this can direct you to compounds that may possess the highest affinity or yield during library synthesis (be it the targeted product or side product), these compounds may not make the best starting points for development. We have implemented a strategy that colorizes the data points in a cubic plot based on an MPO score calculated from multiple physicochemical properties. (A) A line feature is shown that was generated from screening a three-cycle DEL library with bromodomain 1 of BRD4; the data are sized based on the number of degenerate reads observed for each compound and are colored by the value of their MPO score. (B) The MPO score was filtered to reveal hits with a better balance of physicochemical properties versus those that generated the largest counts in the screen.

Application of On-DNA Resynthesis and BALI-MS to Improve the Productivity of DEL Hit Confirmation

In this final section, we address our effort to resolve a poor conversion rate of NGS-identified hits to confirmed ligands off DNA, a result that further complicated the realization of value expected from the technology. An often underappreciated challenge in the application of DEL hit ID technology is the complexity of the chemistry that arises from the monomer diversity employed in the construction of industrial DEL collections. Even with high stringencies for productive rehearsal yields under library synthesis conditions, these preliminary tests with representative upstream intermediates do not capture the spectrum of reactivity that can be encountered during production of a DEL according to the split-and-pool paradigm. As such, the combination of incomplete yields at different steps as well as the propensities for side product formation and subsequent diversification in downstream reactions means that for any single given barcode, a multitude of chemical species can be present. Following that logic, the DNA-recorded synthesis of DELs is best thought of as a recipe for a mixture of compounds rather than a single, discrete chemical entity.⁸⁰ We refer to this combination of possible entities as the “hit tree.” Given this, it is quite possible that signal variability within a feature in a DEL selection data set could arise through affinity of the designed compound (a structure–activity result), or possibly the propensity to react through alternate pathways to produce high-affinity side products. In this latter case, variation in signal amplitude could relate to the designed library compound as a structure reactivity relationship and could be exemplified by cases such as a reduced reactivity of amines (primary > secondary), relative steric hinderance of reactive groups (e.g., α-monosubstituted vs α,α′-disubstituted carboxylic acids), electron donating versus withdrawing effects, and so forth.

With this in mind, we took keen note of an early description of the impact that on-DNA resynthesis had on the productivity of off-DNA hit confirmation⁸⁰ by removing an assumption of DEL warhead identity and, rather, critically assessing the complexity of compound mixtures actually made from the individual monomer combinations under the library synthesis conditions. We have now established a strategy for on-DNA hit resynthesis and testing shown in Figure 5 . To demonstrate this workflow, we returned to the study with BRD4 and examined compounds in this line feature with a good balance of signal and properties (Fig. 5A). This nominated compound 1 for further study. In this library design, the synthetic scheme includes first the acylation of DNA with an amino acid (blue), followed by pyrazine template incorporation (black) and aromatic substitution reactions with the second (magenta) and third (green) building blocks, and was expected to yield product 1. We conducted on-DNA resynthesis following the library preparation route of this compound with a minimal headpiece–DNA template (hp-DNA) and characterized the reaction products of each step by MS; the chemical structures were identified indirectly by analysis of the parent ion arising from each peak in the LC trace.

Figure 5.

Demonstrating the value of on-DNA resynthesis and BALI-MS analysis to DEL hit confirmation. (A) The feature from the screen described in Figure 4 is shown again to demonstrate the signal generated by compound 1, which was selected for follow-up. (B) Following on-DNA resynthesis on the 50 nmol scale using library procedures, chemical analysis via UV-HPLC revealed a mixture of products, none of which were the designed compound. Instead, the major product 2 (75% yield) represented a truncation that lacked the first cycle building block (blue) and contained a second addition of building block 3 (green) to the 2-chloropyridine of building block 2 (magenta). Following a similar route, side product 3 (10% yield) represents a compound containing building block 1 with the same double addition of building block 3. (C) The resynthesized on-DNA mixture generated in B was subjected to BALI-MS affinity selection, and the relative enrichment of compound 2 is shown for beads containing or lacking the target protein (beads + BRD4 and beads only, respectively). The data clearly indicate an enrichment of 2 in this experiment. (D) Analysis of the sample in C but focusing on compound 3 demonstrated a robust enrichment of this compound as well.

The UV-HPLC trace for the on-DNA resynthesis of 1 is shown in Figure 5B . Here we observed a mixture of compounds, of which none were the expected product 1. Instead, the major component represented direct conjugation of the pyrazine template to the hp-DNA, a result of failed incorporation of the first amino acid building block. While the incorporation of 5-aminomethyl-2-chloropyridine proceeded as expected in the second cycle of chemistry, the third bond-forming reaction, owing to the reactivity of the chloropyridine second building block, provided the bis-N-methyl-isoindolinone products 2 and 3 (Fig. 5B).

While telling with respect to which compounds could be attached to the DEL barcode and be present to bind to BRD4, the identity of the true ligand(s) responsible for the DEL selection campaign signal is not substantiated from resynthesis alone. In the original description,⁸⁰ cleavable linkers were employed during resynthesis as a means to liberate on-DNA synthesis products from the minimal hp-DNA handle, but the exact structure of the linkers used was not disclosed. During our initial investigation, we found the top candidates from established solid-phase synthesis approaches to be unsatisfactory for use because they either did not efficiently cleave or led to a multiplicity of products, the latter result being unsatisfactory as it further diversifies the product mixture. Acknowledging that our LC-MS/MS platform for analyzing on-DNA chemical products had the necessary sensitivity to quantify submicromolar concentrations of on-DNA products, we extended our BALI-MS experimental design to enable the identification of chemical structures within the on-DNA compound mixture that bind to the target when the compounds are interrogated with the same affinity selection conditions used in the DEL hit ID campaign. An example of the workflow that we established on this premise is shown in Figure 5C,D , which provides the results of the on-DNA testing of the compound mixture described in Figure 5B . We combined this compound mixture with an equimolar dose of internal standard hp-DNA and performed a selection experiment on this compound mixture using conditions that mimic the DEL discovery campaign (i.e., same target concentration to apply an equal “selective pressure”). After separating the beads from the binding reaction supernatant, we quantified the individual components of the reaction in the eluate via LC-MS and normalized the signals for 2 and 3 to those of the internal standard. This normalization process allows for a quantitative assessment of enrichment between individual compounds in the mixture as well as between sample treatments. Here compound 2 was enriched by 5-fold (Fig. 5C) and compound 3 was enriched by 33-fold (Fig. 5D) relative to their no-protein control that contained only affinity beads. These results substantiate that under biochemical conditions similar to the discovery campaign, each of these compounds would be enriched from the bulk library, and both were likely contributors to the overall signal observed in the line feature produced in the discovery campaign. This hit confirmation strategy has provided our project teams with a streamlined approach to identify hits in a desirable property space and to reassess with a robust go/no-go decision after swift resynthesis on DNA, enabling a confident commitment of resources to downstream off-DNA synthesis when appropriate.

Concluding Remarks

As hit ID technologies have evolved, affinity selection techniques including ASMS and DEL have become increasingly popular options for project teams due to the combined reduction of investment, the clarity of their results, and the speed with which they can interrogate large swaths of chemical space. Over the last 4 years, we have established a DEL hit ID platform that, through the incorporation of the above-described approaches, has begun to deliver value to our portfolio as a reliable source of actionable chemical matter. Through the assay transfer and repeatability studies detailed above, we became confident in the robustness of DEL selection as a binding assay, and we have learned to trust the data from multifaceted campaigns as a source of rich biological information that, when collected in an appropriate manner, can prioritize the most interesting compounds for initial follow-up. In our experience, under the circumstances of limited resources and an increasing pressure to deliver for first-in-class targets, not all hit ID approaches can be pursued in parallel for every project, and thus in this article we have outlined the aspects that we consider when designing a hit ID campaign. Ultimately, it is the most appropriate pairing of technology with project biology and chemistry that will efficiently deliver clinical assets that affect the lives of patients. This realization of positive impact on the experience of those suffering from disease drives us to continually evaluate and improve our hit ID toolbox in the hope that an enhanced ability to identify actionable chemical equity will deliver disease-modifying and curative medicines to those who need them in the not too distant future.

Footnotes

Acknowledgements

The authors wish to acknowledge Adam Gilbert, David Israel, Rob Stanton, Sylvie Sakata, and all members of the Pfizer/HitGen DEL team for their support.

Declaration of Conflicting Interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: T.L.F., W.B., M.E.F., B.K., J.I.M., A.S.R., H.Z., M.-C.P. are employed by Pfizer Inc.; their research and authorship of this article were completed within the scope of their employment with Pfizer Inc. Q.C. and X.L. are employed by HitGen Inc.; and their research and authorship of this article were completed within the scope of their employment with HitGen Inc.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Hongyao Zhu

References

Venter

J. C.

Adams

M. D.

Myers

E. W.

; et al. The Sequence of the Human Genome. Science 2001, 291, 1304–1351.

Petersen

D. N.

Hawkins

Ruangsiriluk

; et al. A Small-Molecule Anti-Secretagogue of PCSK9 Targets the 80S Ribosome to Inhibit PCSK9 Protein Translation. Cell Chem. Biol. 2016, 23, 1362–1371.

Gray

D. L.

Allen

J. A.

Mente

; et al. Impaired Beta-Arrestin Recruitment and Reduced Desensitization by Non-Catechol Agonists of the D1 Dopamine Receptor. Nat. Commun. 2018, 9, 674.

Cameron

K. O.

Kung

D. W.

Kalgutkar

A. S.

; et al. Discovery and Preclinical Characterization of 6-Chloro-5-[4-(1-Hydroxycyclobutyl)phenyl]-1H-Indole-3-Carbox ylic Acid (PF-06409577), a Direct Activator of Adenosine Monophosphate-Activated Protein Kinase (AMPK), for the Potential Treatment of Diabetic Nephropathy. J. Med. Chem. 2016, 59, 8068–8081.

Han

Zhou

Jiang

; et al. Discovery of RG7834: The First-in-Class Selective and Orally Available Small Molecule Hepatitis B Virus Expression Inhibitor with Novel Mechanism of Action. J Med Chem 2018, 61, 10619–10634.

Lancia

O’Connell

Doubling Down: Betting on the Success of HTS & DEL Libraries in Parallel. SLAS2020 International Conference and Exhibition, San Diego, CA, Jan 25–29, 2020.

DiMasi

J. A.

Grabowski

H. G.

Hansen

R. W.

Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs. J. Health Econ. 2016, 47, 20–33.

Dowden

Munro

Trends in Clinical Success Rates and Therapeutic Focus. Nat. Rev. Drug Discov. 2019, 18, 495–496.

Harrison

R. K.

Phase II and Phase III Failures: 2013–2015. Nat. Rev. Drug Discov. 2016, 15, 817–818.

10.

Hewitt

Campbell

J. D.

Cacciotti

Beyond the Shadow of a Drought the Need for a New Mindset in Pharma R&D. https://www.oliverwyman.com/content/dam/oliver-wyman/global/en/files/insights/health-life-sciences/OW_EN_HLS_PUBL_2011_Beyond_the_Shadow_of_a_Drought(3).pdf (accessed Nov 27, 2020).

11.

Morgan

Brown

D. G.

Lennard

; et al. Impact of a Five-Dimensional Framework on R&D Productivity at AstraZeneca. Nat. Rev. Drug Discov. 2018, 17, 167–181.

12.

Munos

Lessons from 60 Years of Pharmaceutical Innovation. Nat. Rev. Drug Discov. 2009, 8, 959–968.

13.

Paul

S. M.

Mytelka

D. S.

Dunwiddie

C. T.

; et al. How to Improve R&D Productivity: The Pharmaceutical Industry’s Grand Challenge. Nat. Rev. Drug Discov. 2010, 9, 203–214.

14.

Peakman

M-C.

Troutman

Gonzales

; et al. Experimental Screening Strategies to Reduce Attrition Risk. In Attrition in the Pharmaceutical Industry: Reasons, Implications and Pathways Forward; Alex

Harris

C. J.

Smith

D. A.

, Eds.; John Wiley & Sons: Hoboken, NJ, 2014; pp 180–214.

15.

Scannell

J. W.

Blanckley

Boldon

; et al. Diagnosing the Decline in Pharmaceutical R&D Efficiency. Nat. Rev. Drug Discov. 2012, 11, 191–200.

16.

Berg

Hallowell

Tibbetts

; et al. High-Throughput Surface Liquid Absorption and Secretion Assays to Identify F508del CFTR Correctors Using Patient Primary Airway Epithelial Cultures. SLAS Discov. 2019, 24, 724–737.

17.

Young

Margaron

Fernandes

; et al. MyoScreen, a High-Throughput Phenotypic Screening Platform Enabling Muscle Drug Discovery. SLAS Discov. 2018, 23, 790–806.

18.

Eder

Sedrani

Wiesmann

The Discovery of First-in-Class Drugs: Origins and Evolution. Nat. Rev. Drug Discov. 2014, 13, 577–587.

19.

Moffat

J. G.

Vincent

Lee

J. A.

; et al. Opportunities and Challenges in Phenotypic Drug Discovery: An Industry Perspective. Nat. Rev. Drug Discov. 2017, 16, 531–543.

20.

Swinney

D. C.

Anthony

How Were New Medicines Discovered?

Nat. Rev. Drug Discov. 2011, 10, 507–519.

21.

Vincent

Loria

Pregel

; et al. Developing Predictive Assays: The Phenotypic Screening “Rule of 3.” Sci. Transl. Med. 2015, 7, 293ps15.

22.

Vincent

Loria

P. M.

Weston

A. D.

; et al. Hit Triage and Validation in Phenotypic Screening: Considerations and Strategies. Cell Chem. Biol. 2020, 27, 1332–1346.

23.

Kaur

McGuire

Tang

; et al. Affinity Selection and Mass Spectrometry-Based Strategies to Identify Lead Compounds in Combinatorial Libraries. J Protein Chem 1997, 16, 505–511.

24.

Muckenschnabel

Falchetto

Mayr

L. M.

; et al. SpeedScreen: Label-Free Liquid Chromatography-Mass Spectrometry-Based High-Throughput Screening for the Discovery of Orphan Protein Ligands. Anal. Biochem. 2004, 324, 241–249.

25.

O’Connell

T. N.

Ramsay

Rieth

S. F.

; et al. Solution-Based Indirect Affinity Selection Mass Spectrometry—A General Tool for High-Throughput Screening of Pharmaceutical Compound Libraries. Anal. Chem. 2014, 86, 7413–7420.

26.

Annis

D. A.

Nickbarg

Yang

; et al. Affinity Selection-Mass Spectrometry Screening Techniques for Small Molecule Drug Discovery. Curr. Opin. Chem. Biol. 2007, 11, 518–526.

27.

Annis

D. A.

Athanasopoulos

Curran

P. J.

; et al. Affinity Selection–Mass Spectrometry Method for the Identification of Small Molecule Ligands from Self-Encoded Combinatorial Libraries: Discovery of a Novel Antagonist of E. coli Dihydrofolate Reductase. Int. J. Mass Spectrom. 2004, 238, 77–83.

28.

Hurzy

D. M.

Henze

D. A.

Cabalu

T. D.

; et al. Design, Synthesis and SAR of Substituted Indoles as Selective TrkA Inhibitors. Bioorg. Med. Chem. Lett. 2017, 27, 2695–2701.

29.

Arico-Muendel

C. C.

From Haystack to Needle: Finding Value with DNA Encoded Library Technology at GSK. Medchemcomm 2016, 7, 1898–1909.

30.

Andrews

C. L.

Ziebell

M. R.

Nickbarg

; et al. Mass Spectrometry-Based Screening and Characterization of Protein–Ligand Complexes in Drug Discovery. In Protein and Peptide Mass Spectrometry in Drug Discovery; Gross

M. L.

Chen

Pramanik

, Eds.; Wiley: Hoboken, NJ, 2011; pp 253–286.

31.

Bai

Zhou

; et al. A Potent and Selective Small-Molecule Degrader of STAT3 Achieves Complete Tumor Regression In Vivo. Cancer Cell 2019, 36, 498–511.e17.

32.

Burslem

G. M.

Crews

C. M.

Proteolysis-Targeting Chimeras as Therapeutics and Tools for Biological Discovery. Cell 2020, 181, 102–114.

33.

Gao

Sun

Rao

PROTAC Technology: Opportunities and Challenges. ACS Med. Chem. Lett. 2020, 11, 237–240.

34.

Rietz

Quist

K. M.

; et al. Discovery of a Small Molecule Probe That Post-Translationally Stabilizes the Survival Motor Neuron Protein for the Treatment of Spinal Muscular Atrophy. J. Med. Chem. 2017, 60, 4594–4610.

35.

Costales

M. G.

Childs-Disney

J. L.

Haniff

H. S.

; et al. How We Think about Targeting RNA with Small Molecules. J. Med. Chem. 2020, 63, 8880–8900.

36.

Rizvi

N. F.

Santa Maria

J. P.

Jr. Nahvi

; et al. Targeting RNA with Small Molecules: Identification of Selective, RNA-Binding Small Molecules Occupying Drug-Like Chemical Space. SLAS Discov. 2020, 25, 384–396.

37.

Superti-Furga

Lackner

Wiedmer

; et al. The RESOLUTE Consortium: Unlocking SLC Transporters for Drug Discovery. Nat. Rev. Drug Discov. 2020, 19, 429–430.

38.

Leveridge

Buxton

Argyrou

; et al. Demonstrating Enhanced Throughput of RapidFire Mass Spectrometry through Multiplexing Using the JmjD2d Demethylase as a Model System. J Biomol. Screen. 2014, 19, 278–286.

39.

Smith

Chase

Niswender

C. M.

; et al. Application of Parallel Multiparametric Cell-Based FLIPR Detection Assays for the Identification of Modulators of the Muscarinic Acetylcholine Receptor 4 (M4). J. Biomol. Screen. 2015, 20, 858–868.

40.

Unterreiner

Gabriel

When High Content Screening Meets High Throughput. https://www.ddw-online.com/screening/p149227-when-high-content-screening-meets-high-throughput.html (accessed Nov 27, 2020).

41.

Flanagan

M. E.

Blumenkopf

T. A.

Brissette

W. H.

; et al. Discovery of CP-690,550: A Potent and Selective Janus Kinase (JAK) Inhibitor for the Treatment of Autoimmune Diseases and Organ Transplant Rejection. J. Med. Chem. 2010, 53, 8468–8484.

42.

Chang

Ruggeri

R. B.

Harwood

H. J.

Jr.

Microsomal Triglyceride Transfer Protein (MTP) Inhibitors: Discovery of Clinically Active Inhibitors Using High-Throughput Screening and Parallel Synthesis Paradigms. Curr. Opin. Drug Discov. Dev. 2002, 5, 562–570.

43.

Dorr

Westby

Dobbs

; et al. Maraviroc (UK-427,857), a Potent, Orally Bioavailable, and Selective Small-Molecule Inhibitor of Chemokine Receptor CCR5 with Broad-Spectrum Anti-Human Immunodeficiency Virus Type 1 Activity. Antimicrob. Agents Chemother. 2005, 49, 4721–4732.

44.

Selness

S. R.

Devraj

R. V.

Devadas

; et al. Discovery of PH-797804, a Highly Selective and Potent Inhibitor of p38 MAP Kinase. Bioorg. Med. Chem. Lett. 2011, 21, 4066–4071.

45.

Helal

C. J.

Arnold

Boyden

; et al. Identification of a Potent, Highly Selective, and Brain Penetrant Phosphodiesterase 2A Inhibitor Clinical Candidate. J. Med. Chem. 2018, 61, 1001–1018.

46.

Singh

Tam

Akabayov

NMR-Fragment Based Virtual Screening: A Brief Overview. Molecules 2018, 23, 233.

47.

Efremov

I. V.

Vajdos

F. F.

Borzilleri

K. A.

; et al. Discovery and Optimization of a Novel Spiropyrrolidine Inhibitor of Beta-Secretase (BACE1) through Fragment-Based Drug Design. J. Med. Chem. 2012, 55, 9069–9088.

48.

Huard

Ahn

Amor

; et al. Discovery of Fragment-Derived Small Molecules for In Vivo Inhibition of Ketohexokinase (KHK). J. Med. Chem. 2017, 60, 7835–7849.

49.

Clark

M. A.

Acharya

R. A.

Arico-Muendel

C. C.

; et al. Design, Synthesis and Selection of DNA-Encoded Small-Molecule Libraries. Nat. Chem. Biol. 2009, 5, 647–654.

50.

Ratnayake

A. S.

Flanagan

M. E.

Foley

T. L.

; et al. A Solution Phase Platform to Characterize Chemical Reaction Compatibility with DNA-Encoded Chemical Library Synthesis. ACS Comb. Sci. 2019, 21, 650–655.

51.

Leveridge

Chung

C. W.

Gross

J. W.

; et al. Integration of Lead Discovery Tactics and the Evolution of the Lead Discovery Toolbox. SLAS Discov. 2018, 23, 881–897.

52.

Kung

P. P.

Bingham

Burke

B. J.

; et al. Characterization of Specific N-Alpha-Acetyltransferase 50 (Naa50) Inhibitors Identified Using a DNA Encoded Library. ACS Med. Chem. Lett. 2020, 11, 1175–1184.

53.

Chen

Cheng

Zhang

; et al. Exploring the Lower Limit of Individual DNA-Encoded Library Molecules in Selection. SLAS Discov. 2020, 25, 523–529.

54.

Shin

Williams

C. M. M.

; et al. Design and Chemoproteomic Functional Characterization of a Chemical Probe Targeted to Bromodomains of BET Family Proteins. Medchemcomm 2014, 5, 1871–1878.

55.

Filippakopoulos

Picaud

; et al. Selective Inhibition of BET Bromodomains. Nature 2010, 468, 1067–1073.

56.

Brenner

Lerner

R. A.

Encoded Combinatorial Chemistry. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 5381–5383.

57.

Buller

Mannocci

Zhang

; et al. Design and Synthesis of a Novel DNA-Encoded Chemical Library Using Diels-Alder Cycloadditions. Bioorg. Med. Chem. Lett. 2008, 18, 5926–5931.

58.

Mannocci

Zhang

Scheuermann

; et al. High-Throughput Sequencing Allows the Identification of Binding Molecules Isolated from DNA-Encoded Chemical Libraries. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 17670–17675.

59.

Buller

Zhang

Scheuermann

; et al. Discovery of TNF Inhibitors from a DNA-Encoded Chemical Library Based on Diels-Alder Cycloaddition. Chem. Biol. 2009, 16, 1075–1086.

60.

Chan

A. I.

McGregor

L. M.

Liu

D. R.

Novel Selection Methods for DNA-Encoded Chemical Libraries. Curr. Opin. Chem. Biol. 2015, 26, 55–61.

61.

Salamon

Klika Skopic

Jung

; et al. Chemical Biology Probes from Advanced DNA-Encoded Libraries. ACS Chem. Biol. 2016, 11, 296–307.

62.

Morris

Ensemble Discovery Initiates Collaboration with Pfizer to Develop Novel Drugs against Protein-Protein Interaction Targets. Jan 6, 2010. https://www.fiercebiotech.com/biotech/ensemble-discovery-initiates-collaboration-pfizer-to-develop-novel-drugs-against-protein (accessed Nov 27, 2020).

63.

Ezzeddine

Suda

M. L.

X-Chem Enters into Multi-Target Collaboration with Pfizer Inc. 2014. http://www.x-chemrx.com/x-chem-enters-into-multi-target-collaboration-with-pfizer-inc/ (accessed Nov 27, 2020).

64.

Morgan

HitGen and Pfizer Enter Research Collaboration and License Agreement to Build and Screen Novel DNA-Encoded Libraries. April 17, 2017. https://www.hitgen.com/enxiandao/index.php?s=/Home/Article/detail/id/410.html (accessed Nov 27, 2020).

65.

Richter

Satz

A. L.

Bedoucha

; et al. DNA-Encoded Library-Derived DDR1 Inhibitor Prevents Fibrosis and Renal Function Loss in a Genetic Mouse Model of Alport Syndrome. ACS Chem. Biol. 2019, 14, 37–49.

66.

Belyanskaya

S. L.

Ding

Callahan

J. F.

; et al. Discovering Drugs with DNA-Encoded Library Technology: From Concept to Clinic with an Inhibitor of Soluble Epoxide Hydrolase. Chembiochem 2017, 18, 837–842.

67.

Franch

Simonsen

H. D.

Nuevolution Technology Progress: Nuevolution Scales Its Compound Collection to 40 Trillion Using its Chemetics™ Drug Discovery Platform. Feb 14, 2017. https://www.prnewswire.com/news-releases/nuevolution-technology-progress-nuevolution-scales-its-compound-collection-to-40-trillion-using-its-chemetics-drug-discovery-platform-300406931.html (accessed Nov 27, 2020).

68.

Kolmel

D. K.

Loach

R. P.

Knauber

; et al. Employing Photoredox Catalysis for DNA-Encoded Chemistry: Decarboxylative Alkylation of Alpha-Amino Acids. ChemMedChem 2018, 13, 2159–2165.

69.

Kolmel

D. K.

Meng

Tsai

M. H.

; et al. On-DNA Decarboxylative Arylation: Merging Photoredox with Nickel Catalysis in Water. ACS Comb. Sci. 2019, 21, 588–597.

70.

Kolmel

D. K.

Ratnayake

A. S.

Flanagan

M. E.

; et al. Photocatalytic [2 + 2] Cycloaddition in DNA-Encoded Chemistry. Org. Lett. 2020, 22, 2908–2913.

71.

Ahn

Kahsai

A. W.

Pani

; et al. Allosteric “Beta-Blocker” Isolated from a DNA-Encoded Small Molecule Library. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, 1708–1713.

72.

Brown

D. G.

Brown

G. A.

Centrella

; et al. Agonists and Antagonists of Protease-Activated Receptor 2 Discovered within a DNA-Encoded Chemical Library Using Mutational Stabilization of the Target. SLAS Discov. 2018, 23, 429–436.

73.

Graybill

T. L.

Zeng

; et al. Cell-Based Selection Expands the Utility of DNA-Encoded Small-Molecule Library Technology to Cell Surface Drug Targets: Identification of Novel Antagonists of the NK3 Tachykinin Receptor. ACS Comb. Sci. 2015, 17, 722–731.

74.

Zhu

Flanagan

M. E.

Stanton

R. V.

Designing DNA Encoded Libraries of Diverse Products in a Focused Property Space. J. Chem. Inf. Model. 2019, 59, 4645–4653.

75.

Machutta

C. A.

Kollmann

C. S.

Lind

K. E.

; et al. Author Correction: Prioritizing Multiple Therapeutic Targets in Parallel Using Automated DNA-Encoded Library Screening. Nat. Commun. 2018, 9, 16227.

76.

Kuai

O’Keeffe

Arico-Muendel

Randomness in DNA Encoded Library Selection Data Can Be Modeled for More Reliable Enrichment Calculation. SLAS Discov. 2018, 23, 405–416.

77.

Faver

J. C.

Riehle

Lancia

D. R.

Jr. ; et al. Quantitative Comparison of Enrichment from DNA-Encoded Chemical Library Selections. ACS Comb. Sci. 2019, 21, 75–82.

78.

McCarthy

K. A.

Franklin

G. J.

Lancia

D. R.

Jr. ; et al. The Impact of Variable Selection Coverage on Detection of Ligands from a DNA-Encoded Library Screen. SLAS Discov. 2020, 25, 515–522.

79.

Wager

T. T.

Hou

Verhoest

P. R.

; et al. Moving Beyond Rules: The Development of a Central Nervous System Multiparameter Optimization (CNS MPO) Approach to Enable Alignment of Druglike Properties. ACS Chem. Neurosci. 2010, 1, 435–449.

80.

Franklin

Bai

Fan

; et al. High-Throughput Binder Confirmation (HTBC): A New Non-Combinatorial Synthesis Platform Created to Enhance and Accelerate Hit ID. SLAS2018 International Conference and Exhibition, San Diego, CA, Feb 3–7, 2018.