Exploring the Lower Limit of Individual DNA-Encoded Library Molecules in Selection

Abstract

DNA-encoded library (DEL) technology has been used as an ultra-high-throughput screening approach for hit identification of drug targets. This process is an affinity-based selection and requires incubation of DEL molecules with the target. Currently, in most reported cases, the input (i.e., the copy number) of individual DEL molecules varies from 10⁵ to 10⁷. With the ever-increasing DEL size and screening cost, lowering the input of DEL molecules while maintaining an appropriate signal-to-noise ratio in a selection is of paramount importance. In this article, we varied the input of DEL ranging from 10³ to 10⁵ in selections with two different protein targets to explore the lower limit of DEL molecule input. The results could facilitate the optimization of the DEL selection process and reduce costs related to library consumption.

Keywords

DNA-encoded library DEL screening selection copy number of individual molecules

Introduction

DNA-encoded library (DEL) technology, inspired by and originated from the phage display library, is considered a powerful chemical tool, especially for its potential in small-molecule drug discovery. Typical DELs consist of mixtures of small molecules conjugated to unique DNA oligo tags, and structural information of each small molecule is encoded into its corresponding DNA sequence.^1–3 When DELs are applied against proteins of pharmaceutical interest in a selection, the library members with higher affinity are enriched after multiple rounds of wash, and their structural identities are revealed by subsequent DNA sequencing.^4–6 The DEL selection process possesses great advantages in terms of speed, cost, and scale when compared with traditional high-throughput screening. To date, multiple pharmaceutical companies such as GSK, Roche, Amgen, and so forth are actively involved in DEL construction, selection service, or drug discovery activities based on the DEL platform. Compounds discovered by DEL technology have already progressed to the clinical phase, such as receptor-interacting serine/threonine-protein kinase 1 (RIPK1) inhibitor GSK2982772⁷ and soluble epoxide hydrolase inhibitor GSK2256294.⁸

As DEL technology has gradually matured over the past decades, the size of individual DEL(s) has been constantly increasing, reaching trillions as claimed in literature⁹ or by the DEL service provider.¹⁰ To deal with this large number of DNA-encoded small molecules, appropriate parameters of the chosen analytical methodology (usually based on DNA sequencing) are essential for successful DEL selection as well as accurate data analysis. It was observed that the enrichment signal from the DEL selection decreased when the input of individual DEL molecules decreased,¹¹ yet the threshold where the enrichment signal could be indistinguishable with background noise was not rigorously examined. Previously, it was empirically suggested that each DEL member molecule should have at least 10⁵ copies in the library for one selection.^12,13 In Table 1 , we summarize several DEL selection practices from the literature in terms of their library input.

Table 1.

Comparison of Molecule Copy Numbers in DNA-Encoded Library (DEL) Selection Practices.

Organization/Year/Source	DEL Size	Library Input	Copy Number of Individual Molecules
GSK/2012¹⁴	4.05 × 10⁹	5 nmol (3.01 × 10¹⁵)	7.4 × 10⁵
GSK/2014¹⁵	4.1 × 10⁹	5 nmol (3.01 × 10¹⁵)	7.3 × 10⁵
GSK/2015¹⁶	4.1 × 10⁷	1.5 × 10¹⁵	3.6 × 10⁷
GSK/2015¹⁷	8.02 × 10⁸	5 nmol (3.01 × 10¹⁵)	3.8 × 10⁶
GSK/2017⁹	2.4 × 10¹²	2.5 nmol (1.5 × 10¹⁵)	100 copies as indicated in article, 627 as calculated
Duke/2017¹⁸	1.9 × 10⁸	5 × 10¹³	2.6 × 10⁵
ETH Zurich/2017¹¹	3.54 × 10⁷ (35, 393, 112)	6.9 × 10¹²	1.9 × 10⁵
GSK/2017⁸	8.02 × 10⁸ (802, 160, 640)	10¹⁵	1.25 × 106

These practices, however, apply certain restrictions to the DEL selection process. For instance, to ensure this level of library input, one would have to increase the amount of DEL used in one selection as the number of library members increases. At a certain level, an increase in the amount of DELs will face technical and economic problems, including solubility, viscosity, or cost issues. Also, considering the sophisticated and expensive preparation process required for DELs, it would be economically significant if one could lower the input of a DEL for individual selection without affecting its performance (i.e., the ability to determine the enrichment signal in a statistically significant manner). Therefore, determination of the minimally required molecule copy number in individual DEL selection under given conditions would allow researchers to have significantly higher efficiency when using DEL to discover new chemical entities, especially those with pharmaceutical interests. It could also have a positive impact on DEL selection practice and guide the direction of optimization for future DEL screening experiments.

In this article, we describe the selection of two different soluble protein targets with various inputs of individual DEL molecules ranging from 10³ to 10⁵ to explore the lower limit of individual DEL molecules in selection and thus to facilitate the optimization of DEL selection.

Materials and Methods

Materials

Rho-associated protein kinase 2 (ROCK2, 19-417) was expressed in insect cells. Constructs were subcloned into pFastBacHTA vector, and the bac-to-bac system was used to produce baculoviruses to infect sf9 cells. It was purified by Ni-NTA affinity column chromatography (GE Healthcare, Chicago, IL), followed by size exclusion chromatography (HiLoad 16/600 Superdex 200 pg, GE Healthcare). The final purified protein was formulated in 20 mM Tris-HCl, pH 8.0, 150 mM NaCl, 20% glycerol, and 1 mM DTT and then stored at −80 °C. SAE is a heterodimer of SAE1 (SUMO-activating enzyme subunit 1) and SAE2 (SUMO-activating enzyme subunit 2); SAE1 (1-346) was cloned into pET-22b to encode a native polypeptide, and SAE2 (1-640) was cloned into pET-28a to encode an N-hexahistidine fusion protein. The constructs were coexpressed in Escherichia coli Rosetta (DE3) and purified by Ni-NTA affinity chromatography (GE Healthcare) followed by gel filtration (HiLoad 16/600 Superdex 200 pg, GE Healthcare). The protein was concentrated to 10 to 15 mg/mL in 50 mM Tris-HCl, pH 7.5, 350 mM NaCl, 1 mM DTT before freezing in liquid nitrogen and storage at −80 °C. DEL (with 167 billion compounds) was from HitGen. Ni-charged MagBeads were purchased from GenScript (Piscataway, NJ; L00295), salmon sperm DNA was from Sigma (St. Louis, MO; 31149), and all other reagents were from commercial sources.

DEL Selection

Automated selection was carried out using a KingFisher Duo Prime Purification System (ThermoFisher, Waltham, MA) in a 96-well plate.¹⁹ After Ni-charged MagBeads were equilibrated with selection buffer (40 mM Tris, 20 mM MgCl₂, 50 µM DTT, 0.1% Tween-20, 0.3 mg/mL salmon sperm DNA, 10 mM imidazole, pH 7.5), beads were transferred to a new row and incubated with 3 µM ROCK2 at room temperature (RT) for 30 min in selection buffer. Then, the immobilized protein along with the beads was incubated with DEL sample in selection buffer at RT for 1 h followed by a 1 min wash at RT in 100 µL selection buffer for nine times. Retained DEL members were recovered by heat elution in elution buffer (40 mM Tris, 20 mM MgCl₂, pH 7.5) at 75 °C for 15 min. After the first round of selection, the second round was repeated with the heat-eluted portion of the previous round used as the input to the successive round with fresh protein. After each round, the output was quantified by quantitative PCR. After two rounds, the selection was done, and the output was amplified by PCR and further purified by QIAquick PCR Purification Kit (QIAGEN, Hilden, Germany; 28006) according to the manufacturer’s instructions. Sequencing was then performed for PCR-amplified samples on an Illumina HiSeq 2500. SAE selection was performed in a similar condition in solution mode, in which 2.5 µM free protein was incubated with DEL and then captured by Ni-charged MagBeads in selection buffer (50 mM Tris, 5 mM MgCl₂, 0.2 mM DTT, 0.01% Tween-20, 0.3 mg/mL ssDNA, 10 mM imidazole, pH 7.5).

Data Analysis

Samples were decoded, and the result was presented in DataWarrior (developed by Openmolecules) cubes, with each dimension representing one cycle of DEL construction. The enrichment of certain cycle(s) of DEL construction was defined as “features,” and the difference between the groups was compared by feature intensity. Feature intensity is the enrichment score calculated as the “sum of sequence count” of the feature divided by the average of the “sum of sequence count” of all possible parallel features in the library. Background subtraction was performed by removing signals that appeared in no target control in the selection performed previously, enriched features were further examined by structures, and imidazole-like binders were excluded; promiscuous features were also highlighted and cross-checked by comparing with other selections in HitGen database.

Results and Discussion

The experiment design in this study is outlined in Table 2 . The HitGen library, containing 167 billion DEL compounds, was used for the selection of two different soluble protein targets, ROCK2 and SAE. ROCK2 is a serine/threonine-protein kinase involved in the regulation of the cytoskeleton and cell polarity.²⁰ SAE functions as E1 in the SUMOylation cascade and mediates the adenosine triphosphate–dependent SUMO ligation.²¹ The input of each DEL molecules varied among 10⁵, 3.3 × 10⁴, 10⁴, and 10³. After two rounds of selection, the DEL molecules recovered were PCR amplified and sent for sequencing. After decoding and data analysis, the DNA sequences enriched in selection were translated into chemical structures and visualized by DataWarrior files. Each axis of the cube represented one cycle of the DEL synthesis; for four-cycle DELs, one of the cycles was prefixed in the cube. We use “feature” to represent the clusters enriched after selection, and the “line feature” indicated that two cycles of the DEL were fixed and 1 was variable, whereas the “plane feature” indicated that one cycle was fixed and the other two were variable.

Table 2.

Experiment Design in DNA-Encoded Library Selection.

Sample	Target	Copy Number of Individual Molecules
1-1	ROCK2	10⁵
1-2	ROCK2	3.3 × 10⁴
1-3	ROCK2	10⁴
1-4	ROCK2	10³
2-1	SAE	10⁵
2-2	SAE	3.3 × 10⁴
2-3	SAE	10⁴
2-4	SAE	10³

As the selection of these two targets has been done previously at HitGen and validated active hits have been found from the HitGen library, our data analysis mainly focused on the identification of features representing the validated active hits.

For ROCK2, the feature intensity of one strong feature ( Fig. 1A ), in which a couple of active hits with IC₅₀ ranging from 50 nM to 860 nM in enzymatic assays were identified, decreased with the decrease of input, as indicated by enrichment scores of 658.55, 548.49, 255.2, and 7.31, respectively. This feature was also reproducible in the lowest input 10³ in our design. For one weak feature ( Fig. 1B ) in which the active hits with an IC₅₀ of about 7 µM were identified, we were able to identify this feature with 10⁴ input but failed to observe the feature with 10³ input; the feature intensity was 546.74, 79.56, 71.38, and 4.17 for the four groups, respectively. Further analysis of these features revealed their corresponding molecules as typical kinase hinge binders. Direct comparison of feature intensity supported the conclusion that strong features were reproducible in the 10³ input whereas weaker features were not. All validated features that showed up in the 10⁵ input group were reproducible in the 3.3 × 10⁴ and 10⁴ groups, indicating that the input could be decreased to 10⁴ in selection without losing significant enrichment signals.

Figure 1.

DataWarrior cube view of active features for ROCK2 and SAE target. The feature represented the binding of DNA-encoded library (DEL) compound with tens to hundreds nanomolar (A) or micromolar (B) potency for ROCK2. (C) The feature represented the binding of DEL compounds with micromolar potency for SAE. (D) A strong feature indicated the binding of compounds to SAE; the binding affinity of compounds in this feature was not determined.

For SAE, after data analysis, one plane feature in which active hits with an IC₅₀ of about 1 to 9 µM on enzymatic activity was identified, was readily reproducible, and feature intensity decreased with the decrease of input. When the input was lowered to 10⁴, the feature became very weak, and we failed to observe this particular feature at 10³ ( Fig. 1C ). The feature intensity of each group was 58.13, 20.12, 1.93, and 0.00. Another strong feature (Fig. 1D) was observed and exhibited the same trend with feature intensity of 241.5, 101.32, 21.99, and 0.00, respectively.

Further analysis of all other features from these 2 selections described above indicated that the top 100 features in 10⁵ sample were reproducible and well correlated in 3.3 × 10⁴ and 10⁴ samples for ROCK2. However, there were seven features that did not show up in the 10³ sample and 49 features with enrichment score below 4, which was very weak and gave little information regarding chemical diversity ( Fig. 2A ). With fewer and weaker features identified for SAE, the top 50 features were analyzed, and good correlation was identified in 10⁵ and 3.3 × 10⁴ samples with five features’ intensity below 4 in the 3.3 × 10⁴ sample. In the 10⁴ sample, 5 of the 50 features exhibited a feature intensity greater than 4, and 23 features were not identified. In the 10³ input sample, no correlation was found, and most features disappeared in this sample ( Fig. 2B ).

Figure 2.

Feature intensity correlation. The top 100 features in ROCK2 (A) and the top 50 features in SAE (B) were compared and analyzed.

In our design, SAE was chosen as it represented the targets with relatively weak features, whereas ROCK2 represented the targets with relatively strong features from past DEL screening practices at HitGen. The selection of two targets was performed in different modes typically used in DEL selection: immobilization mode and in-solution mode. For ROCK2, it was done in immobilization mode; that is, the target protein was immobilized on Ni-charged MagBeads before incubation with the DEL molecules. For SAE, free protein was incubated with the DEL molecules, then the protein along with the DEL binders was immobilized by the Ni-charged MagBeads. The two targets represented two different selection modes with strong or weak features in selection, to draw a more generalized conclusion on DEL selection.

Reduction of input may have an impact on the discrimination of hits and promiscuous binders. We examined the promiscuous binders in our database and found their overall feature intensity was lower than the intensity of real features, the reduction of input resulted in a decrease of both signals; however, with the lower intensity of the promiscuous binders, the decrease was faster than those of the real features. For targets in which the promiscuous binders are stronger than real features, the trend may be different, but other cases are required to prove this.

In our test, we failed to identify some positive binders with 10³ copy of DEL molecules. In the published literature,⁹ hundreds of copy numbers for the DEL molecules were used as the input in selection, active signals were enriched, and binding of compounds was confirmed. The major reason for this difference is that the DEL was a peptide library, and most active hits exhibited more than 50% binding to the target protein. Peptides may have higher binding affinity than small molecules from the HitGen library; in addition, only compounds with very high binding potency can be identified in such a low-input design.

It has been reported that selection performance dramatically dropped when the input was below 10⁵, and at least a 10⁵ (and preferably 10⁶) input should be used to ensure successful DEL screening.¹³ In this article, the selection was performed with carbonic anhydrase IX as the target, and different ligands with known affinity were conjugated with DNA tags and screened along with a DEL of 360,000 compounds to serve as positive controls. The major difference between the conclusions in the previous study and our study is that the ligands were spiked in and identified as a single molecule in the previous article; however, in large-scale DEL selection, such as in our case (more than 100 billion compounds), the library is constructed by the “spilt and pool” strategy, in which the structures vary by the dimension of building blocks. In our study, the reproducibility of signals among different inputs was not identified as an individual compound but as features—the enrichment of a cluster of compounds with chemical similarities. For a single compound, a high sequence count is required for the compound to stand out and be seen among all background signals; however, for the enrichment of a certain dimension of building blocks, they stand out as lines or planes, which are much easier to be identified in cube-style figures. In our case, input less than 10⁵ is also compatible with feature identification.

On the other hand, DEL structures and selection strategies are nonnegligible factors affecting the result. In our case, the DELs were constructed by the “split and pool” strategy, with small molecules attached to double-strand DNA tags, and the DNA was purely a barcode for compound identification. The binding process is determined by the reversible interaction of the small molecules with targets. In some studies, the DEL molecules were with covalent warheads,²² cross-linking ligands,²³ or target proteins conjugated with complementary DNA tags²⁴; the selection strategies in these cases were very different, and our conclusion does not apply.

In most publications, inputs of 10⁵ copies and greater were included in the selection. Theoretically, a higher input results in higher signals and more details for feature distribution. However, when the DEL is screened on a large scale with billions of compounds, the cost of library preparation, solubility issues, and viscosity issues are fundamental factors to be considered. Our study is the first report to use a DEL with a billion-scale size to explore the lower limit of individual DEL molecules required in selection. For targets with reported strong binders and well-known druggable pockets, usually a more stringent potency criterion is required. In our case, such as with the ROCK2 target, lowering the input to 10³ to 10⁴ is feasible and can filter the weak binders. For some targets with limited reported binder and flat or less druggable pockets, a higher input (no less than 3.3 × 10⁴) should be used. In summary, the input can be adjusted along with targets and screening purposes. In general practice, we recommend using no less than 3.3 × 10⁴ in selection, which allows for the identification of both a strong and weak binder, as well as retaining all the details and possible structure-activity relationship information in enriched features.

Footnotes

Acknowledgements

We thank Dr. Wei Cui for critical review of this article.

Declaration of Conflicting Interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All authors are employed by HitGen Inc., and their research and authorship of this article was completed within the scope of their employment with HitGen Inc.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Qiuxia Chen

References

Brenner

Lerner

R. A.

Encoded Combinatorial Chemistry. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 5381–5383.

Gartner

Z. J.

DNA-Templated Organic Synthesis and Selection of a Library of Macrocycles. Science 2004, 305, 1601–1605.

Buller

Mannocci

Scheuermann

, et al Drug Discovery with DNA-Encoded Chemical Libraries. Bioconjug. Chem. 2010, 21, 1571–1580.

Clark

M. A.

Acharya

R. A.

Arico-Muendel

C. C.

, et al Design, Synthesis and Selection of DNA-Encoded Small-Molecule Libraries. Nat. Chem. Biol. 2009, 5, 647–654.

Mannocci

Zhang

Scheuermann

, et al High-Throughput Sequencing Allows the Identification of Binding Molecules Isolated from DNA-Encoded Chemical Libraries. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 17670–17675.

Buller

Steiner

Scheuermann

, et al High-Throughput Sequencing for the Identification of Binding Molecules from DNA-Encoded Chemical Libraries. Bioorg. Med. Chem. Lett. 2010, 20, 4188–4192.

Harris

P. A.

Berger

S. B.

Jeong

J. U.

, et al Discovery of a First-in-Class Receptor Interacting Protein 1 (RIP1) Kinase Specific Clinical Candidate (GSK2982772) for the Treatment of Inflammatory Diseases. J. Med. Chem. 2017, 60, 1247–1261.

Belyanskaya

S. L.

Ding

Callahan

J. F.

, et al Discovering Drugs with DNA-Encoded Library Technology: From Concept to Clinic with an Inhibitor of Soluble Epoxide Hydrolase. ChemBioChem 2017, 18, 837–842.

Zhu

Shaginian

Grady

L. C.

, et al. Design and Application of a DNA-Encoded Macrocyclic Peptide Library. ACS Chem. Biol. 2018, 13, 53–59.

10.

Halford

. How DNA-Encoded Libraries Are Revolutionizing Drug Discovery with the Bar-Coding Technology, Drugmakers Leverage the Chemistry of Large Numbers. Chem. Eng. News 2017, 28–33.

11.

Zimmermann

Scheuermann

, et al Quantitative PCR Is a Valuable Tool to Monitor the Performance of DNA-Encoded Chemical Library Selections. ChemBioChem 2017, 18, 848–852.

12.

Neri

Lerner

R. A.

DNA-Encoded Chemical Libraries: A Selection System Based on Endowing Organic Compounds with Amplifiable Information. Annu. Rev. Biochem. 2018, 87, 479–502.

13.

Sannino

Gabriele

Bigatti

, et al Quantitative Assessment of Affinity Selection Performance by Using DNA-Encoded Chemical Libraries. ChemBioChem 2019, 20, 955–962.

14.

Deng

O’Keefe

Davie

C. P.

, et al Discovery of Highly Potent and Selective Small Molecule ADAMTS-5 Inhibitors That Inhibit Human Cartilage Degradation via Encoded Library Technology (ELT). J. Med. Chem. 2012, 55, 7061–7079.

15.

Kollmann

C. S.

Bai

Tsai

C.-H.

, et al Application of Encoded Library Technology (ELT) to a Protein–Protein Interaction Target: Discovery of a Potent Class of Integrin Lymphocyte Function-Associated Antigen 1 (LFA-1) Antagonists. Bioorg. Med. Chem. 2014, 22, 2353–2365.

16.

Graybill

T. L.

Zeng

, et al. Cell-Based Selection Expands the Utility of DNA-Encoded Small-Molecule Library Technology to Cell Surface Drug Targets: Identification of Novel Antagonists of the NK3 Tachykinin Receptor. ACS Comb. Sci. 2015, 17, 722–731.

17.

Ding

O’Keefe

DeLorey

J. L.

, et al. Discovery of Potent and Selective Inhibitors for ADAMTS-4 through DNA-Encoded Library Technology (ELT). ACS Med. Chem. Lett. 2015, 6, 888–893.

18.

Ahn

Kahsai

A. W.

Pani

, et al Allosteric “Beta-Blocker” Isolated from a DNA-Encoded Small Molecule Library. Proc. Natl. Acad. Sci. U.S.A. 2017, 114, 1708–1713.

19.

Decurtins

Wichert

Franzini

R. M.

, et al Automated Screening for Small Organic Ligands Using DNA-Encoded Chemical Libraries. Nat. Protoc. 2016, 11, 764–780.

20.

Amin

Dubey

B. N.

Zhang

S.-C.

, et al Rho-Kinase: Regulation, (Dys)Function, and Inhibition. Biol. Chem. 2013, 394, 1399–1410.

21.

Lois

L. M.

Lima

C. D.

Structures of the SUMO E1 Provide Mechanistic Insights into SUMO Activation and E2 Recruitment to E1. EMBO J. 2005, 24, 439–451.

22.

Zhu

Grady

L. S. C.

Ding

, et al Development of a Selection Method for Discovering Irreversible (Covalent) Binders from a DNA-Encoded Library. SLAS Discov. 2019, 24, 169–174.

23.

Denton

K. E.

Krusemark

C. J.

Crosslinking of DNA-Linked Ligands to Target Proteins for Enrichment from DNA-Encoded Libraries. Medchemcomm 2016, 7, 2020–2027.

24.

Chan

A. I.

McGregor

L. M.

Liu

D. R.

Novel Selection Methods for DNA-Encoded Chemical Libraries. Curr. Opin. Chem. Biol. 2015, 26, 55–61.