Targeting RNA with Small Molecules: Identification of Selective,RNA-Binding Small Molecules Occupying Drug-Like Chemical Space

Abstract

Although the potential value of RNA as a target for new small molecule therapeutics is becoming increasingly credible, the physicochemical properties required for small molecules to selectively bind to RNA remain relatively unexplored. To investigate the druggability of RNAs with small molecules, we have employed affinity mass spectrometry, using the Automated Ligand Identification System (ALIS), to screen 42 RNAs from a variety of RNA classes, each against an array of chemically diverse drug-like small molecules (~50,000 compounds) and functionally annotated tool compounds (~5100 compounds). The set of RNA–small molecule interactions that was generated was compared with that for protein–small molecule interactions, and naïve Bayesian models were constructed to determine the types of specific chemical properties that bias small molecules toward binding to RNA. This set of RNA-selective chemical features was then used to build an RNA-focused set of ~3800 small molecules that demonstrated increased propensity toward binding the RNA target set. In addition, the data provide an overview of the specific physicochemical properties that help to enable binding to potential RNA targets. This work has increased the understanding of the chemical properties that are involved in small molecule binding to RNA, and the methodology used here is generally applicable to RNA-focused drug discovery efforts.

Keywords

RNA noncoding RNA drug screening drug target mass spectrometry

Introduction

The importance and functional role of RNA is becoming increasingly apparent. Although only 1%–2% of the human genome encodes proteins, 70%–90% is transcribed into RNA.¹ The remaining noncoding RNA (ncRNA) has been implicated in regulating transcription, translation, RNA modification, chromatin structure modification, and alteration of mRNA stability across biological processes.² RNAs have been implicated in a variety of human diseases, and many RNAs form complex three-dimensional structures, making them possible targets for small molecules.³ To date, various classes of ncRNA have been targeted by small molecules.⁴ Even so, the limited number of small molecules targeting RNA leads to a dearth in knowledge of the chemical matter needed for RNA-targeted drug discovery.

Drug discovery has traditionally focused on the intersection of target- and phenotype-based approaches. While both approaches are necessary, a target-based approach enables rapid medicinal chemistry optimization of potency and selectivity by first focusing on the union of chemical matter with a validated target.⁵ The modulation of disease phenotype is independent of and (in target-based approaches) subsequent to defining a structure–activity relationship (SAR). Here, we take a target-based approach to identify small molecule binders to RNA targets, using the Automated Ligand Identification System (ALIS). ALIS is an affinity-selection mass spectrometry platform for the high-throughput screening (HTS) of small molecules binding to macromolecules⁶ that has recently been validated for identifying small molecule–RNA interactions, using a variety of bacterial riboswitches and their known ligands as test cases.^7,8 ALIS is an “indirect” affinity-selection mass spectrometry technique that utilizes a size-exclusion chromatography (SEC) separation step to isolate the target–ligand complex away from unbound components and employs liquid chromatography–mass spectrometry (LC-MS) to release and identify the ligand. Indirect approaches such as ALIS avoid the issue of gas-phase detection and validation and are generally applicable to a wide variety of targets.⁹ Indirect approaches differ from direct approaches, which require detection of the target–ligand complex in the gas phase by the mass spectrometer. Direct approaches can provide definitive evidence of complex formation, but the interactions in the gas phase may not always be relevant to biological interactions that occur in solution.⁹

We report the use of the ALIS platform for the identification of small molecule RNA binders across a broad range of disease-relevant and structured RNA. First, 42 RNA targets were identified from literature that represented various therapeutic areas and RNA classes, such as bacterial and viral ncRNA elements, mammalian lncRNA, structural elements in the 3′ or 5′ untranslated region (UTR) of mammalian mRNA, G-quadruplexes, domains of ncRNA known to bind to RNA-binding proteins for function, RNA repeat elements, small nucleolar RNA (snoRNA), and noncoding splice variants. Each of these RNA targets was screened against a Diversity Library (~50,000 chemically diverse compounds) and our internal tool compounds¹⁰ (herein called “Functionally Annotated Library”; ~5100 compounds intended for phenotypic screening). This approach was distinct from “library versus library” screening, as done using the Inforna method, in that it did not require knowledge of RNA folding and only required that small molecule libraries for testing meet general compatibility standards for LC-MS detection.¹¹ Machine learning based on the initial set of RNA binders from the primary screen was used to generate a compound collection enriched in RNA-binding properties (~4000 compounds) as attested to by a higher hit rate for binding to RNA in subsequent screens. From this unprecedented set of small molecule–RNA-binding data, we analyzed the physicochemical properties and chemical features necessary for selective RNA-targeting compounds and contrasted these with protein-targeting compounds. We anticipate that lessons learned, such as features enriched in RNA-binding compounds, will facilitate the further identification of small molecules targeting RNA for therapeutic applications and drug discovery.

Materials and Methods

Selection of RNA

Human genome annotation (ENSEMBL) was screened for single-nucleotide polymorphisms (SNPs) mapping within ncRNA transcripts. Transcripts were prioritized for further study based on (1) association with an SNP associated with a human disease or condition through genome-wide association studies (GWAS); (2) literature, histone-mark, or RNA-seq evidence of expression of a discrete RNA or RNA domain of <2 kb from the locus; (3) literature evidence of expression of a functional RNA motif; and (4) literature evidence of an in vitro or in vivo assay for function of an RNA expressed from the locus. In addition, a diversity set of structured RNAs of disease relevance was selected by curation, including viral and bacterial regulatory RNAs, transcripts from candidate RNAopathies with Mendelian inheritance, and known regulatory motifs within human mRNA UTRs.

Preparation of RNA

RNA transcripts were prepared through in vitro transcription by Life Technologies (Carlsbad, CA) or generated by collaborators (Harvard Medical School) using the AmpliScribe T7 High Yield Transcription Kit (Lucigen Corporation, Middleton, WI) as described¹² and further purified by gel filtration in a Superdex 200 Increase 10/300 GL column.¹³ Transcript integrity and purity were verified using LabChip GX Touch/GX Touch HT (CLS137031, PerkinElmer, Inc., Branford, CT). Before ALIS screening, each RNA was prepared at 10 µM in annealing buffer (10 mM Tris, pH 7.4, 137 mM NaCl, 27 mM KCl, 2 mM MgCl₂) and annealed by heating to 95 °C for 5 min and then cooling to 25 °C at 3 °C/min in a thermocycler. Preliminary experiments were done to verify annealing conditions as previously reported^7,8 (i.e., MgCl₂ did not degrade the RNA during annealing; control ligands were optimized in annealing conditions), and an example is shown in Supplemental Figure S1 . Known small molecule binders or cognate ligands were used as positive controls for RNA conformation where applicable.⁷

ALIS Experiments

ALIS Configuration

The ALIS 2D LC-MS system configuration used in these studies has been previously described.^{6–9,14–16} Samples were prepared and equilibrated as described below. Samples were placed into the ALIS system autosampler and chilled to 4 °C. SEC (column dimensions: 2.1 mm ID × 50 mm length, prepared by proprietary media) was performed at 4 °C column temperature using 700 mM ammonium acetate (NH₄OAc) running buffer at pH 7.5. An isocratic pump (G1310A, Agilent, Santa Clara, CA) fitted with an online degasser (G1322A, Agilent) was used for eluent delivery at 300 µL/min for a ~20 s chromatography run. Then, a UV detector (G1314A, Agilent, with a G1313 microflow cell) was used to analyze the SEC eluent for RNA–ligand complexes at 230 nm, and a valving system directed the complex to a reverse phase chromatography (RPC) column (Targa-C18, Higgins Analytical, Mountain View, CA; column dimensions: 0.5 mm ID × 50 mm length, 5 μm packing), allowing for direct coupling of the SEC and RPC separations. Ligands were dissociated from the complex (low pH, 40 °C column temperature) and eluted into the mass spectrometer using a gradient of 0%–90% acetonitrile in water (0.2% formic acid) over 2.5 min using a capillary binary pump (G1376A, Agilent) for eluent delivery at 20 µL/min. A mass spectrometer (Exactive Orbitrap, Thermo Scientific, Waltham, MA) was utilized for detection, providing high resolution (100,000 resolving power) and high-accuracy m/z detection (mass error <5 ppm without internal calibration) and allowing for exact mass and formula confirmation for the previously bound compounds.

Sample Preparation for High-Throughput Mixture-Based ALIS Screening against Large Small Molecule Libraries

Compounds for screening were pooled into mixtures of 500 compounds at a concentration of 20 μM/compound in DMSO and then diluted to 1 μM/compound in the above annealing buffer. The compound mixtures (3 µL) were combined with the previously annealed RNA (3 µL of 10 μM) and equilibrated by incubation at room temperature for 30 min. Two rounds of mixture-based samples were run in ALIS with an injection volume of 2.0 µL/injection. Invertase was used to measure nonspecific binding. Compounds yielding reproducible binding in both rounds and not producing signal in the invertase counterscreen were considered ALIS hits. Binding of hits was confirmed in ALIS as single, pure compounds.

Proteins analyzed in the high-throughput Protein-Array ALIS (PA-ALIS) approach were prepared in orthogonal mixtures of five proteins each at 5 μM individual concentration.^5,17 Each target was screened in duplicate as two “orthogonal pools” in which each target was mixed with different partner proteins. This approach enabled hits that selectively bind to a single target to be identified, as a selective binder will appear in the two orthogonal pools containing that target.

Construction of Naïve Bayes Models for RNA Binding and the RNA-Focused Library

Naïve Bayes models were built for binders of target RNAs detected in the initial screening collection using calculated physicochemical, topological (ECFP4¹⁸), and biological (HTSFP¹⁹) descriptors in Pipeline Pilot v17.0. HTSFPs were assembled using calculated z scores from 344 in-house assays. Model accuracy was assessed by calculating the area under the curve for the receiver operator characteristic (AUC ROC) after leave-one-out cross-validation ( Suppl. Fig. S2 ). From the initial analysis, the ECFP4 naïve Bayes model was deemed suitable to select additional compounds from our company’s screening collection. In addition, we used topological nearest neighbors (ECFP4) and biological nearest neighbors (HTSFP). That is, to assemble the RNA-focused screening compound collection, molecules were selected from our company’s compound collection using three different methods, followed by filtering based on availability: 10 nearest-neighbor molecules were selected for each detected RNA binder based on calculated ECFP4-based Tanimoto similarity and a diverse selection was performed (ECFP4 descriptors, “Diverse Molecules” component as implemented in Pipeline Pilot); 10 nearest-neighbor molecules were selected for each detected RNA binder based on the Tanimoto similarity of HTSFPs (minimum 50 assays in common); and the top-scoring compounds from the ECFP4-based naïve Bayes model were selected and a diverse selection was performed (ECFP4 descriptors, “Diverse Molecules” component as implemented in Pipeline Pilot).

Additional naïve Bayes models classifying selective binding to RNA targets were built from detected binders, where classification of selective binding was defined by binding a single RNA target versus many targets. The RNA-Focused Library was trained on primary hits from a subset of 32 of the original 42 RNA targets in the original screen due to experimental practicality.

Construction of Naïve Bayes Models to Compare RNA and Protein Binders

In order to compare the chemical properties enriched for binders of RNA versus protein targets, we built naïve Bayes models classifying binding to proteins or binding to RNAs using calculated physicochemical and topological descriptors. Protein-binding data for each compound were assembled from historical screens performed using the ALIS platform.^5,17 Each model output normalized probability weights for every learning feature present in the observed data, normalized by the total occurrences of that feature across the training dataset.¹⁷

PCA of Protein and RNA Binders

Principal component analysis (PCA) based on physical chemical properties can aid in understanding relationships between sets of compounds.²⁰ We trained a PCA using 14 physicochemical properties as descriptors (molecular polar surface area, molecular weight, number of atoms, number of positive atoms, number of negative atoms, number of rotatable bonds, number of rings, number of aromatic rings, number of ring assemblies, number of stereocenters, number of hydrogen bond donors, number of hydrogen bond acceptors, fraction sp³, ALogP), as previously described,²¹ and compared ALIS RNA binders, ALIS protein binders, and literature multivalent or small molecule RNA binders from R-BIND²² with a subset of approved drugs previously described.²¹

PCA of RNA and Protein Targets

Dimensionality reduction starting with features extracted from naïve Bayes models of binders versus nonbinders has been previously been used to visualize and cluster target families.²³ In order to compare RNA versus protein targets based on the chemical matter that binds them, we trained a PCA using ECFP4 descriptors as features. A multicategory model was built in Pipeline Pilot using protein and RNA targets, binders, and nonbinders as categories, and ECFP4 as descriptors. Features that were present ≥20 times and that had a normalized probability ≥1.5 (963 features total) were used to train the PCA.

Results

Initial ALIS Screening and Building of a Small Molecule RNA Dataset

A set of 42 RNA targets (Fig. 1A, Suppl. Table S1) from nine different RNA categories and ranging from 75 to 2222 nucleotides in size were identified and each RNA target was in vitro transcribed, diluted (10 µM), and refolded in a buffer containing monovalent and divalent ions to promote RNA folding.^24,25 Our previous work with riboswitches had shown that the buffer and RNA annealing conditions used here were optimal for the detection of ligand binding in the ALIS system.^7,8 Rather than demonstrate binding selectivity to a given target using a sequence-scrambled control RNA, as was done previously,⁷ we chose to compare compound–target binding across the complete set of RNA targets screened to determine which molecules were binding in a selective fashion to each target.

Figure 1.

(A) Summary of the 42 RNA targets screened in ALIS, divided by RNA categories. (B) The hit rate (percent library bound = (number of binders/total library size) × 100) for each RNA target screened in ALIS against the Diversity Library (~50,000 compounds) and the Functionally Annotated Library (5100 compounds). (C) The hit rate for each RNA target screened in ALIS against the RNA-Focused Library (3700 compounds), with the Diversity and Functionally Annotated Libraries shown as a comparison. The average and median hit rate for each library are shown by a solid line and a dashed line, respectively. G-quadruplex RNA targets are shown on a different scale (right) with their own average and median.

Of the 42 RNA targets screened, only two targets (RNA18 and RNA26) failed to bind any compounds from the screening collections (Fig. 1B, Suppl. Table S1). All other RNA targets screened had one or more small molecule binders. It should be noted that binding interactions revealed by ALIS can have a range of affinities (K_d < 10 μM). On average (excluding G-quadruplexes), RNA targets screened in ALIS only bound 0.04% of the Functionally Annotated Library (median = 0.04%) and 0.01% of the Diversity Library (median = 0.008%) ( Fig. 1B ). In comparison, the average hit rate for protein targets screened in ALIS tended to be higher:⁵ 1.54% when screened against the Functionally Annotated Library and 0.05% with the Diversity Library ( Suppl. Table S2 ). Interestingly, the G-quadruplex class of RNAs bound a significantly higher number of compounds, with average hit rates of 0.9% of the Functionally Annotated Library (median = 1.0%) and 0.7% of the Diversity Library (median = 0.5%). While these compounds may be binding with a range of affinities, the characteristic structure of G-quadruplexes likely contributed to the higher number of binding compounds. In fact, DNA G-quadruplexes previously screened in ALIS also resulted in a high number of binding compounds.²⁶

Though the ALIS platform was able to identify small molecule binders for diverse RNA targets, we wondered whether the observed low hit rate across targets was due to inherent bias in the interrogated screening libraries that are traditionally used for protein-targeted drug discovery. The Functionally Annotated Library, assembled using pharmacogenomic data, and the Diversity Library, assembled from synthetic and commercial acquisition efforts, have historically both been applied toward the discovery of protein-binding ligands. We hypothesized that applying cheminformatic approaches to our combined primary binding data ( Fig. 1B ) could determine chemical features important for RNA binding, which in turn could be used to assemble an improved screening library enriched for RNA-binding small molecules. We note that the traditional assembly of individual, target-centric focused libraries was precluded by the paucity of binders for any individual RNA.

We generated naïve Bayesian models trained on RNA binders versus nonbinders using calculated physicochemical properties, chemical fingerprints, and biological fingerprints as descriptors.^18,19 Model accuracy was assessed by calculating the AUC ROC score. High model AUC ROC scores supported model accuracy for several descriptors, and the model based on ECFP4 fingerprints performed best ( Suppl. Fig. S2 , blue bars; see Materials and Methods). This model enabled us to identify chemical motifs prevalent among detected RNA binders, and then score new compounds for predicted RNA-binding potential. We then used this model to select high-scoring molecules from our company’s compound collection for subsequent screening against the RNA targets. By including multiple expansion methods, additional chemotypes can be discovered;²⁷ thus, we also included compounds that were nearest neighbors based on chemical or biological similarity to the RNA binders. Comparison of the resulting set of ~3700 compounds, termed the “RNA-Focused Library,” with reference to the initial screening collection using Tanimoto similarity, revealed that our new set included similar compounds (high similarity) as well as distinct chemical moieties (low similarity), the latter largely introduced through biological fingerprint nearest-neighbor expansion and ECFP4 naïve Bayesian model expansion ( Suppl. Fig. S3 ).

We evaluated the performance of the RNA-Focused Library in a second screen against a subset of 32 of the initial RNA targets ( Fig. 1C ). The new compound library exhibited an overall increase in the number of compounds that bound to an RNA target, with an average hit rate (excluding G-quadruplexes) of 0.32% (median = 0.20%), an 8-fold increase compared with the Functionally Annotated Library and a 32-fold increase compared with the Diversity Library. The number of compound binders to the G-quadruplexes was also enriched and was an order of magnitude higher than all other RNA targets screened. Of the methodologies used to generate the RNA-Focused Library, compounds selected by chemical similarity to initial hits comprised the majority of new RNA binders detected ( Suppl. Fig. S4 ). Thus, our model-informed RNA-Focused Library successfully facilitated the discovery of additional novel RNA–small molecule binding interactions.

The performance of the RNA-Focused Library, however, was variable across RNA targets. Of the 32 RNA targets rescreened, 24 targets resulted in an enriched number of binders, while 4 targets (RNA17, RNA24, RNA33, RNA39) yielded a similar or worse hit rate with the RNA-Focused Library compared with the previous libraries screened. One RNA target (RNA26, not shown), which did not have any binders from the initial screen against the Diversity and Functionally Annotated Libraries, also did not have any binders from the new RNA-Focused Library. Because the RNA-Focused Library hit rate was often greater than but uncorrelated with the initial screening library hit rate across targets, we concluded that our machine learning models selected compound features that generally promoted binding to RNA and were not strongly biased toward any one target. Furthermore, this study was aimed to identify compound features that promote general RNA binding instead of trying to optimize binding to each specific RNA target.

Selectivity of Small Molecule–RNA Interactions across RNA and Protein Targets

Using the entire set of binding data from the initial and expansion RNA screens, we interrogated how binding of small molecules compared across the panel of 42 RNA targets screened. We defined an RNA-selective compound as one that bound to only 1 RNA target across our panel of 42 RNA targets. While many compounds bound multiple RNA targets ( Fig. 2A , white bars), we found 944 compounds (gray bars) that were selective for only one RNA target out of the total 1424 compounds that bound across all RNA targets in ALIS (66.3% RNA selective; Fig. 2C ). In fact, 30 of our 42 RNA targets had one or more RNA-selective binders that did not bind any other RNA target. Additionally, our screening set contained eight known RNA binders (yohimbine, kanamycin, sisomicin, gentamicin, phenolphthalein, ADP, neamine, geneticin),¹¹ but their respective RNA targets were excluded from our RNA target panel. Importantly, none of these eight compounds bound to any (off-target) RNA targets screened within the affinity range of ALIS detection (K_d < 10 μM). Aminoglycosides such as kanamycin and gentamicin are conformationally flexible enough to bind to a variety of RNAs (ribosomal A-site, HIV TAR, HIV RRE, Group I intron, RNase P, tmRNA, the eukaryotic A site), but it is known that there are RNA-sequence and RNA structure-dependent elements that are required for aminoglycoside binding.²⁸ Such features may be lacking in our target set.

Figure 2.

(A) Number of RNA binders that are selective (bind only the single indicated RNA target; green) versus nonselective (white) across RNA targets screened. The total number of compound binders is indicated above the bar. (B) Of the RNA-selective compounds from A, the number of compounds that also do not bind any known protein targets in ALIS (patterned). The total number of RNA-selective compounds is shown above the bar. (C) Summary of total RNA-selective and non-protein-binding compounds, total RNA-selective compounds, and total number of compounds (excluding duplicates) bound to all RNA targets. The total number of compounds for each category is shown, and the number of compounds excluding G-quadruplexes is shown in parentheses.

Having demonstrated the ability to identify binders with RNA selectivity within our panel, we next questioned whether or not these RNA-selective compounds bound to any known proteins. We used historical internal ALIS binding data collected at our company to identify any protein targets for our RNA-selective compounds. We found that 545 compounds of the 944 RNA-selective compounds also did not bind any known protein targets (57.7%; Fig. 2B , light gray bars). Of the 30 RNA targets with one or more RNA-selective compounds, 18 of these RNA targets had RNA-selective binders that remained selective in protein space and did not bind to any known proteins. In fact, RNA-selective compounds were significantly less likely to bind to proteins than compounds binding to multiple RNA targets ( Fig. 2C : all targets, one-tailed Fisher’s exact test p < 0.0001; non-G-quadruplexes, p = 0.056).

In order to determine if binding promiscuity across RNA targets also indicated promiscuity across protein targets, we assessed the binding of our three libraries of compounds against protein targets screened in ALIS (internal historical data). A comparison of the number of RNA targets to the number of protein targets bound by each compound revealed that the degree of promiscuity across RNA targets was not correlated with the degree of promiscuity across protein targets ( Suppl. Fig. S5 ). For example, the majority of RNA-binding compounds clustered in the top left quadrant of the heat map, indicating few RNA and protein targets. Remarkably, one compound that was selective for a single RNA target binds to as many as 60 protein targets, while another compound that bound to no known proteins bound to 24 of our 42 RNAs.

It is important to note that we defined RNA selectivity as binding to a single RNA target in our panel and identified compounds that did not bind any known protein targets. However, binding interactions revealed by ALIS can have a range of affinities (K_d < 10 μM). Therefore, compounds may be binding with varying affinities to the identified RNA and protein targets. Determination of compound–target affinities may uncover “nonselective” compounds that in fact have a much higher affinity to a single target compared with all other targets. Furthermore, historically more protein targets have been screened in ALIS compared with RNA targets, making the selectivity criteria for proteins currently more rigorous than those for RNAs through this technique.

Properties of RNA Targets Favorable for Small Molecule Binding

For RNA to serve as a small molecule druggable target, RNA function must be mediated by secondary or tertiary structure,^2,29 as opposed to sequence. We found that there is no correlation between RNA target size and compound binders ( Suppl. Fig. S6 ) across all RNA targets (R² = 0.01), nor was there a correlation within a single RNA class (0.06 < R² < 0.38, for RNA classes with five or more targets). This suggested that the compounds may not be binding in a bulk, linear size-dependent manner, but instead may be binding to more complex secondary or tertiary structural RNA elements. While the structures of all RNA targets in our set are not known, well-structured bacterial riboswitches, each known to bind a cognate ligand to modulate gene function,³⁰ resulted in a similar overall hit rate and enrichment rate compared with other RNA targets, with no correlation to RNA size.

To further interrogate this point, we looked to the G-quadruplex class of RNA. These G-rich RNAs can form well-defined, stable structures with four guanines interacting through Hoogsteen bonding in a planar manner around a central monovalent cation, particularly K⁺, which coordinates between the G-quartets.^31,32 These structured RNAs are associated with several biological processes, including transcript processing and translational control. In our study, the relevance of the G-quadruplex structure for compound binding was investigated by screening these RNA targets under conditions that promote the G-quadruplex structure (i.e., high K⁺) as well as conditions that disfavor G-quadruplex formation (i.e., high Na⁺ in place of K⁺), as confirmed by circular dichroism under the same conditions.³² Interestingly, we found that the many compounds that bound to the folded G-quadruplex structures did not bind to the same sequence under conditions that disfavor quadruplex formation ( Fig. 3 ). This largely suggested that a properly folded RNA structure was necessary for the majority of our identified compounds to bind. Additionally, by this method of comparative screening under structure-favorable and unfavorable conditions, we identified those compounds that were binding under both conditions ( Fig. 3 , in parentheses) in a structure-independent manner. We also found a small subset of compounds that bound to the conformationally disfavored structures only.

Figure 3.

The number of binding compounds for folded versus unfolded G-quadruplex RNA. The same RNA sequence was stabilized for G-quadruplex formation (or not) using different salt conditions (i.e., high K⁺ concentration for G-quadruplex formation; high Na⁺ concentration for destabilized conditions). These RNA targets were screened in ALIS against the Diversity, Functionally Annotated, and RNA-Focused Libraries. The number of overlapping compounds between the folded and unfolded states is indicated in parentheses.

Through this work, we found no correlation between linear RNA size and compound binding and identified a strong example of structure dependence for compound binding. While it is known that small molecules can bind the major and minor grooves of RNA via intercalation or base-pair complementarity, for target specificity and strong pharmacology we aimed to target the tertiary structure of folded RNA by binding in the diverse pockets created by higher-order folding. The trends and requirements seen here suggest that we have, in fact, targeted structured RNA. However, further structural analysis is necessary to confirm these hypotheses.

Characteristics of RNA-Binding Compared with Protein-Binding Compounds

We next used our comprehensive small molecule–RNA dataset ( Suppl. Fig. S2 , green bars) to understand how the physiochemical determinants of RNA binding compared with those for protein binding. We examined the chemical space occupied by ALIS RNA-binding compounds as defined by a PCA trained using physicochemical descriptors such as molecular weight and number of atoms ( Suppl. Fig. S7 ; see Materials and Methods). For comparison, we included approved drugs, ALIS protein binders, and literature small molecule and multivalent RNA binders from the RNA-Targeted Bioactive Ligand Database (R-BIND).²²

Consistent with the findings of Morgan et al.,²² we observe that the majority of RNA binders and protein binders overlap within drug-like chemical space, although there are a few RNA-binding and protein-binding compounds that occupy space that is distinct from the drug-like compounds ( Fig. 4 ). Likewise, the literature small molecule RNA binders also occupy drug-like chemical space. In contrast, the multivalent literature RNA binders occupy chemical space that is distinct from drug-like space ( Fig. 4C , yellow). From analysis of the physiochemical properties that contribute the most to PC1 (Fig. 4D,E) and PC2 (Fig. 4F,G), we can deduce that these multivalent compounds are larger and more polar than the drug-like space that is delineated by our comparators ( Fig. 4C , gray). Multivalent R-BIND ligands possess two binding moieties joined by a peptide linker, driving their higher ring number and molecular weight in comparison to our Diversity Library. We also note that the ~1400 RNA binders identified via ALIS envelop and saturate regions of physicochemical space occupied by the small molecule R-BIND library, which comprises 20-fold fewer members.

Figure 4.

Comparison of physical chemical space occupied by protein binders, RNA binders, and drugs. In all plots, the same subset of approved drugs (gray) is depicted for comparison. (A) Protein binders detected by ALIS (red). (B) RNA binders detected by ALIS (blue). (C) Literature multivalent RNA binders (yellow) and literature small molecule RNA binders (green). For a reference, drugs plotted in A–C are colored from white to black with the properties that contribute the most to PC1 (D,E) or PC2 (F,G), with black signifying a greater amount for the given property.

We then further explored two properties of RNA and protein binders that significantly contributed to the PCA: molecular weight and AlogD. Using CHEMGENIE,¹⁰ our company’s biochemical and pharmacogenomic database, we assembled historical protein ALIS binding data for RNA screening library compounds. We then calculated physicochemical properties for each compound in the screening collection and trained naïve Bayesian classification models to identify key differences in feature weights for RNA or protein binding (Fig. 5, Suppl. Fig. S8).³³ Each feature was assigned a normalized probability related to its enrichment within a model category of interest; a positive probability for a given feature indicated that molecules possessing that feature tend to bind RNA, while a negative probability indicated a feature to be among nonbinders. For most properties, we observed similarities in ranges for favorability and/or unfavorability for target class binding.

Figure 5.

Feature score maps for physicochemical modeling of protein and RNA binders. Screening library compounds were binned based on calculated molecular weight (x axis) and LogD (y axis), and the color of each heat map cell represents the model normalized. Example compounds are displayed (1: a spirohydantoin inhibitor of prolyl hydroxylases;³⁴ 2: triazolinone sulfonamide dual angiotensin 1/2 receptor antagonist³⁷ that does not bind RNA; 3: an indolocarbazole analog of NB-506³⁸ that had no detected protein binding in ALIS) and regions satisfying Lipinski’s rule of 5^35,36 are bound with bolded lines.

Highlighted in Figure 5 are example compounds with properties corresponding to model probabilities. Notably, positive probability regions—indicating enrichment in compounds that bind the target class—such as those occupied by compound 1, an inhibitor of prolyl hydroxylases,³⁴ reside within a physicochemical regime that also satisfies Lipinski’s rules for drug-likeness.^35,36 The positive model probabilities in these regions indicate a plausibility to discover RNA binders with chemical properties akin to those of orally bioavailable drugs. Meanwhile, compounds 2 and 3 ( Fig. 5 ) illustrate examples of binders of only protein or only RNA, respectively, in regions of physicochemical properties that differentially favored these binding behaviors.

The unique RNA–small molecule binding dataset that we have generated offered us the opportunity to compare RNA and protein targets with respect to the chemical matter that binds (and does not bind) them. We built naïve Bayesian models for each of our RNA and protein targets, based on the binders and nonbinders in our three small molecule libraries. We then extracted the chemical features from these models that were most enriched in binders for at least one of the targets and used these features across all targets to train a PCA ( Suppl. Fig. S9 ; see Materials and Methods). Although we observed a small degree of overlap among some targets, the two target types were largely separated by the PCA. These nonoverlapping regions indicated that these target classes were bound by distinct chemical matter. Furthermore, we investigated the PCA loadings to identify chemical features that were selective for binding to either RNA targets ( Suppl. Fig. S9C,D ) or protein targets ( Suppl. Fig. S9E,F ). Many of the features selective for RNA targets are aromatic amine-containing heterocycles or amidine-like motifs.

To further investigate the specific features that are important for RNA binding, we built naïve Bayesian models trained on RNA binders versus nonbinders, with chemical features (ECFP4) as descriptors (cross-validated AUC ROC = 0.805). In this case, each chemical feature was assigned a normalized probability related to its enrichment within a model category of interest; a positive probability for a given feature indicated that molecules possessing that feature tend to bind RNA, while a negative probability indicated a feature to be among nonbinders. We analyzed chemical features with the highest and lowest model weights to find several features that were enriched for overall RNA binding (features i–xii; Fig. 6 ), along with several features that were prevalent in nonbinders (features xiii–xix). In agreement with intuition, nitrogen-containing heterocycles were among the most enriched features for RNA binders, while moieties possessing a negative charge at biological pH, including phosphates and carboxylic acids, were among the most enriched features for nonbinders ( Fig. 6 ). Interestingly, benzimidazole,³⁹ which was previously identified as a chemotype for RNA binding, had a strongly negative model weight, and none of the 18 screening library compounds possessing this substructure were detected as binders for our target panel. Furthermore, computing the structural similarity of our discovered RNA binders with R-BIND²² revealed little similarity (i.e., low Tanimoto scores), supporting our identification of novel chemotypes ( Suppl. Fig. S10 ), though the scope of binding for these chemotypes may be limited to RNA targets similar to those in our screen set.

Figure 6.

Chemical substructures enriched in specific binders, promiscuous binders, and nonbinders.

We next questioned what features were enriched in selective RNA binders. Surprisingly we learned that many of the features scoring highest in the overall binding model were among the lowest scoring features in the selective binding model, reflecting their effectiveness in enriching for general binding. Nevertheless, we did identify several substructures enriched among specific RNA binders (features xi–xiv; Fig. 6 ).

In summary, our dataset and analyses have revealed that while many of the physicochemical properties of the RNA-binding compounds are similar to those of the protein-binding compounds, the features that are enriched in RNA binders are distinct from the features enriched in protein binders. Furthermore, combinations of specific features may render compounds to bind one RNA target preferentially over dozens of other targets.

Discussion

By using the ALIS affinity-selection mass spectrometry platform for HTS of 42 varied RNA targets against a total of ~60,000 compounds each, we have generated millions of target–compound interaction data points from which we have identified new drug-like small molecules that bind to RNA. Based on an analysis of small molecule binders of RNA, we have successfully built an RNA-Focused Library of ~3700 small molecules that is enriched for RNA binders compared with previous compound libraries in our collection. Our analysis has revealed compounds that are selective for RNA targets in our panel as well as general RNA binders. Our selective compound set includes compounds that have selectivity for specific RNAs against other RNA targets as well as protein targets. By looking at correlations to RNA target size, we reason that the majority of our compounds may be binding in a structure-dependent manner. Importantly, comparative screening under structured and disfavorable RNA-folding conditions identifies compounds that specifically bind the physiologically relevant folded state.

Our cheminformatics approach has given us an initial understanding of the physicochemical properties and chemical substructures that lead to both general and specific RNA binding. Importantly, many of our identified RNA-targeting small molecules are classically drug-like in their physicochemical properties, implicating their potential function as RNA-targeting therapeutics. Although there are a limited number of examples of small molecules intentionally designed to target RNA, this work provides initial guidance in this regard, and has uncovered selective RNA–small molecule interactions for focused small molecule drug discovery efforts.

The ALIS platform is a flexible approach that has historically been useful in providing an entry point for drug leads. In fact, ALIS has been used routinely to screen large numbers of drug-like small molecules to find binders to several types of protein targets in the past, and binders have often led to functional lead compounds with additional follow-up studies.⁵ It is evident that additional structural, cell-based, and functional assays need to be done to clarify the biology of the small molecule–RNA interactions identified in this study. Furthermore, RNA ligandability using small molecules is becoming increasingly important with the development of new technologies such as ribonuclease-targeting chimeras (RIBOTAC), which uses a small molecule binder to RNA to recruit a nuclease for RNA degradation.⁴⁰ The ALIS technique used here complements other techniques, such as fluorescent indicator displacement (FID)⁴¹ and selective 2′-hydroxyl acylation and primer extension (SHAPE),²⁹ that have been used to probe small molecule interactions with RNA. The intent of this study is to establish the ligandability of RNA by small molecules and assess the physiochemical properties and enriched features that govern RNA binding and selectivity. This work provides unique insights into RNA-targeted small molecule libraries and the identification of RNA ligands and will help add to previous efforts to classify the factors involved in RNA–small molecule ligand binding,^42,43 thus aiding in RNA-targeted drug discovery efforts toward modulating the function of previously undruggable pathways.

Supplemental Material

Supplementary_Data_final – Supplemental material for Targeting RNA with Small Molecules: Identification of Selective, RNA-Binding Small Molecules Occupying Drug-Like Chemical Space

Supplemental material, Supplementary_Data_final for Targeting RNA with Small Molecules: Identification of Selective, RNA-Binding Small Molecules Occupying Drug-Like Chemical Space by Noreen F. Rizvi, John P. Santa Maria, Ali Nahvi, Joel Klappenbach, Daniel J. Klein, Patrick J. Curran, Matthew P. Richards, Chad Chamberlin, Peter Saradjian, Julja Burchard, Rodrigo Aguilar, Jeannie T. Lee, Peter J. Dandliker, Graham F. Smith, Peter Kutchukian and Elliott B. Nickbarg in SLAS Discovery

Footnotes

Acknowledgements

The authors would like to thank Anne Mai Wasserman and Kerrie Spencer for their advice and support of this research.

Supplemental material is available online with this article.

Authors’ Note

Julja Burchard is currently affiliated with Sera Prognostics, Inc., Salt Lake City, UT, USA. Graham F. Smith is currently affiliated with AstraZeneca, Drug Safety and Metabolism, IMED Biotech Unit, Cambridge, UK.

Declaration of Conflicting Interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All authors except R.A. and J.T.L. are current or former employees of Merck & Co., Inc., and may hold stock or other financial interests in Merck & Co.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: J.T.L. received support from a Merck MINt award, and R.A. was funded by a Pew Latin American Fellowship. All other authors were supported by Merck & Co., Inc.

References

The Encode Project Consortium. Identification and Analysis of Functional Elements in 1% of the Human Genome by the ENCODE Pilot Project. Nature 2007, 447, 799–816.

Cech

T. R.

Steitz

J. A.

The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones. Cell 2014, 157, 77–94.

Warner

K. D.

Hajdin

C. E.

Weeks

K. M.

Principles for Targeting RNA with Drug-Like Small Molecules. Nat. Rev. Drug Discov. 2018, 17, 547.

Rizvi

N. F.

Smith

G. F.

RNA as a Small Molecule Druggable Target. Bioorg. Med. Chem. Lett. 2017, 27, 5083–5088.

Kutilek

V. D.

Andrews

C. L.

Richards

M. P.

; et al. Integration of Affinity Selection-Mass Spectrometry and Functional Cell-Based Assays to Rapidly Triage Druggable Target Space within the NF-kB Pathway. J. Biomol. Screen. 2016, 21, 608–619.

Annis

D. A.

Athanasopoulos

Curran

P. J.

; et al. An Affinity Selection–Mass Spectrometry Method for the Identification of Small Molecule Ligands from Self-Encoded Combinatorial Libraries: Discovery of a Novel Antagonist of E. coli Dihydrofolate Reductase. Int. J. Mass Spec. 2004, 238, 77–83.

Rizvi

N. F.

Howe

J. A.

Nahvi

; et al. Discovery of Selective RNA-Binding Small Molecules by Affinity-Selection Mass Spectrometry. ACS Chem. Biol. 2018, 13, 820–831.

Rizvi

N. F.

Nickbarg

E. B.

RNA-ALIS: Methodology for Screening Soluble RNAs as Small Molecule Targets Using ALIS Affinity-Selection Mass Spectrometry. Methods 2019, 167, 28–38.

Annis

D. A.

Nickbarg

Yang

; et al. Affinity Selection-Mass Spectrometry Screening Techniques for Small Molecule Drug Discovery. Curr. Opin. Chem. Biol. 2007, 11, 518–526.

10.

Kutchukian

P. S.

Chang

Fox

S. J.

; et al. CHEMGENIE: Integration of Chemogenomics Data for Applications in Chemical Biology. Drug Discov. Today 2017, 23, 151–160.

11.

Disney

M. D.

Winkelsas

A. M.

Velagapudi

S. P.

; et al. Inforna 2.0: A Platform for the Sequence-Based Design of Small Molecules Targeting Structured RNAs. ACS Chem. Biol. 2016, 11, 1720–1728.

12.

Cifuentes-Rojas

Hernandez

A. J.

Sarma

; et al. Regulatory Interactions between RNA and Polycomb Repressive Complex 2. Mol. Cell 2014, 55, 171–185.

13.

Chillón

Marcia

Legiewicz

; et al. Native Purification and Analysis of Long RNAs. Methods Enzymol. 2015, 558, 3–37.

14.

Annis

Chuang

C.-C.

Nazef

ALIS: An Affinity Selection-Mass Spectrometry System for the Discovery and Characterization of Protein-Ligand Interactions. In Mass Spectrometry in Medicinal Chemistry: Applications in Drug Discovery, Wanner

Höfner

Mannhold

; et al.; Wiley: Weinheim, 2007, pp 121–156.

15.

Annis

D. A.

Shipps

G. W.

Jr. Deng

; et al. Method for Quantitative Protein-Ligand Affinity Measurements in Compound Mixtures. Anal. Chem. 2007, 79, 4538–4542.

16.

Andrews

C. L.

Ziebell

M. R.

Nickbarg

; et al. Mass Spectrometry-Based Screening and Characterization of Protein-Ligand Complexes in Drug Discovery. In Protein and Peptide Mass Spectrometry in Drug Discovery, Gross

M. L.

Chen

Pramanik

B. N.

, Eds.; John Wiley & Sons: Hoboken, NJ, 2011, pp 253–286.

17.

Santa Maria

J. P.

Jr. Park

Yang

; et al. Linking High-Throughput Screens to Identify MoAs and Novel Inhibitors of Mycobacterium tuberculosis Dihydrofolate Reductase. ACS Chem. Biol. 2017, 12, 2448–2456.

18.

Rogers

Hahn

Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754.

19.

Petrone

P. M.

Simms

Nigsch

; et al. Rethinking Molecular Similarity: Comparing Compounds on the Basis of Biological Activity. ACS Chem. Biol. 2012, 7, 1399–1409.

20.

Shelat

A. A.

Guy

R. K.

The Interdependence between Screening Methods and Screening Libraries. Curr. Opin. Chem. Biol. 2007, 11, 244–251.

21.

Kutchukian

P. S.

Dropinski

J. F.

Dykstra

K. D.

; et al. Chemistry Informer Libraries: A Chemoinformatics Enabled Approach to Evaluate and Advance Synthetic Methods. Chem. Sci. 2016, 7, 2604–2613.

22.

Morgan

B. S.

Forte

J. E.

Culver

R. N.

; et al. Discovery of Key Physicochemical, Structural, and Spatial Properties of RNA-Targeted Bioactive Ligands. Angew. Chem. Int. Ed. 2017, 56, 13498–13502.

23.

Kutchukian

P. S.

Wassermann

A. M.

Lindvall

M. K.

; et al. Large Scale Meta-Analysis of Fragment-Based Screening Campaigns: Privileged Fragments and Complementary Technologies. J. Biomol. Screen. 2015, 20, 588–596.

24.

Draper

D. E.

A Guide to Ions and RNA Structure. RNA 2004, 10, 335–343.

25.

Shiman

Draper

D. E.

Stabilization of RNA Tertiary Structure by Monovalent Cations. J. Mol. Biol. 2000, 302, 79–91.

26.

Flusberg

D. A.

Rizvi

N. F.

Kutilek

; et al. Identification of G-Quadruplex-Binding Inhibitors of Myc Expression through Affinity Selection–Mass Spectrometry. SLAS Discov. 2019, 24, 142–157.

27.

Kutchukian

P. S.

Warren

Magliaro

B. C.

; et al. Iterative Focused Screening with Biological Fingerprints Identifies Selective Asc-1 Inhibitors Distinct from Traditional High Throughput Screening. ACS Chem. Biol. 2017, 12, 519–527.

28.

Chittapragada

Roberts

Ham

Y.W.

Aminoglycosides: Molecular Insights on the Recognition of RNA and Aminoglycoside Mimics. Perspect. Med. Chem. 2009, 3, 21–37.

29.

Mustoe

A. M.

Busan

Rice

G. M.

; et al. Pervasive Regulatory Functions of mRNA Structure Revealed by High-Resolution SHAPE Probing. Cell 2018, 173, 181–195.e118.

30.

Roth

Breaker

R. R.

The Structural and Functional Diversity of Metabolite-Binding Riboswitches. Annu. Rev. Biochem. 2009, 78, 305–334.

31.

Bugaut

Balasubramanian

5′-UTR RNA G-Quadruplexes: Translation Regulation and Targeting. Nucleic Acids Res. 2012, 40, 4727–4741.

32.

Reddy

Zamiri

Stanley

S. Y. R.

; et al. The Disease-Associated r(GGGGCC)n Repeat from the C9orf72 Gene Forms Tract Length-Dependent Uni- and Multimolecular RNA G-Quadruplex Structures. J. Biol. Chem. 2013, 288, 9860–9866.

33.

Kutchukian

P. S.

Vasilyeva

N. Y.

; et al. Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery. PLoS One 2012, 7, e48476.

34.

Vachal

Miao

Pierce

J. M.

; et al. 1,3,8-Triazaspiro[4.5]decane-2,4-diones as Efficacious Pan-Inhibitors of Hypoxia-Inducible Factor Prolyl Hydroxylase 1–3 (HIF PHD1–3) for the Treatment of Anemia. J. Med. Chem. 2012, 55, 2945–2959.

35.

Lipinski

C. A.

Lombardo

Dominy

B. W.

; et al. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 1997, 23, 3–25.

36.

Smith

G. F.

Medicinal Chemistry by the Numbers: The Physicochemistry, Thermodynamics and Kinetics of Modern Drug Design. In Progress in Medicinal Chemistry; Lawton

G.;

Witty

D. R.

, Eds.; Elsevier: Amsterdam, 2009, Vol. 48, pp 1–29.

37.

Ashton

W. T.

Chang

L. L.

Flanagan

K. L.

; et al. Triazolinone Biphenylsulfonamide Derivatives as Orally Active Angiotensin II Antagonists with Potent AT1 Receptor Affinity and Enhanced AT2 Affinity. J. Med. Chem. 1994, 37, 2808–2824.

38.

Ohkubo

Nishimura

Kawamoto

; et al. Synthesis and Biological Activities of NB-506 Analogues Modified at the Glucose Group. Bioorg. Med. Chem. Lett. 2000, 10, 419–422.

39.

Velagapudi

S. P.

Luo

Tran

; et al. Defining RNA-Small Molecule Affinity Landscapes Enables Design of a Small Molecule Inhibitor of an Oncogenic Noncoding RNA. ACS Cent. Sci. 2017, 3, 205–216.

40.

Costales

M. G.

Matsumoto

Velagapudi

S. P.

; et al. Small Molecule Targeted Recruitment of a Nuclease to RNA. J. Am. Chem. Soc. 2018, 140, 6741–6744.

41.

Wicks

S. L.

Hargrove

A. E.

Fluorescent Indicator Displacement Assays to Identify and Characterize Small Molecule Interactions with RNA. Methods 2019, 167, 3–14.

42.

Mehta

Sonam

Gouri

; et al. SMMRNA: A Database of Small Molecule Modulators of RNA. Nucleic Acids Res. 2014, 42, D132–D141.

43.

Juru

A. U.

Patwardhan

N. N.

Hargrove

A. E.

Understanding the Contributions of Conformational Changes, Thermodynamics, and Kinetics of RNA–Small Molecule Interactions. ACS Chem. Biol. 2019, 14, 824–838.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.38 MB

1.37 MB