The Role of Historical Bioactivity Data in the Deconvolution of Phenotypic Screens

Abstract

A substantial challenge in phenotypic drug discovery is the identification of the molecular targets that govern a phenotypic response of interest. Several experimental strategies are available for this, the so-called target deconvolution process. Most of these approaches exploit the affinity between a small-molecule compound and its putative targets or use large-scale genetic manipulations and profiling. Each of these methods has strengths but also limitations such as bias toward high-affinity interactions or risks from genetic compensation. The use of computational methods for target and mechanism of action identification is a complementary approach that can influence each step of a phenotypic screening campaign. Here, we describe how cheminformatics and bioinformatics are embedded in the process from initial selection of a focused compound library from a large set of historical small-molecule screens through the analysis of screening results. We present a deconvolution method based on enrichment analysis and using known bioactivity data of screened compounds to infer putative targets, pathways, and biological processes that are consistent with the observed phenotypic response. As an example, the approach is applied to a cellular screen aiming at identifying inhibitors of tumor necrosis factor–α production in lipopolysaccharide-stimulated THP-1 cells. In summary, we find that the approach can contribute to solving the often very complex target deconvolution task.

Keywords

phenotypic drug discovery compound library target deconvolution data analysis data integration computational inference

Introduction

Despite a focus on target-based approaches for the past decade of drug discovery, most first-in-class new small molecules were discovered via phenotypic screening.¹ Consequently, phenotypic screening programs at the cell or organism level are of strategic importance to the pharmaceutical industry. The molecules identified through these screens can target one or more proteins/pathways, and the observed phenotypic effects often translate into disease. One drawback of these approaches is that the molecular mode of action (MOA) of identified compounds often needs to be characterized to enable efficient further optimization of the screening hit and to enable further research around the new MOA (e.g., safety considerations, functional annotation of target, drug development).^1,2 This characterization process, also called the target deconvolution process, is still a substantial challenge in phenotypic drug discovery.

Several types of complementary approaches currently can be used for the target deconvolution process: direct biochemical methods, genetic interaction methods, and computational knowledge-based methods.^2,3 A recent review gives a very comprehensive view of several complementary experimental target deconvolution methods.⁴ Chemoproteomics is one of the main approaches and is normally used in the absence of hypotheses for which target(s) lead to the phenotypic effect. The molecule of interest is attached to a linker that is used to pull out targets from a cell lysate, which are then profiled using proteomics. The putative target list might be further validated through complementary methods such as small interfering RNA (siRNA). Successful examples of target deconvolution with chemoproteomics are the identification of tankyrase as an essential part of the WNT signaling pathway⁵ and the monocarboxylate transporter MCT1 as the target of a novel class of immunosuppressant compounds.⁶

Complementarily, computational methods can help infer the targets/MOA of small molecules or can generate hypotheses about key targets involved in the phenotype of interest. Numerous methods are available to predict targets from the molecular structure.⁷ They are mainly based on a statistical comparison of chemical features shared with known ligands for a given target. A limitation of these methods, in addition to requiring a known small-molecule ligand, is that structural similarity does not always translate to equivalent bioactivity⁸; therein, false positives can be generated. Multiparametric phenotypic screens can enable computational methods to analyze/cluster the compounds according to their MOA. The comparison of phenotypic profiles between very well-known tool compounds and other compounds can then lead to the inference of targets and MOA for the latter under the assumption that compounds with similar profiles modulate similar targets or pathways.^9,10 A limit to this type of approach is identifying compounds of interest with a new MOA. In this context, interpreting high-dimensional phenotypic profiles from high-content screening is a challenge. To help with this interpretation, and since groups of highly correlated end points are likely to be measuring common underlying traits, Young et al.¹¹ proposed using standard methods for large data reduction to derive factors that are mechanistically interpretable (e.g., nuclear size, DNA replication levels, chromosome condensation). To further aid the target deconvolution process, these methods can be augmented with additional data sources (e.g., the above-mentioned in silico target prediction from compound structure).¹⁰ Other data sources include side effect profiles,¹² historical bioactivity data available in public databases,¹³ and phenotypic profiles available in ChemBank and PubChem.^2,14 As many compounds act via the modulation of several targets,^15,16 pathway/interactome data sources can also provide valuable insights into the MOA of compounds and support compound optimization strategies as well as the new target identification process.¹⁷ In cases where phenotypic screens only measure a single (i.e., 1D) response, the phenotypic profile for tool compounds is not sufficiently informative on its own, and the incorporation of these data integration strategies becomes critical to enable target deconvolution.

There are two different philosophies regarding selecting screening sets for phenotypic screening. Either a diversity screening approach is taken or a set of tool compounds with a known bioactivity profile is screened. A diversity set consists of compounds that should optimally cover the chemical space in a predefined way. Alternatively, a tool compound set consists of compounds that have been optimized for a specific target and also have been profiled in a selectivity panel. Both screening approaches have pro and cons. The primary advantage with screening a set of tool compounds is that relevant targets for the phenotypic effect can be identified very rapidly. Another advantage is that only a small set of compounds needs to be screened, allowing more complex, costly, and time-consuming assays to be used. A disadvantage is that it is difficult to identify a truly novel target since the tool compounds have been developed with a specific target in mind. An advantage with a diversity screen is the possibility of identifying completely novel targets after optimization of the screening hit and accompanying target deconvolution. Disadvantages with screening a diversity set are that a relatively large number of compounds need to be screened and that there is no guarantee that the optimization and target deconvolution parts will be successful.

In this study, we propose a strategy that combines cheminformatics and bioinformatics to integrate historical bioactivity and pathway data for the analysis of single read-out phenotypic screens. We describe how this data integration can support each step of a phenotypic screening campaign from the compound selection (experiment planning) through to the target deconvolution process. As an illustration and validation of the concept, the strategy is applied to a phenotypic screen measuring tumor necrosis factor (TNF)–α production in lipopolysaccharide (LPS)–stimulated THP-1 cells. An overview of the workflow for the calculations is given in Supplemental Material 1.

Materials and Methods

Bioactivity Data Resources

Activity data were extracted from the CHEMBL ( https://www.ebi.ac.uk/chembl/ ),¹⁸ the GoStar,¹⁹ and the BioPrint¹⁴ databases. In-house activity data were also used.²⁰ For all databases, the compound-target data were classified as active or inactive by applying a cutoff of a minimum percent effect of 50% at 10 µM. When conflicting results arose for a given compound-target pair, the active result was kept. The Entrez Gene identifier (EGID) was used to identify the targets. Compounds were identified by identical unique identifiers across different databases.

Screening Set Design for Target Enrichment Testing

Since Fisher’s exact test is used to test for enrichment of targets after screening (see the “Enrichment Analysis” section), one can design the library screened in a manner that increases the power of the test for enrichment. By assuming that compound activity on targets in the screen will be identical to their historical target activity, the contingency table will have only nonzero entries on the diagonal, and the significance of the test for target i based on the screening set, $S$ , is given by the hypergeometric distribution as

p_{i} (S) = a_{i}^{+} (S)! a_{i}^{-} (S)! / (a_{i}^{+} (S) + a_{i}^{-} (S))!,

where $a_{i}^{+} (S)$ and $a_{i}^{-} (S)$ are the number of known actives and known inactives against target $i$ , respectively (additional details in Supplemental Material 2). To test for enrichment for as many targets as possible, the screening set should be selected so that the cardinality of the set $A^{+} (S) = {i | p_{i} (S) < α}$ is maximized based on some threshold, α, for calling significance.

In practice, the size of the screening set is fixed and a search is conducted to optimize the composition using the following simulated annealing approach: (1) initially, S is drawn randomly from the compound collection, C; (2) randomly select a candidate compound for exclusion $i \in S$ and for inclusion $j \in C \ S$ ; (3) if more targets can be distinguished by the substitution, that is, if $| A^{+} (S) | < | A^{+} ((S \ {i}) \cup {j}) |$ or $(| A^{+} (S) | = | A^{+} ((S \ {i}) \cup {j}) |) \land (\max_{k \in A^{+}} p_{k} (S) > \max_{k \in A^{+}} p_{k} ((S \ {i}) \cup {j}))$ or the new state is randomly accepted, replace $i$ by $j$ in $S$ ; and (4) repeat from step 2 for a set number of iterations. The likelihood of accepting a new state by chance is reduced to zero over the course of the iteration.

Selection of Compounds with an Increased Probability of Impact on TNF-α Production

To increase the probability of screening compounds relevant to TNF-α production, a subset of compounds was selected to supplement the above approach according to their activity on proteins putatively related to TNF-α production in pathway databases. The Ingenuity IPA database (Redwood City, CA)²¹ was first queried to identify genes whose complete or partial deletion in mouse models leads to a modification of TNF-α levels. GeneGO MetaCore (Philadelphia, PA)²² was then mined to identify proteins upstream of TNF-α. The linear canonical pathways (noodles) containing TNF-α were extracted from MetaBase. In these noodles, TNF-α was affected through five different mechanisms: “influence on expression,” “transcription regulation,” “co-regulation of transcription,” “cleavage,” and “unspecified (but related to TNF-α production).” Consequently, all upstream proteins in these canonical noodles can affect TNF-α production. The distance to TNF-α was limited to five to select proteins most likely to be causally linked to effects on TNF-α production (as the distance increases, this causal link is expected to reduce due to feedback loops and redundancies).

Targets in the above lists not already well covered by the designed screening set were identified. For each target i identified, available tool compounds were found and included in the screening set S if, upon inclusion, $p_{i} (S) < α$ .

Assay

The assay employed a phenotypic, functional readout for the identification of inhibitors of LPS-induced TNF-α production using a surrogate monocytic cell line, THP-1 (TIB-202; ATCC, Manassas, VA). TNF-α production was measured by fluorescent-linked immunosorbant assay (FLISA).

The FLISA was constructed using goat anti–mouse (GAM) beads coated with a mouse anti–human TNF-α antibody. TNF-α was captured by the antibody and detected using a biotinylated anti–human TNF-α antibody coupled to streptavidin Cy5. Fluorescence was measured using the Applied Biosystems 8200 Cellular Detection System, which employs the Fluorometric Microvolume Assay Technology (FMAT; Applied Biosystems, Life Technologies Corp., Paisley, UK). All materials were purchased from Sigma (Sigma-Aldrich, St Louis, MO) unless otherwise stated.

Cell culture and preparation

THP-1 cells were maintained in THP-1 cell media (Roswell Park Memorial Institute media [RPMI-1640, R0883] containing 10% [v/v] fetal calf serum [FCS; CR1207] and 2 mM L-glutamine [G7513]) at a density between 2 × 10⁵ and 1 × 10⁶ cells/mL under typical cell culture conditions (37—°C, 5% CO₂, 95% humidity). For the assay, cells were centrifuged (300 g for 5 min), the supernatant discarded, and the cell pellet resuspended in assay media (Dulbecco’s modified Eagle’s medium [DMEM; D5671] containing 0.1% [v/v] FCS, 2 mM L-glutamine, and 100 U/mL penicillin/0.1 mg/mL streptomycin [P4333]). This was repeated, finally resuspending the cells at 5 × 10⁵ cells/mL in assay media in the presence of 3 nM phorbol 12-myristate 13-acetate (PMA; P1585, 100-µM stock in 100% [w/v] DMSO [Fisher Scientific, Loughborough UK SP/1044/99SS]).

Compound preparation

Compounds were supplied solubilized in DMSO at a stock concentration of 10 mM and were diluted 100-fold in assay media to give a 10× final assay concentration in 1% (v/v) DMSO.

Preparation of anti–TNF-α monoclonal antibody-coated GAM beads

GAM beads (Saxon, Spherotech, Lake Forest, IL; MPFc-60-5; 6 mm, 0.5% [w/v]) were washed twice with bead-coating buffer (phosphate-buffered saline [PBS; D8537] containing 0.01% [w/v] sodium azide [S-2002]). Then, 40 µL stock anti-TNF monoclonal antibody (R&D Systems, Minneapolis, MN; MAB610; 500 µg/mL) was added to 1 mL beads and incubated, mixing at 4—°C overnight. Beads were washed twice, centrifuged at 13,000 rpm for 1 min, and resuspended in bead-coating buffer, finally adding 2.5 mL to give a final 0.2% (w/v) beads concentration.

Preparation of FLISA bead mix (detection mix)

The detection mix consisted of PBS, 0.5% (v/v) FCS, 2% (v/v) penicillin/streptomycin, 0.002% (w/v) anti–TNF-α mAb-coated GAM beads, 60 ng/mL Biotin anti-TNF antibody (BAF210; R&D Systems), and 1 µg/mL Streptadvidin-Cy5 label (43-8316; Cambridge Biosciences, Cambridge, UK).

Preparation of recombinant human TNF-α (rhTNF-α) standard curve

A 500 µg/mL stock rhTNF-α (210-TA-010; R&D Systems) in PBS was used to produce a 10-point, double-diluted standard curve in assay media (8 ng/mL to 7.8 pg/mL rhTNF-α). Assay media was used as a zero standard control.

Assay build

In total, 40 µL of cells (20,000 cells/well) was added to black 384-well clear-bottom plates (assay plate; 781091; Greiner, Stonehouse, UK). Plates were incubated under typical cell culture conditions overnight. Then, 5 µL of each compound was transferred to the assay plates and incubated as above for 30 min. Next, 5 µL of 10 µg/mL LPS (Escherichia coli 0111:B4; L4130, diluted in assay media from a 3-mg/mL stock in RPMI-1640) was added to the assay wells, and 5 µL of assay media was added to unstimulated control wells. Plates were incubated as above for 4 h. Then, 25 µL of supernatant was removed and transferred to black clear-bottom 384-well plates (FMAT assay plate; Greiner 781091).

TNF-α determination by FLISA

In total, 25 µL of detection mix was added per well, to the FMAT assay plates containing the supernatants. Plates were incubated at ambient temperature overnight and read on the FMAT (mean fluorescence count, factory settings for the photomultiplier tubes and event cutoff = 20).

Data analysis

Data conversion of fluorescence units (FLU) readings to TNF-α concentrations was based on the standard curve and used for subsequent calculation of % inhibition. The effect of each compound was calculated as

% inhibition = 100 - [(SR-BR) \div (CR-BR) \times 100],

where control reading (CR) denotes LPS-stimulated cells and compound solvent, sample reading (SR) denotes LPS-stimulated cells and compound, and background reading (BR) denotes cells, no LPS, and compound solvent. Active compounds were considered those showing % inhibition ≥40% at the 10-µM test concentration. This cutoff was applied as it was the point at which it was considered that compounds showed activity beyond the noise of the assay, being approximately 1.5 interquartile ranges from the third quartile, and hence defined as an outlier. Compounds were routinely tested in duplicate in a single experiment and the maximum % inhibition achieved reported.

A selection of actives was taken into concentration-response confirmation; this was performed under the conditions described above for the single-concentration screening, with a maximum concentration of 30 µM.

Enrichment Analysis

Screening results were analyzed at the target, pathway, and biological process levels. Let S⁺ denote the subset of compounds found active in the screen and $S^{-} \equiv S \ S^{+}$ the remaining found inactive.

To assess the null hypothesis of no association between historical bioactivity data on target i and the screening results, Fisher’s exact test was applied to Table 1 .

Table 1.

Contingency Table.

		Historical Bioactivity
		Active	Inactive
Screen	Active	$α_{i}^{+} (S^{+})$	$α_{i}^{-} (S^{+})$
Screen	Inactive	$α_{i}^{+} (S^{-})$	$α_{i}^{-} (S^{-})$

To correct for multiple hypothesis testing, a Bonferroni correction was applied and significance was called at the $α / | A^{+} |$ ; $α = 0.05$ level.

Two types of pathway enrichment were assessed.

First, all canonical maps from GeneGO MetaBase were considered. For each map, all compounds active on any target belonging to this map were considered active, and compounds inactive on all targets from this map were considered inactive. The contingency table can thus be populated and the null of independence assessed as above for each pathway.

Second, a subnetwork enrichment considering the whole interactome available in GeneGO MetaBase was assessed. All targets were clustered using hierarchical clustering (Ward’s method²³) with a similarity measure, which was the number of common neighbors in the graph weighted by the inverse logarithm of their degrees.²⁴ The null of no association between a cluster and the screening results was assessed as above with the contingency table formed as the sum of contingency tables of all targets in the cluster.

Similarly, enrichment of GO biological processes²⁵ was determined by testing GO terms (including ancestors) for all targets in the design under the null of independence with the targets found to be enriched.

Results

Selection of the Compound Set for Screening

A library of approximately 650,000 compounds was used as a starting point to select the compound set for screening. For those 650,000 compounds, a total of 3,050,580 compound-target pairs were extracted from CHEMBL, GoStar, BioPrint, and in-house activity data across 3060 proteins.

Using the optimization method described in the Materials and Methods, a subset of 10,000 compounds maximizing the number of proteins for which enough data were available for the statistical analysis was selected using α = 10^–6. Using this value of α, chosen conservatively to allow some discrepancy with historical bioactivity data (see Supplemental Material 2), 648 targets had the potential of being identified as enriched. Relaxing the significance threshold to 5% with a Bonferroni adjustment over 3060 proteins increased the number of targets to 665.

To enrich this screening set with compounds active on targets potentially affecting TNF-α production, the pathway databases IPA and GeneGO were queried. Sixty-nine genes whose complete or partial deletion in mouse models leads to a modification of TNF-α levels were extracted from IPA, and 180 proteins upstream of TNF-α in canonical linear pathways were extracted from GeneGO MetaBase (from 4850 linear pathways containing TNF-α). The two lists of targets were complementary to each other, with only nine targets in both. Of these 240 targets, 178 were not in the 665 proteins that can be identified as enriched. An analysis of the historical bioactivity data available (see Materials and Methods) identified a few more targets where the coverage could be improved, and in total, 165 compounds were added to the screening set. This increased the number of targets that could potentially be identified as enriched to 672 and provided a final screening set of 10,165 compounds.

Screening and Enrichment Analysis

The assay performed as expected and within the performance expectations of a robust assay,²⁶ with a Z′ of 0.5 or greater and at least a 10-fold assay window. The detection range for rhTNF-α was approximately 60 to 2000 pg/mL, with a midpoint of 1 ng/mL. In total, 1045 of the 10,165 compounds selected were classified as active by the criteria described (see Supplemental Material 3 for primary screen distribution). Of the 1045 actives, 168 were selected for confirmation testing in concentration-response format, 92% were confirmed with an IC₅₀ of <30 µM, and 57% gave an IC₅₀ of <10 µM.

Integrating both the TNF-α production screen results and the historical bioactivity data from CHEMBL, GoStar, BioPrint, and AstraZeneca, an enrichment analysis was run across all targets for which activity data were available (2750 proteins for the 10,165 compounds selected). Figure 1 presents the p values and odds ratios obtained for each target. Sixty-four protein targets were highlighted as enriched (targets above the significance threshold—indicated by the green line—in Figure 1 and Table 2 ). Half were kinases (32 proteins). The five top-ranking targets were MET, GSK3B, JAK3, PAK1, and PTK2.

Figure 1.

Target enrichment results for tumor necrosis factor (TNF)–α production. Each dot represents a protein target. The green horizontal line is the Bonferroni threshold for the enrichment analysis. All targets above the green line are statistically enriched with compounds active on them and on TNF-α production. The higher and the further right a target is, the more it is associated with TNF-α production. The size of the dots corresponds to the number of compounds active on the target and on TNF-α production. The most significantly enriched targets have been tagged with their short names.

Table 2.

Targets Enriched in Compounds Both Active on Them and on TNF-α Production.

Rank	Target EGID	Target Code	Target Name	Enrichment p Value
1	4233	MET	Met proto-oncogene (hepatocyte growth factor receptor)	3.61E-34
2	2932	GSK3B	Glycogen synthase kinase 3β	9.65E-27
3	3718	JAK3	Janus kinase 3	7.74E-25
4	5058	PAK1	p21 protein (Cdc42/Rac)–activated kinase 1	5.26E-23
5	5747	PTK2	PTK2 protein tyrosine kinase 2	2.75E-22
6	3480	IGF1R	Insulin-like growth factor 1 receptor	6.34E-19
7	3551	IKBKB	Inhibitor of κ light polypeptide gene enhancer in B-cells, kinase β	6.28E-18
8	1230	CCR1	Chemokine (C-C motif) receptor 1	1.21E-16
9	729230	CCR2	Chemokine (C-C motif) receptor 2	9.03E-16
10	10298	PAK4	p21 protein (Cdc42/Rac)–activated kinase 4	7.78E-15
11	4986	OPRK1	Opioid receptor, κ1	1.02E-13
12	9891	NUAK1	NUAK family, SNF1-like kinase, 1	4.84E-13
13	5170	PDPK1	3-Phosphoinositide dependent protein kinase-1	8.59E-13
14	3643	INSR	Insulin receptor	7.71E-12
15	3717	JAK2	Janus kinase 2	1.22E-11
16	7272	TTK	TTK protein kinase	1.63E-11
17	5347	PLK1	Polo-like kinase 1	2.30E-11
18	23385	NCSTN	Nicastrin	7.49E-11
19	127933	UHMK1	U2AF homology motif (UHM) kinase 1	3.08E-10
20	1147	CHUK	Conserved helix-loop-helix ubiquitous kinase	5.12E-10
21	660	BMX	BMX nonreceptor tyrosine kinase	5.84E-10
22	4751	NEK2	NIMA (never in mitosis gene a)–related kinase 2	5.85E-10
23	6714	SRC	v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian)	6.27E-10
24	2357	FPR1	Formyl peptide receptor 1	1.10E-09
25	728	C5AR1	Complement component 5a receptor 1	1.18E-09
26	9475	ROCK2	Rho-associated, coiled-coil containing protein kinase 2	1.31E-09
27	1131	CHRM3	Cholinergic receptor, muscarinic 3	1.90E-09
28	2050	EPHB4	EPH receptor B4	4.59E-09
29	56950	SMYD2	SET and MYND domain containing 2	4.92E-09
30	5602	MAPK10	Mitogen-activated protein kinase 10	5.73E-09
31	7010	TEK	TEK tyrosine kinase, endothelial	6.56E-09
32	6850	SYK	Spleen tyrosine kinase	9.45E-09
33	348	APOE	Apolipoprotein E	9.50E-09
34	2740	GLP1R	Glucagon-like peptide 1 receptor	1.00E-08
35	51135	IRAK4	Interleukin-1 receptor-associated kinase 4	5.03E-08
36	2642	GCGR	Glucagon receptor	5.60E-08
37	5604	MAP2K1	Mitogen-activated protein kinase kinase 1	6.79E-08
38	9451	EIF2AK3	Eukaryotic translation initiation factor 2α kinase 3	8.78E-08
39	1017	CDK2	Cyclin-dependent kinase 2	1.08E-07
40	207	AKT1	v-akt murine thymoma viral oncogene homolog 1	1.15E-07
41	3716	JAK1	Janus kinase 1	1.67E-07
42	5566	PRKACA	Protein kinase, cAMP-dependent, catalytic, α	2.95E-07
43	151	ADRA2B	Adrenergic, α-2B-, receptor	6.08E-07
44	1956	EGFR	Epidermal growth factor receptor	7.42E-07
45	1235	CCR6	Chemokine (C-C motif) receptor 6	7.89E-07
46	2908	NR3C1	Nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)	8.46E-07
47	5599	MAPK8	Mitogen-activated protein kinase 8	8.59E-07
48	3757	KCNH2	Potassium voltage-gated channel, subfamily H (eag-related), member 2	1.13E-06
49	8325	FZD8	Frizzled family receptor 8	1.30E-06
50	3791	KDR	Kinase insert domain receptor (a type III receptor tyrosine kinase)	1.32E-06
51	4638	MYLK	Myosin light chain kinase	2.29E-06
52	1436	CSF1R	Colony-stimulating factor 1 receptor	2.71E-06
53	558	AXL	AXL receptor tyrosine kinase	3.13E-06
54	7852	CXCR4	Chemokine (C-X-C motif) receptor 4	3.20E-06
55	6300	MAPK12	Mitogen-activated protein kinase 12	4.51E-06
56	51289	RXFP3	Relaxin/insulin-like family peptide receptor 3	4.56E-06
57	7376	NR1H2	Nuclear receptor subfamily 1, group H, member 2	5.37E-06
58	8989	TRPA1	Transient receptor potential cation channel, subfamily A, member 1	5.80E-06
59	4921	DDR2	Discoidin domain receptor tyrosine kinase 2	6.00E-06
60	3932	LCK	Lymphocyte-specific protein tyrosine kinase	9.89E-06
61	6788	STK3	Serine/threonine kinase 3	9.93E-06
62	6530	SLC6A2	Solute carrier family 6 (neurotransmitter transporter, noradrenalin), member 2	1.46E-05
63	93	ACVR2B	Activin A receptor, type IIB	1.62E-05
64	5753	PTK6	PTK6 protein tyrosine kinase 6	1.75E-05

EGID, Entrez gene identifier; TNF-α, tumor necrosis factor–α.

To capture targets for which compounds were not available or for which historical data were insufficient for the target enrichment analysis, a canonical pathway map enrichment was carried out as described in the Materials and Methods based on canonical maps from GeneGO. Of the 862 maps available, 504 maps were found enriched. This high number of enriched pathways was likely due to the high number of screened compounds that were active on various kinases and on the high number of kinases in the signaling cascades presented in the GeneGO canonical pathways. Nevertheless, among the enriched maps, 47 included TNF-α, and 83 were related to the immune response or to inflammation. Three of the five top-ranking maps (see Table 3 : “Activation of PKC via G-protein coupled receptor,” “cAMP signaling,” and “VEGF signaling via VEGFR2—generic cascades”) represented generic signal transduction cascades involving mainly the protein kinase C (PKC) family and various MAP kinases (including p38). They also included GSK3B, SRC, and PAK1, for instance, all identified during the previous enrichment analysis. Finally, they involved CREB1, NF-κB, c-Jun, and Elk-1, with all four being recorded as potential regulators of TNF-α production in GeneGO Metacore.²⁷ The two remaining top-ranking canonical maps concerned the immune response and embryonal myogenesis. They also included relevant targets for TNF production such as CREB1, the PKC family, p38, NF-κB, and PAK1 (the latter also identified in the previous target enrichment analysis).

Table 3.

Top-Ranking Canonical Pathways from the Enrichment Analyses.

GeneGO Map ID	GeneGO Map Name	Enrichment p Value
Five top-ranking canonical maps not including TNF-α
453	Signal transduction_Activation of PKC via G-protein coupled receptor	3.87E-97
3290	Stem cells_Early embryonal hypaxial myogenesis	1.67E-93
660	Signal transduction_cAMP signaling	2.91E-92
533	Development_VEGF signaling via VEGFR2—generic cascades	1.32E-80
620	Immune response_CD28 signaling	1.91E-71
Five top-ranking canonical map including TNF-α
3017	Some pathways of EMT in cancer cells	1.45E-54
5299	Hypertrophy of asthmatic airway smooth muscle cells	1.18E-48
2373	PGE2 pathways in cancer	1.43E-45
6241	IgE- and MGF-induced Lyn-mediated production of cytokines and arachidonic acid metabolites in lung mast cells in asthma	1.10E-44
6317	Regulation of mast cell survival in asthma	6.87E-43

TNF-α, tumor necrosis factor–α.

The five top-scoring canonical maps including TNF-α, respectively, ranked 17, 35, 49, 53, and 60 ( Table 3 ). Those maps contained most of the general signaling cascades and targets highlighted above. The map with GeneGO map ID 6241 (“IgE- and MGF-induced Lyn-mediated production of cytokines and arachidonic acid metabolites”), for instance, presents several paths that were affected by the compounds screened and that were observed to have an effect on TNF-α production, providing a mechanistic rationale with regard to how TNF-α and inflammation can be affected in mast cells in asthma.

The network enrichment based on interactome clustering is summarized in Table 4 . To limit the size of the networks considered for enrichment, only clusters with a merging cost less than 0.1, equivalent to approximately the first 1/10th mergers and for which agglomerations led to improved p values, were included. A total of 15 subnetworks were identified, with JAK1-3, PTPN6/11, and STAT5B being most significant.

Table 4.

Subnetworks Enriched in Compounds Both Active on Them and TNF-α Production.

EGIDs	Symbol	p Value
3716 3717 3718 5777 5781 6777	JAK1 JAK2 JAK3 PTPN6 PTPN11 STAT5B	1.90E-67
207 983 2932 5566 5595 81649 116590 852457	AKT1 CDK1 GSK3B PRKACA MAPK3 Mapk14 Mapk1	4.33E-36
25 2185 2885 5747 13649 17973 25467 281987 396442	ABL1 PTK2B GRB2 PTK2 Egfr Nck1 Irs1	1.27E-27
1147 3551	CHUK IKBKB	8.36E-26
2064 4233	ERBB2 MET	5.19E-25
5058 5062	PAK1 PAK2	3.78E-23
2321 3791	FLT1 KDR	1.10E-12
3643 16001	INSR Igf1r	2.02E-12
4149 5347 9682	MAX PLK1 KDM4A	5.17E-11
5601 5602 116554	MAPK9 MAPK10 Mapk8	3.31E-10
1445 2534 4067 7535 25155 508890	CSK FYN LYN ZAP70 Syk	4.47E-08
1111 11200	CHEK1 CHEK2	5.05E-07
4221 4297	MEN1 MLL	3.06E-06
7376 10062	NR1H2 NR1H3	9.26E-06
6790 7153 9212	AURKA TOP2A AURKB	2.42E-05

EGID, Entrez gene identifier; TNF-α, tumor necrosis factor–α.

The biological process analysis identified 71 enriched GO terms from a total of 4358 terms containing the targets in the design (see Supplemental Material 4). Protein phosphorylation (GO:0042327) was identified as most enriched, and most terms were in fact related to kinase activity. MAPK activity (GO:0000165, GO:0051403) was, however, also identified as enriched. None of the terms directly related to TNF-α reached significance.

Discussion

The LPS-stimulated TNF-α assay selected for this screen was initiated to identify compounds that could be optimized for treatment of respiratory disease as bacterial infections have been implicated in exacerbations of both asthma and chronic obstructive pulmonary disease.²⁸ The pathway by which LPS binds to TLR4 and stimulates production of proinflammatory cytokines such as TNF-α and interleukin (IL)–6 via a MyD88-dependent mechanism has been well characterized.²⁹ Several of the 64 protein targets that were identified by the enrichment analysis are known to be involved in the pathway of LPS-stimulated TNF-α production ( Table 2 and Fig. 1 ). These include IRAK4, which interacts with MyD88, CHUK, and IKBKB (IKKα and β), which are upstream of NF-κB transcription, and MAPK8 and MAPK10 (JNK1 and 3), which are involved in the activation of the transcription factor AP-1. The transcription factors NF-κB and AP-1 are known to be required for expression of inflammatory cytokines, including TNF-α. In addition, the target enrichment approach highlighted MAP2K1 (MEK1), which activates the MAP kinases ERK1 and ERK2, leading to enhanced TNF-α transcription via Egr-1 binding to the TNF-α promoter. LPS has also been shown to stimulate other pathways leading to TNF-α production, and members of these pathways, such as AKT1, which is downstream of PI3-kinase, and SYK, which can activate MAP2K1 and JNK, were also identified. Interestingly, a number of proteins that have been implicated in the negative regulation of the pathway were also identified, such as GSK3B and the JAK kinases JAK1, 2, and 3. This may be because the enrichment analysis only considered association of the target with the pathway and not the direction of effect, so that both agonists and antagonists of a target would be included. Cross-talk between pathways may also account for the inclusion of some proteins on the target list such as the focal adhesion kinase FAK (PTK2), which has been shown to interact with the MyD88 pathway, and the insulin receptor INSR and insulin-like growth factor receptor IGF1R, which activate the PI3K/Akt and ERK/MAPK pathways. Further targets such as PAK1 and the hepatocyte growth factor receptor MET are known to be activated by LPS signaling. It is perhaps surprising that p38 MAP kinase (MAPK14) was not identified as a target from the enrichment analysis as p38 is involved in the transcriptional and posttranscriptional control of TNF-α production, and p38 inhibitors block LPS-stimulated TNF-α production in human monocytes.

The information obtained from pathway map enrichment is broadly supportive of the results of the target enrichment analysis ( Table 3 and Fig. 2 ). The top-ranking canonical pathways not including TNF-α comprise multiple signaling molecules known to be involved in LPS-stimulated TNF-α production such as GSK3B, ERK, JNK, and p38 MAPK ( Table 3 ). Although the target enrichment did not highlight the cAMP signaling pathway as important, this was identified by the pathway analysis as the third ranking pathway, which is consistent with literature showing that agents that elevate cAMP suppress TNF-α production. This also raises the question of why the β2 adrenergic receptor, as a G-protein–coupled receptor (GPCR) coupled to intracellular elevation of cAMP, was not identified on the target list. A possible explanation is that in the target enrichment analysis, the statistical significance of an inhibitory effect of an agonist for a given target in the assay would be reduced by lack of effect of an antagonist for the same target. An enhancement of the approach taken in the current study would be to separate the analysis on the basis of whether compounds are agonists or antagonists of the target.

Figure 2.

Canonical pathway map enrichment results for tumor necrosis factor (TNF)–α production. (A) General view of the results: each square/star represents a pathway map. The green horizontal line is the Bonferroni threshold for the enrichment analysis. All maps above the green line are statistically enriched with compounds active on them and on TNF-α production. The higher and the further right a map is, the more it is associated with TNF-α production. The size of the squares/stars corresponds to the number of compounds both active on the map and on TNF-α production. Red squares/stars correspond to maps including TNF-α; blue squares/stars correspond to maps not including TNF-α. Stars correspond to maps including lipopolysaccharide (LPS), and squares correspond to maps not including LPS. (B) Same as A but filtered to highlight only the maps including TNF-α. (C) Same as A but filtered to highlight only the maps including LPS.

The pathways identified that include TNF-α also incorporate many signaling molecules involved in LPS-stimulated TNF-α production such as AKT, GSK3B, and ERK ( Table 3 and Fig. 2C ). Interestingly the top-ranking pathways including TNF-α have a strong linkage to respiratory disease and include pathways involved in hypertrophy of airway smooth muscle cells and mast cell mediator production and survival. The overlap between the pathways gives the potential to identify targets and compounds from the analysis of the LPS-stimulated TNF-α production assay in monocytes that might have activity against multiple cell types implicated in the asthma phenotype. A further analysis to show pathways that include LPS ( Fig. 2B ) identified pathways that are predominantly involved in the innate immune response to bacterial infection. Several of the pathways are linked to cystic fibrosis (CF), suggesting that there may be common mechanisms and targets between asthma and CF that are worth further investigation. A high proportion of the pathways involving LPS also include TNF-α (8 of 13), which is due to the strong linkage of LPS signaling to TNF-α production in addition to the similarity of the signaling pathways by which TNF-α binding to the TNF receptors and LPS signaling via TLR4/MyD88 stimulate NF-κB activation.

The network enrichment builds on the target and pathway enrichment by clustering targets into molecules with similar neighborhoods in the interactome and, thus, potentially with similar functions. The results of this analysis are also consistent with the known signaling pathways for LPS-stimulated TNF-α production and highlight the JAKs, MAP kinases, IKKs, and PAKs as families of key signaling molecules. Similarly, the biological process analysis is very strongly enriched for processes related to protein phosphorylation and regulation of protein kinase activity (see Supplemental Material 4). This relates to the dependence of the signaling pathway on kinases and the bias toward compounds with kinase inhibitory activity in the assay.

The approach described in this article has the advantage of combining historical bioactivity data for multiple compounds to strengthen the evidence that a particular pathway is involved in the biological effect under investigation. The relatively high active rate of about 10% in the screen was expected since the screening set was specifically designed to enable investigation of activity against TNF-α, not only because of including compounds with activity on targets upstream of TNF-α but, more important, because of selecting compounds with activity profiles allowing coverage for enrichment analysis of almost 700 targets and with recorded activity on 2111 proteins.

In addition, since many of the compounds in the set exhibit polypharmacology, the enrichment analysis can help to identify compounds that show efficacy through inhibition of multiple pathways. These compounds could be used as starting points for lead optimization and combined with a more traditional chemoproteomics approach to identify the interacting proteins to fully deconvolute the target(s). Understanding the mechanism of action of a compound is important in the drug discovery process to assess potential target-related side effects and selectivity to related targets, as well as facilitate rational drug design. However, there can be limitations of the chemoproteomics approaches due to low abundance of the target protein, low potency of the interaction between compound and relevant target, and identification of a large number of interacting proteins. In this situation, combining the output of the chemoproteomics with the target and pathway enrichment analysis can aid the identification of the relevant target protein(s).

The enrichment analysis reported was based on data from an assay in which compounds were screened at a single concentration and activity determined based on a fixed threshold for inhibition (40% at 10 µM). In the same way, a threshold was applied to historical bioactivity data: a compound is considered active on a target if 10 µM or less is sufficient for a minimum percentage effect of 50%. Fifty percent activity was selected for the historical data as this was considered an appropriate cutoff to apply across a range of targets to ensure the result was outside of the noise of the assays. This binarization of the data may not be suitable for all compounds or targets (e.g., for some targets, a threshold of 1 µM might make more sense). An improvement of this strategy might be to take into account compound-target affinity to perform correlation analyses linking compound effects on TNF-α and compound affinity for specific targets. This could provide a more accurate view of the targets affecting TNF-α. This could also decrease the number of false-positive targets in the enrichment analyses due to compound polypharmacology. If two targets have a similar binding site (therein a similar set of active compounds) and if one of the targets is affecting TNF-α production, the second protein will also be identified as an enriched target with the approach described herein. The use of experimentally determined potency values could help capture more subtle differences between targets and thus improve this limitation of the method. A further refinement would be to perform secondary screens such as a cytotoxicity assay and an LPS-stimulated TNF-α production assay in primary human monocytes to increase confidence in the relevance of the compound activity to the biological effect of interest.

One disadvantage of the approach described herein is that it is limited by the availability of historical data for compounds and targets. A target without enough bioactivity data cannot be identified as enriched. The amount of data available per target could be increased by including activity data from orthologs as well as from predicted target activity based on chemical structure. This might also help reduce the bias due to known “druggable” targets such as kinases and GPCRs as compounds with activities against these target classes are overrepresented in the compound collection. A phenotypic screen on a broader set of compounds that was either not subject to bioactivity-based preselection criteria or selected on the basis of chemical diversity would increase the potential to identify a compound with a completely novel mechanism of action. However, in this case, a relatively large number of compounds would need to be screened, thus demanding an assay format that is amenable to high throughput. In addition, the identified hit might be only weakly active.

As already highlighted, computational target deconvolution is a first step in the process of phenotypic drug discovery. The identified targets can be validated by siRNA techniques or sometimes pharmacologically by adding a compound that would antagonize the effect of the screening hit. In conclusion, we have described a novel screening set design and target deconvolution method using historical bioactivity data to identify targets and pathways involved in a phenotypic response. We also illustrated that these methods can provide valuable targets in the TNF-α pathway that could be progressed for further target validation.

Footnotes

Acknowledgements

The authors acknowledge Dr. John Unitt for the phenotypic screening design concept and implementation and Philip Rawlins for his support and input into the experimental methodology. The authors also thank Dr. Mike Firth and Dr. Péter Várkonyi for their support during the extraction of historical activity data. Dr. Firth also supported the extraction of linear pathways from MetaBase.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Supplementary material for this article is available on the Journal of Biomolecular Screening Web site at .

References

Swinney

D. C.

Anthony

How Were New Medicines Discovered?

Nat. Rev. Drug Discov. 2011, 10, 507–519.

Schenone

Dancik

Wagner

B. K.

. Target Identification and Mechanism of Action in Chemical Biology and Drug Discovery. Nat. Chem. Biol. 2013, 9, 232–240.

Azad

M. A.

Wright

G. D.

Determining the Mode of Action of Bioactive Compounds. Bioorg. Med. Chem. 2012, 20, 1929–1939.

Ziegler

Pries

Hedberg

. Target Identification for Small Bioactive Molecules: Finding the Needle in the Haystack. Angew. Chem. Int. Ed Engl. 2013, 52, 2744–2792.

Huang

S. M.

Mishina

Y. M.

Liu

. Tankyrase Inhibition Stabilizes Axin and Antagonizes Wnt Signalling. Nature 2009, 461, 614–620.

Murray

C. M.

Hutchinson

Bantick

J. R.

. Monocarboxylate Transporter MCT1 Is a Target for Immunosuppression. Nat. Chem. Biol. 2005, 1, 371–376.

Koutsoukas

Simms

Kirchmair

. From In Silico Target Prediction to Multi-Target Drug Design: Current Databases, Methods and Applications. J. Proteomics 2011, 74, 2554–2574.

Muchmore

S. W.

Debe

D. A.

Metz

J. T.

. Application of Belief Theory to Similarity Data Fusion for Use in Analog Searching and Lead Hopping. J. Chem. Inf. Model. 2008, 48, 941–948.

Towne

D. L.

Nicholl

E. E.

Comess

K. M.

. Development of a High-Content Screening Assay Panel to Accelerate Mechanism of Action Studies for Oncology Research. J. Biomol. Screen. 2012, 17, 1005–1017.

10.

Feng

Mitchison

T. J.

Bender

. Multi-Parameter Phenotypic Profiling: Using Cellular Effects to Characterize Small-Molecule Compounds. Nat. Rev. Drug Discov. 2009, 8, 567–578.

11.

Young

D. W.

Bender

Hoyt

. Integrating High-Content Screening and Ligand-Target Prediction to Identify Mechanism of Action. Nat. Chem. Biol. 2008, 4, 59–68.

12.

Campillos

Kuhn

Gavin

A. C.

. Drug Target Identification Using Side-Effect Similarity. Science 2008, 321, 263–266.

13.

Cheng

Wang

. Identifying Compound-Target Associations by Combining Bioactivity Profile Similarity Search and Public Databases Mining. J. Chem. Inf. Model. 2011, 51, 2440–2448.

14.

Krejsa

C. M.

Horvath

Rogalski

S. L.

. Predicting ADME Properties and Side Effects: The BioPrint Approach. Curr. Opin. Drug Discov. Dev. 2003, 6, 470–480.

15.

Jalencas

Mestres

On the Origins of Drug Polypharmacology. MedChemComm 2013, 4, 80–87.

16.

Bornot

Bauer

Brown

. Systematic Exploration of Dual-Acting Modulators from a Combined Medicinal Chemistry and Biology Perspective. J. Med. Chem. 2013, 56, 1197–1210.

17.

Hopkins

A. L.

Network Pharmacology: The Next Paradigm in Drug Discovery. Nat. Chem. Biol. 2008, 4, 682–690.

18.

Gaulton

Bellis

L. J.

Bento

A. P.

. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100–7.

19.

GOSTAR (GVK BIO Online Structure Activity Relationship Database) GVK Biosciences Private Limited, Plot No. 28 A, IDA Nacharam, Hyderabad, India, 2010. https://gostardb.com/gostar/

20.

Muresan

Petrov

Southan

. Making Every SAR Point Count: The Development of Chemistry Connect for the Large-Scale Integration of Structure and Bioactivity Data. Drug Discov. Today 2011, 16, 1019–1030.

21.

Ingenuity Systems. www.ingenuity.com

22.

Bureeva

Zvereva

Romanov

. Manual Annotation of Protein Interactions. Methods Mol. Biol. 2009, 563, 75–95.

23.

Ward

J. H. J.

Hierarchical Grouping to Optimize an Objective Function. JASA 1963, 58, 236–244.

24.

Adamic

L. A.

Adar

Friends and Neighbors on the Web. Soc. Networks 2003, 25, 211–230.

25.

Ashburner

Ball

C. A.

Blake

J. A.

. Gene Ontology: Tool for the Unification of Biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25, 25–29.

26.

Zhang

J. H.

Chung

T. D.

Oldenburg

K. R.

A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays. J. Biomol. Screen. 1999, 4, 67–73.

27.

Sato

Watanabe

Tanaka

. Regulation of the Human Tumor Necrosis Factor–Alpha Promoter by Angiotensin II and Lipopolysaccharide in Cardiac Fibroblasts: Different cis-Acting Promoter Sequences and Transcriptional Factors. J. Mol. Cell. Cardiol. 2003, 35, 1197–1205.

28.

Toews

G. B.

Impact of Bacterial Infections on Airway Diseases. Eur. Respir. Rev. 2005, 14, 62–68.

29.

Lee

M. S.

Kim

Y. J.

Signaling Pathways Downstream of Pattern-Recognition Receptors and Their Cross Talk. Annu. Rev. Biochem. 2007, 76, 447–480.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

3.12 MB

0.00 MB