Abstract
A severe drawback in the high-throughput screening (HTS) process is the unintentional (random) presence of false positives and negatives. Their rates depend, among others, on the screening process being applied and the target class. Although false positives can be sorted out in subsequent process steps, their occurrence can lead to increased project cost. More fundamentally, it is not possible to rescue false nonhits. In this article, we investigate the prediction of the primary hit rate, hit confirmation rate, and false-positive and false-negative rates. Results for approximately 2800 compounds are considered that are tested as a pilot screen ahead of the primary screening work. This pilot screen is done at several concentrations and in replicates. The rates are predicted as a function of the proposed hit threshold by having the replicates serve as each other’s confirmers, and confidence limits to the prediction are attached by means of a resampling scheme. A comparison of the rates resulting from the resampling with the primary hit rate and the confirmation rates obtained during the screening campaign shows how accurate this method is. Hence, the “optimal” compound concentration for the screen as well as the optimal hit threshold corresponding to low false rates can be determined prior to starting the subsequent screening campaign.
Keywords
Introduction
In the past decade, screening of large compound collections has gained a very high importance in the pharmaceutical industry. Today screening is an essential step in the drug discovery process. The goal is to find among many hundreds of thousands of molecules those few that show an activity against a given target. This process requires strong expertise in many scientific areas, particularly the conception and design of robust biological systems, their automation, and the subsequent data analysis.
Because measurements are subject to random errors, the true but unknown activity of compounds is masked by unwanted effects and cannot always be determined with satisfactory accuracy. Depending on the extent to which noise influences the measurements, two kinds of errors are observed. Type I errors occur where inactive compounds are identified as hits. These falsely classified hits are called false positives. Conversely classifying actually active compounds as inactives is known as type II error (false negatives).
Many studies on the field of assay quality and hit selection procedures have appeared in the past years.1-6 Probably the most frequently used criterion for assessing the quality of assays was introduced by Zhang et al. 1 On the basis of the Gaussian distribution of the positive and negative controls, the authors defined a statistical measure called Z′, which reflects the assay separation window and the variability. Sui and Wu 2 went on to establish a link between high Z′ values and the power of identifying (actually) active compounds by means of (true) hits. A hit selection method based on structural clustering techniques was described in Gagarin et al. 3 The hit confirmation rate as a quality criterion for the hit selection procedure and the assay was discussed extensively in the literature. For instance, Zhang et al. 4 showed that under some assumptions (including Gaussian distribution of the measurement errors and a constant standard deviation on the whole activity range), the lower limit for the confirmation rate is 50%.
This article describes the benefit of having access to primary hit rate and confirmed hit rate information prior to starting the primary screening process within a project.
Materials and Methods
Running a Pilot Screen
Prior to starting the primary screen of a compound library, it is a routine to conduct a pilot campaign beforehand. A structurally representative subset of the screening library is tested in replicates at one or many fixed compounds concentrations. The purpose is to determine the compound concentration appropriate for the screen and to gain information on the prospective hit rate. A high compound concentration may result in a too large number of hits for the subsequent project phases and considered a flawed experiment. The analysis of the results from the pilot screen can be expanded by taking the compound structures into consideration. A relationship between compound structure and potential issues such as aqueous solubility can be observed, quantified, and potentially rectified ahead of the screen.
The results presented in this article are based on the routine measurement of approximately 2800 representative lead-like compounds in several assays. This represents roughly 1% of the compounds we typically apply during primary screening, per project and target.
Hit Selection Strategy
Throughout this article, the assay response is normalized with respect to positive and negative controls. This yields normalized assay signal values (% inhibition or % activation) in the range between 0% (no activity) and 100% (full activity).
Besides compound, positive, and negative controls, DMSO wells (sample negative controls) are distributed uniformly on each assay plate. They undergo the same process steps as the compounds. Therefore, they can be thought of as additional controls for the assay performance. In many studies on hit selection and plate pattern correction (see, e.g., Brideau et al.
5
), it is assumed that most compounds on an assay plate are not active. Computations of location and scale parameters are conducted by means of robust statistics (median and median absolute deviation) to eliminate the effect of outliers. This assumption may fail to be valid, for instance, in the case of focused libraries on 96-well assay plates, where a high concentration of active molecules on the plate and hence a high hit rate are expected. Furthermore, there is some circular dependency. In fact, the compound results are used to compute the location and scale parameters, which are again used to classify the same compounds as active or nonactive. To overcome this problem, the hit threshold was calculated by means of the “
Definitions
In the process of hit finding, a primary screen is usually conducted, where a large library containing hundreds of thousands of compounds is tested in singlicate. A hit list is obtained after applying a hit selection procedure. The next step in the process consists of confirming these hits by testing them in replicates. Definitions that are necessary for derivation of statistical parameters are given below. Most of them are in Zhang et al. 4 or are slightly modified to fit with this context.
The definitions used here are based on the following argument: A robust assay is one that produces few false positives and at the same time enables the classification of actually inactive compounds as negatives with high confidence. Hence, the false-positive rate (calculated after the hit confirmation testing has been completed) also depends on the number of negatives among the compound set.
The definitions of false-positive and false-negative rates given above correspond to
The following formulas are derived from Table 1 .
Decomposition of the Total Number of Compounds According to the Actual Activity and the Observed Activity
After the primary screen, only
The difficulty in applying equations (1) to (4) is that not all numbers appearing in them are known. In equation (1), only
The Resampling Scheme
In many applications, one of the objectives is the estimation of unknown parameters that may be of interest. Depending on the complexity of the estimated parameter and the lack of realistic assumptions, the assessment of the statistical accuracy of the parameters can be challenging.
To address this family of questions, Efron7,8 invented the bootstrap. It is a computer-based method that was extensively studied and tested during the past three decades, taking advantage of continuous revolution in computing with the goal to assign measures of accuracy to statistical estimates. The main advantage of bootstrap methods is that they do not require knowledge of the exact statistical distribution of the parameter estimator, a topic that involves highly sophisticated mathematics in most of the cases. The idea behind resampling methods and especially the bootstrap is to generate
Within the framework of high-throughout screening (HTS), we consider a pilot screening set consisting of
Let
Define the observation matrix:
Each row
Now considering the indicator function:
A constant hit threshold is assumed for illustration purposes. Based on the whole observation matrix
We are now in the position to describe the steps of the resampling method.
Create the observation matrix
Using equation (6), compute the actual set of “confirmed hits” based on a fixed hit threshold
For each compound, select one measurement at random out of the
False positives are those primary hits (from step 3) that do not belong to the actual set of “confirmed hits” obtained in step 2. According to equation (7), the estimator of the false-positive rate is computed as the ratio of the number of false positives to the number of compounds
False negatives are the so-called missed hits. They consist of the “confirmed hits” (from step 2) that do not belong to the set of primary hits obtained in step 3. According to equation (8), the estimator of the false-negative rate is computed as the ratio of the number of false negatives to the number of compounds
For each compound, discard one measurement at random from the remaining
For a fixed hit threshold
The results from steps 3 to 7 are based on a fixed hit threshold
Results
Herein, the methodology described above (under Materials and Methods) was applied to three experimental screening data sets. These data sets were analyzed as follows.
Comparison of the predicted values with the experimental values obtained during the screen
Computation of confidence bands for the primary hit rate and the confirmation rate
Recommendation on how to select the optimal hit threshold
Recommendation on how to choose the optimal screening concentration when the compounds have been tested at more than one concentration
The analysis of the data was conducted with the statistical package S-Plus, Version 8.1.1 for Linux (TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit), a commercially available software. 10
Example 1: Protein Kinase Target
A structurally representative subset of the EVOTEC compound library (Marker Library) consisting of 2816 molecules was screened against a protein kinase target. This pilot campaign was conducted at a compound concentration of 10 µM, the objective being to assess the key statistics and, if necessary, adjust the compound concentration. The compound measurements were replicated four to five times. The pilot screen was carried out on 1536-well plates and revealed a mean Z′ and a mean signal-to-background ratio of 0.60 and 2, respectively. The distribution of the results from compound wells, sample negative control wells, and positive control wells is depicted in Figure 1 .

Kernel density estimations
9
of the activity values of compounds (solid curve), sample negative controls (dashed curve), and positive controls (dot-dashed curve) in the pilot screen. The vertical solid lines are drawn at 0% and 100%. The vertical dotted line is an example of a hit threshold at 30%. The short dashes along the
The purpose of the analysis of the collected data was to achieve a compromise between the false-positive rate, false-negative rate, and activity of the confirmed hits. This suggested computing the hit threshold by using the multiplier 5 for the standard deviation instead of 3, as demonstrated in Zhang et al. 1 To obtain the assay plate–dependent hit threshold, the sum from the mean activity of sample negative control wells and five times their standard deviation was calculated.
After the pilot screen, it was decided to run the primary screen of 260 000 compounds and the subsequent hit confirmation screen at 10 µM. Both campaigns revealed a mean Z′ of 0.62 and a mean signal-to-background ratio of 2. A primary hit rate of 1.5% and a hit confirmation rate of 65% were achieved.
The statistics obtained by applying the resampling method are depicted in Figure 2 . The primary hit rate, confirmation rate, and false-positive and false-negative rates were predicted after the completion of the pilot campaign. After the hit confirmation campaign, the huge triangle and circle (observed primary hit rate and hit confirmation rate, respectively) were added to the plot for comparison with the predicted results based on the pilot screen.

Predictions of the primary hit rate, confirmation rate, and false-positive and false-negative rates by increasing hit threshold. Confidence bands for the primary hit rate and confirmation rates are drawn around the corresponding curves. The huge triangle and the huge circle on the upper part of the figure are the primary hit rate (1.5%) and the confirmation rate (65%), respectively, obtained in the subsequent primary and confirmation screening campaigns. The multiplier 5 for the hit threshold computation was used.
Example 2: G-Protein-Coupled Receptor Target
The Marker Library consisting of 2816 molecules was screened against a G-protein-coupled receptor target (GPCR target) at a compound concentration of 10 µM. Similar to example 1, the goal was to assess the key statistics and, if necessary, adjust the compound concentration. The compound measurements were replicated four to five times.
Based on Figure 3 , the multiplier 8 for the standard deviation was chosen to compute the hit thresholds. Again, the reason was to achieve a compromise between the false-positive rate, false-negative rate, activity of the hits, and number of hits that can be processed in the subsequent hit confirmation campaign.

Kernel density estimations
9
of the activity values of compounds (solid curve), sample negative controls (dashed curve), and positive controls (dot-dashed curve) in the pilot screen. The vertical solid lines are drawn at 0% and 100%. The vertical dotted line is an example of a hit threshold at 15%. The short dashes along the
The pilot screen was conducted on 1536-well plates and revealed a mean Z′ and a mean signal-to-background ratio of 0.80 and 5.2, respectively.
After the pilot screen, the decision was made to run the primary screen of 300 000 compounds and the subsequent hit confirmation screen at 10 µM. Both campaigns showed comparable performance, translating to a mean Z′ of 0.85 and a mean signal-to-background ratio of 7. Using the multiplier 8 for the hit threshold calculation, a primary hit rate of 0.7% and a hit confirmation rate of 81% were obtained.
Figure 4 depicts the results obtained by applying the resampling method. The primary hit rate, confirmation rate, and false-positive and false-negative rates were predicted immediately after the pilot campaign. After the completion of the hit confirmation campaign, the huge triangle und circle (observed primary hit rate and hit confirmation rate, respectively) were added to the plot for comparison.

Predictions of the primary hit rate, hit confirmation rate, and false-positive and false-negative rates by increasing hit threshold based on the pilot screen results. Confidence bands for the primary hit rate and the hit confirmation rate are drawn around the corresponding curves. The huge triangle and the huge circle on the upper part of the figure are the primary hit rate (0.7%) and the confirmation rate (81%), respectively, obtained in the subsequent primary and confirmation screening campaigns. The multiplier 8 for the hit threshold computation was used. GPCR, G-protein-coupled receptor.
Example 3: Protease Target
A Marker Library consisting of 2815 molecules from the EVOTEC collection was screened against a protease target. The two compound concentrations of 10 µM and 25 µM were tested on 2080-well plates. Three to four measurements were replicated for each compound at the respective concentrations. A mean Z′ of 0.82 and a mean signal-to-background ratio of 2 were achieved. The screening results at 10 µM showed a very low primary hit rate, so the focus will be on the results at 25 µM. Figure 5 shows the distribution of the results from compound wells, sample negative control wells, and positive control wells.

Kernel density estimations
9
of the activity values of compounds (solid curve), sample negative controls (dashed curve), and positive controls (dot-dashed curve) in the pilot screen. The vertical solid lines are drawn at 0% and 100%. The vertical dotted line is an example of a hit threshold at 15%. The short dashes along the
Arguments similar to those in examples 1 and 2 suggested the choice of the multiplier 4.5 for the hit threshold. During the primary screen, 280 000 compounds from the EVOTEC collection were tested in singlicate at a concentration of 25 µM. The multiplier 4.5 for the hit threshold calculation resulted in a primary hit rate of 0.6% and a hit confirmation rate of 56%.
The outcome of the resampling method is shown in Figure 6 . Again, the huge triangle and circle (primary hit rate and hit confirmation rate, respectively) were added to the plot after the completion of the primary and hit confirmation screens.

Predictions of the primary hit rate, hit confirmation rate, and false-positive and false-negative rates by increasing hit threshold based on the pilot screen results. Confidence bands for the primary hit rate and the hit confirmation rate are drawn around the corresponding curves. The huge triangle and the huge circle on the upper part of the figure are the primary hit rate (0.6%) and the confirmation rate (56%), respectively, obtained in the subsequent primary and confirmation screening campaigns. The multiplier 4.5 for the hit threshold computation was used.
Discussion
One of the exercises in the early drug discovery process is the development of robust assays that enable a reliable identification of active compounds against a given target. Cost considerations require the minimization of false positives during the primary screen. Increasing the hit threshold is one possibility to account for the reduction of the number of false positives, but at the same time, false negatives are generated.
In this article, a method for an upfront prediction of the primary hit rate, hit confirmation rate, and false-positive and false-negative rates has been proposed and validated with real data. The resampling method uses the information generated in a pilot campaign, where a representative set of the screening library is tested in replicate. In addition to the results in examples 1 to 3, many other campaigns were analyzed. Similar results (not shown in this article) to those of examples 1 to 3 were achieved. In the examples above, the primary hit rate and hit confirmation rate obtained after the corresponding primary and confirmation campaigns are within the confidence bands. The main idea of the method presented in this article consists of capturing the data variability. In other words, there is a link between a robust assay (meaning an assay in which the agreement of the individual measurements for the compounds is very good) and the rate of false positives and negatives.
Another application of the prediction method consists of warning the assay developer in cases where the predicted confirmation rate is low. In this situation, the reasons for the low confirmation rate have to be investigated, and in the majority of cases, a further optimization of the assay toward robustness is essential. It is important to note that the predicted parameters are only useful if the assay performance and quality during the pilot campaign, the primary screen, and the hit confirmation are the same. The sporadic appearance of errors such as edge effects on assay plates may contribute to the discrepancy between the predicted and the observed parameters. This shows again how crucial it is to generate high-quality data.
It is also important to mention an ambiguity that can be caused by the number of replicates
Footnotes
Acknowledgements
We thank all members of the screening operation group at EVOTEC for running the assays on the screening devices. We are grateful to our colleagues Dr. Christian Kirchhoff, Dr. Insa Winzenborg, Dr. Alexander Böcker, Dr. Annett Müller, Dr. Mark Slack, and Dr. Dennis Wegener for helpful discussions and input to the manuscript. We also thank two anonymous referees for their valuable comments and their constructive suggestions.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
