Abstract
The genomics revolution coupled to advances in computational power, informatics and robotics is driving drug discovery programs to produce drug candidates faster. This need has resulted in advances in high throughput methods for performing organic chemistry such as combinatorial and parallel synthesis. Yet there has not been a corollary advance in the ability to collect quantitative information on reactions that can be used to produce these drug candidates. This lack of an efficient and robust analytical method has resulted in a significant chemistry bottleneck. This work outlines a set of methods that helps address this chemistry bottleneck by using analytical constructs to detect and quantify reaction outcomes. To accomplish this, an integrated experimental-cheminformatics platform has been developed which couples an experimental design system, automated high throughput parallel and combinatorial synthesis methodology, sample processing, quantitative mass spectroscopy and automated data analysis. This platform is being used to optimize single reactions and the syntheses of whole libraries of compounds, and to generate large databases on specific reaction classes.
Keywords
INTRODUCTION
A major challenge in generating a library of homologous compounds is discovering the most general set(s) of reaction conditions to produce the compounds at high yield. These conditions are identified by screening the reactivity of a few representative monomers and then applying these optimized results to the production run of a library (Figure 1). Even when solid-phase combinatorial methods are utilized, this reaction optimization still consumes a significant fraction of the time in the overall library design and production process.

Flow chart displaying the typical steps in a library synthesis.
The practical problem of library process optimization is exacerbated by the lack of convenient analytical methods for examining reactions on solid support. 1 Without a viable analytical technique, it is difficult to leverage combinatorial chemistry to produce high quality libraries. As a result, conventional solid-phase combinatorial chemistry increases the number of compounds that can be produced, but the absence of an analytical methodology prevents the generation of well-characterized and optimized libraries. These limitations have motivated many researchers in this area to move away from conventional combinatorial chemistry approaches and adopt a quality-over-quantity philosophy in library design and production. 2
In this work we describe an automated platform and process (Figure 2) for reaction identification and optimization that addresses these limitations. The platform allows for the generation of high quality libraries of potential lead compounds at high throughput and low cost. The experimentally derived information that is required to generate these synthetic methods can only be obtained cost-effectively using combinatorial and parallel approaches. Thus we have implemented a suite of technologies that allow us to encode and decode reactions and analyze reaction outcomes quantitatively (in terms of their yield) in a reasonable time and at a low cost. This suite of techniques is a solution to the quality issue described above that does not sacrifice the quantity of results produced.

Platform for the use of analytical constructs for high throughput production of chemical information. This process chart displays the various processes and steps in utilizing analytical constructs.
The underlying suite of technologies that supports this work is derived from the rapidly evolving field of combinatorial chemistry, high throughput parallel synthesis, mass spectroscopic analysis and signal processing. These technologies rely on the use of single bead analysis and analytical constructs3–8 to provide facile methods for carrying out chemistry and the automated analysis of a large number of reactions. Encoded combinatorial synthesis on isotopically labeled analytical constructs enables us to run collections of over ten thousand unique chemical reactions at a time. 3 Each construct contains a distinct isotopically labeled tag that can be ‘decoded’ by mass spectrometry, and each tag is associated with a unique supported reactant attached to the end of the analytical construct (see Figure 3).

Analytical constructs. I. Schematic diagram of an analytical construct a. polymer support, b. cleavable linker, c. MS charge leveler, d. code position e. peak split, f. supported reagent. II. The structures of a construct containing a sample code, acid cleavable linker and the supported reactant are all attached to a solid support resin. The construct is a physical link between the resin bead and the site of synthesis. The linking construct enables the encoding of a chemical library and the facile quantitation by mass spectroscopic procedures.
EXPERIMENTAL PLATFORM FOR THE GENERATION OF REACTION DATA
Analytical constructs are a class of functional materials that have several unique properties (see next section). Engineered into the construct are a sequence of covalently bound functionalities with properties that enable encoding, 3 purification, detection 6,9,10 and quantification of reactions of molecules to which the constructs are attached.7,1 Specifically, the analytical constructs provide an ability to determine the kinetics, products and yields of chemical reactions involving the attached molecules in a rapid and reliable manner using electrospray mass spectroscopy (ES-MS). The components of these analytical constructs are described in Figure 3 and Table 1. Originally, the technology of analytical constructs was developed to build encoded screening libraries for drug discovery.
Structure-Property-Function relationships for the analytical construct that is displayed in Figure 3.
Building on the work of Geysen and others, we have constructed a series of automated methods and processes that enable the high throughput use of analytical constructs in library generation, reaction screening and reaction optimization. These processes function as an integrated system that supports both high and low levels of encoding, single bead processing and sample tracking, data analysis and data integrity. A summary of the steps to utilize analytical constructs are listed below:
experimental design,
analytical construct synthesis,
running the reactions,
bead handling, 12
severing the parts of the analytical construct from the solid support (cleavage),
ES-MS analysis of the samples, and finally
spectral processing and feature extraction.
The overall process has automated steps for experimental design, construct synthesis, resin pooling, making solutions for reactions, bead handling and manipulation of analytical samples. The steps for utilizing analytical constructs in conjunction with sample processing and data analysis algorithms are shown in Figure 2.
The utilization of analytical constructs begins with an experimental design that seeks to answer a specific chemistry problem. Our experimental designs often include a combinatorial exploration of reaction parameters such as time, temperature, concentration and reagent. A full combinatorial search of reaction parameters is performed in order to rapidly determine fruitful conditions to affect desired transformations (see Example of Reaction Screening and Optimization below). The next step involves synthesis of an analytical construct with the correct properties to find a solution to the chemical problem at hand. After the constructs are synthesized as a set of codes, the master patterns for these codes must be collected for use by feature extraction software. These constructs are functionalized with starting material monomers and are then combined to generate a resin pool. The resin is then split and dried in small reactors and the constructs are subjected to the pre-determined reaction conditions specified in the experimental design.
All of the information pertaining to the resin pool and experimental design is tracked in a customized, web-delivered laboratory information management system which is integrated with a sample tracking system. The reactions are then run using parallel synthesis in concert with standard liquid handling techniques. This results in samples that contain information about the reaction that was performed. The resulting pooled constructs are washed and prepared for bead picking and cleavage. At this step of the process the multiple resin beads, each with a unique isotopic code, are separated into micro titer trays consisting of a single bead per well for spectral analysis using an automated bead handler. 12 The resin is then subjected to the cleavage reaction in small arrays of samples.
The molecular blocks of the construct, as generated by cleavage process, are dissolved in solvent and each sample is analyzed by electrospray mass spectroscopy. At this step raw mass spectral data files are created that go through a number of feature extraction steps to integrate the signals, read codes and find peak split patterns (see Signal Processing and Feature Extraction below). Finally, other data processing steps occur to validate statistical consistency of the data set, identify any errors, calculate yields and present the data to the researcher. Some of these issues are discussed in more detail below.
ANALYTICAL CONSTRUCTS
An analytical construct is a functional material that is composed of a series of molecular components that are assembled in a defined order and attached to a supported reactant. Each of these molecular components has a specific function that, when working in concert, enables rapid identification and quantification of a reaction in terms of product masses and yield. The analytical constructs consist of a reference block, a supported reactant block and two cleavable linkers.
To insure that both components of the construct will be detected by electrospray mass spectroscopy, a basic center (charge leveler) is attached to both the reference block and the reactant block. The supported reactant block MS signal is split into an unnatural isotopic pattern called a peak split for reasons of signal processing. The reference block also contains an unnatural isotopic pattern called a code. Different codes can be created by varying the molar ratios of isotopomers for each code that are attached at a single position in the construct. 3 This results in readily detectable patterns in the mass spectra (see Figure 3, III). Encoding thus can be performed not only in the amplitude (ion intensity) dimension but also in the mass (m/z) dimension of the spectra. Similarly, code sets can be readily constructed with as little as a few codes and as many as hundreds of distinct codes, depending on the experimental requirements. 3
The number of chemically distinct encoded constructs is virtually unlimited due to the large number of possible component combinations. Therefore, constructs can be engineered that have differing physical and chemical properties. One important consideration in using analytical constructs is that the construct itself must not react under the chemical conditions being explored in a given experimental design. The number of available cleavable linkers, however, allows for the construct to be customized to the synthetic challenge at hand.
After the constructs are made and prepared for use in a resin pool, supported reactants (starting material monomers) must be attached in order to examine the relevant chemistry. The supported reactants are carefully chosen as part of an overall experimental design that addresses a particular chemical problem. Typical examples of supported reactants that are routinely utilized as starting materials in reaction screens are sets of carboxylic acids, amines, aldehydes, ketones and alcohols. By pooling the supported reagents, the numbers of unique combinations of solution-phase and supported reagents that can be run in a single day in our laboratory is over 20,000.
The analytical constructs described in Figure 3 utilize ES-MS as the primary analytical method. The analytical construct can be broken down into a reference block and a supported reactant block. The fact that molar ratio of the reference block to reactant block is 1:1 enables the ability to automatically quantify the yields of reactions that are detected (See Quantitation section below). Others have used UV or visible spectroscopic methods coupled to LC-MS to allow for yield determination.6,8 Since ES-MS is both one of the most sensitive and highest throughput analytical techniques, when it is coupled with automation that effectively tunes the mass spectrometer 13 and a chemically robust analytical construct it allows for the efficient production of accurate reactivity data (See section on Quantitation below). For example, a reactor array of 200 reactors and a resin pool containing 100 supported reactants can be used to run 20K reactions in one day. With an analysis throughput of 500 samples per mass spectrometer, six mass spectrometers can analyze 3,000 of these samples in the same time frame. Therefore, by using the analytical constructs in conjunction with robotics and ES-MS, many reactions can be run and analyzed in a single day in an automated fashion.
SIGNAL PROCESSING AND FEATURE EXTRACTION
Analytical constructs produce unique digital signatures in mass spectra that are readily distinguished from each other and from contaminant peaks and noise (Figure 3). Upon cleavage the analytical construct is broken into two molecular blocks (Figure 3). The use of these two blocks can be seen in a reaction described in Figure 4. The reference block has a digital signature that is called a code (Figure 5, III) and the supported reactant block has a digital signature that is called a peak split (Figure 5, II). Encoding is used to identify each compound in the pool of supported reactants. Code reading is achieved by integrating each peak in the code region of the mass spectrum, and from these integrals constructing an experimentally generated one norm pattern. This pattern is then compared to the set of master patterns for the codes used in the resin pool. The best fit of the experimental to master pattern is determined to be the correct code.

Test reaction of an isocyanate with a supported primary alcohol. This reaction was run at two temperatures. The concentration of the isocyanate was 200 mM.

Mass spectra of the starting materials and products of the reaction of 4-nitrophenyl isocyanate with a primary alcohol. I. An overlay plot of the three spectra. Two of the spectra were generated from samples where the reaction temperature was 25 and 50°C. The control is the unreacted alcohol 3 with a mass 317 amu and the product 4 pattern at 439 amu. II. This plot displays the supported reactant block (alcohol) and its peak split pattern. The one norm pattern for this alcohol is 38%, 7%, 3%, 38%, 7%, 3%. III. This plot shows the code pattern for the reference block for all three samples. The intensity of the reference block varies because of variability in bead size.
The product peaks can be identified in a similar way, except that one does not know
a priori at what mass the product peaks will be found in the spectrum.
However, the presence of a product can be ascertained because the pattern resulting from the
peak split (See Figure 3 and
Figure 5.II) can be
calculated from the molecular formula and then searched for in the spectrum. To search for
the product patterns in the spectrum, a series of patterns are calculated from nominal
formulas based on pattern modifying elements (e.g.
12
C/
13
C and 79Br/81Br)
and the peak split. These patterns are searched for at each 1 amu increment across the whole
integrated spectrum excluding the code region of the spectrum that is reserved for the
encoding. The pattern matching is performed by utilizing a measure function
(
Figure 6 illustrates a measure function-based scan of the spectrum. Note the error function is displayed on the Y2 axis.

Feature extraction from spectra. This double y-axis plot displays the 25°C spectra from Figure 5 on the left axis and on the right axis the error function resulting from searching that spectra with the calculated isotopic pattern for the carbamate 4. The error function for the pattern (38%, 7%, 3%, 38%, 7%, 3%) drops below 100 at two positions indicating a presence of a peak-split pattern.
Generally, the measure function, which can be calculated at every mass increment in the spectrum, is a large number (>1,000). This is because the differences between the theoretical pattern and the data in the spectrum deviate largely. When this deviation is small, eqn. 1 approaches zero and one can be certain that a pattern has been found, particularly if it is repeatable across several spectra and the area of the whole pattern is a significant fraction of the total reference block area. In Figure 6 one can readily see that the starting material alcohol and the product peaks are readily identified using this method.
QUANTITATION
Mass spectroscopy is generally not viewed as an analytical tool that can be used to quantify the yield of a reaction [8]. However, by using analytical constructs that contain a charge leveler and have an internal standard engineered into the reference block, quantitative estimation of the yield of a reaction by ES-MS is now possible. 11 Therefore, the reason these constructs are described as “analytical constructs” is that they generally allow for quantitation of the yield of the reaction.
The inclusion of a charge leveler guarantees that the analyte will be charged, and thus allows one to readily detect the products of a reaction by ES-MS. Nonetheless, significant differences in ionization coefficients exist between construct-supported products. Since there exists an internal standard with a constant ionization coefficient (the reference block), one can write a series of mass balance equations that allow for yield estimation 11 (see eqn. 2 & 3). These equations state that the sum of the concentration of each product is equal to the concentration of the reference block. If two or more spectra are obtained that have varying ratios of products (see Figure 7), a series of linear equations can be constructed. Solving the equations for the unknown ionization coefficients allows for the product yields to be calculated.

Absolute quantitation using analytical constructs. By running a reaction to differing degrees of completion (by either sampling at different times or changing other condition parameters) two spectra with different levels of a and c can be generated. Using the numeric data generated, the ionization coefficients can be calculated and thus the yields can be determined.
The mass balance equations that are used to calculate the yields for a given reaction
outcome are developed from two or more sets of spectra acquired from a chemical reaction
that has been run to varying degrees of completion (Figure 7). A necessary requirement of these
spectra is that they result from differing ratios of products, thus allowing the system of
equations that are generated to be over-determined. Equation 2 states that the product of the
total area of the code (
Equation 3
is identical to equation 2 except that the index refers to another spectrum that is obtained for
the same reaction run under a different condition. By solving the system of simultaneous
equations (Equations
2 and 3), the relative ionization coefficients to the reference block for each compound
are obtained. Equation 4 shows how the yield of the products can be written as a function of the
product intensities and the ionization coefficients where
Figure 8 displays the yields of the carbamate synthesis that is described in Figure 4. Since the reactions were performed at two temperatures, it was possible to obtain these results by applying Equations 1, 2 and 3 to the spectra in Figure 5. Variations in the intensities of the products exist between mass spectrometers, indicating that the measured ionization coefficient is not only dependant on the physical properties of the analyte but on the mass spectrometer as well. To account for these differences, samples are scheduled as a set and run on the same mass spectrometer for internal comparison. Success in this process required the development of standards, automation and analysis software to internally calibrate the mass spectrometers, making the data more consistent from day to day and from instrument to instrument. 13

Quantitation of the reaction from Figure 4 on three different mass spectrometers. This figure displays a plot of the mass of the found product pattern versus the calculated yield for each found product.
EXAMPLE OF REACTION SCREENING AND OPTIMIZATION
Although general methods for making a new target compound may be intuitive to a chemist, the exact reaction conditions to make a compound efficiently are generally unknown and require iterative experimentation to develop suitable methods. Analytical constructs can be used to rapidly survey reaction conditions (i.e. parameters such as solvent, reagent, temperature, time, stoichiometry and order of addition) and determine if a particular method results is a desired reaction outcome.
We now provide an example of how the platform was applied to rapidly optimize a Suzuki reaction (Figure 9). The experimental conditions included two solvents (DMF, DME), two times (2 and 18 h), and two catalyst systems at two different stoichiometries resulting in a total of 16 (2 × 2 × 2 × 2) unique reaction condition combinations. Figure 10 is a reaction map that graphically displays the results of this study by mapping where the reaction condition effectively generates the target product and where it does not. The results of the experiment are categorized according to a reaction outcome classification system displayed on the Y2 axis.

Experimental design for the combinatorially-constructed reaction conditions for a Suzuki coupling reaction screen.

Reaction map of the Suzuki reaction screen. This is a multidimensional representation of the collected data for the experimental design described in Figure 9. The small dots represent reaction outcomes for a reaction time of 2 hours and the large dots represent 18 hours. The left hand y-axis displays the four stoichiometries described in Figure 9. Based on the spectra collected, the outcome for each reaction condition was grouped into a ranked category describing how well the reaction worked. These categories are displayed on the right hand y-axis: 1. target product (TP) 7 only detected, 2. TP and starting material (SM) detected, 3. TP and side product (SP) detected, 5. SP and SM detected. There are other reaction outcome categories we have defined but were not observed in this study.
We have automated the generation of reactivity maps of this type from the spectral data by defining the target products and starting material masses of the reactions and applying the pattern recognition and statistical analysis software systems. The total time to generate this map of 18 data points was three days. In other examples, hundreds of reaction conditions have been surveyed and analyzed in a 3 to 5 day period. 13 One can readily see that in this experiment that Pd(OAc)2 in DMF is the best condition for this reaction to produce 7. The high density of data points in Category 1 [target product (TP) only] strongly indicates that this chemistry is robust, i.e. that the reaction outcome is not dependent on stoichiometry.
The key features of reactivity maps generated with this method are:
they are generated with one method in one lab,
enable a chemist to effectively view a large number of data,
can be used to optimize the production of a library when combined with encoding and
by capturing all the products and side products, unexpected and unreported chemistry maybe uncovered.
These maps can be used as a risk assessment tool to rapidly survey a chemistry to determine if it is likely to produce the desired product. If a search of a sufficiently diverse but reasonable set of conditions shows no target product then one can generally assume that time could be better spent on other synthetic approaches.
CONCLUSION
In this work we describe an automated platform for rapid reaction identification and optimization that addresses the analysis bottlenecks of traditional high throughput parallel synthesis and combinatorial chemistry. The use of analytical constructs in consort with the overall experimental platform allows for the automated design, experimentation and data analysis of chemical reactions. The platform can be applied to solving specific synthesis challenges for single compounds or library synthesis. A unique feature of the platform is that it enables the collection of information on the products, side products and yields of all of the chemical reactions as well as cases where starting materials are untransformed by a chemical condition. Thus, in addition to solving specific synthesis challenges, capturing information of this type makes it possible to build databases on classes of chemical reactions.
MATERIALS AND METHODS
Materials. The resin used in these studies was 200 μm in diameter aminomethyl polystyrene resin (loading 1.2 mmol/g AMNH2) and was purchased from RappPolymere, the anhydrous solvents N,N-dimethylformamide (DMF), tetrahydrofuran (THF), and ethylene glycol dimethylether (DME) were purchased from Aldrich Chemical in Sure/Seal™ bottles and used without further purification. Triethyl silane (TES) and trifluoroacetic acid (TFA) were used freshly distilled. Picopure water was degassed with nitrogen for 20 minutes inside a modified Schlenck flask and then taken into the glove box. 3-Bromopyrimidine, tris(dibenzylideneacetone) dipalladium(0), palladium acetate, tri-o-tolylphosphine and anhydrous potassium carbonate were also purchased from Aldrich Chemical and used directly. Reactions utilizing analytical constructs were shaken in J-Chem heating blocks on Lab-Line shaker tables at 200 RPM. Synthesis of the Rink-Rink construct, compounds 3 and 5, will be reported elsewhere.
General Procedures. Reactions were performed in an Innovative Technology glove box using Kimble 1mL glass vials with polypropylene caps lined with PTFE septa. The vials were silanized using a 10% solution of trimethylsilyl chloride in dichloromethane by placing 1 mL of solution into the vials. This solution was decanted 5 min later and the vials were dried in an oven. The Rink-Rink analytical constructs described herein were prepared on an ACT-357 automated peptide synthesizer (Advanced Chemtech, Louisville KY). Bead picking was performed using an automated device (Cartesian Technologies, Irvine, CA; www.cartesiantech.com). The samples were cleaved using volatile reagents and then evaporated in a vacuum centrifuge (Genevac, Ipswitch UK, www.genevac.co.uk). Liquid handling was performed in an array format using a 96-well liquid handler (Hydra, Robbins). Cleavage of the Rink-Rink construct is accomplished by incubating the bead in 40 μL TES/DCM (10/90 v/v) for 15 min followed by the addition of 40 μL of TFA. The samples were concentrated by evaporation in a Genevac vacuum centrifuge with heating. The compounds resulting from the cleavage were then dissolved in solvent suitable for positive mode ES-MS (MeOH/H20/AcOH 80/20/1 v/v/v). The samples (V = 50 μL) are then injected into a single quad FIA-MS (Sciex, 150EX) system equipped with a Gilson autosampler at 40 μL/min. The source parameters were: nebulizer gas flow of 3, curtain gas flow of 10, ion spray voltage 5300 V. The turbo ion spray settings were as follows: T = 200 °C, N flow rate 5 L/min. The compound parameters (these may deviate slightly from instrument to instrument) were as follows: declustering potential 20 V, focusing potential 85 V, and entrance potential −4 V. The detector parameters were set at the following values: deflector voltage −400 V, and CEM voltage 2 kV. The step size for each run was 0.106 amu, with a dwell time of 1 ms and a scanned mass range of 100–800 amu. The ion energy and resolution offsets for each instrument was calibrated to a custom standard using a tuning algorithm. 13
Acknowledgement
We are indebted to Dr. James Nelson, Dr. Frank Schoenen, Dr. David Wagner and the other scientists at the GlaxoSmithKline, Inc. Diversity Sciences Department, RTP, NC for their contributions to the science and technology surrounding analytical constructs and mass encoding. The authors thank Ron Bonner and others at ABI-Sciex for helpful discussions regarding mass spectroscopy. We also thank NIST ATP for financial support (200-00-4189A).
