Abstract

Recently we have developed new technologies to synthesize and assay biological molecules in a highly miniaturized format. The method uses light to direct the combinatorial chemical synthesis of biopolymers on a solid support. The identity and location of each bio-polymer is known, and its interaction with a molecular binding agent can be measured. These miniature biological arrays, or chips, can then be used for a variety of multiplexed biochemical assays, including epitope mapping, the development of chemical analogues, genetic diagnostics and nucleic acid sequencing analysis.
In order to fully implement this technology, two key methodologies were developed (Fodor et al, Science 251:767, 1991; Pease et al PNAS 91:5022, 1994; Fodor et al, Nature 364:555, 1993). The first, light-directed combinatorial chemical synthesis, enables the synthesis of tens to hundreds of thousands of compounds in precise locations on the chip. The second technology, laser confocal fluorescence scanning, facilitates the measurement of molecular binding events at the individual sites on the array. The combination of these two technologies forms the core of new instrumentation for a miniaturized, multiplexed assay format that can be utilized for research, diagnostics and bio-analytical studies.
DNA Chips
An exciting general application to this technology is in nucleic acid sequence analysis. An array of oligonucleotides complementary to subsequences of a target sequence can be used to identify a target sequence, measure its amount or relative expression level, or detect differences between the target and a reference sequence. Many different arrays can be designed for these purposes, and the applications appear to be only limited by imagination. The system consists of chips, a hybridization station to control hybridization, and a reader and software to access the chip data. Specific chip products for expression analysis, and gene re-sequencing are already on the market. Two versions of commercial readers are available: a first-generation system from Molecular Dynamics as well as a recently released high-performance system from Hewlett-Packard. Chip production is now in a scaleable format. We are now producing ∼5,000 to 10,000 chips per month, and we are planning for a large increase in production in the near future.
Gene Expression
To fully understand gene expression, gene function, and the subtleties of regulation of the ∼100,000 genes in the Human Genome, the quantitative levels of expressed genes under various conditions must be assayed. In addition, if quantitative “snapshots” of gene expression can be captured, the dynamics of cellular pathways can then be deciphered. Recently, Lockhart et al (Nature Biotechnol. 14:1675, 1996), published methods for the quantitative parallel measurement of cellular messenger RNA for genes encoded on the chip solely from primary sequence data. RNAs present at a frequency of 1:300,000 were unambiguously detected with a quantitative assay spanning three to four orders of magnitude in concentration. Currently, Lockhart and group have developed chips containing the complete open reading frames from the yeast genome, a series of “custom” chips with hundreds to thousands of full-length genes or fragments from various databases as well as “standard” chips containing more than 6,500 genes. An expression chip with more than 50,000 expressed sequence tags is currently in development.
Large-scale Sequence Analysis
Understanding the relationship between genotype and phenotype is a critical technical bottleneck in modern genetics. For example, consider examining 50 kilobases (kb) of coding sequence for 1000 individuals. The gene sequences are known, but the prevalence, location, and identity of polymorphisms are not. The methods of conventional gel-based sequencing that are so effective in the initial gene sequencing of the Human Genome are not efficient for this task. Comparative gel-based sequencing is indistinguishable from a de novo sequencing reaction, and so the de novo sequencing reaction must be carried out for all 50 kb over the 1000 individuals, or roughly 50 Mb of sequence.
Recently we have shown how variations from a baseline sequence in the entire human mitochondrial genome can be detected with high accuracy in a single hybridization experiment (Chee et al, Science 274:610, 1996). A total of ∼135,000 oligonucleotide probes were used to check the sequence of ∼33 kb (forward and reverse strands) of the mitochondrial genome in one reaction. In addition, 179 of the 180 polymorphisms present in control samples were correctly detected. Two color comparative sequence analysis experiments were performed that demonstrated how mutations or polymorphisms could be detected on a very large scale, making it now possible to use the technology for large-scale polymorphism screening efforts. At the current state of development, 1.28-cm × 1.28-cm chips can contain enough probes to scan anywhere from 32 kb to more than several hundred kilobases of sequence, depending on the specific chip design and accuracy requirements of the screen. Put in the context of the previously posed experiment, 1000 chips each containing 50 kb could easily and quickly perform the comparative sequence analysis.
Genotyping
Designing arrays to detect specific allelic variation is relatively straightforward. In addition to using chip designs appropriate to scan for variations (as in the polymorphism application), blocks of probes can be dedicated to the specific detection of known allelic variation. Cronin et al (Hum. Mutat. 7:244, 1996) have designed a chip-based assay to detect multiple mutations in the CFTR gene, Kozal et al (Nature Med. 2:753, 1996) for targeted HIV, Hacia et al (Nature Genet. 14:441, 1996) for the BRCA1 gene, and a number of new designs are in development or on sale for examination of p53, cytochrome p450, and for microbial identification and antibiotic resistance. The amount of data coded on the array is limited only by the number of probes used per data point, the available synthesis area, and the synthesis resolution.
Recently, in collaboration with Lander's group at the Whitehead Institute (Cambridge, MA), Lipshutz, Fan, and co-workers are developing a single nucleotide polymorphism (SNP) mapping chip (Lander, Science 274:536, 1996). The immediate objective is to identify the common polymorphisms (those of ∼20 to 50% frequency) contained within the mapped sequence tag site collection at the Whitehead Institute. These then form the basis set of biallelic markers that can be amplified from genomic DNA and applied to a probe array. Similar to the design used to detect allelic variation in the CFTR gene, blocks of probes are dedicated to each polymorphic form of the marker. This allows a straightforward detection of whether the sample is homozygous or heterozygous for each marker. These experiments offer enormous savings in time and labor, compared to standard gel-based microsatellite methods. Currently, prototype mapping chips containing ∼500 markers are being produced, with plans to expand to a 2000-marker chip by the end of the year. These chips will be used for a number of applications, including linkage, association, and loss of hetereozygosity measurements.
As the structure of the genome is elucidated, the chip technology will allow enormous quantities of genetic information to be stored on and read from the surface of a single chip.
