Abstract
In agriculture, the associations between genes and resulting traits revealed by high throughput approaches such as transcription profiling could be used to select more environmentally friendly chemicals for plant protection and to develop plants with increased grain yields and better nutrition value, with more resistance to diseases and tolerance to abiotic stress. However, one of the major challenges to apply such approaches is the limited genomic information for most of the very diversified crop species. We developed multiple strategies and platform technologies to address this issue. Here we report our improvements of these technologies towards large-scale transcription profiling and their applications in agricultural gene discovery.
INTRODUCTION
Genomics has changed the way biologists conduct their research. Over the past few years, whole genome sequence information from various organisms has become available. These organisms include not only model systems, but also economically important species, such as crops and pathogens of humans and plants. With the aid of genomics, the focus of the post-genome era will be to use systematic approaches to accelerate gene discovery by associating phenotypes with gene sequence and expression information in a high throughput manner. Microarrays are one of the most widely used technologies for parallel gene expression analysis, because of the coverage of the genes to be monitored and the throughput of data generation (Fig. 1). The GeneChip high-density oligonucleotide probe arrays (Lipshultz et al., 1999) and custom spotted cDNA microarrays (Duggan et al., 1999) are the two most commonly used platforms for gene expression analysis. The GeneChip microarrays are extensively used for large-scale genome profiling because of their reproducibility and accuracy, and medium-throughput sample processing potential (Zhu and Wang et al., 2000). On the other hand, cDNA microarrays are frequently used for more targeted expression monitoring and other custom applications, because of their flexibility and low cost. In this essay, the improvements of these two technology platforms, and their applications in large-scale transcription profiling for agricultural gene discovery will be illustrated.

A diagram depicting the throughput and gene coverage of different gene expression analysis technologies.
GENE DISCOVERY THROUGH TRANSCRIPTION PROFILING OF PLANT MODEL SYSTEMS
Using GeneChip technology, the transcriptome of plant model systems can be profiled on a large scale. By pair-wise comparison of samples, profiling data suggest a possible association between a specific trait and genes whose expression changes in that particular biological process. In addition, data mining across different experiments in a database helps to reveal gene expression regulatory networks, and assigns potential functions to those genes with unknown functions. Thus they provide numerous targets for plant improvements through traditional and molecular techniques.
In agricultural genomics, model systems are routinely used for gene discovery studies. The two model plants most widely used for this purpose are Arabidopsis and rice. Arabodopsis is a dicotyledonous weed with a very small genome size, short life span and is very amenable to genetic manipulations; therefore it has been extensively used in molecular genetics research. The recently completed genome map provided the first insight into the plant genome organization and its regulation (Arabidopsis Genome Initiative, 2001). Similarly, rice contains a relatively small genome, with high sequence similarity and synteny to other monocotyledonous plants, especially cereal crops. The recently completed genome sequence project not only provides the opportunity to explore the gene functions in this important food crop, but also to apply the discovered genes directly in other cereal species such as wheat, maize, and barley (Davenport, 2001).
DESIGN OF HIGH CAPACITY GENECHIP GENOME MICROARRAYS
Taking advantage of the available genome information, two high-density GeneChip oligonucleotide probe arrays have been designed. The Arabidopsis genome array contains ∼160,000 perfect and mismatch probe pairs for 8300 genes, representing one-third of the genome (Zhu and Wang, 2000). Each probe is a 25mer oligonucleotide located in a 24 μm 2 area. This enables all of the probes to be arrayed in a 1.28 cm 2 area. This array has been used to identify circadian regulated plant genes (Harmer et al., 2000); to study global transcription pattern and identify constitutive and organ specific promoters (Zhu et al., 2001); and to dissect the signaling pathways of the photoreceptor phytochrome A (Tepperman et al., 2001).
Increased coverage of the genes per array not only makes whole genome expression analysis possible, but also reduces the amount of sample required to conduct such microarray experiments, and cost and labor associated with the experiments. Currently, genome expression analysis requires a set of arrays to cover a genome, especially for those organisms with a more complex genome. In order to increase the capacity of the GeneChip array, we conducted a simulation study to compare results with or without mismatch probes in the probe set. Because similar results were obtained, based on this study, modifications were applied to the design of the rice genome array. In the rice GeneChip array, the mismatch probes were eliminated, and the feature size of each probe was reduced from 24 to 20 μm 2 . These modifications allow probe sets for approximately 24,000 rice genes (approximately half of the rice genome) to be located in the same array area (Table 1).
Technical comparision of the arabidopsis and rice genome arrays. Note Rice genome array consisted of only perfect match probes in the probe sets.
The performance of the microarray directly affects the confidence level of results for global patterns and individual genes. To ensure the consistent performance of these GeneChip arrays, sensitivity, specificity and dynamic range of the array were characterized (Table 1). In addition, the quality of the GeneChip arrays was examined by reproducibility studies. By hybridizing the same biological samples prepared in parallel during the labeling to different arrays, a high reproducibility was demonstrated (Fig. 2A). The false positives, that is the percentage of genes significantly changed per total number of genes of the array, ranged from 0.2–0.5% based on the studies conducted using over 30 array pairs (Table 1, Zhu and Wang, 2000).

Scatter plots showing the reproducibility of the GeneChip genome array. A. Data from the rice genome array hybridized by samples prepared in parallel. B. Data from Arabidopsis genome array hybridized by samples prepared by standard single tube reaction and in 96-well format. Note the tight correlation between the duplicate data sets.
Microarray data were also validated using other gene expression analysis methods such as quantitative RT-PCR and Northern analysis. In most cases, the microarray measurements correlated well with the results from other gene expression analysis methods (Fig. 3).

Correlation of data from GeneChip experiments and Northern blot analysis. A kinase gene in wt and mt background in 9 different conditions were monitored in parallel. Data from GeneChip experiments were plotted and relative expression levels were indicated in the upper panel. Corresponding Northern results were indicated in the middle panel. A constitutively expressed gene was used as a control in the Northern analysis, as shown in the lower panel.
IMPROVEMENTS IN SAMPLE QUALITY AND SAMPLE PREPARATION THROUGHPUT
The quality of the microarray data is largely determined by the quality of the original RNA samples. As a centralized facility, controlling sample quality from different laboratories becomes one of the biggest challenges. To address this issue, biological samples are subjected to vigorous quality controls. It is known that different RNA extraction protocols can yield different sample quality and therefore can affect the microarray results (Tapperman et al., 2001). In order to ensure the quality of the data, a standardized sample preparation and shipment procedure was developed and was recommended to our worldwide-distributed collaborators. RNA samples received were monitored by both spectrophotometer assay and electrophoresis using either conventional gel electrophoresis or a BioAnalyzer 2000 (Agilent Technologies). Additional quality assay steps were implemented to monitor other steps, from setting up biochemical reactions to data archiving. Furthermore, detailed sample information was collected through a web-based interface and archived along with the expression data.
While microarrays have proven highly useful for profiling studies, their throughput is relatively low, due to the lengthy labeling and amplification procedures. In order to increase the sample preparation throughput, a parallel sample preparation method was developed. Using this method, cDNA and cRNA synthesis and purification can be conducted in a 96 well format, with a maximum capacity of processing 192 samples or more at a time, therefore dramatically improved the throughput of the sample preparation. The number of significant false positives at two-fold level is 0.25%, similar to those obtained using a standard preparation method (Fig. 2B). The average correlation coefficient between duplicates prepared by this method is 98.2%, based on the results from six pairs of duplicates. Because the whole process remains in the 96 well plates, it has great potential for automation using automated liquid handling robots.
GENE DISCOVERY THROUGH TRANSCRIPTION PROFILING OF CROPS
The main challenge in applying microarray technology to agricultural discovery is the limited available genomic information, especially for non-model crops. Three strategies were used in our research to overcome these barriers.
To identify conserved genes from model systems:
Many genes that are essential or important to plants are conserved, therefore, model systems can be used to identify such genes. By sequence similarity search, putative orthologs of these genes can be identified and isolated in economically important crops. Such an approach was used to identify genes controlling bolting in many vegetable crops, such as sugarbeets and Brassicas (Provart et al., unpublished).
To develop heterologous microarrays for closely related non-model systems:
The microarray is a hybridization-based technology. One can utilize a microarray developed for a model system for closely related species by cross-species hybridization. The balance of the specific and non-specific hybridization determines the feasibility of such a heterologous system. The hybridization efficiency, which is used to indicate the abundance level of the hybridized transcripts, is determined by sequence similarity and abundance of the hybridizing targets. To eliminate the noise contributed by the sequence variations, genomic DNA can be labeled (Winzeler et al., 1998) and applied to an array to identify useful probes, which commonly hybridize to both model species and close relatives (Fig. 4). Using such an approach, a large portion of the probes was found to be usable to detect gene expression in maize (90%, Fig. 4), and barley (82%) within the rice GeneChip array. Alterations in gene expression of landmark genes, such as genes encoding GAMyb and a-amylases, were detected in barley aleurone cells after applying plant hormones such as gibberellin or abscisic acid. The uses of genome arrays to non-model crops broaden its applications.
Identification of usable probes for heterologous microarray experiments. Labeled genomic DNA from maize and rice were hybridized to different Rice GeneChips respectively. Images were pseudo-colored and superimposed. Features with yellow color indicate the useable probes, which can hybridize to both species. Note the probes were randomly distributed in this high-density array.
To develop custom cDNA microarray for specific crops: The spotted cDNA microarrays are useful for those non-model systems, because they do not require previous knowledge of genomic sequences. While genome expression analysis may still be difficult using this technology, due to some limitations with the source clones and in fabrication, it provides a quick way to analyze gene expression at the transcription level in parallel. In addition, for species with a high polymorphism frequency in their genome, such as in maize, the larger probes used in the spotted DNA microarray are superior to the oligonucleotide probes used in GeneChip microarray when comparing expression data among different varieties. A system with high-throughput potential was established for custom cDNA microarrays in our facility. This system includes Tecan robotic liquid handling systems for probe normalization and re-arraying; a GeneMachine OmniGrid arrayer with plate hotels and a server arm for high-speed printing; a GenomicSolution GeneTAC an automatic hybridization station for consistent hybridization and washing; a Packard ScanArray5000 with automatic slide loader for slide scanning; and BioDiscovery's ImaGene software for image-processing with batch mode capacity. With this system, a set of high-density maize genome arrays was developed and used for various projects. The set consists of two arrays; each with 20,000 elements represented 10,000 genes. The performance of these arrays is summarized in Table 2.
Technical specifies of maize cDNA microarrays
GENE DISCOVERY AND VALIDATION BY CUSTOM MICROARRAYS
Genome scale profiling provides opportunities for global, unbiased surveys of gene expression. However, validation of the targets identified will be greatly benefited by the flexibility of the spotted custom cDNA microarrays. Using cDNA microarray technology, in addition to the high-density arrays, we have also developed custom low to medium density arrays. One example is depicted in Fig. 5. This maize array consists of duplicate probes for 860 maize genes, and was designed to identify tissue or organ specific promoters. The quality of the experiments can be demonstrated by its low false positive rate of 1% at the 2-fold level.

A maize cDNA microarray designed to identify tissue specific expressed genes. Leaf preferentially expressed genes were indicated in red, while root preferentially expressed genes were indicated in green.
Another example of the application of the cDNA microarray is to efficiently confirm differentially expressed genes identified by other transcription profiling technologies. cDNA fingerprinting has a relatively broad gene coverage, high sensitivity and is independent of knowledge of genome sequences. Because of these advantages, it has been widely used for non-model systems. However, confirmation of the identified gene fragment reporting differential expression remains a time consuming process. To accelerate the confirmation process, taking advantage of the cDNA microarray in a massive parallel assay, the identified differentially expressed gene fragments were cloned and spotted onto a glass slide. Results showed a good correlation between the two technologies. Among 48 randomly selected differentially expressed gene fragments, 41 were confirmed by the cDNA microarray and three were marginally confirmed. The consistency of the results from these two different technologies validates not only the results, but also the technologies used in the study.
In conclusion, microarrays are a powerful tool for gene discovery in agriculture. This will be especially true when whole-genome expression analysis is enabled. Genome expression analysis provides a broad, un-biased survey of global gene expression, which can be used to associate economic traits with genes and dissect gene regulation pathways. Effectiveness of genome expression analysis will be greatly increased with the elevated gene coverage of the microarray, and will be effectively used for gene discovery programs of varieties of crops and pathogens.
Acknowledgements
We thank Todd Moughmer and Darrell Ricke for their bioinformatics support, and Yen Tran, Jesse Campbell, Devon Brown and Bi-yu Li for their technical assistance.
