Abstract
We report a method,
Keywords
Introduction
Chromosomal aberrations are frequently observed in cancer, and whole-genome analysis of copy number change in tumor cells has become a useful tool for tumor classification, tumor marker discovery, and for studying tumorigenesis. The initial application of chromosomal comparative genomic hybridization (CGH), co-hybridizing differentially labeled tumor and normal genomic DNA to normal metaphase spreads, identified genomic regions of deletions and amplifications in various tumor samples and cell lines (1), allowing copy number estimation at around 10 megabase resolution. Recent advances in microarray technology have provided higher resolution tools for genome wide analysis of copy number estimations. An early array-based study used a spotted chromosome-specific library or cloned genomic fragments to investigate copy number changes in tumor samples (2). Later developments, using microarrays derived from genomic clones (3), cDNA (4), BAC clones (5) and oligonucleotides (6–12) provided higher resolution analyses. By using high-density SNP oligonucleotide microarrays, Bignell et al. (8) described an assay and algorithm for copy number analysis on various cancer cell lines to identify homozygous deletions and high-level amplification. Other oligonucleotide-based microarray studies used longer oligos, 60- or 70-mers, to identify copy number changes in cancer cells (6, 7).
Gene expression profiles have been used successfully to classify tumors (13), (14), including gastrointestinal stromal tumors (15). To better understand the role of DNA copy number aberration in tumorigenesis, efforts have been made to correlate gene expression patterns to specific genomic alterations (16–22). While genes in the altered genomic regions are not necessarily regulated by DNA dosage, copy number aberrations may influence genome-wide gene expression patterns. If both genomic DNA and RNA are available from the same sample, both copy number analysis and RNA expression analysis can be performed on the same arrays. Thus, it is possible to assess whether a probeset is in a region that is both amplified and over-expressed. Such regions may be of greater interest for further study, both to understand the pathogenesis of disease and to explore the possibility of discovering diagnostic biomarkers.
Gastrointestinal stromal tumor (GIST) is the most common mesenchymal tumor of the intestinal tract (23). GISTs express KIT protein and show in a significant number of cases activating mutations in either
In this study, we describe an assay for detecting copy number changes by hybridizing genomic DNA to oligonucleotide microarrays designed for RNA expression profiling. We applied this approach to examine the genomic copy number changes among various cell lines and GIST tumors. Algorithm and method development were performed on cell lines containing various numbers of X chromosomes and known deletions and amplifications. This method, Expression-Microarray Copy Number Analysis (ECNA), allowed us to readily identify genes that showed copy number alterations starting with as little as 5 ng genomic DNA. ECNA was validated on GIST tumors in which previously described as well as novel copy number aberrations were identified.
Materials and Methods
Cell Lines and Dna
DNA samples used in this study fall into 3 categories: DNA extracted from cell lines, normal human blood, and GIST tumors. DNAs from cell lines containing different copy numbers for the X chromosome: 1X(NA01723A), 2X(NA09899), 3X(NA04626), 4X(NA01416), and 5X(NA06061) chromosomes and from a Chromosome 4 deletion cell line (NA04126) were purchased from the National Institute of General Medical Sciences (NIGMS) Human Genetic Cell Repository, Coriell Institute for Medical Research (Camden, NJ). A human breast cancer cell line, SK-BR-3, was obtained from the American Type Culture Collection (ATCC, Manassas, VA). DNA was extracted from the cultured cells using the DNA Maxi Kit (Qiagen, Inc., Valencia, CA). DNA from normal blood was obtained from AllCells, LLC (Emeryville, CA). GIST sample DNA was obtained using a standard organic phenol-chloroform procedure. There were 5 GIST samples from 4 patients. Three of the samples were taken from the primary tumor resection, and in one patient two abdominal recurrences removed at different time points were analyzed (GIST#159, 199). The diagnosis was confirmed by pathologic review and immunoreactivity for KIT. Three samples had a
Whole Genome Amplifications, Purification, Fragmentation and Labeling
5–25ng genomic DNA was amplified using QIA-GEN's REPLI-g® kit (Qiagen) for 16 hours at 30 °C, according to the protocol provided by the manufacturer. Reaction volumes were between 150 and 200 μL (2–5 μg/μL yield). DNA from GIST tumors has been previously used for reliable genotyping results. The amplification products were purified by Qiagen genomic-tip (Qiagen) and quantified using a NanoDrop® spectrophotometer (NanoDrop Technologies, Wilmington, DE) at 260 nm. Fragmentation of purified DNA samples (100 μg) was carried out by adding 0.2 Unit of DNAase I (DNA Fragmentation Reagent, Affymetrix, Inc.) in 1X of Fragmentation Buffer (Affymetrix, Inc.), then incubated at 37 °C for 30 min. The fragmentation reaction was terminated by incubation at 95 °C for 10 min. The fragmentation products were then terminally biotinylated with DNA Labeling Reagent (Affymetrix, Inc.) at 37 °C for 5 hours. The labeled fragments were then concentrated on YM-3 Microcon columns (Millipore, Billerica, MA).
Dna Hybridization, Wash, Staining and Scanning
Labeled DNA fragments (0.5 μg/μl) were added in a hybridization mix containing: 1X HYB Mix, 2.5X Denhardt's Solution (Sigma, St. Louis, MO), 0.125 μg/μl human Cot-1 DNA (Roche, Basel, Switzerland), 0.06 nM Oligo B2. (Affymetrix) and 10% DMSO (Sigma). The hybridization mix was heated at 95 °C for 5 min, followed by immediate cooling, and then hybridized to the GeneChip® Human Genome U133 Plus 2.0 arrays (Affymetrix) at 48 °C for 16 hours. After hybridization, arrays were washed with 3M TMACl in 0.4 × SSPE and 0.01% Tween-20 solution for 30 min, then washed extensively with 0.1M NaCl in 0.6 × SSPE and 0.01% Tween-20 prior to the staining. Arrays were first stained with streptavidin, and then with a biotin-conjugated anti-streptavidin antibody, finally followed by staining with phycoerythrin–-streptavidin. Arrays were scanned using the Affymetrix GeneChip Scanner 3000 (Affymetrix). Image analysis was performed with GeneChip Software GCOS, version 1.2.
Rna Hybridization to Affymetrix u133a Arrays
RNA from 10 tumor samples (4 GISTs and 6 leiomyosarcomas) was analyzed on Affymetrix human genome U133A arrays. Leiomyosarcomas, which are malignant mesenchymal neoplasms of smooth muscle derivation closely resembling GIST morphologically, but genetically distinct from GIST, were used as a control reference. RNA was isolated using the protocol accompanying the RNAwiz™ RNA Isolation Reagent from Ambion (Austin, TX) and all samples were treated on the column with RNase-free DNase (Qiagen, Valencia, CA) according to the manufacturer's instructions. Twenty-five to 50 nanograms of total RNA were tested for quality on an RNA 6000 Nano Assay (Agilent, Palo Alto, CA) using a Bioanalyzer 2100. RNA with an OD260/280 ratio greater than 1.8 was chosen for expression profiling experiments. Two micrograms of high quality total RNA was then labeled according to protocols recommended by the manufacturer. Briefly, after reverse-transcription with an oligo-dT-T7 (Genset), double stranded cDNA was generated with the Superscript double stranded cDNA synthesis custom kit (Invitrogen Life Technologies, Carlsbad, CA). In an in vitro transcription step with T7 RNA polymerase (MessageAmp™ RNA kit from Ambion) the cDNA was linearly amplified and labeled with biotinylated nucleotides. Ten micrograms of labeled and fragmented cRNA were then hybridized onto a test array and a Human Genome U133A expression array (Affymetrix, containing probesets representing 18,000 transcripts and variants). Post hybridization staining and washing were processed according to the manufacturer (Affymetrix). Finally, chips were scanned with a GC3000 laser confocal scanner.
Data Analysis of Copy Number Change
The copy number analysis workflow is summarized in Figure 1. The following steps were carried out sequentially: data normalization, data filtering, chromosome data mapping, reference and validation data set selection, DNA copy number estimation by computing a Z score and Stouffer Z score for each probe set, method validation and GIST samples copy number estimation. Details of the method are described in the Supplementary Materials and Methods.

Data Analysis of Rna Samples Hybridized to u133a Arrays
Analysis was performed using Affymetrix PLIER algorithm (Affymetrix Technical Note 1)(30). Principal Component Analysis (PCA) and oneway ANOVA were performed using Partek software. Data visualization was rendered either by Partek® Genomics Suite, version 6.2 or SpotFire® DecisionSite™ version 8.1 software. One-way ANOVA analysis was used to compare the 4 GIST samples with a reference group of 6 leiomyosarcoma tumors. Data mapping between RNA expression data and DNA copy number estimation was done by matching probe set names (U133 Plus 2.0 vs. U133A array) or by corresponding chromosomal locations (U133 Plus 2.0 vs. Mapping 100K array). Each step is described in detail in the Supplementary Materials and Methods.
100k Human Mapping Arrays
For data validation, genomic DNA from the same 4 GIST samples was hybridized on the 100K Human Mapping Arrays (31). The data was analyzed with Affymetrix GDAS software and the subsequent DNA copy number estimation was analyzed with Affymetrix DNA copy number tool CNAT (Affymetrix Technical Note 2) (32).
Fluorescence in Situ Hybridization(Fish)
FISH was performed in 3 GIST cases, using fresh frozen touch preparations from all 3 tumors, as well as on paraffin sections for 2 of these tumors. BAC clones for PRKAR2B and PEX11B were obtained from BACPAC Resources. The PRKAR2B probe comprised two overlapping BAC clones, RP11–120N6 and RP11–258L19, labeled by nick translation with Spectrum Orange (Vysis, Abbott Laboratories, IL). A chromosome 7 centromeric plasmid probe (p7t1)(33) labeled with Spectrum Green (Vysis, Abbott Laboratories, IL) was used as reference. The PEX11B probe was a single BAC clone, RP11–315I20, labeled by nick translation with Spectrum Orange. The reference probe was a chromosome 1 centromeric plasmid probe (pSD1–1) (34) labeled with Spectrum Green. FISH was done according to standard procedures. Briefly, touch preparations were fixed in 3:1 methanol/acetic acid and then pretreated with pepsin-HCl at 37 °C for 3 to 5 minutes, rinsed in PBS, fixed in 1% formaldehyde, then rinsed, dehydrated, and air-dried. Paraffin sections were de-waxed in xylenes, and then micro-waved in 10 mM sodium citrate (pH 6~6.5) solution for 5~10 minutes, cooled to room temperature, rinsed and dehydrated. The slides were then denatured in 70% formamide at 68 °C for 2 to 4 minutes. Approximately 100 ng of labeled BAC DNA and 2 μg Cot-1 DNA (Invitrogen), was ethanol-precipitated, and resuspended in hybridization buffer. The probe mix was then denatured at 70 °C for 10 minutes, followed by pre-annealing at 37 °C for 30 minutes. The reference probe was denatured separately, without pre-annealing, and combined with the denatured reference probe on the slide for overnight incubation at 37 °C. After standard post-hybridization washes, the slides were stained with 4’, 6-diamidino-2-phenylindole (DAPI) and mounted in VECTASHIELD® antifade mounting medium (Vector Laboratories). Analysis was done using a Nikon E800 epifluorescence microscope with MetaSystems Isis 3 imaging software. A minimum of 100 cells was scanned over separate regions for each slide. Image z-stacks were captured using a Zeiss Axioplan 2 motorized microscope controlled by Isis 5 software (Metasystems).
Results
Detection of Copy Number Changes
To confirm that differences in signal are proportional to the differences in copy number, we performed the assay on cell lines with variable numbers of X chromosomes ranging from 1 to 5 copies. The probesets on the X chromosome show a proportional increase in signal (Fig. 2A) when each cell line is compared to a 1X cell line. The Z score, which provides a point estimation of copy number for each probeset, is derived by comparing the signal of each probeset in a sample to that of a reference sample set (Fig. 2B). Chromosomes other than the X chromosome were analyzed and found not to have copy number variation in the samples tested (data not shown).

In chromosomal copy number estimations, a range of values is typically seen. It is important to choose thresholds or cut-off values, above or below which a region may be called amplified or deleted. The 69 samples that passed the 67% present call rate cut-off value and used in these analyses (Supplementary Table 1), have known numbers of X chromosomes (Fig. 3). In this study, Z scores in windows of 500,000 bp were used to compute Stouffer Z values. The tighter distribution seen with the Stouffer Z sliding window approach reflects the reduction of noise obtained (Figs. 3A, B) and was used for final copy number estimations. A clear separation was observed between the median Stouffer Z scores for each of the 5 sample sets, bearing 1 to 5 X-chromosome copies, and 2-fold changes could be distinguished by this method (Fig. 3C). In this model, a 2-fold change between 2X and 4X was easier to determine than the change between 1X and 2X. However, 3-fold or greater changes can be distinguished much more easily than smaller changes. When assessing copy number changes in unknown samples, it is important to use thresholds with defined levels of confidence. Since the median and mean Stouffer Z values were highly similar (Fig. 3D), in subsequent analyses we chose to use the mean values, plus or minus 2 S.D. as the threshold value to identify chromosomal deletions and amplifications. These threshold values are highlighted in Figure 3D.

Validation of Known Deletions and Amplifications
Applying the cutoffs listed above, the known deletion on chromosome 4 (4p16.3) from the NA04126 cell line, derived from an individual with Wolf-Hirschorn syndrome, was detected with a number of probesets falling below the 2 S.D. line (Fig. 4). Of these, 4 probesets (shown as blue dots in the inset image) map to the


Copy Number Changes in Gist
The next step was to apply this methodology on tumor samples. GIST is an ideal tumor model for testing the sensitivity of this system, since it has relatively few copy number changes, and these are well-documented using both low and high resolution approaches (29, 36). We therefore analyzed 5 GIST genomic DNA samples for a global assessment of segmental gains and losses. Figure 6 shows the genomic view of the ECNA data for 3 of these tumors. There are chromosomal regions identified as clearly changed in all three GIST samples compared to the control sample (Fig. 6A). For example, in all tumors, the majority of probe sets in 1p are well below the 2X copy number line (0 on the y-axis.) The 1p-arm appears to fall at about a 1X copy number, indicating loss of 1p, while 1q appears to be gained in these samples.

The 2 samples (GIST#159,199) originating from the same patient from two subsequent recurrences, at 14-month intervals, showed very good concordance overall between the copy number changes (Figs. 6C and 6D, and Supplementary Table 2). Interestingly, GIST#199, the later recurrence, showed additional losses at 5q23–35 and 8p12–23 as compared to GIST#159, suggesting the possibility of deletions of candidate tumor suppressor genes involved in tumor progression.
The copy number changes detected by ECNA are listed in Supplementary Table 2. Briefly, the majority of GIST tumors showed losses of 14q (4/5 samples), 22q (3/5 samples) and 1p (5/5 samples). Furthermore, smaller regions of loss were consistently noted, such as 1p36 (seen in 4/5 GIST samples), 13q34 (4/5 samples) and 21q22 (seen in 3/5 GISTs). The two GIST samples harboring mutations in
Seen in ≥2 GIST tumors; bolded regions indicate that were identified by at least 2 CGH methods.
To further validate our assay, we performed a comparison with another copy number technique with similarly high resolution. Figure 7 shows the concordance of our results on chromosome 1, with copy number analysis performed on the GeneChip Human Mapping100K arrays. Results are similar between the two methods; both reveal deletions and gains on chromosome 1p and 1q, respectively (Fig. 7A and B). The U133 Plus 2.0 arrays are gene-centric, whereas the SNP arrays span coding as well as non-coding regions of the genome. This complementarity of coverage is evident in the distribution of probesets or SNPs in the respective arrays. The SNP arrays additionally provide allele-specific information, as illustrated by the loss of heterozygosity (LOH) results in Figure 7C.

Comparison of Copy Number Changes with Expression Data and Validation Using Fish
One-way ANOVA identified probesets that were significantly over- or under- expressed in GIST samples compared to leiomyosarcoma tumor samples. Some regions were identified that showed both copy number change and a corresponding difference in expression as indicated by a significant p-value. Some of these low p-value probesets were mapped back to the chromosomes and regions were selected that showed both copy number change as well as significant difference in expression. Of these, two genes,

Discussion
In this study we designed a method to estimate chromosomal copy number by hybridizing genomic DNA to Human Genome U133 Plus 2.0 arrays, typically used to study levels of RNA expression. An important advantage of our novel assay is that it requires only a very limited amount of DNA, i.e. as little as 5 ng starting material. Additionally, many of the advantages of an established array platform such as whole genome representation, probe set annotations and algorithms to estimate probe set signals were available to us by using this expression array approach.
We developed this method by taking samples with known differences in X-chromosome copy number. We chose a sliding window approach to generate Stouffer Z scores that were used to estimate copy number changes. The approach was then applied to and confirmed on cell lines with known chromosomal abnormalities. Finally, we used this approach to assess copy number changes in gastrointestinal stromal tumor (GIST) samples. GISTs are known to have copy number aberrations, some of which have been identified by other techniques.
In a recent study, Auer et al. (37) used a similar gene resolution analysis of copy number variation, the Affymetrix U133 Plus 2.0 arrays. Similarly, the authors conclude that this approach provides more reproducible results than custom-made BAC CGH arrays, that can be compared among different laboratories and can be combined with gene expression data using the same platform. Their results show a good concordance between the copy number changes detected by a 19k BAC high density microarray platform and the Affy expression arrays. Comparable with our approach, the authors choose various cell lines with known amplifications/deletions, such as neuroblastoma cell lines, to validate the variations in gene copy number. However they do not extend the use of this application to routine clinical tumor samples.
The most common findings, which have been reported by both conventional and BAC-array CGH, include losses of part or all chromosome 14, loss of chromosome 22, and loss of 1p (26, 29) (Table 1). Our method confirmed these results, showing a high incidence of 14q, 22q and 1p losses. Furthermore, we provide evidence that increased resolution in the current platform facilitated the identification of small alterations that were missed by a lower resolution BAC-array CGH platform. Three areas of interest were pinpointed by this method including losses of 1p36, 13q34 and 21q22, the first two previously highlighted by BAC array CGH, while the third locus being a novel finding. In addition, novel gains of
Several copy number analysis methods are now available. The amount of DNA needed varies among the different methods, from as little as 5 ng as used in this study, to 400 ng (8) or 2 μg (7). ECNA uses whole genomic DNA without complexity reduction followed by amplification with φ29 DNA polymerase, in contrast to the WGSA method used on SNP arrays (8, 10, 31). We analyzed the same GIST samples with the SNP array method. Despite the fact that the assay and array designs are distinct, the results obtained are highly similar (Fig. 7). While the SNP arrays have very dense genomic coverage, the HG U133 Plus 2.0 arrays are gene-centric, and so may have representation in regions where SNP coverage is limited or absent. Thus, in addition to showing good concordance, these methods are complementary. Most copy number analysis methods described thus far generate a list of genomic regions undergoing copy number alterations, without further details on their impact on gene expression. ECNA promises to link the areas of loss or gain with information related to the expression level of their corresponding probe sets, as RNA from the same sample can be used to analyze levels of RNA expression on the same platform.
Many of the advantages of an established array platform such as whole genome representation, probe set annotations and algorithms to estimate probe set signals were available to us by using this expression array approach. A clear advantage to this approach is that copy number alterations may be studied in other species, such as mouse, rat, and other model organisms for which expression arrays are available, and for which SNPs have not yet been mapped. This assay has been successfully used by other researchers to detect copy number changes on HG U133 Plus 2.0 arrays (38). Additionally, experimental evidence has shown that this assay may be used on Affymetrix tiling arrays (K. Wu, unpublished data) to assess copy number changes. We also believe that this approach will prove valuable in studying copy number aberrations in clinical samples because of the availability of relatively small amounts of starting material.
Footnotes
Acknowledgements
The authors would like to acknowledge the assistance of Gianfranco de Feo, Jacques Retief, Michael Shapero, Dione Bailey and Giulia Kennedy from Affymetrix and Agnes Viale and the MSKCC Genomic Core Lab, Lei Zhang from Molecular Cytogenetics Core Facility, MSKCC and Nicholas Socci from Computational Biology, MSKCC. Supported in part by: ACS MRSG CCE-106841 (CRA).
Supplementary Material
Sutton, A., K. Abrams, D. Jones, T. Sheldon, and F. Song. 2000.
