Abstract
The interactions between genetic variants in estrogen receptor (ER) have been identified to be associated with an increased risk of breast cancer. Available evidence indicates that genetic variance within a population plays a crucial role in the occurrence of breast cancer. Thus, the comparison and identification of ER-related gene expression profiles in breast cancer of different ethnic origins could be useful for the development of genetic variant cancer therapy. In this study, we performed microarray experiment to measure the gene expression profiles of 59 Taiwanese breast cancer patients; and through comparative bioinformatics analysis against published U.K. datasets, we revealed estrogen-receptor (ER) related gene expression between Taiwanese and British patients. In addition, SNP databases and statistical analysis were used to elucidate the SNPs associated with ER status. Our microarray results indicate that the expression pattern of the 65 genes in ER+ patients was dissimilar from that of the ER- patients. Seventeen mutually exclusive genes in ER-related breast cancer of the two populations with more than one statistically significant SNP in genotype and allele frequency were identified. These 17 genes and their related SNPs may be important in population-specific ER regulation of breast cancer. This study provides a global and feasible approach to study population-unique SNPs in breast cancer of different ethnic origins.
Introduction
Breast cancer is one of the most common cancers for women in the world as it ranks number one among other cancers in developed countries, and ranks fourth in Taiwan. Current research suggests that interactions between genetic variants and a wide range of environmental factors may contribute to the development of breast cancer. Available evidence indicates that genetic variance within the population plays a role in the probability of breast cancer development, with a low incidence in certain groups of Asian women to the highest in Caucasian women (Hsiao et al. 2004).
With microarray technique, large amounts of gene expression data can be obtained in a short period of time. Gene expression profiling is a powerful tool for identifying gene activity patterns, which enables the distinction among various subtypes of breast cancer (including luminal subtypes A and B, and ERBB2 between basal and normal) (Sorlie et al. 2001). According to the data from both clinical and animal studies, estrogen is crucial to the development and progression of breast cancer. Estrogen mediates its effects through the estrogen receptor (ER), which serves as the basis for many therapeutic interventions (Deroo and Korach, 2006). More than two-thirds of breast cancers show estrogen receptor expression at the time of diagnosis, and immunohistochemical detection of estrogen receptor expression is routinely used in making decisions on hormonal therapy for breast cancer (Holst et al. 2007). Gene variants in steroid hormone related genes, ESR1, ESR2, PGR, and HSD17B1 have been identified to be associated with either an increased or decreased risk of breast cancer (Feigelson et al. 2006; Gold et al. 2004); however, the exact associations remain unclear. Furthermore, ER-α allelic variants have been reported to be associated with the risk for breast cancer (Gold et al. 2004) in Caucasians and in Taiwanese (Hsiao et al. 2004). Certain single nucleotide polymorphisms (SNPs) may influence the regulation of ERs and coregulators on tumor development and progression. In this study, we developed an approach to find out population-unique SNPs in breast cancer of different ethnic origins by comparing the gene expression profiles of two different populations and related SNP data.
Materials and Methods
Tumor Tissue Samples and Examination of ER
Surgical specimens of breast cancer tumor tissue were freshly collected and snap frozen from patients who underwent surgery at National Taiwan University Hospital (NTUH) between 2002 and 2005. Cancer samples containing relatively pure tumor, as defined by greater than 50% tumor cells per high-power field examined in a section adjacent to the tissue used, were included in this study. All the paraffin sections of breast cancer specimens (3–5 m in thickness) on slides were processed in Ventana's automated staining system (BenchMark â LT) (Ventana Medical System Inc., Tucson, AZ, U.S.A.) for the immunohistochemical stain (IHC). Firstly the slides were probed with CONFIRMTM anti-Estrogen Receptor (SP1) rabbit monoclonal primary antibody (Catalog # 790-4325, Ventana Medical System Inc.). Secondly, to localize and visualize ER protein within the specimen, iVIEW TM DAB Detection kit (Catalog # 760-091, Ventana Medical System Inc.) was applied. The negative control slides for tumor specimens were solely stained using iVIEWTM DAB Detection kit (Catalog # 760-091, Ventana Medical System Inc.).
RNA Extraction and Oligo Microarray
Total RNA was extracted by Trizol® Reagent (Invitrogen, U.S.A.), followed by RNeasy Mini Kit (Qiagen, Germany). Purified RNA is quantified at OD260nm by a ND-1000 spectrophotometer (Nanodrop Technology, U.S.A.) and quality-controlled by Bioanalyzer 2100 (Agilent Technology, U.S.A.). A human reference RNA pooled from 10 cell lines (Stratagene, U.S.A.) was used to serve as reference in microarray comparison. 0.5 g of total RNA was amplified by a Low RNA Input Fluor Linear Amp kit (Agilent Technologies) and labeled with Cy3 or Cy5 (CyDye, PerkinElmer, U.S.A.) during the
Microarray Data Analysis
In this study, we used microarray technique to profile the gene expression of 59 breast cancer patients in Taiwan with primary invasive breast carcinoma. For the comparison of the gene expression profiles between Taiwanese and U.K. patients, we used U.K. breast cancer data from the National Cancer Institute (NCI) which were obtained from the online supplementary materials of Sotiriou et al. (Sotiriou et al. 2003). Detailed information and clinical characteristics for breast cancer patients in the U.K. and Taiwan are shown in Table 1. The gene expression datasets from our microarray results and that of NCI were grouped into ER+ and ER– respectively according to their clinical prognosis variables shown in Figure 1. To identify differentially expressed genes between the two groups and to increase the accuracy of significant gene selection, combination of Significance Analysis of Microarrays (SAM) (Tusher et al. 2001) and Optimal Discovery Procedure (ODP) (Storey, 2005) were jointly used for the selection of differentially expressed genes.
Summary of microarray information and clinical characteristics of NTUH and NCI datase.
ER: estrogen receptor.

Hierarchical Clustering
Hierarchical clustering was processed by Multiple-Experiment Viewer (MeV) 4.0, an open source software which is part of the TM4 Software Suite (Saeed et al. 2003) created by The Institute for Genomic Research (TIGR, Rockville, MD). Euclidean distance and average linkage were used to measure the distance of gene expression.
SNP Search and Statistical Analysis
SNP information was retrieved from the Perlegen Genotype Browser (http://genome.perlegen.com/browser/index_v2.html) and dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/). Perlegen Genotype Browser and dbSNP contain SNP information of three ethnic populations, African American, European American, and Chinese. For the purpose of this study, we focused on the SNPs of European American and Chinese to represent the United Kingdom (U.K.) and Taiwanese populations respectively for further statistical analysis. SNPs in the 67 candidate genes were initially searched and browsed through the Perlegen Genotype Browser. The SNPs with differences in allele frequencies between European American and Chinese were manually screened for further analysis. In order to collect detailed information on the rest of the SNPs, we referred to dbSNP and retrieved the genotype and allele frequencies of the two populations. Through Pearson's chi-square tests, the SNPs of our candidate genes that were significantly differentiated in both genotype and allele frequencies between European American and Chinese were identified. We utilized R software for the statistical analysis of SNPs.
Results
Differentially Expressed Genes in ER+ /ER– Breast Cancer Subgroups
First, we used immunohistochemical stain to examine ER+/ER– breast cancer tissue subgroups (Fig. 2). The microarray data were grouped into ER+ and ER– groups based on the ER status of breast cancer patients. In order to select the differentially expressed genes, we combined two algorithms, Significance Analysis of Microarray (SAM) (Tusher et al. 2001) and Optimal Discovery Procedure (ODP) (Storey, 2005). Using two-class unpaired SAM supervised analysis, and setting the imputation engine at 10


Comparison of differentially Expressed Genes in Our and NCI Data
The microarrays used in our and NCI datasets have 3197 gene probes in common, and were designated as our candidate gene pool for analysis. Shown in Figure 4, with regard to our candidate gene pool, in the NTUH dataset, 9 of 65 significant genes fell in this block region (3197 common genes for both NTUH and NCI datasets); while 68 of 88 genes fell in the same block in the NCI dataset. Five genes found in the overlapping region were the common genes among the significant genes from the two datasets, implying these genes were differentially expressed in both Taiwanese and U.K. patients. They are basic transcription factor 3 (BTF3), cyclin-dependent kinase inhibitor 2A (CDKN2A), estrogen receptor 1 (ESR1), GATA binding protein 3 (GATA3), and trefoil factor 3 (TFF3). All of these five genes have already been reported to be associated with ER status in human breast cancers. There were 67 differentially expressed candidate genes appearing exclusively in either the Taiwanese or the U.K. patients, as depicted by the Venn diagram (Fig. 4). These 67 mutually exclusive genes drew our attention since they might reflect the differences between the two populations, and may potentially influence the ER status in each population. Further details about these genes are shown in Table 2.

Subsets of differentially expressed genes in NTUH and NCI datasets.
The expression values of these genes were detected by two or more probes in the microarrays from either NTUH or NCI dataset. Hence two or more p-values and q-values were calculated and shown together.
p-value and q-value were calculated using the software, EDGE. Numbers have been rounded off.
T for NTUH dataset and U for NCI dataset. ↑, higher expression in ER+ samples; ↓lower expression in ER- samples.
Association Analysis of SNPs in Different Ethnic Origins
Since the SNP information of many human genes is available, we searched for possible SNP variations of these 67 genes between the representative populations of Taiwanese and U.K. patients in public databases. These results, namely the population-unique SNPs in the mutually exclusive genes, may imply the associations of these SNPs with ER status in breast cancer of different ethnic origins. Perlegen Genotype Browser and dbSNP, two public SNP databases, were useful for our association analysis of SNPs. Perlegen Genotype Browser includes SNP information on three ethnic populations, African American, European American, and Chinese. We chose the SNPs of European American and Chinese to represent the U.K. and Taiwanese populations respectively for further statistical analysis. Using a cutoff p-value of 0.001, Pearson's chi-square test identified 17 candidate genes which could be potential focuses in breast cancer development, including damage-specific DNA binding protein 2 (DDB2), ATP-binding cassette, sub-family D (ALD) and member 3 (ABCD3), ATPase, Na+/K+ transporting, beta 3 polypeptide (ATP1B3), sphingosine kinase type 1 interacting protein (SKIP), developmentally regulated GTP binding protein 1 (DRG1), interleukin-1 receptor-associated kinase 1 (IRAK1), keratin 7 (KRT7), dipeptidyl-peptidase 6 (DPP6), E2F transcription factor 3 (E2F3), fucosyltransferase 8 (FUT8), hydroxysteroid (17-beta) dehydrogenase 4 (HSD17B4), lipin 1 (LPIN1), myosin VI (MYO6), nuclear factor I/B (NFIB), protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1), syndecan 4 (SDC4), and seven in absentia homolog 2 (SIAH2). These genes had more than one statistically significant SNP (p-value < 0.001) in genotype and allele frequency between our and NCI datasets. A total of 83 SNPs among the 17 genes were identified. Genes with the most statistically significant SNPs between European American and Chinese are as follows: DPP6 (19 SNPs), followed by HSD17B4 (13 SNPs), ABCD3 (10 SNPs), and FUT8 (9 SNPs). The remaining genes have fewer than five significant SNPs. In Table 3, the differential genes with their possible related SNPs are presented.
List of identified genes and SNPs.
Genotype frequency: AA, AC, CC. Allele frequency: A, C.
Genotype frequency: AA, AG, GG. Allele frequency: A, G.
Genotype frequency: AA, AT, TT. Allele frequency: A, T.
Genotype frequency: CC, CG, GG. Allele frequency: C, G.
Genotype frequency: CC, CT, TT. Allele frequency: C, T.
Genotype frequency: GG, GT, TT. Allele frequency: G, T.
Functional Study of Identified Genes
In order to understand the functions of the 17 identified genes and their possible relationships with ER expression, we used DAVID database (Database for Annotation, Visualization and Integrated Discovery) (Dennis et al. 2003) to perform this analysis. These genes were functionally classified into the following six categories: alternative splicing (DPP6, PTPNS1, NFIB, MYO6, SKIP, IRAK1, DDB2, and FUT8), signal-anchor (DPP6, ATP1B3, FUT8), sh3-binding (PTPNS1, FUT8), disease mutation (HSD17B4, MYO6, ABCD3, DDB2), peroxisome (HSD17B4, ABCD3), and nucleotide-binding (MYO6, ABCD3, IRAK1, DRG1). As expected, all of the observed functions are either directly or indirectly involved in the regulation of estrogen receptors, in particular the genes closely associated with alternative splicing, sh3-binding, disease mutation, and peroxisome (Bonofiglio et al. 2005; Pfeffer et al. 1996; Troester et al. 2006; Zhou et al. 2006).
Discussion
According to the result of SNP analysis, 83 SNPs from the 17 differentially expressed genes in either the Taiwanese or U.K. ER+/ER– breast cancers were identified (Table 3). The findings indicate these 17 genes and their related SNPs may be important in the ER regulation in breast cancer of different populations. Since many reports have showed the association of genetic variations of estrogen receptor and breast cancer (Iwase H, 1996; Kang et al. 2002; Roodi et al. 1995), it is worthy to presume that the 83 identified SNPs in ER-related genes may very well have either direct or indirect influence on the ER status in breast cancer of different ethnic populations. Here we used SNPs to represent the polymorphisms between Taiwan and U.K. populations since they are good indicators for measuring the genetic differences between two different ethnic origins. Moreover, among the 17 significant genes mentioned above, only DDB2 was from our data, while the rest of the 16 genes were from the NCI data. DDB2 is known for its function in DNA binding while it also acts as a tumor suppressor. A recent study provides further evidence that rs830083 polymorphisms in DDB2 may contribute to the etiology of lung cancer in Chinese population (Hu et al. 2006). This corresponds to our study in that the SNPs we identified in DDB2 may play a role in specific regulation of estrogen receptor in Chinese breast cancer patients.
Except for DDB2, the expressions of the other 16 genes, ABCD3, ATP1B3, SKIP, DRG1, IRAK1, KRT7, DPP6, E2F3, FUT8, HSD17B4, LPIN1, MYO6, NFIB, PTPNS1, SDC4, and SIAH2, are different in ER-positive and ER-negative U.K. patients. ATP1B3 is derived from the primary differentiation event during mammalian development (Adjaye et al. 2005). IRAK1 has been proposed that one SNP within it, when combined with high-risk genotype at TLR6-1-10, conferred a significant increase in the risk for prostate cancer, suggesting synergistic effects between sequence variants in IRAK1 and the TLR 6-1-10 gene cluster (Sun et al. 2006). LPIN1 is reported to be a candidate gene for human lipodystrophy syndromes as common SNPs in LPIN1 of lipodystrophy patients have been identified (Cao and Hegele, 2002). Limited research has been done on the genetic variants in these genes, but the correlations of these ER-related genes and tumorigenesis are likely to correspond to an increase in susceptibility for breast cancer. Cytokeratin 7 (encoded by KRT7/CK7) is found in the majority of type 1 papillary renal cell carcinomas and chromophobe renal cell carcinomas, and its expression profile alteration is particularly associated with tumorigenesis of primary adenocarcinoma of the small intestine (Chen and Wang, 2004; Mazal et al. 2005). SDC4 is a cell-adhesion molecule related to the enhanced adhesion of cancer cells to fibronectin (Koike et al. 2004), and functions as a receptor in intracellular signaling. A study has showed SKIP to be a protein likely to participate in the regulation of SPHK1 activity modulation, but much about its functions remain unknown. The association between estrogen and SIAH2 has been illustrated by a mechanism in which the estrogen-ER complex markedly reduces the level of N-CoR through a process related to the up-regulation of SIAH2 and the subsequent targeting of N-CoR for proteasomal degradation (Frasor et al. 2005).
The genes, BTG2, ISL1, MCP (also known as CD46), SIAH2, and XBP1, in the list of mutually exclusive subset shown in Table 2 were previously known to be associated with ER status (Frasor et al. 2005; Gay et al. 2000; Kawakubo et al. 2006; Lacroix and Leclercq, 2004; Rushmere et al. 2004). These five ER-regulated genes were only observed to be differentially expressed in the NCI dataset, but not in our dataset. Intriguingly, the majority of published reports used breast cancer cell lines extracted from Caucasian patients in their studies. One possible explanation for this phenomenon is that the association of these five genes and estrogen is solely limited to Caucasian breast cancers patients.
In addition, five genes, BTF3, CDKN2A, ESR1, GATA3, and TFF3, fell in the overlapping subset as shown in Figure 4, meaning they were identified to be differentially expressed in both our and the NCI datasets. All of these genes have been identified and studied to play roles in ER regulation (Doane et al. 2006; Green et al. 2007; Milde-Langosch et al. 2001; Oh et al. 2006). According to a study, specific interaction between BTF3 and ERalpha has been verified
The SNPs identified in our selected genes may be involved in determining whether ER expression causes disparities between Chinese and Caucasian breast cancer patients. Our work can provide possible SNPs associated with ER status in breast cancer of different ethnic origins and a set of potential gene expression signatures for novel targeted therapeutic strategies.
Footnotes
Acknowledgements
We thank Li-Yun Chang and Dr. HC Lien for Immunochemical stain of ER and Jason Lee for proofreading the manuscript. We acknowledge the grant supported in part by Grants NSC95-2221-E-002-183, NSC94-2314-B-002-118 and NSC96-2311-B-002-018- from the National Science Council and NTU-Frontier and Innovative Research Projects-96R0107 from National Taiwan University.
