Abstract
Background:
Gene expression profiles from early-onset breast cancer and normal tissues were analyzed to explore the genes and prognostic factors associated with breast cancer.
Methods:
GSE109169 and GSE89116 were obtained from the database of Gene Expression Omnibus. We firstly screened the differentially expressed genes between tumor samples and normal samples from patients with early-onset breast cancer. Based on database for annotation, visualization and intergrated discovery (DAVID) tool, functional analysis was calculated. Transcription factor-target regulation and microRNA-target gene network were constructed using the tool of transcriptional regulatory relatitionships unraveled by sentence-based text mining (TRRUST) and miRWalk2.0, respectively. The prognosis-related survival information was compiled based on The Cancer Genome Atlas breast cancer clinical data.
Results:
A total of 708 differentially expressed genes from GSE109169 data sets and 358 differentially expressed genes from GSE89116 data sets were obtained, of which 122 common differentially expressed genes including 102 uniformly downregulated genes and 20 uniformly upregulated genes were screened. Protein–protein interaction network with a total of 83 nodes and 157 relationship pairs was obtained, and genes in protein–protein interaction, such as peroxisome proliferator-activated receptor γ,
Conclusions:
Introduction
In China, the incidence rate of breast cancers increased markedly in the past decades, and it has been reported that the rate would keep increasing in the next few years. Meanwhile, among older patients, the mortality of breast cancers showed significant upward trends. 1 In the United States, younger than 40 years, about 6.6% of humans have been diagnosed with breast cancer. 2,3 Those data imply that breast cancer will become a leading global public health problem as the increasing incidence rate of the disease in the next few decades.
With the development of bioinformatics technology, most researchers focused on exploring molecular biomarkers associated with the development of breast cancers. 4 -6 Potential genes associated with early-onset breast cancer development have been put forward, such as growth arrest specific 7 and breast cancer 1/2. 7 Meanwhile, a lot of prognostic markers have also been put forward. For example, the 21-gene recurrence score has been used in clinical for prognosis of patients with breast cancer. 8 Ellegård and his colleagues reported that copy numbers of ERBB2 and PTPN2 was one of significant prognostic factors in breast cancer after trastuzumab treatment. 9 However, for early-onset breast cancer, the recent biomarkers were still limited, and the novel biomarkers identification was also urgently needed.
In order to explore the prognostic factors in early-onset breast cancers, microarray data of GSE109169 and GSE89116 were downloaded, and the differentially expressed genes (DEGs) between tumor samples and normal samples were screened by classic Bayesian method in limma package. Transcription factor (TF)–target regulation forecast and microRNA (miRNA)-target gene network were constructed using the tool of transcriptional regulatory relatitionships unraveled by sentence-based text mining (TRRUST) and miRWalk2.0, respectively. Moreover, the prognosis-related survival information was compiled based on The Cancer Genome Atlas (TCGA) breast cancer clinical data. The workflow of analysis is shown in Figure 1.

The workflow of analysis.
Materials and Methods
Data Sources
Gene expression profiles of GSE109169 and GSE89116 were downloaded from the database of Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/).
10
GSE109169 was deposited by Chang
Differentially Expressed Genes Screening
Cell format profile data of GSE109169 were downloaded, the microarray data were then converted into expression matrix using Oligo package in R software (version 1.38.0, http://www.bioconductor.org/packages/release/bioc/html/oligo.html). After that, the methods based on the robust multiarray average were used to preprocess the microarray data, and the data were calculated following by the processes of background correction, normalization, and expression calculation. GSE89116 was downloaded as the standardized probe matrix file GSE89116-GPL6947_series_matrix.TXT from GEO. After that, the probes from both microarray data were further annotated according to platform annotation information. The probes not mapped to the gene symbol were deleted. The medial value would be considered as the final gene expression value if multiple probes mapped to the same gene symbol.
The DEGs between tumor samples and normal samples were screened by classic Bayesian method in limma package (version 3.10.3, http://www.bioconductor.org/packages/2.9/bioc/html/limma.html) in R software. All genes were analyzed to obtain the corresponding
After obtaining the DEGs in the 2 data sets, there would be some common DEGs that were upregulated (downregulated) in one data set but downregulated (upregulated) in the other data set. In order to ensure the accuracy of the results, the DEGs that were simultaneously upregulated or simultaneously downregulated in the 2 sets of data were screened for subsequent analysis.
Functional Enrichment Analysis of DEGs
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis were performed using the tool of database for annotation, visualization and intergrated discovery (DAVID) (version6.8, https://david-d.ncifcrf.gov/) based on hypergeometric algorithm.
13,14
A
Protein–Protein Interaction Network and Module Construction
The database of Search Tool for Retrieval of Interacting Genes (STRING) is an online tool evaluating the network of protein–protein interaction (PPI). Using STRING (version 10.0, http://www.string-db.org/) database, 15 the PPI of DEGs was analyzed. The input genes were set as DEGs and the species was set as human beings. The PPI score was set as 0.4 to create subsets of medium-confidence human PPI networks. The tool of Cytoscape (version: 3.2.0, http://www.cytoscape.org/) was used to visualize the predicted PPI network.
The topology properties of the node network were analyzed by the CytoNCA (version 2.1.6, http://apps.cytoscape.org/apps/cytonca), and the parameter of the calculation was set as without weight. 16 The score of nodes would be obtained, and the importance of nodes in network of PPI would be sequenced by the score. Hub protein was defined as important node involved in protein interaction in PPI networks.
Transcription Factor–Target Regulation Forecast
Transcription factor prediction was performed using the tool of TRRUST (version 2; http://www.grnpedia.org/trrust/).
17
The species was set as human beings, and
MicroRNA-Target Network Construction
For DEGs, miRWalk2.0 18 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/) was used to synthesize results from commonly used databases, including miRWalk, miRanda, miRDB, PITA, RNA22, and Targetscan. If the predicted miRNAs appeared in all above 6 databases, it would be considered as the miRNA with high reliability. Then, miRNA–messenger RNA (mRNA) relationship pair was obtained. The miRNAs simultaneously regulated 3 or more target genes were selected. Finally, miRNA–mRNA network was mapped using Cytoscape software.
Prognostic Survival Analysis of Differential Genes
The data used for survival analysis were obtained from the University of California, Santa Cruz (http://xena.ucsc.edu/) database, 19 which contained TCGA-related data. Clinical survival information and gene expression data for breast cancer (log2 [FPKM + 1]) were downloaded. Finally, 1058 cancer samples with survival information (−01A) were obtained through one-to-one correspondence.
The prognosis-related survival information was compiled based on TCGA breast cancer clinical data, including overall survival (OS) and OS status. The above DEGs were selected as candidate genes. Kaplan-Meier (K-M) survival analysis was then performed combining the gene expression value (log2 [FPKM + 1]) with prognostic information in the TCGA database. Two subgroups were obtained based on the median value of their expression values, including high expression group and low expression group, and K-M survival curve was drawn. The significance of
Results
Data Preprocessing
A total of 14 726 genes were annotated in GSE109169, and 18 906 genes were annotated in GSE89116. Moreover, from GSE109169, 708 DEGs were screened, including 313 upregulated and 395 downregulated DEGs, and 358 DEGs were obtained from GSE89116 data sets, including 66 upregulated and 292 downregulated DEGs. As shown in Figure 2, the bidirectional clustering heat map and volcano map were constructed. From the heat map, samples in predesigned groups could be clearly distinguished by DEGs. The DEGs of the 2 data sets were intersected, and the DEGs that were co-upregulated or co-downregulated were screened for further analysis. As shown in Figure 3, a total of 122 common DEGs were obtained. Among these DEGs, 102 were uniformly downregulated and 20 were uniformly upregulated.

The double hierarchical clustering heat map of the GSE89116 (A) and GSE109169 (B) data sets. Volcano map of the GSE89116 (C) and GSE109169 (D) data sets.

Venn map of GSE89116 and GSE109169.
Functional Enrichment of Common DEGs
Biological process of GO and KEGG pathway functional enrichment analysis was conducted on the common DEGs. According to the threshold set by the method, 170 biological processes and 4 KEGG pathways were obtained. The top 10 biological process and 4 KEGG pathways are shown in Figure 4.

The Gene Ontology (GO) functional enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGGE) pathway of differentially expressed genes (DEGs) that related to prognosis in data set.
These common DEGs were significantly associated with biological processes of GO, such as responses to organic substance, hormone stimulus, endogenous stimulus, peptide hormone stimulus, protein stimulation, steroid hormone stimulus, and other hormone-stimulating biological functions. The 4 KEGG pathways included PPAR signaling pathway, phenylalanine metabolism, adipocytokine signaling pathway, and histidine metabolism.
Protein–Protein Interaction Network
According to the “Method” section, the PPI network with a total of 83 nodes and 157 relationship pairs was obtained. As shown in Figure 5, the node connectivity of the differential genes in the network was calculated using the CytoNCA passage in Cytoscape software. The degree scores of the top 12 are given in Table 1, and the connectivity of 12 genes including peroxisome proliferator-activated receptor γ (

Protein–protein interaction network of common differentially expressed genes of GSE89116 and GSE109169. The yellow node indicates the up, the blue node indicates the down, and the node size indicates the size of the degree.
Top 12 Genes With Highest Degrees in Protein–Protein Interaction Network.
Abbreviations: ADIPOQ, adiponectin; PPARG, peroxisome proliferator-activated receptor γ.
In total, 66 TF-target relationship pairs were calculated, and these pairs included 16 TFs and 33 target genes. Among 16 TFs, PPARG was the only one downregulated TF. The network of TF target is shown in Figure 6A. The PPARG could regulate 7 genes, such as Kruppel-like factor 4 (

The regulatory network of transcription factor (TF)–target (A) and microRNA-target (B).
MicroRNA Predictive Analysis and miRNA-Target Network Construction
From the network of miRNA-target genes, 368 miRNA-target relationship pairs were calculated, and the relationship pairs included 27 miRNAs and 52 target genes, of which 21 miRNAs could regulate 3 or more target genes. Figure 6B shows that miRNA-target network contains 64 miRNA-target relationship pairs, including 21 miRNAs and 27 targets.
Prognostic Survival Analysis of Common DEGs
After prognostic survival analysis, we obtained a matrix of expression values of 121 DEGs in 1058 breast cancer samples. Table 2 shows a total of 16 common DEGs, including 2 upregulations and 14 downregulations, were significantly related to the prognosis of the disease. The K-M survival curves of 16 prognosis-related genes are shown in Table 2. Moreover, one upregulated (squalene epoxidase,
Sixteen Differentially Expressed Genes Associated With Prognosis.
Abbreviation: PPARG, peroxisome proliferator-activated receptor γ.

Kaplan-Meier survival curve of SQLE (A) and peroxisome proliferator-activated receptor γ (PPARG) (B).
Discussion
In our study, a total of 122 common DEGs were obtained, of which 102 genes were uniformly downregulated and 20 genes were uniformly upregulated in both GSE109169 and GSE89116. The PPI network with a total of 83 nodes and 157 relationship pairs was obtained, and genes in PPI such as
Previous studies have demonstrated that obesity played a major role in breast cancer pathogenesis.
20,21
The gene of
Adiponectin was also a key node in PPI network of breast cancer, which was exclusively expressed in adipose tissue. The gene was an adipocytokine secreted by adipocytes, which was demonstrated as a gene negatively regulating cancer cell growth. Previous studies demonstrated that the gene could regulate the processes associated with cell growth, angiogenesis, and tissue remodeling mediated by various growth factors.
24
Chung
PDK4 is a key enzyme in the process of glucose metabolism and highly expressed in breast cancers. Our data showed the gene was one of key nodes of PPI network in breast cancer, and it has a targeted binding relationship with more than 5 miRNAs. Previous evidence demonstrated inverse correlation between expression of miR-211 and PDK4, and targeting miR-211 to inhibit PDK4 was recognized as a valuable potential therapeutic strategy in breast cancers. 29 In our study, the correlations between PDK4 and 20 miRNAs were demonstrated, such as miR-211, miR-602, and miR-16. Although no clinical data have been published on these correlations, these findings might be useful for future advances in breast cancer treatment.
In summary,
Footnotes
Authors’ Note
Our study did not require an ethical board approval because it did not contain human or animal trials. Zhun Yu, Qi He and Guoping Xuare also affiliated with Shanghai Key Laboratory of Embryo Original Diseases, Shanghai, China and Shanghai Municipal Key Clinical Specialty, Shanghai, China.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
