Observation of significant biomarkers in osteosarcoma via integrating module- identification method with attract

Abstract

OBJECTIVE:

Osteosarcoma (OS) is the most frequent type of bone malignancy, and this disease has a poor prognosis. We aimed to identify the significant genes related with OS by integrating module-identification method and attract approach.

METHODS:

OS-related microarray data E-GEOD-36001 were obtained from ArrayExpress database, and then protein-protein interaction (PPI) networks of normal and OS were re-weighted by means of spearman correlation coefficient (SCC). Next, maximal cliques were detected from the re-weighted PPI networks using clusteringbased on maximal cliques approach. Afterwards, highly overlapped cliques were merged according to the interconnectivity, following by candidate modules and seed modules identification. Attract proposed by Mar et al. who have suggested that this approach can extract and annotate the gene-sets which can distinguish between disease and control samples, and obtained differences of these gene-sets among the expression profile of samples were defined as attractors. Thus, we applied attract method to extract differential modules from the seed modules, and these obtained differential modules were defined as attractors. The genes in attractors were determined as attractor genes.

RESULTS:

After eliminating the maximal cliques with nodes less than 4, there were 1,884 and 528 maximal cliques in normal and OS PPI networks, which were used to conduct module analysis. A total of 60 and 19 candidate modules were obtained in control and OS PPI networks, respectively. By comparing with normal group, 2 seed module pairs with similar gene composition were found. Significantly, based on attract method, we found that these 2 modules were differential. These 2 modules had the same gene size with 4 genes. Of note, genes CCNB1 and KIF11 simultaneously appeared in these two attractors.

CONCLUSIONS:

We successfully identified two attractors via integrating module-identification method and attract approach, and attractor genes, for example, CCNB1 and KIF11 might play pathophysiological roles in OS development and progression.

Keywords

Osteosarcoma attract modules

Abbreviations

OS: osteosarcoma

PPI: protein-protein interaction

SCC: spearman correlation coefficient

DEGs: differentially expressed genes

RMA: robust multiarray average

WID: weighted interaction density

WIC: weighted inter-connectivity

MCD: module correlation density

FDR: false discovery rate

1. Introduction

Osteosarcoma (OS), characterized by neoplastic cells that directly generate immature osteoid [1], is a major common primary malignant bone tumor in children and young adolescents. Five-year survival rate is about 60–65% for OS patients without metastasis [2]. Unfortunately, the survival rates have reached a plateau, and further improvements are possibly dependent on novel biology-based therapies. Moreover, mechanisms of OS rapid growth and chemo-resistance are still poorly elucidated. Thus, it is very urgent to reveal the mechanisms of OS growth and progression.

Recently, gene expression biomarkers based on microarray technology have proven available for predicting the risk of OS. Nevertheless, the significant gene changes identified in one study are seldom replicated in another study [3, 4]. More significantly, many of these signatures are not related to OS functionally. With the goal of identifying robust, functionally relevant disease biomarkers, it is crucially important to discover gene biomarkers that are consistent in various data sources. A complex disease, for example, OS, leads to many differentially expressed genes (DEGs), which together can be used to construct a “disease module” network which functions as highly synergetic or coordinated groups [5]. Several of these DEGs directly contributing to the disease phenotype, are called “driver” genes. The expression changes enacted on the driver genes lead to a cascade of changes of other genes: initially to their first-degree interaction neighbors which accumulate in cells yet do not correspond to cancer development are termed as passengers [6, 7]. Of note, it is a challenge to isolate the modules from the passenger genes for a given disease [8]. Thus, identification of core modules is a crucial step in understanding the molecular mechanisms underlying OS, which can further aid in effective diagnosis, treatment and prognosis of OS patients.

Gene module analysis attempts to study combined effects by identifying groups of genes that are coordinately expressed [9, 10]. Excitedly, attract proposed by Mar et al. [11] who have suggested that this approach can extract and annotate the gene-sets which can distinguish disease from control samples, and obtained differences of these gene-sets among the expression profile of samples were defined as attractors. Moreover, another study has also used attract method to detect “core pathway modules” [12]. Famously, attractor can identify well-defined ensembles of networks whose statistical features matched those of real cells and organisms [4]. In the present study, we used module-identification method and attract method to screen attractors within PPI networks to determine the pathogenesis of OS progression. In brief, the gene expression profile E-GEOD-36001 was recruited from the EMBL-EBI database, following by re-weighting the PPI networks of normal and OS groups on the basis of spearman correlation coefficient (SCC). Then, seed module identification was implemented by computing module correlation density (MCD) between any pair of candidate modules which were derived from the re-weighted PPI networks using clique-merging algorithm. Subsequently, attract method was employed to detect differential modules (named as attractor) from the seed modules between OS group and normal group, and the genes in differential modules were defined as attractor genes.

2. Materials and methods

2.1 Gene expression profile, quality control and PPIs data

OS-related gene expression data with accessing number E-GEOD-36001 were recruited from ArrayExpress database. E-GEOD-36001, which presented on the platform of Illumina Human-6 v2 Expression BeadChip, was comprised of 19 OS samples and 6 normal samples. Before analysis, we firstly pretreated the gene profile datasets of E-GEOD-36001. In detail, background correction was implemented relying on RMA [13], followed by normalization through quartile algorithm [14]. Then, PM/MM correction was implemented via MAS [15]. Eventually, we conducted the expression summary by means of medianpolish. Afterwards, we converted the data on probe levels into gene symbols through annotate package [16]. Finally, 19,032 genes were obtained for further exploitation.

All human predicted PPI information was collected from the database of STRING 9 [17]. Proteins without expression value were removed and the repeated IDs for a given gene were reduced to a single one. In order to minimize false positive rate, only protein interactions with combine-score $\geqslant$ 0.8 were kept to construct PPI networks. In the current study, 51,258 interactions among 8,282 nodes were preserved. By intersecting with the gene expression data, a seed PPI network with 8,238 nodes and 51,218 interactions was listed.

2.2 Identification of modules

Based on the above information, we used clique- merging method to detect potential modules. The identification procedure contained three steps: establishing the conditional-specific PPI networks for normal and OS groups; detecting candidate modules from the OS and normal-specific PPI networks relying on clique-merging algorithm; as well as identifying seed modules from candidate modules based on MCD and module pair match.

2.2.1 Inferring normal and OS PPI networks

The weight values denoted the reliabilities of interactions, and interactions with low weight are likely false-positives [18]. Herein, in our analysis, we used SCC to examine the strength of two interacting proteins. As reported SCC is frequently used to measure the strength of association of two co-expressed variables and the range is from $-$ 1 to 1 inclusive [19, 20]. Based on SCC, the interactions in the seed PPI networks obtained above were re-weighted. The weight value of the edge was defined as the absolute value of SCC of the corresponding interaction. If SCC has a positive value, there is a positive linear correlation between the two proteins. In our work, only interactions with significant weight values ( $P<$ 0.05) was selected to construct the conditional-specific PPI networks of normal and OS groups.

2.2.2 Detecting modules and refining modules

In the present analysis, the conditional-specific modules were detected using modules-identification algorithm in Genelibs (http://www.genelibs.com/gb/) based on clique-merging method [21]. This algorithm included two steps:

Firstly, we discovered all the maximal cliques from the re-weighted PPI networks and the cliques were sorted on the basis of the weighted interaction density (WID). Secondly, many maximal cliques might overlap with each other, thus to decrease the result size, the highly overlapped maximal cliques were eliminated or merged. Weighted inter-connectivity (WIC) between 2 cliques was calculated to determine whether these 2 overlapped cliques were merged or not. The maximal cliques were listed in descending sequence based on the WIC values, named as clique [C ${}_{1}$ ], [C ${}_{2]}$ , [C ${}_{3}$ ]……[Cm]. In short, for every maximal clique [C ${}_{i}$ ], we iteratively examined whether there was a clique [C ${}_{j}$ ]. If the ratio of the overlap between [C ${}_{i}$ ] and [C ${}_{j}$ ] was greater than t ${}_{0}$ which was a overlap-threshold (herein t ${}_{0}=$ 0.5), this clique existed. Thus, [C ${}_{j}$ ] was merged into [C ${}_{i}$ ] to produce a candidate module. Or else, maximal clique [C ${}_{j}$ ] was discarded.

Table 1
Distribution of weight values of interactions in normal and osteosarcoma (OS) groups

Range of weight values Number

of interactions Normal group OS group

0.9–1 5032 9

0.8–0.9 4616 93

0.7–0.8 0 408

0.6–0.7 0 1160

0.5–0.6 0 2471

0.4–0.5 0 1678

Sum 9648 5819

Mean weight value 0.904 0.563

Range of weight values	Number
of interactions	Normal group	OS group
0.9–1	5032	9
0.8–0.9	4616	93
0.7–0.8	0	408
0.6–0.7	0	1160
0.5–0.6	0	2471
0.4–0.5	0	1678
Sum	9648	5819
Mean weight value	0.904	0.563

Figure 1.

The distribution of weighted interaction density of the maximal cliques in osteosarcoma (OS) and normal groups.

Table 2

Properties of normal and OS modules

Module set	Number of candidate modules	Average module size	Correlation
			Max	Min	Avg
Normal	60	31.40	0.473285	0.424693	0.464375
OS	19	27.79	0.357087	0.259365	0.317541

Note: Max, maximum value; min, minimum value; avg, average value.

2.2.3 Comparison of modules between OS and normal conditions

We supposed that H ${}_{O}$ , and H ${}_{N}$ respectively stood for the PPI network of OS, and normal samples. The modules sets O $=$ {O ${}_{1}$ , O ${}_{2}$ , …, Ok} and N $=$ {N ${}_{1}$ , N ${}_{2}$ , …, Nn} were selected from H ${}_{O}$ , and H ${}_{N}$ , respectively. The MCD of candidate modules in two conditions was calculated. After that, Jaccard similarity was utilized to detect the set of module pairs either having the similar or same gene composition (seed modules). In our analysis, Jaccard score was set as 0.7.

2.3 Attractor analysis within modules

On the basis of attractor theory developed by Mar et al. [11], attract method was utilized to detect differentially expressed modules related to OS from the above-identified seed modules. To test the module data, GSEA-ANOVA was utilized as a gene set enrichment method, which was different from other methods in multiple classes [11]. Obtained differences among the expression profile of samples were defined as attractors [11]. Using ANOVA model, Fisher’s test was implemented for genes in modules to examine the expression level. After that, we employed T-test to examine the F-statistics values. Next, false discovery rate (FDR) correction was implemented on the P values using Benjamini-Hochberg method [22]. Modules were ranked according to the significance of difference. Remarkably, the modules with FDR $<$ 0.05 were considered as the attractors.

Figure 2.

Venn diagram of candidate modules in OS and normal samples.

Figure 3.

Two attractors involving in 4 nodes and 6 interactions. The circular nodes represented proteins and the grey lines stood for interactions. Nodes CCNB1 and KIF11 simultaneously appeared in these two attractors.

3. Results

3.1 Disruptions analysis in PPI networks

After reweighting the original PPI networks using SCC method, we obtained the conditional-specific PPI networks of normal and OS groups. Significantly, we found there were different number of interactions in these two groups. For normal samples, there were 9648 interactions, and 5819 interactions were kept in OS reweighted PPI network. The mean weight values of normal and OS networks were 0.904 and 0.563, respectively. The distribution of weight values among interactions in normal and OS groups was listed in Table 1. From the distribution of weight values of interactions, we observed that the weight values among interactions in normal PPI network mainly ranged from 0.8 to 1, while the range of weight values of interactions in OS PPI network was from 0.4 to 1. Moreover, in the weight value distribution of 0.8–1, the count of interactions of normal PPI network was highly greater than that in OS group, but the number of interactions of normal PPI network was highly lower than those in OS samples in the weight distribution of 0.4–0.6.

3.2 Detection of modules

Using clique-merging method, a total of 6,921 and 4,808 maximal cliques in PPI networks of normal and OS groups. Figure 1 showed the difference in the WID distribution of the maximal cliques in the normal and OS groups. We found that the cliques in normal group mainly distributed the range of 0.4 $\sim$ 0.5, and there were more cliques in OS group when WID ranged from 0.3 to 0.45. Then, the cliques having nodes less than 4 were discarded, and 1,884 and 528 maximal cliques were remained in normal and OS PPI networks. We used these 1,884 and 528 maximal cliques to conduct module analysis. As listed in Fig. 2, 60 and 19 candidate modules were identified in normal and OS PPI networks, respectively. Next, we implemented a comparative analysis of normal and OS modules to understand the disruptions at the module level. Importantly, we found that the mean module size of normal group was a little greater, relative to that in OS samples. Similarly, comparing with OS group, the maximum, minimum and average WID in normal group were higher. Specific information was shown in Table 2. Then, we calculated the set of matching modules in normal and OS groups. For Jaccard score $=$ 0.7, we identified 2 module pairs with the similar gene composition between two groups (Fig. 2).

3.3 Attractor analysis

Using the module pairs aforementioned, we applied attract method to extract differential modules, and these obtained differential modules were defined as attractors. Based on the cut-off criteria of FDR $<$ 0.05, two attractors (Module 1, and Module 2) were identified, as displayed in Fig. 3. From this figure, we observed that these two attractors included the same gene size of 4 nodes. Of note, genes CCNB1 and KIF11 simultaneously appeared in these two attractors.

4. Discussion

OS is the most frequent type of bone malignancy, and this disease has a poor prognosis. Unfortunately, about 15–25% of OS patients present with metastasis detected at diagnosis and approximately 45% patients develop distant metastasis, which is the main cause of death in OS [23]. Hence, it is urgent to detect molecular targets to prevent OS progression, and further to improve the prognosis of OS patients. In the present study, to explore the pathogenesis of OS, we investigated gene expression data E-GEOD-36001 to identify significant genes which might be involved in OS progression via integrating module-identification method with attract method. Significantly, a total of 2 attractors were detected. Moreover, CCNB1, and KIF11 simultaneously appeared in these two attractors.

As we all know, cancer is characterized by unscheduled cell proliferation induced by dys-regulation of the cell-cycle [24]. In our work, our results argued for an important function of cell cycle out of control in the pathogenesis of OS, and several candidate genes were identified, for example, CCNB1. CCNB1, one of the principal mitotic cyclins [25], is believed to be essential for G2-M transition of the cell cycle [26]. Significantly, regulating G2-M transition might be an available target to control the proliferation of cancer cells [27]. More importantly, CCNB1 expression is frequently over-expressed in human cancers and is related with tumor aggressiveness and poor clinical outcome [28, 29]. In addition, Wang et al. [30] have indicated that CCNB1 exerts important roles in the progression of OS via regulating cell cycle. Hence, as demonstrated here, we further highlights the potential use of CCNB1 as a biomarker in the treatment and diagnosis of OS through controlling the cell cycle progression.

KIF11 is a plus end directed kinesin needed for separation of duplicated centrosomes and for the spindle formation in metaphase [31, 32]. Centrosome amplification may result in the formation of aberrant mitotic spindles with multiple spindle poles that lead to abnormal cell divisions and aneuploidy. Cancer cells commonly exhibit genomic instability, and chromosomal instability [33]. As it is known that KIF11 inhibitors result in a collapse of bipolar spindle with a consequent formation of a monopolar spindle causing a block of the cell-cycle [34]. Moreover, KIF11 mRNA expression has been reported to be elevated in many tumor samples derived from breast, colon, lung, ovary, rectum, glioblastoma, and uterus [35, 36]. Thus, we infer that KIF11 might play important roles in the progression of OS, partially through regulation of genomic instability, and chromosomal instability.

In conclusion, the present study performed a comprehensive module analysis and successfully obtained two attractors. Our results provided evidence that candidate genes such as CCNB1 and KIF11 might play important roles in the pathogenesis of OS. We believe that our findings obtained above can provide theoretical guidelines for future works in clinic. Utilization of specific genes in OS will shed new insights for therapeutic and preventive methods. However, our study were obtained based on bioinformatic methods but lacked experimental verifications in vivo or in vitro, future studies will have to validate our obtained preliminary findings.

References

Picci

, Osteosarcoma (Osteogenic sarcoma), Orphanet Journal of Rare Diseases 2 (2007), 1–4.

Posthumadeboer

Witlox

M.A.

Kaspers

G.J.L.

and Royen

B.J.V.

, Molecular alterations as target for therapy in metastatic osteosarcoma: a review of literature, Clinical & Experimental Metastasis 28 (2011), 493–503.

Reis

A.H.O.

Vargas

F.R.

and Lemos

, More epigenetic hits than meets the eye: microRNAs and genes associated with the tumorigenesis of retinoblastoma, Frontiers in Genetics 3 (2012), 287–290.

Ganguly

and Shields

C.L.

, Differential gene expression profile of retinoblastoma compared to normal retina, Molecular Vision 16 (2010), 1292–1303.

Ghiassian

S.D.

, Network Medicine: A Network-based Approach to Human Diseases, Dissertations & Theses – Gradworks (2015).

Greenman

Stephens

Smith

Dalgliesh

G.L.

Hunter

Bignell

Davies

Teague

Butler

Stevens

Greenman

Stephens

Smith

Dalgliesh

Hunter

Bignell

et al., Patterns of Somatic mutation in cancer genomes, Nature 446 (2007), 153–158.

Beyer

Bandyopadhyay

and Ideker

, Integrating physical and genetic maps: from genomes to interaction networks, Nature Reviews Genetics 8 (2007), 699–710.

Lenferink

A.E.G.

Deng

Collins

Cui

Purisima

E.O.

O’connor-McCourt

M.D.

and Wang

, Identification of high-quality cancer prognostic markers and metastasis network modules, Nature Communications 1 (2010), 1–8.

Stuart

J.M.

Segal

Koller

and Kim

S.K.

, A gene-coexpression network for global discovery of conserved genetic modules, Science 302 (2003), 249–255.

10.

Ben-David

and Shifman

, Networks of neuronal genes affected by common and rare variants in autism spectrum disorders, Plos Genetics 8 (2012), e1002556.

11.

Mar

J.C.

Matigian

N.A.

Quackenbush

and Wells

C.A.

, attract: A Method for Identifying Core Pathways That Define Cellular Phenotypes, Plos One 6 (2011), e25445.

12.

Mar

J.C.

Matigian

N.A.

Mackaysim

Mellick

G.D.

Sue

C.M.

Silburn

P.A.

Mcgrath

J.J.

Quackenbush

and Wells

C.A.

, Variance of Gene Expression Identifies Altered Network Constraints in Neurological Disease, Plos Genetics 7 (2011), e1002207.

13.

Irizarry

R.A.

Hobbs

Collin

Beazer-Barclay

Y.D.

Antonellis

K.J.

Scherf

and Speed

T.P.

, Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics 4 (2003), 249–264.

14.

Bolstad

B.M.

Irizarry

R.A.

Åstrand

and Speed

T.P.

, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bio- informatics 19 (2003), 185–193.

15.

Pepper

S.D.

Saunders

E.K.

Edwards

L.E.

Wilson

C.L.

and Miller

C.J.

, The utility of MAS5 expression summary and detection call algorithms, BMC Bioinformatics 8 (2007), 1.

16.

Zhu

L.J.

Gazin

Lawson

N.D.

Pagès

Lin

S.M.

Lapointe

D.S.

and Green

M.R.

, ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data, Bmc Bio- informatics 11 (2010), 1–10.

17.

Szklarczyk

Franceschini

Kuhn

Simonovic

Roth

Minguez

Doerks

Stark

Muller

and Bork

, The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored, Nucleic Acids Research 39 (2011), D561–D568.

18.

Liu

Wong

and Chua

H.N.

, Complex discovery from weighted PPI networks, Bioinformatics 25 (2009), 1891–1897.

19.

Hauke

and Kossowski

, Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data, Quaestiones Geographicae 30 (2011), 87–93.

20.

Mukaka

M.M.

, Statistics corner: A guide to appropriate use of correlation coefficient in medical research, Malawi Medical Journal the Journal of Medical Association of Malawi 24 (2012), 69–71.

21.

Srihari

and Leong

H.W.

, A survey of computational methods for protein complex prediction from protein interaction networks, Journal of Bioinformatics and Computational Biology 11 (2013), 1230002.

22.

Benjamini

and Hochberg

, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society. Series B (Methodological) (1995), 289–300.

23.

Buddingh

E.P.

Anninga

J.K.

Versteegh

M.I.

Taminiau

A.H.

Egeler

R.M.

van Rijswijk

C.S.

Hogendoorn

P.C.

Lankester

A.C.

and Gelderblom

, Prognostic factors in pulmonary metastasized high-grade osteosarcoma, Pediatric Blood & Cancer 54 (2010), 216–221.

24.

Urrego

Tomczak

A.P.

Zahed

Stuhmer

and Pardo

L.A.

, Potassium channels in cell cycle and cell proliferation, Philos Trans R Soc Lond B Biol Sci 369 (2014), 20130094.

25.

Murphy

Stinnakre

M.G.

Senamaud-Beaufort

Winston

N.J.

Sweeney

Kubelka

Carrington

Bréchot

and Sobczak-Thépot

, Delayed early embryonic lethality following disruption of the murine cyclin A2 gene, Nature Genetics 15 (1997), 83–86.

26.

Krek

and Nigg

E.A.

, Differential phosphorylation of vertebrate p34cdc2 kinase at the G1/S and G2/M transitions of the cell cycle: identification of major phosphorylation sites, Embo Journal 10 (1991), 305–316.

27.

Tyagi

A.K.

Singh

R.P.

Agarwal

Chan

D.C.

and Agarwal

, Silibinin strongly synergizes human prostate carcinoma DU145 cells to doxorubicin-induced growth Inhibition, G2-M arrest, and apoptosis, Clinical Cancer Research 8 (2002), 3512–3519.

28.

Nam

H.J.

and van Deursen

J.M.

, Cyclin B2 and p53 control proper timing of centrosome separation, Nature Cell Biology 16 (2014), 538–549.

29.

Suzuki

Urano

Miki

Moriya

Akahira

J.I.

Ishida

Horie

Inoue

and Sasano

, Nuclear cyclin B1 in human breast carcinoma as a potent prognostic factor, Cancer Science 98 (2007), 644–651.

30.

Wang

D.W.

S.Y.

Cao

Yang

Liu

X.Q.

Yao

G.J.

and Bi

Z.G.

, Identification of CD20, ECM, and ITGA as Biomarkers for Osteosarcoma by Integrating Transcriptome Analysis, Medical Science Monitor International Medical Journal of Experimental & Clinical Research 22 (2016), 2075–2085.

31.

Wojcik

E.J.

Buckley

R.S.

Richard

Liu

Huckaba

T.M.

and Kim

, Kinesin-5: Cross-bridging mechanism to targeted clinical therapy, Gene 513 (2013), 133–149.

32.

Brier

Lemaire

Debonis

Forest

and Kozielski

, Identification of the protein binding region of S-trityl-L-cysteine, a new potent inhibitor of the mitotic kinesin Eg5, Horn Book Magazine 43 (2004), 13072–13082.

33.

Fukasawa

, Oncogenes and tumour suppressors take on centrosomes, Nature Reviews Cancer 7 (2007), 911–924.

34.

Coleman

P.J.

and Fraley

M.E.

, Inhibitors of the mitotic kinesin spindle protein, Expert Opinion on Therapeutic Patents 14 (2005), 1659–1667.

35.

Koller

Propp

Zhang

Zhao

Xiao

Chang

Hirsch

S.A.

Shepard

P.J.

Koo

and Murphy

, Use of a chemically modified antisense oligonucleotide library to identify and validate Eg5 (kinesin-like 1) as a target for antineoplastic drug development, Cancer Research 66 (2006), 2059–2066.

36.

Valensin

Ghiron

Lamanna

Kremer

Rossi

Ferruzzi

Nievo

and Bakker

, KIF11 inhibition for glioblastoma treatment: reason to hope or a struggle with the brain? Bmc Cancer 9 (2009), 1–14.