Construction and Clarification of Dynamic Gene Regulatory Network of Cancer Cell Cycle via Microarray Data

Abstract

Background

Cell cycle is an important clue to unravel the mechanism of cancer cells. Recently, expression profiles of cDNA microarray data of Cancer cell cycle are available for the information of dynamic interactions among Cancer cell cycle related genes. Therefore, it is more appealing to construct a dynamic model for gene regulatory network of Cancer cell cycle to gain more insight into the infrastructure of gene regulatory mechanism of cancer cell via microarray data.

Results

Based on the gene regulatory dynamic model and microarray data, we construct the whole dynamic gene regulatory network of Cancer cell cycle. In this study, we trace back upstream regulatory genes of a target gene to infer the regulatory pathways of the gene network by maximum likelihood estimation method. Finally, based on the dynamic regulatory network, we analyze the regulatory abilities and sensitivities of regulatory genes to clarify their roles in the mechanism of Cancer cell cycle.

Conclusions

Our study presents a systematically iterative approach to discern and characterize the transcriptional regulatory network in Hela cell cycle from the raw expression profiles. The transcription regulatory network in Hela cell cycle can also be confirmed by some experimental reviews. Based on our study and some literature reviews, we can predict and clarify the E2F target genes in G1/S phase, which are crucial for regulating cell cycle progression and tumorigenesis. From the results of the network construction and literature confirmation, we infer that MCM4, MCM5, CDC6, CDC25A, UNG and E2F2 are E2F target genes in Hela cell cycle.

Introduction

The losses of cellular regulation give rise to most case of cancer. In cells, intricate genetic control systems regulate the balance between cell survival and death in response to growth signals, growth-inhibiting signals, and death signals. When some errors occur in the control systems causing cells to proliferate continuously, tumor just comes into being (Badawi et al. 2005).

The proliferation of cancer cell into two individual cells must go through cell cycle process (Whitfield et al. 2002; Valsesia-Wittmann et al. 2004). Cell cycle entails an ordered series of macromolecular events that lead to cell division and the production of two daughter cells. So that cell cycle is meaningful to the proliferation of cancer cell.

Expression levels of thousands of genes fluctuate throughout the cancer cell cycle (Cho et al. 2001; Ishida et al. 2001; Whitfield et al. 2002). Functional genes show periodic transcription to reflect cell growth, DNA synthesis, spindle pole body duplication and migration through the cell cycle (Cho et al. 1998). These processes and their regulation have been extensively investigated at the molecular level (Stillman, 1996; Nurse, 2000; Shah and Cleveland, 2000; Hinchcliffe and Sluder, 2001; Chen et al. 2004). Systems biology can be described as “integrative biology” with the ultimate goal of being able to predict de novo biological outcomes given the list of the components involved (Liu, 2005). Hence it is the coordinated study by (1) investigating the components of cellular networks and their interactions, (2) applying experimental high-throughput and whole-genome techniques, and (3) integrating computational methods with experimental effort (Klipp et al. 2005). In this situation, characterization of the genome-wide transcriptional program of the cell division cycle in mammalian cells is a critical step toward understanding the basic cell cycle processes and their roles in cancer. Therefore, it is worth investigating how these periodic patterns are regulated in the gene regulatory network of the cancer cell cycle from the systems biology perspective.

Gene expression data of Hela cell cycle have been collected (Whitfield et al. 2002) and analyzed with many clustering methods to organize which gene is associated with the cell cycle (Whitfield et al. 2002;Cho et al. 2001). Theoretically, it is possible to engineer the cell cycle network reversely, if we take cDNA expression levels as the output of gene expression networks, and collect cDNA expression levels of transcription factors as input. In order to realize how genes are regulated by transcription factors, we must also understand the interactions between target genes and their transcription factors (which transcription factor binds to which promoters). With all these information and interactive dynamic model, we get some clues to piece up the gene expression regulatory network in Hela cell cycle.

In this study, we attempt to devise an interactive dynamic model to characterize transcriptional regulatory network of the Hela cell cycle from the cDNA expression data of the human cell cycle in tumour (Whitfield et al. 2002). Based on our dynamic regulatory network, we not only predict the upstream regulators but also characterize the significance of the regulators depending on quantifying their regulatory abilities based on the corresponding biochemical kinetic parameters. At first, we construct a discrete-time dynamic model system and calculate the system kinetic parameters as the regulatory ability by using the expression data (Whitfield et al. 2002) and the system identification method (Johansson, 1993). Second, based on the interactive dynamic model, we detect the transcriptional regulatory function of target genes by the maximum likelihood parameter estimation method. Third, we trace back a group of upstream genes that play a role of transcriptional regulators of target genes in the Hela cell cycle of Homo sapiens via deducing the interactive relationship between the expression profiles of regulators and the detected transcription regulation of specific target genes. The pathway kinetic parameters of transcriptional regulatory network of Hela cell cycle are also estimated by the cDNA expression profiles of target genes and their upstream regulatory genes. Finally, these upstream regulatory genes are considered as the target genes. By a similar method, we construct their upstream regulatory genes one by one. Iteratively, we can construct the whole gene regulatory network of Hela cell cycle.

We applied our method to a publicly available data set of HeLa cell with microarray experiment on cell cycle (Whitfield et al. 2002) to identify the transcriptional regulators of cell cycle and to characterize their regulatory abilities on specific target genes. By means of the quantitative system analysis of the transcriptional regulatory network from Hela cell cycle genes, several transcription factors were identified and their regulatory abilities were determined. Further, some genes that may be suspected of regulators of Hela cell cycle are predicted here to be synergistic in harmonizing gene expression. Our proposed algorithm provides a novel approach to gain insight into the gene regulatory network of Hela cell by its gene expression data using system identification technique and discrete-time dynamic model. Furthermore, we combine the constructed Hela cell cycle dynamic network, some experimental reviews and E2F binding site's research (Ren et al. 2002), we not only confirm the reliability of the dynamic network but also find the E2F target genes in Hela cell cycle progression. Finally, from the results of this study, we infer that E2F directly regulate MCM4, MCM5, CDC6, CDC25A, UNG and E2F2 in Hela cell cycle progression.

Our approach is so different from the statistical clustering method that it not only provides a suitable interactive dynamic model to decipher the complex signal transduction pathway that regulates gene expressions in Hela cell cycle but also predicts some potential regulators that have not been found. We can also quantify the regulatory abilities of the transcription factors by the corresponding kinetic activities to their target genes in the Hela cell cycle regulatory network. We could construct the cell cycle regulatory network in Hela cells quantitatively and discuss the sensitivity of regulatory genes to the gene regulatory network from the system analysis perspective.

System Model and Network Identification

The construction of the transcriptional regulatory network of Hela cell cycle can be divided into two steps. First, the transcriptional regulation should be extracted from the gene expression data by the dynamic discrete-time model. Second, the upstream regulators will be traced back by correlating the transcriptional regulation of target genes with the expression profiles of possible regulators of Hela cell cycle. In this study, 64 transcription regulators, as shown in Table 1, are used as candidates of upstream regulators to each target gene. Finally, the kinetic parameters of gene transcriptional regulatory network of Hela cell are estimated by the cDNA expression profiles of target genes and their upstream regulatory genes.

Table 1

ClonelDs (Whitfield et al. 2002) of 64 transcription factors in human cell cycle are selected as candidates to regulate downstream target genes.

Gene names are boldly denoted behind or under the ClonelDs in brackets. IEA denotes Inferred from Electronic Annotation in NCBI.
IMAGE: 173309 (BCLAF1)	IMAGE:564803(FOXM1) IEA	IMAGE:590774(MAPK13) Knebel et al.	IMAGE:202704(SCML1) IEA
IMAGE:241474(BRCA1) IEA	IMAGE:149809(GATA2) IEA	IMAGE:1536451 (MHC2TA)	IMAGE782622 (SP1)
IMAGE:815287(BRD8) Monden et al.	IMAGE:135688(GATA2) IEA	IMAGE:809731(MNT) Meroni et al.	IMAGE:80318 (SP1)
IMAGE:825210(C14orf106) IEA	IMAGE780958 (GTF3C4)	IMAGE:197520(NCOA3) IEA	IMAGE:840636 (SRF)
IMAGE:726588(C14orf106) IEA	IMAGE:291827 (GTF3C4)	IMAGE:884438(NFE2L2) IEA	IMAGE:545503 (STAT1)
IMAGE:”! 915416(CDK7) Shiekhattar et al.	IMAGE:344049(HCFC1) Wysocka et al.	IMAGE:1455463 (NFIC)	IMAGE:134120 (STAT5B)
IMAGE:130242(CDK7) Shiekhattar et al.	IMAGE:897806 (HIF1A)	IMAGE:265874 (NFIC)	IMAGE712840 (STAT5B)
IMAGE:268652(CDKN1A) IEA	IMAGE:878184(HMG20B) IEA	IMAGE:271198(NR3C1) Takahashi et al.	IMAGE:132857 (STAT5B)
IMAGE:240367(CTCF) Filippova et al.	IMAGE:1842250(HMGB2) Shirakawa et al.	IMAGE:245517(NR5A2) IEA	IMAGE:272192 (TCERG1)
IMAGE:490728 (DMTF1)	IMAGE:363103(HMGB2) Shirakawa et al.	IMAGE:43229(PCNA) IEA	IMAGE:137387 (TFAP2A)
IMAGE:487797 (DR1)	IMAGE:242952(ILF2) Reichman et al.	IMAGE:789182(PCNA) IEA	IMAGE:868630(TGFB1I4) IEA
IMAGE:566760 (DR1)	IMAGE:838829(JARID1 B) IEA	IMAGE:30114 (PHTF2)	IMAGE:366414(UHRF1) IEA
IMAGE:884462(DSCR1) Fuentes et al.	IMAGE:345056(KIAA1404) IEA	IMAGE:1947972(PKNOX1) Chen et al.	IMAGE:1550739(UHRF1) IEA
IMAGE:236142 (E2F1)	IMAGE:510381(KLF6) Ratziu et al.	IMAGE:2018976(PTTG1) Dominguez et al.	IMAGE:246869(ZNF207) Pahl et al.
IMAGE:768260 (E2F1)	IMAGE:302549(KLF9) Imataka et al.	IMAGE:781089(PTTG1) Dominguez et al.	IMAGE:296429(ZNF24) IEA
IMAGE:293331 (E2F2)	IMAGE:35147(LASS6) Banerjee-Basu et al.	IMAGE:845502(RBPSUH) Hsieh et al.	IMAGE:280750(ZNF281) Law et al.

Dynamic signaling regulatory model

The second-order difference equation is used in the description of dynamic system evolved from the causality of gene regulatory function. Let X_i(k) denote the expression profile of the i-th gene at time point k. The following second-order difference equation is proposed to model the cDNA expression level of the i-th gene,

\begin{array}{l} X_{i} (k) + a_{i} X_{i} (k - 1) + b_{i} X_{i} (k - 2) \\ = G_{i} (k) + ε_{i} (k) \end{array}

(1)

where G_i(k) is the upstream transcriptional regulatory function from regulatory genes to influence the expression profile X_i(k) of the i-th gene while a_i and b_i are the parameters that characterize the dynamic inherent property of the gene like degradation and oscillation, and ∈_i (k) is the random noise of current microarray data or the residue of the model. In general, the second-order difference equation has been widely used to model dynamic discrete-time systems to efficiently characterize the dynamic properties of damping and resonance of systems in physics and engineering. The reason is that the roots of the characteristic equation of second-order dynamic equation may be a real double root, two real roots or conjugate roots, which could easily describe a system with undamping, overdamping, critical damping, under-damping or oscillation, dependent on the specification of their coefficients (Kreszig, 1993). These characteristics can not be easily described by the first-order dynamic equation. Therefore the second order stochastic equation is employed to characterize the biochemical processing of the gene expression.

Evidently, the transcriptional regulatory function G_i(k) controls the synthetic rate of cDNA and the clue of upstream regulatory pathway is involved in G_i(k). Therefore we emphasize on how to detect the upstream regulatory function G_i(k) from expression data X_i(k) and our dynamic model equation in Equation 1). In general, it is not easy to extract transcriptional regulatory function G_i(k). In order to extract the input regulatory function G_i(k), we apply Fourier decomposition method to decompose G_i(k) to generate some harmonic sinusoid functions. When the extraction problem is reduced to a simple parameter estimation problem, G_i(k) can be decomposed by the following Fourier series.

G_{i} (k) \approx \sum_{n = 0}^{N} [α_{n} \cos (n k) + β_{n} \sin (n k)]

(2)

Then we need to estimate the parameters of Fourier series, α_n and β_n, that are the magnitudes of different harmonics of sinusoid functions (cos(nt) and sin(nt)) for n = 0, …, N. Fourier series is a good tool to synthesize functions with finite energy by harmonic functions in respect of engineering.

Extraction of the transcriptional regulatory function G_i(k)

Since G_i(k) has been decomposed, we combine equations (1) and (2) to get the following dynamic model equation for the expression profile of the i-th gene,

\begin{matrix} X_{i} (k) = - a_{i} X_{i} (k - 1) - b_{i} X_{i} (k - 2) \\ + {\sum^{​}}^{​}_{N}^{n = 0} [α_{n} \cos (n k) + β_{n} \sin (n k)] + ε_{i} (k) \end{matrix}

(3)

In the above dynamic model equation, the parameters a_i, b_i, α_n, and β_n should be estimated by the time profile of expression data of the i-th gene in linear scale, i.e. these parameters should be specified so that the simulating output X_i(k) of the dynamic model in Equation 3) should match the expression profile of the i-th gene. The maximum likelihood estimation method is employed to estimate these parameters a_i, b_i, α_n, and β_n in Equation 3) from the expression profile X_i(k) in the section Methods.

After the parameters α_n and β_n of the regulatory function G_i(k) have been estimated in the section Methods, we can present the regulation detection Ĝ_i(k)as follows,

{\hat{G}}_{i} (k) = \sum_{n = 0}^{N} [{\hat{α}}_{n} \cos (n k) + {\hat{β}}_{n} \sin (n k)]

(4)

where ${\hat{α}}_{n}$ and ${\hat{β}}_{n}$ are the estimates of α_n and β_n, respectively.

We know that the input transcriptional regulatory function G_i(k) of the target gene of Hela cell cycle is often relative to the bindings of transcription factors or some interactions from the upstream regulators. In the next step, we will trace back the corresponding regulatory genes from the input regulatory function Ĝ_i(k) of the target gene.

Iterative algorithm for constructing gene regulatory network

In biology, the specific biochemical reactions are usually relative to the concentration of specific products. For this purpose, we describe the regulatory function as the following sigmoid function to describe the binding and nonbinding of transcription factors to motif binding sites (Chen et al. 2004; Klipp et al. 2005)

{\tilde{X}}_{j} (k) = \frac{1}{1 + e^{- γ (X_{j} (k) - M_{j})}}

(5)

where γ is the transition rate and M_j is the mean expression of the j-th regulatory gene's profile X_j(k).

We determined, regulatory genes whose regulatory signals X_i(k)j = 1, …, R are the most correlative to the target gene profile X_i(k) of the i-th target gene. Then, we could reconstruct the gene regulatory network by tracing back the upstream regulators from the extracted regulatory function Ĝ_i (k), which are contributed by R_i regulatory genes, via the following biochemical kinetic relationship,

{\tilde{G}}_{i} (k) = c_{i 0} + \sum_{j \in R_{i}} c_{i j} {\tilde{X}}_{j} (k) + e_{i} (k)

(6)

where c_ij is the pathway kinetic parameters from the regulatory gene j to target gene i, R_i represents the number of the searched upstream regulatory genes selected by the absolute value of correlation coefficient between the target gene expression profile and the regulatory gene expression profile which is more than 0.8 based on the 95% confidence of normalized correlation coefficients of expression profiles of total cell cycle-related genes in Whitfield et al. 2002, the constant c_i0 is the basal level denoting the regulatory function other than upstream regulatory genes, for example, due to post-transcriptional regulation, and e_i(k) is the error or the noise of the network model.

Using the maximum likelihood algorithm in Method to estimate the parameters c_i0 and c_ij from Ĝ_i(k) and X_i(k), the regulation from the upstream regulators is identified as

{\hat{G}}_{i} (k) = {\hat{c}}_{i 0} + \sum_{j \in R_{i}} {\hat{c}}_{i j} {\tilde{X}}_{j} (k)

(7)

By combining Equation 1) and the above equation, the dynamics of transcriptional regulatory network of the Hela cell cycle can be represented by the following identified difference equation,

\begin{matrix} X_{i} (k) = - \hat{a} X_{i} (k - 1) - {\hat{b}}_{i} X_{i} (k - 2) \\ + {\hat{c}}_{i 0} + \sum_{j \in R_{i}} {\hat{c}}_{i j} {\tilde{X}}_{j} (k) \end{matrix}

(8)

where i = 1,2,… for all profile target genes in Hela cell cycle.

In fact, Equation 8) contains much information for exploring the regulatory network of each specific target gene of the Hela cell cycle. The regulatory genes, which belong to a specific set, R_i, represent the potential upstream regulators for target gene i. The estimated chemical kinetic parameter, ĉ_ij, characterizes the type and intensity of the influence of they jth regulatory gene on the ith target gene, in which positive sign indicates activation and a negative sign indicates repression, and the magnitude is defined as the regulatory ability. After the regulatory pathways of the ith target gene is constructed by tracing back their upstream regulatory genes, these R_i upstream regulators are considered as target genes again to trace back their upstream regulatory genes. Iteratively, we can construct the whole gene regulatory network of the Hela cell cycle globally. The goal of reverse engineering gene regulatory network is to deduce the possible set of regulators and to identify their associated regulation abilities by the available data set from the dynamic system perspective. For this purpose, we devise a novel algorithm based on the dynamic gene expression model for searching possible upstream regulators and then identifying the relevant regulatory abilities ĉ_ij according to Equation 8).

Results

Data processing and analysis

Data were extracted by superimposing a grid over each array using GenePix 3.0 software (Axon Instruments). Spots of poor quality, determined by visual inspection, were removed from further analysis. Data of HeLa cell collected for each array were stored in the Stanford Microarray Database (SMD) and are available from SMD at http://genome-www5.stanford.edu/ (Sherlock et al. 2001; Whitfield et al. 2002).

We combine 775 cell cycle-related genes from the Human expression of Hela cell cycle-regulated genes according to the classification by Whitfield et al. (2002) with Human expression of cell cycle-regulated genes according to the traditional classification as the target genes. After that, we select 64 transcription factors (Table 1) from the 775 cell cycle genes (Whitfield et al. 2002).

The raw data were transformed into a linear scale from the original log ratio and applied to our approach. Following the dynamic model in Equation 8), the parameters which characterize the dynamic regulatory mechanism are estimated successfully for each target gene in the pathway. Fig. 1 compares the simulation results of the dynamic expression model in Equation 8) with the experimental expression profiles for some important cell cycle-related genes regulated by E2F family (Bracken et al. 2004), such as CDC25A (Stanelle et al. 2002; Muller and Helin, 2000; Ren et al. 2002), MCM6 (Ren et al. 2002; Polager et al. 2002; Ishida et al. 2001), E2F1 (Bracken, 2004), CDC6 (Stanelle et al. 2002; Ren et al. 2002), E2F2 (Ren et al. 2002; Muller et al. 2001), MCM5 (Ren et al. 2002; Ishida et al. 2001), MCM4 (Ren et al. 2002; Ishida et al. 2001), PCNA (Muller and Helin, 2000; Ren et al. 2002; Polager et al. 2002; Ishida et al. 2001), RFC4 (Ren et al. 2002; Polager et al. 2002), and DHFR (Ishida et al. 2001). The extracted regulatory functions, Ĝ_i(k) of these genes which are estimated by the maximum likelihood algorithm in Methods from their expression profiles are shown in Fig. 2. The extracted regulatory function Ĝ_i(k) in Fig. 2 are employed to estimate the kinetic parameters of gene regulatory network in Equation 8) by the parameter estimation scheme in Methods. Our iterative algorithm can find the most likely regulatory genes that may participate in the expression program of Hela cell cycle genes.

Figure 1

The second-order dynamic model fitting of E2F target genes.

Figure 2

The extracted upstream regulatory functions from expression profiles (Whitfield et al. 2002) and their fitting by upstream regulatory genes.

Inference of the regulatory pathway

For illustrations, the inferring strategy is applied to the E2F target genes (Bracken et al. 2004) in Hela cell cycle pathways to recognize their upstream regulatory genes. E2F transcription factors are well studied owing to their importance in both cell cycle (Muller and Helin, 2000; Nevins, 2001). Their regulatory abilities are shown in the upstream regulatory functions Ĝ_i(k) of dynamic equation in Table 2. Parameters of regulatory functions Ĝ_i (k) in Table 2 represent the regulatory abilities and sensitivities of the relative transcription factors. It is very exciting that E2F1 and E2F2 are found to be active regulators in most E2F target genes listed in Table 2, which agree very well with the previous results (Cam and Dynlacht, 2003; Ivey-Hoyle et al. 1993; Lang et al. 2001). The regulatory abilities of the related regulators implying different degrees of influence are converted to the red-colored lines as positive regulations (activations) and the black-colored lines as negative regulations (inhibitions) for each target gene. Then, based on the dynamic regulatory equations in Table 2 (see detail in Supplementary Table S1), the pathways of E2F target gene in Hela cell cycle regulatory system are described in Fig. 3. The coefficients of these dynamic regulatory equations represent the kinetic activities of regulatory genes. If a regulatory gene is with a large kinetic parameter in the dynamic regulatory equation, it will play an important role in Hela cell cycle and is more sensitive to the gene expression of target gene.

Table 2

Upstream regulatory TFs and their regulatory function Ĝ_i (k) on E2F target genes in cancer cell cycle. The positive sign implies activations while the negative sign implies inhibitions for each target gene. The magnitudes indicate their regulatory abilities to the downstream target genes.

Supplementary Table S1

A miniature dynamic model network with the identified upstream regultors and their downstream target genes in the pathway of E2F target genes in cancer cell cycle. The coefficients characterize the corresponding regulatory abilities and sensitivities of the transcription regulations. The positive sign implies activations while the negative sign implies inhibitions for each target gene.

Figure 3

The regulatory pathways of E2F target genes in cancer cell cycle based on the dynamic regulatory modeling in Table 1.

Based on the dynamic regulatory modeling, the 152 E2F target genes are found and shown in Table 3. In these 152 E2F target genes, 6 genes match the E2F target genes found by Elkon et al. (2003) from 124 E2F target genes edited by Ren et al. (2002) and 17 genes match the E2F target genes found by Elkon et al. (2003) from 872 periodic genes edited by Whitfield et al. (2002). Ren et al. (2002) has found 124 E2F target genes with E2F binding promoter. The comparisons of these results are shown in Fig. 9.

Table 3

The 152 E2F target genes in the gene regulatory network based on the dynamic regulatory modeling.

The marked signs are the matched E2F target genes found by Elkon et al. (2003).
BAIAP2	TOP3A	ACYP1	BIVM	ADCK2	OGT	C20orf55
SSR3	MED31	CDC6^*^#	FLJ10154	FANCG	PILRB	MET
SLBP^#	NASP	ADCY6	MGC15716	APEX2	FLJ11021	NR5A2
MNT	CDC25A^*^#	ANKRD10	ABCA7	LOC197336	ASF1B	CDKN2D
LOC221955	UNG^*^#	BARD1	FLJ13912	RBBP8	SFRS5^#	CENPE
LOC90110	MGC3207	E2F2^*	Pfs2	MCM8	C22orf18	USP16
FBXL20	UBQLN2	MCM5^*^#	EGFL5	C20orf111	LOC58486	ARHGAP8
ANKRD25	RAMP	CCRK	NKTR	ABCC5	MAP3K2	PSEN1
KIAA1529	MCM6^*^#	TREX1	KIAA0092	MLF1IP	EZH2	C6
PTOP	DMTF1^#	RAB23^#	LOC200895	FANCA	CENTB5	USP6NL
TRIM26	KIAA0738	FLJ10618	HSPB8	DNAJC6	HIST3H2A	LOC51334
ZMYND19	PNN	OACT1	C14orf130	RAD18	BRCA1	AKAP13
CDCA7	MAP2K6	ORC1L^#	KIAA1586	SP1	FLJ35740	AGPAT3
CASP8AP2”	GMNN	MCM4^#	HELLS	USP1	HSPC150	ASAM
PLCXD1	LOC283596	MSH2	CLSPN	MYCBP2	ORC3L	RAB3A
ZNF367	CASP2	LCHN	RFC4	TOPBP1^#	DNAJB4	HIST2H2BE
RNP	C9orf42	FLJ20280	DONSON	DHFR	RRM1	HIST2H2AA
USP53	E2F1^#	PCNA	AP4B1	DKFZP434B168^#	RHOBTB3	H2AFY
SKP2	CCNE2^#	MBD4	MGC2610	DLNB14	KIAA0841	H3F3A
PDXP	SCML1	TLOC1	ATAD2	RFC2	NFE2L2	HIST1H3D
FLJ20530	SERPINB3	SDC1	RPA2^#	NSUN3	BBS2
PANK2	UHRF1	PRPS2	FEN1	FKBP6	FANCD2

E2F target genes found by Elkon et al. (2003) from 124 E2F target genes edited by Ren et al. (2002).

E2F target genes found by Elkon et al. (2003) from 872 periodic genes edited by Whitfield et al. (2002).

After finishing the construction of the first layer network of E2F target genes (Bracken et al. 2004), we take these upstream regulators as target genes. By using similar method, we construct the upstream regulatory genes of these target genes. In the second layer network, the regulatory abilities implying different degrees of influence are converted into pink-colored lines as positive regulations (activations) and blue-colored lines as negative regulations (inhibitions). Then, we combine the first layer network and the second layer network together to form a more complete network to E2F target genes in HeLa cell cycle as described in Fig. 4. Iteratively, we can construct the higher layer network to complete the gene regulatory network of Hela cell cycle.

Figure 4

The network of E2F target genes in cancer cell cycle based on the dynamic regulatory modeling.

Discussion

The losses of cellular regulation give rise to most cases of cancer. The transcription factors are crucial for regulating cell cycle progression and may be related to the development of a cancer. Therefore, to understand these gene regulatory processes, we need to unravel the regulatory mechanisms of these transcription factors in cell cycle. Our study presents a systematically iterative approach to discern and characterize the transcriptional regulatory network of 775 cell cycle-related genes from the raw expression profiles of Hela cell (Whitfield et al. 2002). Because the transcriptional regulatory network of 775 cell cycle-related genes is very complicated, two miniature gene regulatory networks of E2F family during G1/S phase in Fig. 4 and the other family during G2/M in Fig. 8 are given to illustrate the regulatory mechanism of Hela cell.

Our approach also offers the following advantages. First, based on the dynamic regulatory model, a gene regulatory network of cancer cell could be constructed by the extracted upstream regulatory function through microarray data. Then, the identified regulatory ability for each specific regulator could evaluate the contribution of this regulator; the positive sign stands for activation and the negative sign stands for repression, and the magnitude represents the significance. These advantages of the proposed approach will improve the analysis to cope with rapidly growing microarray data of human. BRCA1 is one of the important cell cycle-related genes to play as a transcriptional repressor in cell cycle progress (Kennedy et al. 2005). This finding matches the gene regulatory network constructed by dynamic regulatory model (shown in Fig. 4 and Fig. 5). It is clear that E2F regulates the expression of a host of factors that function during G1/S transition and S phase even the whole cell cycle (Bracken et al. 2004). E2F is best known for its role in regulating the transcription of genes that positively affect cell cycle progression (Ortega et al. 2002; Sherr and Roberts, 1999; Di et al. 2003; Stott et al. 1998; Trimarchi and Lees, 2002; Hayashi et al. 2006). Our results almost match this finding (shown in Fig. 4 and Fig. 5). Hayashi with his collaborator found that E2F1 activates human MCM8. In our study, we found that E2F1 activates human MCM5 and MCM6 (shown in Fig. 4 and Fig. 5). Elkon et al. (2003) and Ren et al. (2002) have found that human MCM5 and MCM6 are two E2F target genes (shown in Table 3). As a result of these studies, we infer that human MCM5 and MCM6 may be positively regulated directly by E2F1 in Hela cell cycle. Accumulating evidence indicates that cdc25A possesses oncogenic properties. Recently, overexpression of cdc25A was found in many breast, head and neck cancers (Wu et al. 1998). CDC6 plays a critical role in the regulation of the onset of DNA replication in eukaryotic cells and Cdc6 expression is down-regulated in prostate cancer (Robles et al. 2002). From the results of Table 3, we could infer that E2F directly regulates MCM4, MCM5, CDC6, CDC25A, UNG and E2F2 in Hela cell cycle progression. In Fig. 5, we represent some E2F target genes and show the importance of E2F transcription factor in Hela cell cycle progression. In Fig. 7, we also predict the probable transcription regulations in some target genes, which express in G2/M phase of human cell cycle as shown in the regulatory network of Fig. 8. Further, we can construct not only E2F related regulatory network but also the whole Hela cell cycle network if we have genome-wide microarray data and CHIP data. However, at present, we still need to get enough evidence and CHIP experiments in the same cellular system to confirm our gene regulatory network of Hela cell cycle (Bracken et al. 2004).

Figure 5

The miniature cancer cell cycle network.

Based on the dynamic regulatory modeling, the 152 E2F target genes are found and shown in Table 3. These target genes may be regulated by E2F directly or indirectly. In these 152 E2F target genes, 6 genes match the E2F target genes found by Elkon et al. (2003) from 124 E2F target genes with E2F promoter edited by Ren et al. (2002) and 17 genes match the E2F target genes found by Elkon et al. (2003) from 872 periodic genes in Hela cell cycle edited by Whitfield et al. (2002).

Finally, in order to validate the proposed approach, an independent validation is also given by randomly reshuffling the time order of microarray experiment but with the same choices of target gene and regulatory genes, to confirm the reliability of the proposed method as shown in Fig. 6. As previous statement, BRCA1 plays as a transcriptional repressor in cell cycle progress (Kennedy et al. 2005) (shown in Fig. 5). From the shuffling results shown in Fig. 6, BRCA1 becomes a transcriptional activator. It is clearly seen that the proposed Hela cell cycle regulatory pathway in Fig. 5 is destroyed by reshuffling the experimental data.

Figure 6

The miniature cancer cell cycle network in Fig. 5 is repeated as independent validation by randomly reshuffling the time order of microarray experiment but with the same choices of target and regulatory genes.

Figure 7

The regulatory pathways of target genes, which expressed in G2/M phase of human cell cycle, are based on the dynamic regulatory modeling.

Figure 8

The gene regulatory network of in G2/M phase of cancer cell cycle based on the dynamic regulatory modeling and the interactions of regulatory pathways in Fig. 7.

Methods

Maximum likelihood Estimation of a_i, b_i, α_n and β_n

The dynamic Equation 3) must match the expression profile at all time points and then is arranged in a vector difference form. Consequently, the vector dynamic form of this equation is applied to m time points of expression profile in order to make the dynamic model work.

\begin{matrix} X_{i k} = - a_{i} X_{i k - 1} - b_{i} X_{i k - 2} \\ + \sum_{n = 0}^{N} [α_{n} \cos (n) + β_{n} \sin (n)] + ε_{i k} \\ X_{i m} = {[\begin{matrix} X_{i} (3) \\ X_{i} (4) \\ ⋮ \\ X_{i} (m) \end{matrix}]}_{,} X_{i m - 1} = {[\begin{matrix} X_{i} (2) \\ X_{i} (3) \\ ⋮ \\ X_{i} (m - 2) \end{matrix}]}_{,} \\ X_{i m - 2} = {[\begin{matrix} X_{i} (1) \\ X_{i} (2) \\ ⋮ \\ X_{i} (m - 2) \end{matrix}]}_{,} \end{matrix}

(M.1)

where

\begin{matrix} \cos (n) = {[\begin{matrix} \cos (n 3) \\ \cos (n 4) \\ ⋮ \\ \cos (n m) \end{matrix}]}_{,} \sin (n) = {[\begin{matrix} \sin (n 3) \\ \sin (n 4) \\ ⋮ \\ \sin (n m) \end{matrix}]}_{,} \\ ε_{i m} = {[\begin{matrix} ε (3) \\ ε (4) \\ ⋮ \\ ε (m) \end{matrix}]}_{.} \end{matrix}

Next, we translate equation (M.1) into a matrix form,

Y_{i} = A_{i} Φ_{i} + E_{i}

(M.2)

where Φ_i = [a_i b_i α₀ β₀ … α_Nβ_N]^T and E_i = ∈_im are in vector forms, and A_i = [-X_im-1-X_im-2 cos(0) sin(0) … cos (N) sin (N)]^Tis a matrix.

Then we use the maximum likelihood estimation to derive the optimal parameters estimation of ${\hat{Φ}}_{i}$ (Johansson, 1993).

We assume that each element in the error vectors, ∈ _i (k), k = (3, …, m), is an independent random variable with a normal distribution with zero mean and variance σ², and we will estimate the parameter ${\hat{Φ}}_{i}$ , by the maximum likelihood method.

\begin{matrix} p (Y_{i} | Φ_{i}, σ^{2}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp \\ {- \frac{{[Y_{i} - A_{i} Φ_{i}]}^{T} [Y_{i} - A_{i} Φ_{i}]}{2 σ^{2}}} \end{matrix}

(M.3)

The log-likelihood function for given m data points is then described by–

\begin{matrix} \log L (Φ_{i}, σ^{2}) = - \frac{m - 2}{2} 1 n [2 π σ^{2}] \\ - \frac{1}{2 σ^{2}} \sum_{k = 1}^{m} {[Y_{i} - A_{i} Φ_{i}]}^{T} [Y_{i} - A_{i} Φ_{i}] \end{matrix}

(M.4)

Here we expect the log-likelihood function to have the maximum at $Φ = \hat{Φ}$ and $σ^{2} = {\hat{σ}}^{2}$ . The necessary condition for maximum likelihood estimates $\hat{Φ}$ and ${\hat{σ}}^{2}$ as follows (Johansson, 1993).

\begin{array}{l} {\frac{\partial \log L (Φ, σ^{2})}{\partial Φ} |}_{Φ = \hat{Φ}} = 0 \\ {\frac{\partial \log L (Φ, σ^{2})}{\partial σ^{2}} |}_{σ^{2} = {\hat{σ}}^{2}} = 0 \end{array}

(M.5)

The estimated parameters $\hat{Φ}$ and ${\hat{σ}}^{2}$ are shown below.

{\hat{Φ}}_{i} = {(A_{i}^{T} A_{i})}^{- 1} A_{i}^{T} Y_{i}

(M.6)

{\hat{σ}}^{2} = \frac{1}{m - 2} {(Y_{i} - A_{i} Φ_{i})}^{T} (Y_{i} - A_{i} Φ_{i})

(M.7)

Theoretically, E_i is just the noise of the gene expression profile of the microarray chips, but some modeling errors and approximation errors in Equation 2) are also involved in E_i. So that taking the modeling error and approximation error in our consideration makes our dynamic model equation more approach the actual situation. The number of Fourier series N is determined by tradeoff between the computational complexity of parameter estimation in equation (M.6) and the accuracy of approximation in Equation 2). According to our expression data, we choose N = 16 to make the synthesis of these harmonics be the best approximation to the expression data.

Parameter Estimation of c_i0 and C_ij

To estimate the pathway kinetic parameters c_ij in Equation 6) by Ĝ_i (k) and $\tilde{X}$ _j(k) with m time points of upstream regulatory expression profile, Equation 6) is represented with algebraic form as follows,

\begin{array}{l} {\hat{G}}_{i} = B_{i} Ω_{i} + V_{i} \\ {\hat{G}}_{i} = {[\begin{matrix} {\hat{G}}_{i} (3) \\ {\hat{G}}_{i} (4) \\ ⋮ \\ {\hat{G}}_{i} (m) \end{matrix}]}_{,} B_{i} = [\begin{matrix} 1 & {\tilde{X}}_{1} (3) & \dots & \dots & {\tilde{X}}_{R_{i}} (3) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & {\tilde{X}}_{1} (m) & \dots & \dots & {\tilde{X}}_{R_{i}} (m) \end{matrix}] \end{array}

(M.8)

where

Ω_{i} = {[\begin{matrix} c_{i 0} \\ c_{i 1} \\ ⋮ \\ c_{i R_{i}} \end{matrix}]}_{, a n d} V_{i} = [\begin{matrix} e_{i} (3) \\ e_{i} (4) \\ ⋮ \\ e_{i} (m) \end{matrix}]

We assume that each element in the error vectors, e_i(k), k = {3,…, m}, is an independent random variable with a normal distribution with zero mean and variance $σ_{e_{i}}^{2}$ . Then by the similar procedure, the estimated parameters $\hat{Ω}$ and $σ_{e_{i}}^{2}$ are shown below.

\hat{Ω} = {(B_{i}^{T} B_{i})}^{- 1} B_{i}^{T} {\hat{G}}_{i}

(M.9)

σ_{e_{i}}^{2} = \frac{1}{m - 2} {({\hat{G}}_{i} - B_{i} Ω_{i})}^{T} ({\hat{G}}_{i} - B_{i} Ω_{i})

(M.10)

Footnotes

Acknowledgements

We thank the National Science Council, Taiwan for grants NSC 95-2627-B-007-011.

References

Badawi

R.A.

, Birns

, and Watson

. 2005. Growth factors and their relationship to neoplastic and paraneoplastic disease. Eur. J. Intern. Med., 16: 83–94.

Bracken

A.P.

, Ciro

, and Cocito

. 2004. E2F target genes: unraveling the biology. Trends Biochem. Sci., 29: 409–17.

Banerjee-Basu

, and Baxevanis

A.D.

2001. Molecular evolution of the homeodomain family of transcription factors. Nucleic Acids Res., 29(15): 3258–69.

Cam

, and Dynlacht

B.D.

2003. Emerging roles for E2F: beyond the G1/S transition and DNA replication. Cancer Cell, 3: 311–6.

Chen

H.C.

, Lee

H.C.

, and Lin

T.Y.

. 2004. Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle. Bioinformatics, 20: 1914–27.

Chen

, Rossier

, and Nakamura

. 1997. Cloning of a novel homeobox-containing gene, PKNOX1, and mapping to human chromosome 21q22.3. Genomics, 41(2): 193–200.

Cho

R.J.

, Campbell

M.J.

, and Winzeler

E.A.

. 1998. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell., 2: 65–73.

Cho

R.J.

, Huang

, and Campbell

M.J.

. 2001. Transcriptional regulation and function during the human cell cycle. Nat. Genet., 27: 48–54.

Di Stefano

, Jensen

M.R.

, and Helin

2003. E2F7, a novel E2F featuring DP-independent repression of a subset of E2F-regulated genes. EMBO. J., 22: 6289–98.

10.

Dominguez

, Ramos-Morales

, and Romero

. 1998. hpttg, a human homologue of rat pttg, is overexpressed in hematopoietic neoplasms. Evidence for a transcriptional activation function of hPTTG. Oncogene., 17(17): 2187–93.

11.

Filippova

G.N.

, Fagerlie

, and Klenova

E.M.

. 1996. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol. Cell. Biol., 16(6): 2802–13.

12.

Fuentes

J.J.

, Pritchard

M.A.

, and Planas

A.M.

. 1995. A new human gene from the Down syndrome critical region encodes a proline-rich protein highly expressed in fetal brain and heart. Hum. Mol. Genet., 4(10): 1935–44.

13.

Hayashi

, Goto

, and Haga

. 2006. Comparative genomics on MCM8 orthologous genes reveals the transcriptional regulation by transcription factor E2F. Gene, 367: 126–34.

14.

Hinchcliffe

E.H.

, and Sluder

2001. “It takes two to tango”: understanding how centrosome duplication is regulated throughout the cell cycle. Genes Dev., 15: 1167–81.

15.

Hsieh

J.J.

, Zhou

, Chen

, Young

D.B.

, and Hayward

S.D.

1999. CIR, a corepressor linking the DNA binding factor CBF1 to the histone deacetylase complex. Proc. Natl. Acad. Sci. U.S.A., 96(1): 23–80

16.

Imataka

, Sogawa

, and Yasumoto

. 1992. Two regulatory proteins that bind to the basic transcription element (BTE), a GC box sequence in the promoter region of the rat P-4501A1 gene. EMBO. J., 11(10): 3663–71.

17.

Ishida

, Huang

, and Zuzan

. 2001. Role for E2F in control of both DNA replication and mitotic functions as revealed from DNA microarray analysis. Mol. Cell. Biol., 21: 4684–99.

18.

Ivey-Hoyle

, Conroy

, and Huber

H.E.

. 1993. Cloning and characterization of E2F-2, a novel protein with the biochemical properties of transcription factor E2F. Mol. Cell. Biol., 13: 7802–12.

19.

Johansson

1993. System modeling and identification. Prentice-Hall. p. 113–9.

20.

Kennedy

R.D.

, Gorski

J.J.

, and Quinn

J.E.

. 2005. BRCA1 and c-Myc associate to transcriptionally repress psoriasin, a DNA damage-inducible gene. Cancer Res., 65(22): 10265–72.

21.

Klipp

, Herwig

, and Kowald

. 2005. Systems Biology in Practice: Concepts, Implementation and Application. Wiley-VCH.

22.

Kreyszig

1993. Advanced Engineering Mathematics. 7th ed. John Wiley, New York.

23.

Knebel

, Morrice

, and Cohen

2001. A novel method to identify protein kinase substrates: eEF2 kinase is phosphorylated and inhibited by SAPK4/p38delta. EMBO. J., 20(16): 4360–9.

24.

Lang

S.E.

, McMahon

S.B.

, and Cole

M.D.

. 2001. E2F transcriptional activation requires TRRAP and GCN5 cofactors. J. Biol. Chem., 276: 32627–34.

25.

Law

D.J.

, Du

, and Law

G.L.

. 1999. ZBP-99 defines a conserved family of transcription factors and regulates ornithine decarboxylase gene expression. Biochem. Biophys. Res. Commun., 262(1): 113–20.

26.

Liu

E.T.

2005. Systems Biology, Integrative Biology, Predictive Biology. Cell., 121: 505–6.

27.

Meroni

, Reymond

, and Alcalay

. 1997. Rox, a novel bHLHZip protein expressed in quiescent cells that heterodimerizes with Max, binds a non-canonical E box and acts as a transcriptional repressor. EMBO J., 16(10): 2892–906. Erratum in: EMBO. J., 16(19): 6055.

28.

Monden

, Wondisford

F.E.

, and Hollenberg

A.N.

1997. Isolation and characterization of a novel ligand-dependent thyroid hormone receptor-coactivating protein. J. Biol. Chem., 272(47): 29834–41.

29.

Muller

, and Helin

2000. The E2F transcription factors: key regulators of cell proliferation. Biochim. Biophys. Acta., 1470: M1–12.

30.

Muller

, Bracken

A.P.

, and Vernell

. 2001. E2Fs regulate the expression of genes involved in differentiation, development, proliferation, and apoptosis. Genes Dev., 15: 267–85.

31.

Nevins

J.R.

2001. The Rb/E2F pathway and cancer. Hum. Mol. Genet., 10: 699–703.

32.

Nurse

2000. A long twentieth century of the cell cycle and beyond. Cell., 100: 71–8.

33.

Ortega

, Malumbres

, and Barbacid

2002. Cyclin D-dependent kinases, INK4 inhibitors and cancer. Biochim. Biophys. Acta., 1602: 73–87.

34.

Pahl

P.M.

, Hodges

Y.K.

, and Meltesen

. 1998. ZNF207, a ubiquitously expressed zinc finger gene on chromosome 6p21.3. Genomics, 53(3): 410–2.

35.

Polager

, Kalma

, and Berkovich

. 2002. E2Fs up-regulate expression of genes involved in DNA replication, DNA repair and mitosis. Oncogene., 21: 437–446.

36.

Ratziu

, Lalazar

, and Wong

. 1998. Zf9, a Kruppel-like transcription factor up-regulated in vivo during early hepatic fibrosis. Proc. Natl. Acad. Sci. U.S.A., 95(16): 9500–5.

37.

Reichman

T.W.

, Muniz

L.C.

, and Mathews

M.B.

2002. The RNA binding protein nuclear factor 90 functions as both a positive and negative regulator of gene expression in mammalian cells. Mol. Cell. Biol., 22(1): 343–56.

38.

Ren

, Cam

, and Takahashi

. 2002. E2F integrates cell cycle progression with DNA repair, replication, and G(2)/M checkpoints. Genes Dev., 16: 245–56.

39.

Robles

L.D.

, Frost

A.R.

, and Davila

. 2002. Down-regulation of Cdc6, a cell cycle regulatory gene, in prostate cancer. J. Biol. Chem., 277: 25431–8.

40.

Shah

J.V.

, and Cleveland

D.W.

2000. Waiting for anaphase: Mad2 and the spindle assembly checkpoint. Cell., 103: 997–1000.

41.

Sherlock

, and Hernandez-Boussard

, and Kasarskis

. 2001. The Stanford Microarray Database. Nucleic Acids Res., 29: 152–5.

42.

Sherr

C.J.

, and Roberts

J.M.

1999. CDK inhibitors: positive and negative regulators of G1-phase progression. Genes Dev., 13: 1501–12.

43.

Shiekhattar

, Mermelstein

, and Fisher

R.P.

. 1995. Cdk-activating kinase complex is a component of human transcription factor TFIIH. Nature, 374(6519): 283–7.

44.

Shirakawa

, Yoshida

1992. Structure of a gene coding for human HMG2 protein. J. Biol. Chem., 267(10): 6641–5.

45.

Stanelle

, Stiewe

, and Theseling

C.C.

. 2002. Gene expression changes in response to E2F1 activation. Nucleic Acids Res., 30: 1859–67.

46.

Stillman

1996. Cell cycle control of DNA replication. Science, 274: 1659–1664.

47.

Stott

F.J.

, Bates

, and James

M.C.

. 1998. The alternative product from the human CDKN2A locus, p14(ARF), participates in a regulatory feedback loop with p53 and MDM2. EMBO. J., 17: 5001–5014.

48.

Takahashi

, Wakui

, and Gustafsson

J.A.

. 2000. Functional interaction of the immunosuppressant mizoribine with the 14-3-3 protein. Biochem. Biophys. Res. Commun., 274(1): 87–92.

49.

Trimarchi

J.M.

, and Lees

J.A.

2002. Sibling rivalry in the E2F family. Nat. Rev. Mol. Cell. Biol., 3(1): 11–20.

50.

Valsesia-Wittmann

, Magdeleine

, and Dupasquier

. 2004. Oncogenic cooperation between H-Twist and N-Myc overrides failsafe programs in cancer cells. Cancer Cell., 6: 625–30.

51.

Whitfield

M.L.

, Sherlock

, and Saldanha

A.J.

. 2002. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell., 13: 1977–2000.

52.

, Fan

Y.H.

, and Kemp

B.L.

. 1998. Overexpression of cdc25A and cdc25B is frequent in primary non-small cell lung cancer but is not associated with overexpression of c-myc. Cancer Res., 58: 4082–5.

53.

Wysocka

, Myers

M.P.

, and Laherty

C.D.

. 2003. Human Sin3 deacetylase and trithorax-related Set1/Ash2 histone H3–K4 methyl-transferase are tethered together selectively by the cell-proliferation factor HCF-1. Genes Dev., 17(7): 896–911.

54.

Yuh

C.H.

, and Davidson

E.H.

1996. Modular Cis-Regulatory Organization of Endo16, a Gut-Specific Gene of the Sea Urchin Embryo. Development, 122: 1069–1082.

Construction and Clarification of Dynamic Gene Regulatory Network of Cancer Cell Cycle via Microarray Data

Abstract

Background

Results

Conclusions

Introduction

System Model and Network Identification

Dynamic signaling regulatory model

Extraction of the transcriptional regulatory function Gi(k)

Iterative algorithm for constructing gene regulatory network

Results

Data processing and analysis

Inference of the regulatory pathway

Discussion

Methods

Maximum likelihood Estimation of ai, bi, αn and βn

Parameter Estimation of ci0 and Cij

Footnotes

Acknowledgements

References

Extraction of the transcriptional regulatory function G_i(k)

Maximum likelihood Estimation of a_i, b_i, α_n and β_n

Parameter Estimation of c_i0 and C_ij