Sage Journals: Discover world-class research

Abstract

Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady-state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady-state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.

Keywords

gene networks network reconstruction time series VAR equilibrium

1 Introduction

With continuing advances in gene expression measurement technologies, more and more expression data are becoming available for analysis. One of the important uses for these data has been in inferring gene regulatory networks. These networks identify relationships between genes and give insight into the complex inner workings of the cell.

Even with the increasing amounts of data, constructing gene regulatory networks is difficult in practice. The large number of genes involved, generally in the thousands, makes methods developed for smaller networks computationally infeasible. Time-series genetic data, with the same population of cells observed at multiple times, is useful in inferring the direction of the relationships (Yeung et al., 2011; Young et al., 2014). Knockdown and over-expression data can also be used to infer directionality of relationships (Young et al., 2016). However, most data take the form of perturbation screens or steady-state data at a single timepoint. This makes it harder to identify the direction of the relationship between two genes without prior knowledge.

There have been many methods developed for the analysis of steady-state data. Mutual information and correlation-based methods are commonly used, although the resulting networks tend to lack directionality (Basso et al., 2005; Faith et al., 2007; Margolin et al., 2006; Meyer et al., 2007; Tusher et al., 2001). Regression-based methods have also been applied to steady-state data to infer network structure (Omranian et al., 2016; Singh and Vidyasagar, 2016). Bayesian network methods (Friedman et al., 2010) can result in directed graphs, but the networks generated are, of necessity, acyclic. This means that any cycles, such as those that are found in real biological systems, will not be captured, so the resulting networks will be unrealistic in this respect. Shojaie et al. (2014) developed a method combining information from both perturbation and steady-state data to increase the accuracy of the inferred network.

Here, we propose a new method based on an implicit multivariate time-series model that uses steady-state data and has the capability to test the existence of edges in directed networks. This method is not restricted to acyclic networks, but can be used to infer information about cycles in the network. We present proofs of consistency and efficiency in a constrained case as well as a likelihood ratio test for the existence of an edge. We also give simulation results and apply the method to a synthetic dataset, showing that the method performs well in practice.

2 First-order vector autoregressive model

To build a model for equilibrium data, first consider a first-order vector autoregres- sive, or VAR(1) model for time-series data for a system with p genes with no correlation in the error term between genes (Lütkepohl, 2005). We can write the model as

\begin{matrix} x_{1} & = & ε_{1}, \\ x_{t} & = & A x_{t - 1} + ε_{t} ​ ​ ​ for t > 1, \\ ε_{t} & \sim & N (0, D), \end{matrix}

where

D

is diagonal,

x_{t}

and

ε_{t}

are vectors of length

p

, while

A

and

D

are

p \times p

matrices. Here,

A

identifies the relationships between the genes. A non-zero

(i, j)

-th element of

A

indicates that there is an edge from gene

j

to gene

i

Autoregressive models have been broadly used to infer gene networks from time-series data, with approaches ranging from penalized regression (LASSO, elastic net) to Bayesian model averaging (BMA) to dynamic Bayesian networks (DBN). Michailidis and dAlché Buc (2013) review many of these approaches to modelling time series using autoregressive models.

At first glance, the VAR(1) model does not appear to be applicable to steady-state data. However, if the eigenvalues of $A$ are less than 1 in absolute value, then as $t \to \infty$ , $x_{t}$ converges to a stable equilibrium distribution (Anderson, 2000). We can use this equilibrium distribution as the model for steady-state data. This can be applied to either a set of experiments with the same perturbation or to the control experiments.

2.1 Equilibrium distribution

Since this is a Gaussian VAR model, we can write the equilibrium distribution as

x_{\infty} \sim N (0, Σ),

where

Σ

is also a

p \times p

matrix. To find

Σ

, we use an iterative relationship between the variances at consecutive time points:

\begin{matrix} Var [x_{1}] & = & D, \\ Var [x_{t + 1}] & = & A Var [x_{t}] A^{T} + D, \\ Σ & = & \sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i} . \end{matrix}

We will call this the summation identity (Sheppard, 2013).

Another expression for the asymptotic variance uses the identity $vec (A X B) = (B^{T} \otimes A) vec (X)$ :

\begin{matrix} Σ & = & \sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i}, \\ vec (Σ) & = & vec (\sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i}), \\ = & \sum_{i = 0}^{\infty} vec (A^{i} D (A^{T})^{i}), \\ = & \sum_{i = 0}^{\infty} (A^{i} \otimes A^{i}) vec (D), \\ = & \sum_{i = 0}^{\infty} {(A \otimes A)}^{i} vec (D), \\ = & {(I - A \otimes A)}^{- 1} vec (D) . \end{matrix}

This Kronecker identity is convenient for calculating the asymptotic variance from a given

A

and

D

since it eliminates the infinite sum (Sheppard, 2013). We can also write out a recursive identity for

Σ

since, at the equilibrium, the variance does not change from one time step to the next:

Σ = A Σ A^{T} + D .

(2.1)

3 Identifiability

When looking at steady-state data as opposed to time-series data, we must rely on the relationship between $(A, D)$ and $Σ$ in order to infer anything about $A$ and $D$ . This raises an issue of identifiability. As a simple example, substituting $- A$ for $A$ in any of the earlier variance identities does not change anything, so the sign of $A$ cannot be identified from equilibrium data only. More generally, let $Q$ be any $p \times p$ orthogonal matrix. An orthogonal matrix is defined as any matrix such that $Q Q^{T} = I$ . For any such $Q$ , if $\tilde{A}$ is defined as

\tilde{A} = A Σ^{1 / 2} Q Σ^{- 1 / 2}

if will satisfy the recursive identity (Tong et al., 1992).

We can state the identifiability problem as follows. Under what conditions is the map

(A, D) ⟷ Σ

bijective? As a necessary condition we can count parameters.

Proposition A necessary condition for identifiability in the VAR equilibrium model of dimension $p$ , with connected nodes and non-zero diagonal elements of $A$ , is that the number of non-zero off-diagonal elements of $A$ be no more than $p (p - 3) / 2$ and that $p \geq 5$ .

Proof The number of parameters in $Σ$ for a $p$ -dimensional model is $p (p + 1) / 2$ . For identifiability, the total number of independently estimable parameters in $A$ and $D$ must not exceed that number. $D$ is diagonal and thus accounts for $p$ parameters. Similarly, the diagonal of $A$ is assumed to be non-zero, implying that the expression level of each gene at timepoint $t + 1$ is dependent on the expression level of that gene at timepoint $t$ for all genes. This accounts for an additional $p$ parameters. Thus, a maximum $p (p - 3) / 2$ off-diagonal elements of $A$ may be non-zero. Now, the graph defined by $A$ must be connected. If it is not, then it can be decomposed into two smaller, independent components, and each of those must be identifiable. We need at least $p - 1$ off-diagonal elements of $A$ for it to be connected. Thus, $p$ must be at least 5, because if $p = 4$ , only a maximum of 2 off-diagonal elements is allowed, which is not enough to fully connect the 4 nodes.

Parameter counting yields a necessary condition, but it is not a sufficient condition. We have not found sufficient conditions in the most general case. We have, however, been able to examine identifiability in two specific examples, one acyclic and one cyclic. To assess the identifiability of a specific model, we used the recursive identity,

Σ = A Σ A^{T} + D .

If a given model is identifiable, then there will be only one

(A, D)

pair which satisfies this equation.

The first example is a five-dimensional model with a simple network structure, given in Figure 1. This model has four off-diagonal non-zero elements in $A$ . The maximum allowable by parameter counting is 5, so we are close to but not at that limit. Writing out the recursive identity as a system of equations yields equations, the first two of which are:

\begin{matrix} σ_{11} & = & a_{11}^{2} σ_{11} + 2 a_{11} a_{12} σ_{12} + a_{22}^{2} σ_{22} + d_{1}, \\ σ_{12} & = & a_{11} a_{22} σ_{12} + a_{12} a_{22} σ_{22} + a_{11} a_{23} σ_{13} + a_{12} a_{23} σ_{23}, \\ \dots \end{matrix}

where $a_{ij}$ is the entry of $A$ in the $i$ ’th row and $j$ ’th column, and $d_{i}$ is the $i$ -th diagonal entry in $D$ . Next, we solve the system of equations to eliminate as many variables as possible. Then we perform a grid search on the free variables to find all potential solutions of the recursive identity. In this example, we were able to solve the system down to a single free parameter, $a_{55}$ .

To test the parameter solutions, we used the following model parameters:

\begin{matrix} A & = & [\begin{matrix} 0.6 & 0.3 & 0 & 0 & 0 \\ 0 & 0.4 & - 0.4 & 0 & 0 \\ 0 & 0 & 0.8 & 0.5 & 0 \\ 0 & 0 & 0 & 0.5 & - 0.3 \\ 0 & 0 & 0 & 0 & 0.7 \end{matrix}], \\ D & = & I_{5} . \end{matrix}

The grid search was performed by allowing $a_{55}$ to vary between 0 and 1 in increments of 0.0001. This is the allowable range for $a_{55}$ since $A$ and $- A$ both result in the same $Σ$ . For each prospective value of $a_{55}$ , the other elements of $A$ and $D$ were calculated. These values were then used to evaluate the objective function $log | | Σ - A Σ A^{T} - D | |_{2}^{2}$ . Figure 2 shows the results. From this it is clear that the only value of $a_{55}$ that satisfies the recursive identity is the true value, 0.7, where the objective function is minimized. The flat region of the objective curve to the left of $a_{55} \approx 0.65$ reflects the fact that there is no valid solution for $A$ and $D$ with that starting value for $a_{55}$ , either because some element of $A$ is not a real value or because a diagonal element of $D$ is negative. Also note that there is a degenerate solution at $a_{55} = 1$ , such that $A = I$ and $D = 0$ . Since $D$ represents the variance of the noise added to the VAR system, its diagonal elements must be positive. So in this example the model is identifiable.

Figure 1:

The network structure for the cyclic identifiability example

Figure 2:

Grid search for solutions of the acyclic identifiability example. The objective function is minimized at the true value of $a_{55} = 0.7$ , showing that the model is identifiable in this example

Our second example involves a cyclic network. We added a single edge to the previous network, making a 3-node cycle, as shown in Figure 3. We used the following model parameters:

\begin{matrix} A & = & [\begin{matrix} 0.3 & 0.6 & 0 & 0 & 0 \\ 0 & 0.1 & - 0.3 & 0 & 0 \\ 0 & 0 & 0.7 & 0.2 & 0 \\ 0 & 0.3 & 0 & 0.4 & - 0.6 \\ 0 & 0 & 0 & 0 & 0.5 \end{matrix}], \\ D & = & I_{5} . \end{matrix}

For this model, we were able to solve down to two free parameters,

a_{45}

and

a_{55}

. We again performed a grid search over the appropriate parameter space (

a_{45} \in (- 1, 1)

a_{55} \in (0, 1)

) with an initial grid increment of 0.001 and evaluated the objective function at each point. Figure 4 shows the results of the grid search, both in the larger space as well as zoomed in around the truth, where a much smaller grid increment of

10^{- 11}

was used. There are large areas where a valid solution is not achieved, indicated by the white areas. Again, however, the original solution is the only valid one, indicating that this model is also identifiable.

Figure 3:

The network structure for the cyclic identifiability example

Figure 4:

Grid search for solutions of the cyclic identifiability example showing the value of the objective function at each point searched. The left plot shows the grid search over the entire model space, while the right plot is zoomed in on the true parameter value. The white areas indicate parameter values that result in invalid solutions. The true solution at ( $-$ 0.6, 0.5) is the only valid solution

3.1 Simple case

Although the parameter counting condition is helpful and results in identifiability in at least a couple of examples, it is important to find a more general result for identifiability.

We now find a more general condition for identifiablity within a specific subset of VAR(1) models. We do this by restricting the model considerably to make the analysis more tractable. First, we constrain $D$ to be known. Second, we require that the model be acyclic, and thus the nodes can be ordered such that $A$ is lower-triangular. Finally, we require that the diagonal entries of $A$ be less than 1 in absolute value and that their signs be known. With these constraints, the model is identifiable and has good asymptotic properties.

Consider the model as constrained earlier. We order the nodes in the network topologically, meaning that there are no edges from node $i$ to node $j$ , if $i > j$ . This creates an $A$ that is lower-triangular. We will use the recursive identity for the equilibrium variance:

A Σ A^{T} + D = Σ,

(3.1)

A Σ A^{T} = Σ - D .

(3.2)

We solve this identity recursively by looking at the leading principal submatrices, an approach inspired by Drton et al. (2011). First, define

S_{i}

to be the leading

i \times i

principal submatrix of

Σ

. That is the matrix made of elements in the first

i

rows and columns of

Σ

. Similarly, define

B_{i}

to be the leading principal submatrix of

B = Σ - D

and

Γ_{i}

to be the leading principal submatrix of

A

. Also, define

s_{i + 1}

to be the vector containing the first

i

elements of the

(i + 1)^{th}

column of

Σ

B

. That is,

s_{i + 1} = [σ_{1, i + 1}, \dots, σ_{i, i + 1}]^{'}

. Similarly,

γ_{i + 1}^{T}

is the vector containing the first

i

elements of the

(i + 1)^{th}

row of

Γ

. So,

\begin{matrix} S_{i + 1} & = & [\begin{matrix} S_{i} & s_{i + 1} \\ s_{i + 1}^{T} & s_{i + 1, i + 1} \end{matrix}], ​ ​ s_{ii} = σ_{ii} \\ B_{i + 1} & = & [\begin{matrix} B_{i} & s_{i + 1} \\ s_{i + 1}^{T} & b_{i + 1, i + 1} \end{matrix}], ​ ​ b_{ii} = σ_{ii} - d_{ii} \\ Γ_{i + 1} & = & [\begin{matrix} Γ_{i} & 0 \\ γ_{i + 1}^{T} & γ_{i + 1, i + 1} \end{matrix}], ​ ​ γ_{ii} = a_{ii} . \end{matrix}

Now, for $i = 1$ , we get

\begin{matrix} γ_{ii}^{2} s_{ii} & = & b_{ii}, \\ a_{ii} & = & \sqrt{\frac{σ_{ii} - d_{ii}}{σ_{ii}}} . \end{matrix}

This gives us $A_{1}$ . Now, suppose we know $A_{i}$ . Then, for $i + 1$ , we have

\begin{matrix} B_{i + 1} & = & Γ_{i + 1} S_{i + 1} Γ_{i + 1}^{T}, \\ [\begin{matrix} B_{i} & s_{i + 1} \\ s_{i + 1}^{T} & b_{i + 1, i + 1} \end{matrix}] & = & [\begin{matrix} Γ_{i} S_{i} Γ_{i}^{T} & Γ_{i} S_{i} γ_{i + 1} + γ_{i + 1, i + 1} Γ_{i} s_{i + 1} \\ - & γ_{i + 1}^{T} S_{i} γ_{i + 1} + 2 γ_{i + 1, i + 1} s_{i + 1}^{T} γ_{i + 1} + γ_{i + 1, i + 1}^{2} s_{i + 1, i + 1} \end{matrix}] . \end{matrix}

We solve for $γ_{i + 1, i + 1}$ and $γ_{i + 1}$ to get

\begin{matrix} γ_{i + 1, i + 1} & = & \sqrt{\frac{b_{i + 1, i + 1} - s_{i}^{T} (Γ_{i} S_{i} Γ_{i}^{T})^{- 1} s_{i}}{s_{i + 1, i + 1} - s_{i}^{T} Γ^{- 1} s_{i}}}, \\ γ_{i + 1} & = & (Γ_{i} S_{i})^{- 1} (I - γ_{i + 1, i + 1} Γ_{i}) s_{i} . \end{matrix}

Notice here that the expression for $γ_{i + 1, i + 1}$ has two solutions, the positive and negative square root. These elements are the diagonal entries in $A$ , thus requiring that the signs of the diagonal elements be specified. If they are not specified, there are $2^{p}$ possible solutions. Once the signs of these elements are specified, we have a unique solution for $A$ given $Σ$ and $D$ . However, there still exists positive definite $Σ$ that do not map back to a valid $A$ given $D$ . As a simple example, if $σ_{ii} < d_{ii}$ , then the model will not work since that fact means that the equilibrium variance is less than the error variance added at each step. Now we show that the maximum likelihood estimator (MLE) for $A$ in this model is both consistent and efficient:

Theorem Suppose $p$ -dimensional $x_{1}, \dots, x_{n}$ are independent and are observed from the distribution

X | A \sim N (0, \sum_{i = 0}^{\infty} A^{i} D (A^{T})^{i}),

where $D$ is known and $A$ is lower-triangular with the signs of the diagonal elements known, and the diagonal elements less than 1 in absolute value. Then $\hat{A}$ , the MLE of $A$ , is consistent and asymptotically efficient with

\sqrt{n} (\hat{a} - a) \to N (0, F (a)^{- 1}),

where

a

is the vectorization of the non-zero elements of

A

and

F (a)

is the Fisher information matrix, whose elements are

F (a)_{ij} = E [(\frac{\partial}{\partial a_{i}} log f (x | A)) (\frac{\partial}{\partial a_{j}} log f (x | A))] .

Proof We will show that the conditions for Cramér's theorem, as found in Ferguson (1996, p. 121), are satisfied.

The appendix contains derivations of the first- and second-partial derivatives and show that they exist and are continuous. For example, the first partial derivatives take the form

\begin{matrix} \frac{\partial σ_{ij}}{\partial a_{mn}} & = & \frac{1}{1 - a_{ii} a_{jj}} [\begin{matrix} (A Σ)_{in} 1_{[j = m, j \geq n]} + (A Σ)_{jn} 1_{[i = m, i \geq n]} \\ + \sum_{k = 1}^{i} \sum_{l = 1}^{j} a_{ik} a_{jl} \frac{\partial σ_{kl}}{\partial a_{mn}} 1_{[(k, l) \neq (i, j)]} \end{matrix}] . \end{matrix}

Both the first- and second-partial derivatives can be calculated iteratively.

Now let

\begin{matrix} Ψ (x, a) = \frac{\partial}{\partial a} log f (x | a), & a ​ k ​ vector, \\ \dot{Ψ} (x, a) = \frac{\partial^{2}}{\partial a^{2}} log f (x | a), & a k \times k matrix . \end{matrix}

Note that the components of

\dot{Ψ} (x, a)

, as shown in the appendix, are all of the form

\sum_{i, j} c_{ij} (x x^{T})_{ij},

where the $c_{ij}$ ’s are constant. Further, the expectations of the absolute values of the components are finite since $E [| x x^{T} |_{ij}] < \infty$ (Li and Wei, 2012). Now lets

K (x) = \sum_{i, j} m_{ij} | x x^{T} |_{ij},

where $m_{ij}$ is the maximum of all the $| c_{ij} |$ ’s across all components. Then each component of $\dot{Ψ} (x, a)$ is bounded by $K (x)$ in absolute value and $K (x)$ has finite expectation.

These results, along with the earlier proof of identifiability and the form of the constraints on $A$ , satisfy Cramér's theorem. Thus, the MLE of $A$ is asymptotically consistent and efficient.

In addition to asymptotic consistency and efficiency, we also can form a likelihood-ratio test for the existence of an edge.

Corollary Let $A_{1}$ be the set of all valid $A$ in the VAR equilibrium model, possibly with some entries constrained to be 0. Let $A_{0} \subset A_{1}$ with a difference in dimension between the two spaces of $c$ . This corresponds to $c$ additional entries of $A$ constrained to be 0. Then the log-likelihood ratio test statistic

λ = - 2 log (\frac{{sup}_{A \in A_{0}} L (A | x)}{{sup}_{A \in A_{1}} L (A | x)})

has a $χ_{c}^{2}$ distribution, asymptotically.

Proof The conditions which satisfy Cramér's theorem also satisfy Wilk's theorem concerning the likelihood-ratio test statistic (Ferguson, 1996, p. 145).

3.2 Simulation results: Asymptotic variance

We have an asymptotic distribution for the MLE, and now we assess the quality of its approximation in finite samples via simulation. Because of the complexity of the calculations involved in finding the Fisher information matrix, we will restrict ourselves to a two-dimensional model. This model has three parameters for $A$ ,

A = [\begin{matrix} a_{11} & 0 \\ a_{21} & a_{22} \end{matrix}],

so $a = [a_{11}, a_{21}, a_{22}]^{T}$ and the Fisher information matrix will be 3 by 3, with elements

F (a)_{ij} = E [(\frac{\partial}{\partial a_{i}} log f (x | A)) (\frac{\partial}{\partial a_{j}} log f (x | A))] .

For our simulations, we used

\begin{matrix} A & = & [\begin{matrix} 0.9 & 0 \\ 0.7 & 0.8 \end{matrix}], \\ D & = & I_{2} . \end{matrix}

Using these values, the asymptotic variance of the MLE for $A$ is calculated to be

F (a)^{- 1} = [\begin{matrix} 0.022 & - 0.126 & 0.029 \\ - 0.126 & 3.094 & - 0.850 \\ 0.029 & - 0.850 & 0.251 \end{matrix}] .

To test this, we performed a simulation where we generated $n$ samples from the equilibrium distribution. We then found the value of $\hat{A}$ that maximized the likelihood. To do this, we initialized $A$ using the recursive algorithm described earlier and used the function optim in R to optimize the log-likelihood of the model, utilizing the BFGS method developed by Broyden (1970); Fletcher (1970); Goldfarb (1970). We repeated the initialization and optimization procedure 1 000 times and used the resulting $\hat{A}$ values to calculate an empirical variance. We used three different sample sizes $n$ , resulting in the following empirical variances:

\begin{matrix} n = 1 000, \\ V a r (\hat{A}) = [\begin{matrix} 0.021 & - 0.077 & 0.009 \\ - 0.077 & 2.633 & - 0.717 \\ 0.009 & - 0.717 & 0.219 \end{matrix}] \end{matrix},

\begin{matrix} n = 10 000, \\ V a r (\hat{A}) = [\begin{matrix} 0.023 & - 0.074 & 0.007 \\ - 0.074 & 2.705 & - 0.746 \\ 0.007 & - 0.746 & 0.230 \end{matrix}] \end{matrix},

\begin{matrix} n = 100 000, \\ V a r (\hat{A}) = [\begin{matrix} 0.023 & - 0.091 & 0.012 \\ - 0.091 & 2.757 & - 0.737 \\ 0.012 & - 0.737 & 0.220 \end{matrix}] \end{matrix},

These results show that the asymptotic variance approximates the finite sample variances quite well, and so provide support for the asymptotic theory.

3.3 Simulation results: Coverage

Another way to assess the accuracy of the asymptotic approximation for finite samples is to look at the coverage of the resulting confidence intervals. To do this, we used the same two-dimensional model as earlier. In each single simulation, $a_{11}$ and $a_{22}$ were chosen at random from a Uniform(0.1, 0.9) distribution and $a_{21}$ was chosen at random from a Uniform( $-$ 0.9, 0.9) distribution. We then generated $n$ samples from the equilibrium distribution and used those samples to obtain an MLE for $A$ by maximizing the log-likelihood as before. We then used the empirical asymptotic variance of the MLE to obtain confidence intervals and see if the true values are covered. We repeat this 1 000 times with the same $A$ to get coverage results.

We carried out this procedure 10 times each for $n$ equal to 100, 1 000 and 10 000, generating a different $A$ each time. The coverage results are shown in Figures 5 –8. The x-axis is the true value of $a_{11}$ . This value matters more than the other $A$ values because it propagates to all the values of $Σ$ . As $n$ increases, the coverage results clearly improve. The coverage is essentially correct for large $n$ , both marginally and jointly.

We note that with small $n$ and $a_{11}$ , the observed variance of $x_{1}$ can be smaller than that of the known $D$ . This can cause problems with optimizing $A$ .

Figure 5:

Joint coverage for $A$ using the asymptotic distribution of the MLE]Joint coverage for all parameters of $A$ using the 95% confidence interval from the multivariate normal asymptotic distribution of the MLE

Figure 6:

Marginal coverage for $a_{11}$ using the asymptotic distribution of the MLE]Marginal coverage for $a_{11}$ using the 95% confidence interval from the asymptotic distribution of the MLE

Figure 7:

Marginal coverage for $a_{21}$ using the asymptotic distribution of the MLE]Marginal coverage for $a_{21}$ using the 95% confidence interval from the asymptotic distribution of the MLE

Figure 8:

Marginal coverage for $a_{22}$ using the asymptotic distribution of the MLE]Marginal coverage for $a_{22}$ using the 95% confidence interval from the asymptotic distribution of the MLE

4 Likelihood ratio test

We now assess the validity of the likelihood-ratio test for the simple VAR equilibrium model. To test this, we used a three-dimensional system with

A = [\begin{matrix} 0.6 & 0 & 0 \\ - 0.3 & 0.4 & 0 \\ 0 & - 0.5 & 0.8 \end{matrix}] .

For this system,

A_{0}

has the correct constraint that the lower-left element is zero, while

A_{1}

does not have that constraint. For a dataset generated from the model, we find the MLE under both hypotheses and construct the likelihood ratio test statistic.

To obtain the empirical distribution of the test statistic, we generated 10 000 samples from the equilibrium distribution and calculated the likelihood ratio for those data. We repeated this 1 000 times to get a distribution values of the statistic. Figure 9 shows the distribution compared with a $χ^{2}$ distribution with one degree of freedom. The distributions are close, indicating that the asymptotic distribution of the likelihood-ratio chi-squared test statistic provides a good approximation in this simulated example.

Figure 9:

Comparison of the empirical distribution of the likelihood ratio statistic to test for an additional edge in the VAR equilibrium problem with a $χ^{2}$ distribution with one degree of freedom. The left-hand plot shows the density of the test statistic compared with the $χ^{2}$ distribution, while the right-hand plot shows the empirical CDF comparison

5 Application to synthetic data

We have shown by simulation that the asymptotic distributions of estimators and likelihood-ratio test statistics provide good approximations to finite-sample distributions in several example scenarios. What happens when we try to apply the method in situations where the model is not known to be correct? The matrix $D$ is not likely to be known a priori in a real data setting, and in gene networks the graph is not necessarily restricted to be acyclic. That the general VAR equilibrium model does not require an acyclic graph is a potential advantage of our method over Bayesian network methods, which do require a directed acyclic graph.

5.1 GeneNetWeaver data

To look at the method in a more realistic setting, we need a reasonable size network where the truth is known. For this we used networks from the DREAM4 in silico network challenge competition (http://dreamchallenges.org/project/dream4-in-[silico-network-challenge/]). The 10-gene networks from the competition are a good size for analysis, include cycles, and are known. The original data used in the competition include time-series data simulating a perturbation to the network. The first and last time points do correspond to a stable equilibrium for the network, but these cannot be used for our purposes because the original data do not include enough separate time series from which to get samples.

However, the software used to generate the data, GeneNetWeaver, is available online at http://gnw.sourceforge.net/(Marbach et al., 2009; Schaffter et al., 2011). The software can be configured to generate any number of independent time series from a given network model. The data are generated according to a dynamical model using ordinary differential equations with added stochastic noise. For the time-series data, the network begins at equilibrium, and then the network is artificially perturbed for the first half of the time course. At the midpoint of the time course, the perturbation is removed and the network is allowed to return to its equilibrium state by the end of the time course. The GeneNetWeaver software is designed to provide a rich simulation of the dynamics of biological regulatory networks. The ODEs used include terms simulating transcription and translation rates as well as protein degradation rates, and simulates mRNA levels and protein levels simultaneously. Noise is added to the system to simulate both random fluctuations in the actual levels of mRNA and protein as well as measurement error.

We used GeneNetWeaver to generate time-series data using the network model used in the first 10-gene network from the DREAM4 competition. The GeneNetWeaver software has this network available as a pre-configured model from which data can be generated. The network has 10 nodes and 15 non-self edges, including a cycle involving 3 genes. All settings for data generation were the same as those used in the DREAM4 competition, including the amount of added noise to the system and the model of noise in microarrays. We used GeneNetWeaver to generate 100 random time series of 21 points each from the model and took the first and last time points as samples from the equilibrium distribution.

5.2 Method application

To see how the VAR equilibrium method works on these data, we used the known network structure as our initial model for $A$ , which includes cycles and is not lower-triangular. We did not assume a known $D$ , but included it as additional parameters for the model. This resulted in a total of 35 parameters for the model: 10 from $D$ , 10 diagonal elements of $A$ and 15 off-diagonal elements of $A$ , reflecting the 15 non-self edges in the network. The matrix $Σ$ has 55 parameters, so this is well within the parameter counting requirement. This does not guarantee identifiability, but does suggest that it is likely to hold.

To find the MLE for the model, we randomly initialized $A$ and $D$ . The diagonal elements of $A$ were initialized randomly from a Uniform(0, 0.9) distribution to reflect the belief that the autoregressive parameter for the effect of any gene's current expression level on its level at the next time point is positive. The off-diagonal elements were initialized from a Uniform( $-$ 0.9, 0.9) distribution. A requirement on $D$ is that each element be less than the variance of the corresponding gene in the equilibrium data, so each element was initialized to be the variance of the corresponding gene times, a draw from a Uniform(0.1, 0.9) distribution.

After $A$ and $D$ were initialized, the log-likelihood of the data was optimized using the optim function in R with the BFGS method. The optimization scheme may get stuck in a local mode since the likelihood is not convex in $A$ and $D$ , so the random initialization and optimization was repeated 1 000 times to provide a better search of the model space. The best $A$ and $D$ were kept as the approximate MLE.

This gave an estimate of the parameters for the true model, but in order to test individual edges, we needed to find the MLE for models adjacent to the true model. These models are defined by taking the original network structure and adding or removing a single edge. When removing edges, we needed to ensure that the connectivity of the graph was not broken. Three of the edges in the original network, if removed, would isolate a single node. This leaves 12 of the original 15 edges available to be tested. Adding an edge does not create problems, and thus we were able to test all 75 extra edges. With the results from the true model and the 87 testable models, we could see which edges the likelihood ratio test identified as significant.

Running the above optimization scheme for each of the 88 models resulted in an approximate MLE for each model. Because these optimizations were run independently for each model and the model space is not necessarily fully searched, this resulted in some mis-ordering of the models. That is, the MLE found for one model resulted in a likelihood that was worse than that of a model in which some of the parameters of the first model are constrained to be zero. This is never allowed since the parameters of the smaller model can be used for the larger model, resulting in the same likelihood.

To fix the ordering issue, we performed a second optimization step for each target model. We initialized optim with the parameter values from each of the models. In some cases, this involved throwing out parameter values for extra edges or using zeros from smaller models. The model was then optimized from that starting point. Further, if there are zeroes from a smaller model, we re-randomized just the zeros a small number of times to give better search coverage. We iterated over this step until the models were appropriately ordered, which happened within a few iterations.

5.3 Results

We applied the optimization scheme described earlier to both the initial and final time point data from the generated data from GeneNetWeaver. For each model explored, we obtained the best log-likelihood value for comparison. We then compared the true model with each other model, either with one less or one more edge, and computed the likelihood ratio statistic for testing the edge. From the likelihood ratio statistic we computed a p-value by comparing the value to a $χ^{2}$ distribution with one degree of freedom.

Ideally, the p-values corresponding to the 12 edges which are in the true network would be low and the p-values for the 75 edges added to the true model would be high. Figure 10 shows the results for both the first and the last time point. Looking at the circles, we find that 7 of the 150 edges added to the true network were identified as true edges at $p = 0.05$ . This is a Type I error rate of 5%, showing that the test is producing the correct level of false positive results. Five of the 24 true edges which were tested were identified as true edges, yielding a power of 21%, which is quite promising for this area of research.

Figure 10:

p-values for testing edges in GeneNetWeaver data]p-values for each tested edge in the GeneNetWeaver data. Circles correspond to edges added to the true network (false edges) and triangles correspond to edges removed from the true network (true edges). The left figure is using data from the first time point and the right figure is using data from the last time point. The dashed line corresponds with a $p$ -value of 0.05

6 Discussion

Steady-state gene expression data present a challenge in inferring directional edges. We have presented a model which can be used to test for the existence of edges in such data. We have derived asymptotic properties of estimators and tests for this model in a constrained, but still reasonably large subset of the possible models, and we found that the resulting asymptotic approximations performed well in some finite-sample settings. The derivation of necessary and sufficient conditions for identifiability in the fully general case is a topic for future research.

The identifiability problem is also found in structural equation modeling (SEM) (Bentler and Weeks, 1980). SEM is similar to a VAR model in that it specifies relationships among a set of variables. SEM, however, relates the variables without considering the time element. If $X$ is the collection of random variables observed, then a linear structural equation model with Gaussian noise can be written as

\begin{matrix} X & = & A X + ε, \\ ε & \sim & N (0, D) . \end{matrix}

Seeking identifiability conditions has been the subject of a number of papers (Brito and Pearl, 2012; Drton andWeihs, 2016; Drton et al., 2011), and general conditions for this class of models are not known.

We have shown that even in cases where the theory has not been validated, the VAR equilibrium method can still identify true relationships among genes. This is evident in the analysis of the GeneNetWeaver data, where true edges were identified more consistently than extraneous edges.

If the network is partially known, then the likelihood ratio statistic can be used to test for extra edges and thus build up a more comprehensive view of the network. As an example, we may want to learn about a network perturbed with a certain drug. If the unperturbed network is known, we can test edges in and around that network using the perturbation data to learn about the changes induced by the drug. Since we may not expect the entire network to change, the VAR equilibrium method takes advantage of prior knowledge about the network structure.

To apply the VAR equilibrium-based likelihood ratio test method to steady-state data without a known network to start from will take further development. One way to go about this would be to take a subset of genes and look at all possible network structures for those genes. For each model, the steady-state data would be used to maximize the likelihood. One could then combine the models using a BMA approach. To do this, the integrated likelihood needs to be approximated for each model. As a starting point, the Bayesian Information Criteria (BIC) could be used. The BIC is equal to $- 2 log \hat{L} + k log n$ , where $\hat{L}$ is the maximized likelihood for the model, $k$ is equal to $2 p$ plus the number of edges in the network, and $n$ is the number of observations. Averaging over all models would result in a posterior probability for each edge in the network.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship and/or publication of this article.

Funding

This research was supported by NIH grants U54-HL127624, R01-HD054511 and R01-HD070936. Raftery's research was also partly supported by the Center for Advanced Study in the Behavioral Sciences at Stanford University.

References

Anderson

(2000) A note on a vector-variate normal distribution and a stationary autoregressive process. Journal of Multivariate Analysis , 72, 149–150.

Basso

Margolin

Stolovitzky

Klein

Dalla-Favera

Califano

(2005) Reverse engineering of regulatory networks in human B cells. Nature Genetics , 37, 382–390.

Bentler

Weeks

(1980) Linear structural equations with latent variables. Psychometrika , 45, 289–308.

Brito

Pearl

(2012) Graphical condition for identification in recursive SEM. arXiv preprint arXiv : 1206.6821.

Broyden

(1970) The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics , 6, 76–90.

Drton

Foygel

Sullivant

(2011) Global identifiability of linear structural equation models. Annals of Statistics , 39, 865–886.

Drton

Weihs

(2016) Generic identifiability of linear structural equation models by ancestor decomposition. Scandinavian Journal of Statistics , 43, 1035–1045.

Faith

Hayete

Thaden

Mogno

Wierzbowski

Cottarel

Kasif

Collins

Gardner

(2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biology , 5, e8.

Ferguson

(1996) A Course in Large Sample Theory. Vol. 49. London: Chapman & Hall.

10.

Fletcher

(1970) A new approach to variable metric algorithms. The Computer Journal , 13, 317–322.

11.

Friedman

Hastie

Tibshirani

(2010) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software , 33, 1–22.

12.

Goldfarb

(1970) A family of variable-metric methods derived by variational means. Mathematics of Computation , 24, 23–26.

13.

Wei

(2012) A Gaussian inequality for expected absolute products. Journal of Theoretical Probability , 25, 92–99.

14.

L‥utkepohl

(2005) New Introduction to Multiple Time Series Analysis . Berlin: Springer Science & Business Media.

15.

Marbach

Schaffter

Mattiussi

Floreano

(2009) Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology , 16, 229–239.

16.

Margolin

Nemenman

Basso

Wiggins

Stolovitzky

Favera

Califano

(2006) ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics , 7, 1–10.

17.

Meyer

Kontos

Lafitte

Bontempi

(2007) Information-theoretic inference of large transcriptional regulatory networks. EURASIP Journal on Bioinformatics and Systems Biology , 2007, 1–9.

18.

Michailidis

d’Alch’e Buc

(2013) Autoregressive models for gene regulatory network inference: Sparsity, stability and causality issues. Mathematical biosciences , 246, 326–334.

19.

Omranian

Eloundou-Mbebi

Mueller- Roeber

Nikoloski

(2016) Gene regulatory network inference using fused LASSO on multiple data sets. Scientific Reports , 6, 20533.

20.

Schaffter

Marbach

Floreano

(2011) GeneNetWeaver: In silico benchmark generation and performance profiling of network inference methods. Bioinformatics , 27, 2263–2270.

21.

Sheppard

(2013) Financial econometrics notes. URL http://www.kevinsheppard.com/ MFE (last accessed on 4 May 2018).

22.

Shojaie

Jauhiainen

Kallitsis

Michailidis

(2014) Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles. PloS One , 9, e82393.

23.

Singh

Vidyasagar

(2016) bLARS: An algorithm to infer gene regulatory networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics , 13, 301–314.

24.

Tong

Liu

(eds) (1992) Identifiability of a set of quadratic equations with unknown coefficients. In Circuits and Systems, 1992. ISCAS’92. Proceedings, 1992 IEEE International Symposium on. Vol. 1. pages 292–295. IEEE . doi: 10.1109/ISCAS.1992.229956

25.

Tusher

Tibshirani

Chu

(2001) Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences , 98, 5116–5121.

26.

Yeung

Dombek

Mittler

Zhu

Schadt

Bumgarner

Raftery

(2011) Construction of regulatory networks using expression time-series data of a genotyped population. Proceedings of the National Academy of Sciences , 108, 19436–19441

27.

Young

Raftery

Yeung

(2014) Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC Systems Biology , 8, 47

28.

Young

Raftery

Yeung

(2016) A posterior probability approach for gene regulatory network inference in genetic perturbation data. Mathematical Biosciences and Engineering , 13, 1241–1251.

Identifying dynamical time series model parameters from equilibrium samples,with application to gene regulatory networks

Abstract

Keywords

1 Introduction

2 First-order vector autoregressive model

2.1 Equilibrium distribution

Figure 1:

The network structure for the cyclic identifiability example

Grid search for solutions of the acyclic identifiability example. The objective function is minimized at the true value of a 55 = 0.7 , showing that the model is identifiable in this example

The network structure for the cyclic identifiability example

3.3 Simulation results: Coverage

Figure 5:

Joint coverage for A using the asymptotic distribution of the MLE]Joint coverage for all parameters of A using the 95% confidence interval from the multivariate normal asymptotic distribution of the MLE

Marginal coverage for a 11 using the asymptotic distribution of the MLE]Marginal coverage for a 11 using the 95% confidence interval from the asymptotic distribution of the MLE

Marginal coverage for a 21 using the asymptotic distribution of the MLE]Marginal coverage for a 21 using the 95% confidence interval from the asymptotic distribution of the MLE

Marginal coverage for a 22 using the asymptotic distribution of the MLE]Marginal coverage for a 22 using the 95% confidence interval from the asymptotic distribution of the MLE

Figure 9:

5.1 GeneNetWeaver data

5.2 Method application

5.3 Results

Figure 10:

Declaration of Conflicting Interests

Funding

References

Grid search for solutions of the acyclic identifiability example. The objective function is minimized at the true value of $a_{55} = 0.7$ , showing that the model is identifiable in this example

Joint coverage for $A$ using the asymptotic distribution of the MLE]Joint coverage for all parameters of $A$ using the 95% confidence interval from the multivariate normal asymptotic distribution of the MLE

Marginal coverage for $a_{11}$ using the asymptotic distribution of the MLE]Marginal coverage for $a_{11}$ using the 95% confidence interval from the asymptotic distribution of the MLE

Marginal coverage for $a_{21}$ using the asymptotic distribution of the MLE]Marginal coverage for $a_{21}$ using the 95% confidence interval from the asymptotic distribution of the MLE

Marginal coverage for $a_{22}$ using the asymptotic distribution of the MLE]Marginal coverage for $a_{22}$ using the 95% confidence interval from the asymptotic distribution of the MLE