Sage Journals: Discover world-class research

Abstract

The network autocorrelation model has been the workhorse for estimating and testing the strength of theories of social influence in a network. In many network studies, different types of social influence are present simultaneously and can be modeled using various connectivity matrices. Often, researchers have expectations about the order of strength of these different influence mechanisms. However, currently available methods cannot be applied to test a specific order of social influence in a network. In this article, the authors first present flexible Bayesian techniques for estimating network autocorrelation models with multiple network autocorrelation parameters. Second, they develop new Bayes factors that allow researchers to test hypotheses with order constraints on the network autocorrelation parameters in a direct manner. Concomitantly, the authors give efficient algorithms for sampling from the posterior distributions and for computing the Bayes factors. Simulation results suggest that frequentist properties of Bayesian estimators on the basis of noninformative priors for the network autocorrelation parameters are overall slightly superior to those based on maximum likelihood estimation. Furthermore, when testing statistical hypotheses, the Bayes factors show consistent behavior with evidence for a true data-generating hypothesis increasing with the sample size. Finally, the authors illustrate their methods using a data set from economic growth theory.

Keywords

network autocorrelation model hypothesis testing Bayes factor order hypotheses empirical Bayes

1. Introduction

Social network research plays an important role in understanding how countries, organizations, and persons influence one another’s behavior, decision making, and well-being. The network autocorrelation model (Doreian 1981; Ord 1975) has been the workhorse for estimating and testing the strength of social influence with respect to a variable of interest in a given network (Fujimoto, Chou, and Valente 2011). In the network autocorrelation model, actors’ behavior, decision making, or well-being is assumed to be correlated, and a network autocorrelation parameter $ρ$ is estimated, representing the strength of a social influence mechanism in the network. The network autocorrelation model has been used to analyze network influence on individual behavior across many different fields, such as criminology (Tita and Radil 2011), ecology (McPherson and Nieswiadomy 2005), economics (Kalenkoski and Lacombe 2008), geography (Mur, López, and Angulo 2008), organization studies (Mizruchi and Stearns 2006), political science (Gimpel and Schuknecht 2003), and sociology (Burt and Doreian 1982).

Even though the network autocorrelation model has yielded many useful findings, the standard, or first-order, specification of the model implicitly assumes the presence of only a single network influence mechanism. However, this may be too restrictive in many cases, as different types of social influence are likely to be present simultaneously. For example, an actor is often a member of multiple distinct but potentially overlapping networks, such as a collaboration network, a friendship network, or an information-sharing network. Similarly, ties need not only be defined by social interaction but can also refer to geographical proximity, joint memberships, or money flows. Each of these networks may have some connection to the variable of interest; hence, a model that ignores multiple influence mechanisms might be overly simplistic. Besides the fact that individuals are often members of multiple, potentially overlapping, networks, it is also the case that many networks are characterized by subgroups. For example, children in school classes may belong to different social classes. We might ask if, with respect to school performance, children of socially disadvantaged backgrounds influence one another on the basis of the same influence mechanism, say friendship, stronger than do children from more privileged backgrounds. Another example of grouping can be found in economic growth theory: with respect to economic growth, central nations are expected to be subject to different processes than are peripheral developing nations (Dall’erba, Percoco, and Piras 2009; Leenders 1995).

The network autocorrelation model can be straightforwardly extended to include multiple influence mechanisms and different subgroups within a network. Anselin (2001), Badinger and Egger (2013), Elhorst, Lacombe, and Piras (2012), Gupta and Robinson (2015), Hepple (1995b), Lee and Liu (2010), and Leenders (1995) provided theoretical discussions of and frequentist estimation procedures for these higher-order network autocorrelation models, and empirical applications can be found in, for example, Beck, Gleditsch, and Beardsley (2006), Dall’erba et al. (2009), Lacombe (2004), LeSage and Pace (2008), Lin, Wu, and Lee (2006), and Tita and Radil (2011). There has been a steady evolution of Bayesian estimation and model selection techniques for the first-order network autocorrelation model as well as the encompassing spatial Durbin model over the past 25 years (for a taxonomy of spatial regression models, see Elhorst 2010), most notably in Dittrich, Leenders, and Mulder (2017a, 2017b), Hepple (1995a, 1995b), LeSage (1997a), LeSage and Pace (2007), and LeSage and Parent (2007), but comparable advances for higher-order models are nearly nonexistent. Only recently, Han, Hsieh, and Lee (2017) developed an efficient Bayesian sampler for higher-order network autocorrelation models on the basis of the exchange algorithm in Murray, Ghahramani, and MacKay (2006), providing a fast Bayesian estimation of the model when networks are large. However, despite the usefulness of the sampler in Han et al. (2017), their approach has two important limitations: first, it does not permit researchers to include substantial prior information about the network autocorrelation parameters to their analyses; second, it is restricted to testing nested hypotheses on these parameters only, which prohibits researchers from testing more intricate hypotheses on the network effects.

In this article, we develop a fully Bayesian framework that allows the inclusion of external prior information for estimating higher-order network autocorrelation models and for simultaneously testing multiple non-nested constraints on the relative order of network effects, such as $H_{0} : ρ_{1} = ρ_{2} = 0$ , $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , or $H_{3} : ρ_{1} = ρ_{2} > 0$ , where $ρ_{1}$ and $ρ_{2}$ quantify the strength of different influence mechanisms, respectively. Using a Bayesian approach for estimating and testing higher-order network autocorrelation models has several advantages compared with classical methods such as maximum likelihood estimation and null hypothesis significance testing. First, in contrast to maximum likelihood estimation of higher-order models, Bayesian estimation does not rely on asymptotic theory for computing standard errors of the network autocorrelation parameters, potentially resulting in a more accurate quantification of parameter uncertainty in small networks. Second, unlike null hypothesis significance testing, Bayes factors allow researchers to quantify relative evidence in the data in favor of the null, or any other, hypothesis against another hypothesis (Kass and Raftery 1995), and Bayes factors can be easily extended to test more than two hypotheses against each other simultaneously (Raftery, Madigan, and Hoeting 1997). Hence, this enables researchers to precisely test multiple network operationalizations against one another. Third, Bayes factors have been proven to be very effective for testing hypotheses with order constraints on the parameters of interest (Braeken, Mulder, and Wood 2015; Klugkist, Laudy, and Hoijtink 2005; Mulder 2016; Mulder and Wagenmakers 2016).

For example, in a simple research design in which regions are divided into higher-productivity and lower-productivity regions, it would be reasonable to expect differing levels of influence within and between the two sets of regions. In such a setup, one could hypothesize that higher-productivity regions might influence one another’s policies more strongly than lower-productivity regions would influence one another. Moreover, one could argue that the influence of higher-productivity regions on lower-productivity ones is likely higher than that of lower-productivity regions on higher-productivity regions. Of course, one could formulate competing hypotheses, such as that all network autocorrelations are zero, that they are nonzero but equal to one another, or that there is neither influence of lower-productivity regions on higher-productivity ones nor influence among lower-productivity regions themselves. This cannot be done using classical tests and is of particular importance in higher-order network autocorrelation models, as in this setting, researchers often have expectations about the order of strength of different network effects. Such expectations are implicit in most research and Bayes factors permit researchers to state them as actual hypotheses and then test them in a precise and straightforward manner. Beyond allowing researchers to test the more interesting hypotheses they may already have, we hope the availability of this approach will also stimulate researchers to theorize more creatively and more precisely about social influence phenomena, knowing their hypotheses can be easily and correctly tested against one another.

Thus, we propose Bayes factors for testing multiple hypotheses on the relative importance of network influence in a given network. The presented methodology not only allows a researcher to conclude if there is evidence in the data for, or against, nonzero network autocorrelations in the network, but it also grants the researcher the opportunity to simultaneously test any number of competing hypotheses on the relative strength of the network effects against one another. Subsequently, we conduct an extensive simulation study to investigate and show the desirable numerical properties of the new procedures, which we then use to reanalyze a data set from the economic growth literature. In addition to motivating and introducing new methodology, another main goal of this article is to make the methods easily available to researchers by providing ready-to-use R code.

We proceed as follows. In the next section, we present higher-order network autocorrelation models in detail before introducing Bayesian estimation and hypothesis testing techniques for the model in Sections 3 and 4. Concomitantly, we provide efficient implementations for estimating higher-order network autocorrelation models and for computing Bayes factors involving order hypotheses on the network autocorrelation parameters. We assess the numerical behavior of the proposed methods in Section 5. In Section 6, we illustrate our approaches with an empirical example, and Section 7 concludes.

2. The Network Autocorrelation Model

2.1. The First-Order Network Autocorrelation Model

Building on a standard linear regression model, the network autocorrelation model relaxes the assumption of independence of observations and allows correlation between them by explicitly using the underlying network structure. More precisely, an actor’s response is modeled as the weighted sum of the actor’s neighbor responses and a linear combination of actor attributes. In mathematical notation, the first-order network autocorrelation model is given by

y = ρ W y + X β + ε, ε ~ N (0_{g}, σ^{2} I_{g}),

(1)

where $y$ is a vector of length $g$ containing the observations for a variable of interest for the $g$ actors in a network, $X$ is a standard $(g \times k)$ design matrix (possibly including a vector of ones in the first column for an intercept term), $β \in R^{k}$ is a vector of $k$ regression coefficients as in standard linear regression, $ε \in R^{g}$ comprises the error terms that are assumed to be independent and identically normally distributed with zero mean and variance of $σ^{2}$ , $0_{g}$ is a vector of zeros of length $g$ , and $I_{g}$ denotes the $(g \times g)$ identity matrix. Furthermore, $W$ is a $(g \times g)$ connectivity matrix, where a nonzero entry $W_{ij}$ amounts to the influence of actor $j$ on actor $i$ and $W_{ii} = 0$ for all $i \in {1, . . ., g}$ . Typically, $W$ is row-standardized; that is, all rows sum to 1, which in this case means that the term $W y$ represents the vector of the actors’ neighbor average responses. Finally, $ρ$ is the network autocorrelation parameter and quantifies the magnitude of the network influence with respect to a variable of interest in a given network as induced by $W$ . For a substantive interpretation of the model, see Leenders (1995, 2002).

The model’s likelihood is multivariate normal and can be written as

f (y | ρ, σ^{2}, β) = | \det (A_{ρ}) | {(2 π σ^{2})}^{- \frac{g}{2}} \exp (- \frac{1}{2 σ^{2}} {(A_{ρ} y - X β)}^{T} (A_{ρ} y - X β)),

(2)

where $A_{ρ} : = I_{g} - ρ W$ (see, e.g., Doreian 1981). Usually, the parameter space of $ρ$ is chosen as the interval around $ρ = 0$ for which $A_{ρ}$ is nonsingular (Hepple 1995a; LeSage and Parent 2007; Smith 2009). The bounds of this feasible range of $ρ$ are determined by the eigenvalues of $W$ with the smallest and largest real part, respectively, which means that $ρ$ must be contained in $(1 / Re (λ_{g} [W]), 1 / Re (λ_{1} [W]))$ , where $λ_{1} [W], . . ., λ_{g} [W]$ denote the eigenvalues of $W$ with $Re (λ_{1} [W]) \geq . . . \geq Re (λ_{g} [W])$ (Hepple 1995a). The model’s overall parameter space of $θ : = (ρ, σ^{2}, β)$ is then given by $Θ : = Θ_{ρ} \times Θ_{σ^{2}} \times Θ_{β} = (1 / Re (λ_{g} [W]), 1 / Re (λ_{1} [W])) \times (0, \infty) \times R^{k}$ .¹

2.2. Higher-Order Network Autocorrelation Models

The standard, or first-order, network autocorrelation model in equation (1) is limited to a single network autocorrelation parameter $ρ$ and a single connectivity matrix $W$ . Hence, in this model the network influence is assumed to be homogeneously distributed across the network on the basis of a single influence mechanism. Extending the first-order model to higher-order network autocorrelation models allows a richer dependence structure by including multiple connectivity matrices, representing different influence mechanisms (e.g., geographic adjacency and social similarity).² This amounts to the functional form

y = \sum_{r = 1}^{R} ρ_{r} W_{r} y + X β + ε, ε ~ N (0_{g}, σ^{2} I_{g}),

(3)

where ${W_{r}}_{r}$ are distinct connectivity matrices, and the corresponding network autocorrelation parameters ${ρ_{r}}_{r}$ denote the strength of the different influence mechanisms.

In practice, there can be overlap between connectivity matrices; that is, different connectivity matrices may share common ties. Partially overlapping connectivity matrices do not pose identification problems as long as there is no complete overlap (Elhorst et al. 2012), but overlap does make interpretability of the network autocorrelation parameters more difficult (Elhorst et al. 2012; LeSage and Pace 2011). In particular, partial overlap may result in empirically unlikely negative network autocorrelations (Dittrich et al. 2017a; Elhorst et al. 2012). We analyze the numerical effect of overlapping connectivity matrices on the estimation of and hypothesis tests on $ρ : = (ρ_{1}, . . ., ρ_{R})$ in more detail in a simulation study in Section 5.

Higher-order network autocorrelation models not only allow one to consider multiple influence mechanisms, but they also allow researchers to partition a network into several subgroups. In the latter case, possible heterogeneity in network influence strength is included in the model by allowing for different levels of network autocorrelation within and between subgroups for a given influence mechanism (e.g., geographic adjacency). Dividing actors in a network into $S$ subgroups, with sizes $g_{1}, . . ., g_{S}$ and $\sum_{s = 1}^{S} g_{s} = g$ , a model with multiple subgroups can be expressed using the representation in equation (3) by writing

\begin{matrix} y = [\begin{matrix} y_{1} \\ \dots \\ y_{S} \end{matrix}] = [\begin{matrix} ρ_{11} W_{11} & \dots & ρ_{1 S} W_{1 S} & \dots & \dots & \dots \\ ρ_{S 1} W_{S 1} & \dots & ρ_{SS} W_{SS} \end{matrix}] [\begin{matrix} y_{1} \\ \dots \\ y_{S} \end{matrix}] + X β + ε \\ = (ρ_{11} [\begin{matrix} W_{11} & 0 & \dots \\ 0 & \dots & 0 \end{matrix}] + \dots + ρ_{S S} [\begin{matrix} 0 & \dots & 0 \\ 0 & \dots & W_{S S} \end{matrix}]) [\begin{matrix} y_{1} \\ \dots \\ y_{S} \end{matrix}] + X β + ε, \end{matrix}

where $y_{s}$ is a vector of length $g_{s}$ containing the observations for the $g_{s}$ actors in the $s$ th subgroup of the network, $W_{ss'}$ is a $(g_{s} \times g_{s'})$ connectivity matrix defining the influence relationships between members of subgroup $s'$ and members of subgroup $s$ , and $ρ_{ss'}$ is a network autocorrelation parameter representing the strength of the network influence of the actors in subgroup $s'$ on the actors in subgroup $s$ . Because the sizes of the $S$ subgroups potentially differ, each $W_{ss'}$ is typically row-standardized separately, which removes scale effects and eases direct comparison between the network autocorrelation parameters (McMillen, Singell, and Waddell 2007).

The structure of the likelihood function of higher-order network autocorrelation models remains the same as in the first-order model in equation (2), with $A_{ρ}$ being replaced by $A_{ρ} : = I_{g} - \sum_{r = 1}^{R} ρ_{r} W_{r}$ . As in the first-order model, we define the $R$ -dimensional parameter space of $ρ = (ρ_{1}, . . ., ρ_{R})$ as the space containing the origin for which $A_{ρ}$ is nonsingular. Elhorst et al. (2012) provided a simple procedure for checking if a point $ρ^{*} \in ℝ$ ^R, given $W_{1}, . . ., W_{R}$ , lies in the corresponding feasible parameter space $Θ_{ρ}$ .³ Figure 1 shows two exemplary feasible parameter spaces of $(ρ_{1}, ρ_{2})$ in a second-order network autocorrelation model for simulated data based on nonoverlapping matrices (left) and with 40 percent overlap (right).

Figure 1.

Feasible two-dimensional parameter space $Θ_{(ρ_{1}, ρ_{2})}$ for simulated data based on nonoverlapping connectivity matrices (left) and connectivity matrices with a 40 percent overlap (right).

2.3. Application of a Higher-Order Network Autocorrelation Model: Economic Growth of Labor Productivity

In this subsection, we introduce a data set from the economic growth literature that prompts questions that can readily be answered using the proposed Bayes factor approach. Here, we merely describe the data set and the research questions; in Section 6, we provide solutions to the posed questions.

Dall’erba et al. (2009) used a second-order network autocorrelation model to explain the growth rates of labor productivity in the service industry across 188 European regions in 12 countries from 1980 to 2003. To adequately deal with interregional spillovers, the authors introduced two different spatial weight matrices, $W_{1}$ and $W_{2}$ , “under the assumption that economic interactions decrease very substantially when a national border is passed” (p. 337). They constructed $W_{1}$ using a region’s three nearest neighbors within the same country, and $W_{2}$ was based on the three nearest neighbors in bordering countries. The authors then row-normalized these raw binary connectivity matrices. In addition to an intercept term, Dall’erba et al. (2009) considered four more explanatory variables: the growth rate of market service output in a region, the initial labor productivity gap between the region and the leading region, a measure of the region’s urbanization, and a measure of the region’s accessibility. Thus, their model is given by

y = ρ_{1} W_{1} y + ρ_{2} W_{2} y + β_{1} X_{\cdot 1} + β_{2} X_{\cdot 2} + β_{3} X_{\cdot 3} + β_{4} X_{\cdot 4} + β_{5} X_{\cdot 5} + ε, ε ~ N (0, σ^{2} I),

(4)

where $y \in R^{188}$ is the vector of growth rates of labor productivity in the service industry across the 188 regions, $β \in R^{5}$ represents the vector of the four regression coefficients plus an intercept term, $X \in R^{188 \times 5}$ contains the values for the explanatory variables for the 188 regions, where $X_{. i}, i \in {1, . . ., 5}$ , denotes the $i$ th column of $X$ , $0 \in R^{188}$ is a vector of zeros, and $I \in R^{188 \times 188}$ represents the corresponding identity matrix.

The authors found that the estimate of $ρ_{1}$ , reflecting interactions within the same country, is positive and statistically significant, indicating the presence of positive spatial within-country spillover effects. On the other hand, the estimate of $ρ_{2}$ is very close to zero and statistically not significant. Dall’erba et al. (2009) concluded by saying that “the results obtained also confirm the hypothesis that economic interactions decrease very substantially when a national border is passed (indeed, the coefficient reflecting external spillovers is not statistically significant)” (p. 342). However, to draw this conclusion, one needs to directly test a corresponding hypothesis, for example, $H_{1} : ρ_{1} > ρ_{2} = 0$ , against a (set of) competing hypothesis (hypotheses), such as $H_{0} : ρ_{1} = ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , or (and) $H_{3} : ρ_{1} = ρ_{2} > 0$ . These four hypotheses correspond to the notion of “no network effects” ( $H_{0}$ ), “a positive within-country network effect only” ( $H_{1}$ ), “positive but decreasing network effects after a national border is passed” ( $H_{2}$ ), and “positive and equally strong within-country and between-country network effects” ( $H_{3}$ ). Currently, no formal statistical method is available to directly test such hypotheses on multiple network autocorrelations. In the remainder of this article, we develop a Bayesian framework for testing and quantifying the evidence in the data for such hypotheses involving equality as well as order constraints on the network effects. We come back to this empirical example and test these hypotheses against one another using Bayes factors in Section 6. Finally, Dall’erba et al. (2009) stated that “there is evidence that the coefficients in a growth model are potentially varying for different subsets of the total sample” (p. 342). In Section 6, we investigate if there is such evidence in this data set by considering a network autocorrelation model with two subgroups, allowing for differing levels of network autocorrelation within and between the two subgroups.

3. Bayesian Estimation of Higher-Order Network Autocorrelation Models

3.1. Prior Specification

Bayesian estimation starts with formulating prior expectations about the parameters in a model in terms of prior distributions, or priors. These priors summarize the (lack of) information about the model parameters before observing the data. If such prior information is available (e.g., on the basis of previous literature), informative priors for the parameters of interest can be formulated. For the first-order network autocorrelation model, Dittrich et al. (2017a) performed a literature study, looking at the distribution of reported network autocorrelations across many different fields. On the basis of their results, most of the analyzed data in the literature exhibit positive network autocorrelation between 0 and 0.5, and it seems highly unlikely to observe negative network autocorrelation estimates (as previously noted by, e.g., Neuman and Mizruchi 2010). This information could then be used to formulate an informative prior for $ρ$ in a first-order network autocorrelation model, as Dittrich et al. (2017a, 2017b) did. On the other hand, if such prior information is missing, or a researcher deliberately refrains from adding additional information to the model through the prior, noninformative priors are often used (Gelman et al. 2003). In the network autocorrelation model, $σ^{2}$ and $β$ are commonly assigned the standard noninformative priors $p (σ^{2}) \propto 1 / σ^{2}$ and $p (β) \propto 1$ (Hepple 1995a; Holloway, Shankar, and Rahman 2002; LeSage 1997b). These priors assume that all possible values for $\log (σ^{2})$ and $β$ are equally likely a priori. We also do so throughout this article. Note that these priors are not proper in the sense that they do not integrate to a finite value, which does not affect estimation of the model.

We use a general $R$ -variate normal prior for $ρ$ , $p (ρ) = ϕ_{μ, Σ} (ρ) 1_{Θ_{ρ}} (ρ) c^{- 1}$ , where $ϕ_{μ, Σ} (\cdot)$ denotes the probability density function of a multivariate normal distribution with prior mean $μ$ and prior covariance matrix $Σ$ , $1_{\cdot} (\cdot)$ is the standard indicator function, and $c : = \int_{Θ_{ρ}} ϕ_{μ, Σ} (ρ) d ρ$ is a normalizing constant representing the probability mass of $ϕ_{μ, Σ} (\cdot)$ contained in the network autocorrelation parameters’ space $Θ_{ρ}$ . If researchers have sufficient prior information about the network autocorrelations, they can specify $μ$ and $Σ$ directly. Alternatively, when specifying $Σ$ vaguely enough, that is, with very large diagonal elements, the prior becomes essentially identical to a proper uniform distribution for $ρ$ on the bounded parameter space $Θ_{ρ}$ .⁴

In summary, we use the following priors for the model parameters, which we assume to be a priori independent from one another:

p (ρ) = ϕ_{μ, Σ} (ρ) 1_{Θ_{ρ}} (ρ) c^{- 1},

(5)

p (σ^{2}) \propto 1 / σ^{2},

(6)

\begin{matrix} p (β) & \propto 1, \\ p (ρ, σ^{2}, β) & = p (ρ) \times p (σ^{2}) \times p (β) . \end{matrix}

(7)

3.2. Posterior Computation

After having specified a prior distribution for the model parameters, the information contained in the observed data $y$ is used to update the prior distribution and to arrive at the posterior distribution, or simply posterior. The posterior is used for all Bayesian inference in the model, for example, to obtain point estimates of model parameters (the posterior mean or the posterior median), to construct Bayesian credible intervals (i.e., intervals in the domain of the posterior), or to determine other statistics of interest, such as the probability that one network effect is stronger than another one for given data, $p (ρ_{1} > ρ_{2} | y)$ . In this subsection, we specify the posterior in higher-order network autocorrelation models on the basis of the priors from Section 3.1, and we provide an automatic and efficient scheme to sample from this posterior.

First, Bayes’ theorem gives that the posterior is proportional to the prior multiplied by the likelihood, more precisely

\begin{matrix} p (ρ, σ^{2}, β | y) & = \frac{p (ρ, σ^{2}, β) f (y | ρ, σ^{2}, β)}{\int_{Θ_{ρ}} \int_{Θ_{σ^{2}}} \int_{Θ_{β}} p (ρ, σ^{2}, β) f (y | ρ, σ^{2}, β) d β d σ^{2} d ρ} \\ \propto p (ρ, σ^{2}, β) f (y | ρ, σ^{2}, β) . \end{matrix}

(8)

The denominator of equation (8) is called the marginal likelihood and ensures that the posterior integrates to unity. The marginal likelihood does not depend on any model parameters and can be ignored in Bayesian estimation of the model. On the other hand, when testing hypotheses, the marginal likelihood does play a central role, as it quantifies how plausible the data are under a specific hypothesis, which we discuss in the following section.

Next, using the priors in equations (5), (6), and (7), and the likelihood function in equation (2), we can express the posterior $p (ρ, σ^{2}, β | y)$ in higher-order network autocorrelation models as

\begin{matrix} p (ρ, σ^{2}, β | y) \propto | A_{ρ} | {(σ^{2})}^{- \frac{g}{2} - 1} \\ \exp (- \frac{1}{2} {(ρ - μ)}^{T} Σ^{- 1} (ρ - μ) - \frac{1}{2 σ^{2}} {(A_{ρ} y - X β)}^{T} (A_{ρ} y - X β)) . \end{matrix}

(9)

However, the posterior in equation (9) does not belong to a family of known probability distributions, so we cannot directly infer its posterior mean, its quantiles, or other quantities of interest.⁵ In this case, it is common to sample random draws from the posterior and to use these posterior draws to approximate any desired statistic. An efficient method is to sequentially draw from the conditional posteriors, that is, the posterior of one parameter (block) given the remaining parameters and the data (Gelfand and Smith 1990; Geman and Geman 1984).⁶ Extending the proposed method for the first-order network autocorrelation model in Dittrich et al. (2017a) to higher-order models, we sample the model parameters according to the following blocks: $(ρ, β_{1}), σ^{2}$ , and $\tilde{β}$ , where $β_{1}$ denotes the model’s intercept and $\tilde{β} : = (β_{2}, . . ., β_{k})$ contains the remaining regression coefficients. By simultaneously sampling $ρ$ and $β_{1}$ , we can better capture potential posterior correlation between the network effects as well as potential correlation between the network effects and the intercept (Dittrich et al. 2017a). The conditional posteriors for the proposed blocks are then given by (see, e.g., LeSage 1997a)

\begin{matrix} p (ρ, β_{1} | σ^{2}, \tilde{β}, y) \\ \propto | A_{ρ} | \exp (- \frac{1}{2} {(ρ - μ)}^{T} Σ^{- 1} (ρ - μ) - \frac{1}{2 σ^{2}} {(A_{ρ} y - X β)}^{T} (A_{ρ} y - X β)), \end{matrix}

(10)

p (σ^{2} | ρ, β_{1}, \tilde{β}, y) ~ IG (\frac{g}{2}, \frac{{(A_{ρ} y - X β)}^{T} (A_{ρ} y - X β)}{2}),

(11)

p (\tilde{β} | ρ, β_{1}, σ^{2}, y) ~ N (μ_{\tilde{β}}, Σ_{\tilde{β}}),

(12)

where $IG (\cdot, \cdot)$ denotes the inverse gamma distribution, and $μ_{\tilde{β}}$ and $Σ_{\tilde{β}}$ are given in Appendix A.

Drawing from the conditional posteriors in equations (11) and (12) can be done using standard statistical software. In contrast, the conditional posterior in equation (10) does not have a well-known form and cannot be directly sampled from. Instead, we use the Metropolis-Hastings algorithm (Hastings 1970; Metropolis et al. 1953) to generate draws from the conditional posterior for $(ρ, β_{1})$ . In short, the algorithm generates candidate values for the conditional posterior from a candidate-generating distribution that can be easily sampled from and subsequently accepts, or rejects, the draws with a certain probability. The algorithm’s efficiency mainly depends on the shape of the proposed candidate-generating distribution; if possible, exploiting the form of the conditional posterior and specifying a candidate-generating distribution that closely approximates it results in efficient solutions (Chib and Greenberg 1994, 1995, 1998).

We first approximate $\log (| A_{ρ} |)$ by a quadratic polynomial in $ρ$ by virtue of Jacobi’s formula and the Mercator series (see Appendix A). Next, we observe that the logarithm of the exponential in equation (10) can also be written as a quadratic polynomial in $(ρ, β_{1})$ . Hence, the logarithm of the conditional posterior itself can be approximated by a quadratic polynomial in $(ρ, β_{1})$ . Finally, by equating coefficients of this quadratic polynomial with the log-kernel of the probability density function of an $(R + 1)$ -variate normal distribution, the density in equation (10) can be approximated by an $(R + 1)$ -variate normal candidate-generating density for $(ρ, β_{1})$ that is tailored to the conditional posterior for $(ρ, β_{1})$ .⁷ All details and the full sampling scheme can be found in Appendix A.

We implemented our proposed approach in R (R Core Team 2017) and compared its performance with a sampling scheme that does not block the network autocorrelation parameters and the intercept but uses one-dimensional random walk algorithms to generate draws for each network effect sequentially, as in Zhang et al. (2013). Figure 2 shows exemplary trace plots of posterior draws for $ρ_{1}$ and $ρ_{2}$ on the basis of the two sampling schemes and the data in Dall’erba et al. (2009) with model (4). We see that our method results in a more efficient implementation than drawing each network effect separately, as it generates Markov chains that explore the corresponding parameter space of $(ρ_{1}, ρ_{2})$ much faster. Finally, our approach is fully automatic in the sense that there are no parameters to be tuned in the Metropolis-Hastings algorithm, such as the variance of candidate-generating distributions.

Figure 2.

Trace plots of posterior draws for $ρ_{1}$ and $ρ_{2}$ on the basis of our proposed scheme (top row) and a random walk algorithm (bottom row) for the data in Dall’erba et al. (2009) with model (4).

To conclude, the presented sampling algorithm allows researchers to automatically and efficiently draw from the posterior on the basis of a general multivariate normal prior for the network autocorrelation parameters, including informative as well as noninformative specifications. Such efficient sampling is essential for performing any Bayesian estimation of the model, which solely relies on the generated posterior draws. An R program that implements our scheme is available at https://github.com/DittrichD/BayesNAM.

4. Bayesian Hypothesis Testing in Higher-Order Network Autocorrelation Models

In many network studies, researchers have competing theories about the specific order of different network effect strengths. These theories can be formulated as hypotheses on the network autocorrelation parameters, for example, as $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , or $H_{3} : ρ_{1} = ρ_{2} > 0$ , and can include as many network autocorrelation parameters as relevant to one’s theory. The focus of interest then lies on which substantive theory, or hypothesis, is most plausible and most supported by the data and how strongly. In this section, we consider $T \geq 2$ constrained hypotheses on the network effects, where a hypothesis $H_{t}$ , $t \in {0, . . ., T - 1}$ , contains $q_{t}^{E}$ equality and $q_{t}^{I}$ inequality constraints on $ρ$ , that is,

H_{t} : = {\begin{matrix} R_{t}^{E} ρ = r_{t}^{E} \\ R_{t}^{I} ρ > r_{t}^{I}, \end{matrix}

(13)

where $R_{t}^{E}$ and $r_{t}^{E}$ are a $(q_{t}^{E} \times R)$ matrix and a vector of length $R$ , respectively, containing the coefficients of the $q_{t}^{E}$ equality constraints under hypothesis $H_{t}$ . Equivalently, the $(q_{t}^{I} \times R)$ matrix $R_{t}^{I}$ and the vector $r_{t}^{I}$ contain the coefficients of the $q_{t}^{I}$ inequality constraints. For example, the constraints induced by the three hypotheses $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , and $H_{3} : ρ_{1} = ρ_{2} > 0$ can be represented by equation (13) as⁸

H_{1} : R_{1}^{E} = (0, 1), r_{1}^{E} = 0, R_{1}^{I} = (1, 0), r_{1}^{I} = 0,

H_{2} : R_{2}^{I} = (\begin{matrix} 1 & - 1 \\ 0 & 1 \end{matrix}), r_{2}^{I} = (0, 0),

H_{3} : R_{3}^{E} = (1, - 1), r_{3}^{E} = 0, R_{3}^{I} = (1, 0), r_{3}^{I} = 0 .

Bayes factors for testing hypotheses of the form in equation (13) have also been proposed for other types of parameters, such as group variances (Böing-Messing et al. 2017), intraclass correlations (Mulder and Fox 2018), and regression effects (Mulder, Hoijtink, and Klugkist 2010).

4.1. The Bayes Factor

The Bayes factor is a comparative Bayesian hypothesis testing criterion that directly quantifies the relative evidence in the data in favor of a hypothesis. The Bayes factor of hypothesis $H_{t}$ against hypothesis $H_{t'}$ , $t, t' \in {0, . . ., T - 1}$ , is defined as the ratio of the marginal likelihoods under the two hypotheses, that is, in the network autocorrelation model as

\begin{matrix} B_{tt'} = \frac{m_{t} (y)}{m_{t'} (y)} \\ = \frac{\int_{Θ_{ρ_{t}}} \int_{0}^{\infty} \int_{R^{k}} p_{t} (ρ_{t}) p (σ^{2}) p (β) f (y | ρ_{t}, σ^{2}, β) d β d σ^{2} d ρ_{t}}{\int_{Θ_{ρ_{t'}}} \int_{0}^{\infty} \int_{R^{k}} p_{t'} (ρ_{t'}) p (σ^{2}) p (β) f (y | ρ_{t'}, σ^{2}, β) d β d σ^{2} d ρ_{t'}}, \end{matrix}

(14)

where $ρ_{t}$ are the network autocorrelation parameters under hypothesis $H_{t}$ , $p_{t} (ρ_{t})$ denotes their prior density, and $Θ_{ρ_{t}}$ is the corresponding parameter space (Kass and Raftery 1995). We assume common priors for $σ^{2}$ and $β$ under both hypothesis $H_{t}$ and hypothesis $H_{t'}$ as they are seen as nuisance parameters in the presented framework. The exact form of the priors for these nuisance parameters typically does not alter the magnitude of the Bayes factor (Kass and Raftery 1995).

The marginal likelihood under hypothesis $H_{t}$ , $m_{t} (y)$ , is a weighted average likelihood over the parameter space under hypothesis $H_{t}$ , with the prior $p_{t} (ρ_{t})$ under hypothesis $H_{t}$ acting as a weight function. Therefore, it can be interpreted as the probability that the data were observed under hypothesis $H_{t}$ . Hence, the Bayes factor, as the ratio of two marginal likelihoods, quantifies the relative evidence that the data were observed under hypothesis $H_{t}$ rather than hypothesis $H_{t'}$ . For example, when $B_{tt'} = 5$ , this indicates that the data are five times more likely to have occurred under hypothesis $H_{t}$ compared with hypothesis $H_{t'}$ . Conversely, when $B_{tt'} = 1 / 5$ , it is five times more likely to have observed the data under hypothesis $H_{t'}$ than under hypothesis $H_{t}$ .

To facilitate interpretation of the Bayes factor, Jeffreys (1961) proposed a classification scheme that groups Bayes factors into different categories (see Table 1). For example, there is “strong” evidence in the data for hypothesis $H_{t}$ , relative to hypothesis $H_{t'}$ , when $B_{tt'} > 10$ and, equivalently, “strong” relative evidence in the data for hypothesis $H_{t'}$ when $B_{tt'} < 1 / 10$ . This grouping provides verbal descriptions and rules of thumb when speaking of relative evidence in the data in favor of a hypothesis, but it is still somewhat arbitrary. Ultimately, the interpretation of the magnitude of a Bayes factor should hinge on the context of the research question (Kass and Raftery 1995). For some introductory texts on Bayes factor testing in social science research, we refer the interested reader to Braeken et al. (2015), Raftery (1995), van de Schoot et al. (2011), or Wagenmakers (2007).

Table 1.

Evidence Categories for the Bayes Factor $B F_{tt'}$ as Given by Jeffreys (1961)

$B F_{tt'}$	$\log (B F_{tt'})$	Interpretation
>100	>4.61	Decisive evidence for hypothesis $H_{t}$
30 to 100	3.40 to 4.61	Very strong evidence for hypothesis $H_{t}$
10 to 30	2.30 to 3.40	Strong evidence for hypothesis $H_{t}$
3 to 10	1.10 to 2.30	Substantial evidence for hypothesis $H_{t}$
1 to 3	0 to 1.10	Not worth more than a bare mention
1/3 to 1	−1.10 to 0	Not worth more than a bare mention
1/10 to 1/3	−2.30 to −1.10	Substantial evidence for hypothesis $H_{t'}$
1/30 to 1/10	−3.40 to −2.30	Strong evidence for hypothesis $H_{t'}$
1/100 to 1/30	−4.61 to −3.40	Very strong evidence for hypothesis $H_{t'}$
<1/100	<–4.61	Decisive evidence for hypothesis $H_{t'}$

4.2. Bayes Factor Computation

In this section, we present efficient methods to compute marginal likelihoods and Bayes factors in higher-order network autocorrelation models. Using a multivariate normal prior for $ρ_{t}$ under hypothesis $H_{t}$ , $p_{t} (ρ_{t}) = ϕ_{μ_{t}, Σ_{t}} (ρ_{t}) 1_{Θ_{ρ_{t}}} (ρ_{t}) c_{t}^{- 1}$ , $c_{t} : = \int_{Θ_{ρ_{t}}} ϕ_{μ_{t}, Σ_{t}} (ρ_{t}) d ρ_{t}$ , the noninformative prior $p (σ^{2}, β) \propto 1 / σ^{2}$ for the nuisance parameters $σ^{2}$ and $β$ , and after analytically integrating out $σ^{2}$ and $β$ , the Bayes factor of hypothesis $H_{t}$ against hypothesis $H_{t'}$ in equation (14) reduces to

\begin{matrix} B_{tt'} = \frac{m_{t} (y)}{m_{t'} (y)} \\ = \frac{c_{t'} {| 2 π Σ_{t} |}^{- \frac{1}{2}}}{c_{t} {| 2 π Σ_{t'} |}^{- \frac{1}{2}}} \\ \frac{\int_{Θ_{ρ_{t}}} | A_{ρ_{t}} | \exp (- \frac{1}{2} {(ρ_{t} - μ_{t})}^{T} Σ_{t}^{- 1} (ρ_{t} - μ_{t})) y^{T} A_{ρ_{t}}^{T} M A_{ρ_{t}} y^{- \frac{g - k}{2}} d ρ_{t}}{\int_{Θ_{ρ_{t'}}} | A_{ρ_{t'}} | \exp (- \frac{1}{2} {(ρ_{t'} - μ_{t'})}^{T} Σ_{t'}^{- 1} (ρ_{t'} - μ_{t'})) y^{T} A_{ρ_{t'}}^{T} M A_{ρ_{t'}} y^{- \frac{g - k}{2}} d ρ_{t'}}, \end{matrix} (15)

(15)

where $M : = I_{g} - X {(X^{T} X)}^{- 1} X^{T}$ .⁹

The normalizing constants $c_{t}$ and $c_{t'}$ in equation (15) correspond to the prior probabilities that the unconstrained priors for $ρ_{t}$ under hypothesis $H_{t}$ and for $ρ_{t'}$ under hypothesis $H_{t'}$ , $N (μ_{t}, Σ_{t})$ and $N (μ_{t'}, Σ_{t'})$ , are in agreement with the constraints imposed under the two hypotheses. They can be approximated by simple rejection sampling, that is, by sampling draws from the unconstrained priors and recording the proportions of draws that are in agreement with the constraints. The remaining integrals in the numerator and denominator of equation (15) do not have closed-form solutions and have to be evaluated numerically. For this purpose, we rely on an importance sampling procedure (Owen and Zhou 2000) that is explained next.

Let $h_{t} (ρ_{t}) : = | A_{ρ_{t}} | \exp (- \frac{1}{2} {(ρ_{t} - μ_{t})}^{T} Σ_{t}^{- 1} (ρ_{t} - μ_{t})) y^{T} A_{ρ_{t}}^{T} M A_{ρ_{t}} y^{- \frac{g - k}{2}}$ denote the integrand in the numerator of equation (15) (all steps apply equivalently to $h_{t'} (ρ_{t'})$ ). Then, we can write for the numerator of equation (15)

\begin{matrix} I_{t} : = \int_{Θ_{ρ_{t}}} h_{t} (ρ_{t}) d ρ_{t} = \int_{Θ_{ρ_{t}}} q_{t} (ρ_{t}) \frac{h_{t} (ρ_{t})}{q_{t} (ρ_{t})} d ρ_{t} = E [\frac{h_{t} (P)}{q_{t} (P)}] \\ \approx N^{- 1} \sum_{i = 1}^{N} \frac{h_{t} (ρ_{i})}{q_{t} (ρ_{i})} : = {\hat{I}}_{t}, \end{matrix}

(16)

where $P$ is a random variable with probability density function $q_{t} (\cdot)$ known as the importance density, $E [h_{t} (P) / q_{t} (P)]$ denotes the expected value for $h_{t} (P) / q_{t} (P)$ , and $ρ_{i}$ are draws from $q_{t} (\cdot)$ , forming realizations of $P$ . The specification of the importance density is crucial for the algorithm’s efficiency, where we aim to construct a density that closely follows the actual integrand but has heavier tails than the latter and is easy to sample from (Owen and Zhou 2000).

As in Section 3.2, we approximate $\log (| A_{ρ_{t}} |)$ by a second-order polynomial in $ρ_{t}$ at its maximum, the origin. This results in a normal approximation of $| A_{ρ_{t}} |$ . We apply the same rationale to the third term in $h_{t} (ρ_{t})$ , $y^{T} A_{ρ_{t}}^{T} M A_{ρ_{t}} y^{- (g - k) / 2}$ . Hence, $h_{t} (ρ_{t})$ can be approximated by the product of three multivariate normal densities that itself is a multivariate normal density, which we use as importance density in equation (16).¹⁰ Finally, as $ρ_{t}$ approaches the boundary of $Θ_{ρ_{t}}$ , the proposed normal importance density has heavier tails than $h_{t} (ρ_{t})$ , because in this case $| A_{ρ_{t}} |$ decreases toward zero, but the normal importance density does not. This ensures a finite variance of the importance sampling estimate ${\hat{I}}_{t}$ and reliable estimation of the associated Bayes factors. All details can be found in Appendix B.

4.3. A Default Prior for $ρ$

When testing multiple hypotheses against one another, a prior for the tested model parameters has to be specified under each hypothesis. Arguably, eliciting a prior under each hypothesis directly can become difficult and cumbersome, especially with a large number of hypotheses at hand. As an alternative, we propose an automatic empirical Bayes procedure (Carlin and Louis 2000) for constructing a default prior $p_{t} (ρ_{t})$ under each hypothesis $H_{t}$ such that the marginal likelihood under every hypothesis $H_{t}$ is maximized.

First, we center the multivariate normal default prior $p_{t} (ρ_{t})$ under hypothesis $H_{t}$ around the origin. The motivation for this choice is that the origin is located at the boundary of typical (in)equality constrained hypotheses in the network autocorrelation model, such as $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , or $H_{3} : ρ_{1} = ρ_{2} > 0$ , and previous literature on order constrained hypothesis testing suggests “there is a gain of evidence for the inequality constrained hypothesis that is supported by the data when the unconstrained prior is located on the boundary” (Mulder 2014:452). Second, in contrast to Bayesian estimation, assigning very large values to the diagonal elements of the prior’s covariance matrix $Σ_{t}$ is not feasible in hypothesis testing. In hypothesis testing, we need to explicitly calculate the normalizing constant $c_{t} = \int_{Θ_{ρ_{t}}} ϕ_{μ_{t}, Σ_{t}} (ρ_{t}) d ρ_{t}$ , and a vague formulation of $Σ_{t}$ makes this computation either unstable or tremendously time consuming because of the fairly small parameter space $Θ_{ρ_{t}}$ .¹¹ Instead, we set the prior covariance matrix $Σ_{t}$ of the free network autocorrelation parameter(s) under a hypothesis, for example, $ρ_{1}$ under hypothesis $H_{1} : ρ_{1} > ρ_{2} = 0$ , to the product of the corresponding asymptotic variance-covariance matrix of the maximum likelihood estimate of $ρ_{t}$ and a hypothesis-specific scaling factor $σ_{t}^{2}$ , similar to Zellner’s g-prior (Zellner 1986). In mathematical notation, $Σ_{t} = σ_{t}^{2} I {(ρ_{t})}^{- 1}$ , where $I (ρ_{t})$ denotes the submatrix of the network autocorrelation model’s Fisher information matrix $I (ρ_{t}, σ^{2}, β)$ . Hence, there is only one free parameter in the prior specification of $Σ_{t}$ left, $σ_{t}^{2}$ . Following Hansen and Yu (2001) and Liang et al. (2008), we use a local empirical Bayes approach and choose $σ_{t}^{2}$ such that the associated marginal likelihood $m_{t} (y)$ is maximized, avoiding arbitrary prior specification. Because there is no analytic solution to this maximization problem, one way to approximate the maximum of $m_{t} (y)$ is to compute the marginal likelihood on a grid of increasing values for $σ_{t}^{2}$ until a stopping rule is reached, for example, until the marginal likelihood is not increasing anymore, or until it is not increasing by more than some tolerance factor.¹²

Figure 3 shows the marginal likelihoods under the three constrained hypotheses $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , and $H_{3} : ρ_{1} = ρ_{2} > 0$ ; the marginal likelihood under an unconstrained hypothesis $H_{u} : (ρ_{1}, ρ_{2}) \in Θ_{(ρ_{1}, ρ_{2})}$ ; and the logarithm of the Bayes factors of the three constrained hypotheses against the unconstrained hypothesis $H_{u}$ as a function of $σ_{t}^{2}$ , $t \in {1, 2, 3, u}$ , for the data in Dall’erba et al. (2009) with model (4). All of the marginal likelihoods sharply increase for smaller values for $σ_{t}^{2}$ before they gradually decrease after having reached their respective maxima. The associated Bayes factors, in which we are ultimately interested, appear fairly robust to the choice of $σ_{t}^{2}$ , except for extremely small values for $σ_{t}^{2}$ . For the vast majority of data sets we looked at, we observed essentially the same pattern, with almost all the optimal values for $σ_{t}^{2}$ lying between 2 and 10.

Figure 3.

Marginal likelihoods $m_{t} (y)$ , $t \in {1, 2, 3, u}$ , under the hypotheses $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , $H_{3} : ρ_{1} = ρ_{2} > 0$ , and $H_{u} : (ρ_{1}, ρ_{2}) \in Θ_{(ρ_{1}, ρ_{2})}$ as a function of $σ_{t}^{2}$ (left), and the logarithm of the Bayes factors $\log (B F_{1 u}), \log (B F_{2 u}),$ and $\log (B F_{3 u})$ as a function of $σ_{t}^{2}$ (right) for the data in Dall’erba et al. (2009) with model (4).

In summary, we showed how Bayes factors can be used to quantify the evidence in the data for hypotheses with order constraints on the network autocorrelation parameters. In addition, we provided methodology to efficiently compute such Bayes factors without any need to subjectively elicit priors for the network effects. This ultimately allows network scholars to test and verify any kind of expectations they have about the strength of different network effects. An R program that implements our methodology will be available at https://github.com/DittrichD/BayesNAM.

5. Simulation Study

We performed a simulation study to investigate the performance of the proposed Bayesian estimator and Bayes factors in a second-order network autocorrelation model. First, we compared the Bayesian estimator from Section 3.2 with the maximum likelihood estimator in terms of bias of the network effects and frequentist coverage of the corresponding credible and confidence intervals. Here, we use the term coverage to indicate the proportion of times in which the true, that is, data-generating, network effects were contained in the credible and confidence intervals. Second, because researchers are generally interested in testing whether (some) network effects are zero or whether one network effect is larger than another, we considered a multiple hypothesis test with the following five hypotheses: $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , $H_{3} : ρ_{1} = ρ_{2} > 0$ , $H_{4} : 0 < ρ_{1} < ρ_{2}$ , and $H_{5} : 0 = ρ_{1} < ρ_{2}$ . We investigated if and how fast the different Bayes factors converge to a true data-generating hypothesis and how robust these findings are to various degrees of overlap between two connectivity matrices.

5.1. Study Design

In our simulation study, we generated data $y$ via $y = A_{(ρ_{1}, ρ_{2})}^{- 1} (X β + ε), ε ~ N (0_{g}, I_{g})$ , for four network sizes $g$ ( $g \in {50, 100, 200, 400}$ ), three levels of overlap between $W_{1}$ and $W_{2}$ (0 percent, 20 percent, and 40 percent), and both $W_{1}$ and $W_{2}$ having an average degree of four. We simulated random nonsymmetric binary connectivity matrices using the rgraph() function from the sna package in R (Butts 2008), randomly rearranged ties when accounting for overlap, and subsequently row-standardized the raw connectivity matrices. Furthermore, we drew independent values from a standard normal distribution for the elements of $X \in R^{g \times 4}$ (excluding the first column which is a vector of ones), $β \in R^{4}$ , and $ε \in R^{g}$ .

In our first experiment, we set the two network effects to $(ρ_{1}, ρ_{2}) = (0.2, 0.2)$ and simulated 1,000 data sets for each of the 12 scenarios (four network sizes × three levels of overlap × one network effects size).¹³ For the Bayesian estimator, we used the standard improper prior $p (σ^{2}, β) \propto 1 / σ^{2}$ for the nuisance parameters and a noninformative bivariate normal prior for $(ρ_{1}, ρ_{2})$ , $p (ρ_{1}, ρ_{2}) \propto N (0_{2}, 100 \times I_{2})$ , which essentially corresponds to a uniform prior for $(ρ_{1}, ρ_{2}) \in Θ_{(ρ_{1}, ρ_{2})}$ . We drew 1,000 realizations from the resulting posteriors relying on the methods described in Section 3.2, taking the maximum likelihood estimate of $((ρ_{1}, ρ_{2}), σ^{2}, β)$ as the starting value in the sampling algorithm (see Appendix A). We used the marginal posterior median as point estimator and the 95 percent equal-tailed credible interval for coverage analysis. We obtained the maximum likelihood estimates as well as their standard errors and associated asymptotic confidence intervals applying the lnam() function from the sna package in R.

In our second experiment, we considered 41 network effects sizes $(ρ_{1}, ρ_{2})$ ( $(ρ_{1}, ρ_{2})$ $\in {(0.40, 0), (0.39, 0.01), \dots, (0, 0.40)}$ ) and simulated 100 data sets for each of the 492 scenarios (four network sizes × three levels of overlap × 41 network effects sizes). Figure 4 shows the trajectory of the network effects and depicts the five tested hypotheses, $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , $H_{3} : ρ_{1} = ρ_{2} > 0$ , $H_{4} : 0 < ρ_{1} < ρ_{2}$ , and $H_{5} : 0 = ρ_{1} < ρ_{2}$ . We specified the prior under each of the five hypotheses on the basis of the proposed empirical Bayes procedure in Section 4.3.¹⁴ To compute the normalizing constants $c_{2}$ and $c_{4}$ , we generated draws from the unconstrained bivariate normal prior for $(ρ_{1}, ρ_{2})$ until we obtained 1,000 draws in agreement with the constraints imposed under hypothesis $H_{2}$ and hypothesis $H_{4}$ , respectively. Then, we approximated the normalizing constants by the reciprocals of the proportion of the total number of draws in agreement with the constraints. For the hypotheses with only one free network autocorrelation parameter, that is, $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{3} : ρ_{1} = ρ_{2} > 0$ , and $H_{5} : 0 = ρ_{1} < ρ_{2}$ , we directly obtained the corresponding normalizing constants by using the pnorm() function in R, as the bounds of the feasible range of a single free network autocorrelation parameter are known exactly (see Section 2.1). Finally, for all hypotheses we drew 1,000 realizations from their (unconstrained) importance densities and computed the logarithm of the Bayes factor of each constrained hypothesis against an unconstrained reference hypothesis $H_{u} : (ρ_{1}, ρ_{2}) \in Θ_{(ρ_{1}, ρ_{2})}$ .¹⁵

Figure 4.

Admissible subspaces of $(ρ_{1}, ρ_{2})$ under the five constrained hypotheses and the trajectory of the data-generating network effects (dashed line).

5.2. Simulation Results

Table 2 shows the average estimates and root mean squared errors of $ρ_{1}$ and $ρ_{2}$ for the Bayesian as well as the maximum likelihood estimator. Overall, the two estimators yield nearly identical results for all considered scenarios. As expected, the (negative) bias in the estimation of the network effects and the associated root mean squared errors are decreasing with the network size, the bias being virtually nonexistent for $g = 400$ . Introducing 20 percent and 40 percent overlap between two connectivity matrices does not appear to affect the estimation results, even if there is mild negative correlation between the estimated network effects in these cases (see Figure 5).

Table 2.

Average Posterior Median Estimates and MLEs of $(ρ_{1}, ρ_{2}) = (0.2, 0.2)$ and Corresponding Average RMSEs for 1,000 Simulated Data Sets

	0 Percent Overlap				20 Percent Overlap				40 Percent Overlap
	Estimate		RMSE		Estimate		RMSE		Estimate		RMSE
	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$
$g = 50$
Bayes	0.149	0.152	0.155	0.157	0.160	0.148	0.163	0.151	0.159	0.166	0.168	0.168
MLE	0.157	0.160	0.156	0.157	0.167	0.154	0.164	0.151	0.164	0.172	0.168	0.169
$g = 100$
Bayes	0.180	0.182	0.107	0.104	0.178	0.184	0.111	0.107	0.188	0.183	0.107	0.107
MLE	0.182	0.184	0.108	0.104	0.179	0.186	0.111	0.107	0.189	0.185	0.108	0.108
$g = 200$
Bayes	0.189	0.187	0.074	0.074	0.198	0.190	0.076	0.075	0.194	0.195	0.077	0.077
MLE	0.190	0.188	0.074	0.074	0.198	0.190	0.076	0.075	0.194	0.195	0.078	0.077
$g = 400$
Bayes	0.196	0.196	0.052	0.051	0.197	0.197	0.050	0.051	0.198	0.196	0.054	0.053
MLE	0.196	0.196	0.052	0.051	0.197	0.197	0.050	0.051	0.198	0.196	0.053	0.053

Note: MLE = maximum likelihood estimate; RMSE = root mean squared error.

Figure 5.

Posterior median estimates (black) of $(ρ_{1}, ρ_{2}) = (0.2, 0.2)$ (gray) for 1,000 simulated data sets.

Table 3 reports the empirical frequentist coverage of Bayesian equal-tailed 95 percent credible intervals and asymptotic 95 percent maximum likelihood-based confidence intervals for $ρ_{1}$ and $ρ_{2}$ . The coverage of Bayesian credible intervals is very close to the nominal 0.95 for all considered scenarios, and the coverage of confidence intervals is below nominal for network sizes of 50. These observations are in line with the subpar coverage of maximum likelihood-based confidence intervals for small samples in the first-order network autocorrelation model reported by Dittrich et al. (2017a).

Table 3.

Empirical Frequentist Coverage of 95 Percent Credible and Confidence Intervals for $ρ_{1}$ and $ρ_{2}$ for 1,000 Simulated Data Sets

	0 Percent Overlap		20 Percent Overlap		40 Percent Overlap
	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$	$ρ_{1}$	$ρ_{2}$
$g = 50$
Bayes	0.953	0.949	0.936	0.946	0.948	0.943
MLE	0.928	0.928	0.912	0.937	0.923	0.930
$g = 100$
Bayes	0.949	0.958	0.949	0.946	0.957	0.961
MLE	0.936	0.950	0.942	0.936	0.951	0.955
$g = 200$
Bayes	0.943	0.948	0.947	0.953	0.954	0.937
MLE	0.934	0.943	0.950	0.951	0.951	0.932
$g = 400$
Bayes	0.954	0.948	0.964	0.940	0.953	0.943
MLE	0.955	0.949	0.964	0.946	0.953	0.943

Note: MLE = maximum likelihood estimate.

On the basis of results from our first simulation experiment, we draw two main conclusions. First, we recommend using the Bayesian estimator over the maximum likelihood estimator, as both estimators yield nearly identical network effect estimates, but the coverage of Bayesian credible intervals appears accurate, whereas for smaller network sizes the coverage of maximum likelihood-based confidence intervals is much less accurate. Second, estimating second-order network autocorrelation models with moderately overlapping connectivity matrices, that is, with up to 40 percent shared ties, does not affect estimation of the network effects. This second finding is of particular importance to social network researchers who often encounter distinct but partially overlapping networks in empirical practice.

Figure 6 displays the average logarithm of the Bayes factor of hypothesis $H_{1}$ (thick solid line), $H_{2}$ (thick dashed line), $H_{3}$ (dotted line), $H_{4}$ (dashed line), and $H_{5}$ (solid line) against an unconstrained reference hypothesis, $H_{u}$ , respectively, as a function of network effects $(ρ_{1}, ρ_{2})$ from $(0.40, 0)$ to $(0, 0.40)$ . Overall, the results indicate that the Bayes factor shows consistent behavior, that is, there is the most evidence for the data-generating hypothesis if the network size is large enough. This evidence monotonically increases with network size. In particular, there is little discrimination between the five hypotheses for $g = 50$ , whereas there is clear support for the data-generating hypothesis for network sizes of 200 and 400. Two lines in Figure 6 are discontinued for numerical reasons: when computing the Bayes factor, we need to calculate the probability mass of the unconstrained importance density contained in the parameter space imposed by the constraints under a hypothesis (see Appendix B). For the Bayes factors involving the purely inequality constrained hypotheses $H_{2} : ρ_{1} > ρ_{2} > 0$ and $H_{4} : 0 < ρ_{1} < ρ_{2}$ , we approximated these probabilities numerically by the proportion of 1,000 draws from the unconstrained importance densities that were in agreement with hypothesis $H_{2}$ and hypothesis $H_{4}$ , respectively. For some data sets, however, none of the draws were in agreement with hypothesis $H_{2}$ , or hypothesis $H_{4}$ , in which case we set the corresponding marginal likelihood to $- \infty$ . If this happened for at least one of the 100 simulated data sets, then the average logarithm of the Bayes factor was $- \infty$ as well. Finally, as in our first simulation experiment, these findings are robust to moderate degrees of overlap between two connectivity matrices.

Figure 6.

Average logarithm of the Bayes factors $\log (B_{tu})$ , $t \in {1, 2, 3, 4, 5}$ , of the hypotheses $H_{1} : ρ_{1} > ρ_{2} = 0$ (thick solid line), $H_{2} : ρ_{1} > ρ_{2} > 0$ (thick dashed line), $H_{3} : ρ_{1} = ρ_{2} > 0$ (dotted line), $H_{4} : 0 < ρ_{1} < ρ_{2}$ (dashed line), and $H_{5} : 0 = ρ_{1} < ρ_{2}$ (solid line) against $H_{u} : (ρ_{1}, ρ_{2}) \in Θ_{(ρ_{1}, ρ_{2})}$ as a function of network effects $(ρ_{1}, ρ_{2})$ from $(0.40, 0)$ to $(0, 0.40)$ for 100 simulated data sets.

6. Application Revisited

In this section, we reanalyze a data set from the economic growth literature initially studied by Dall’erba et al. (2009) and address the questions raised in Section 2.3. First, we reestimated the second-order network autocorrelation model in equation (4) on the basis of noninformative priors for all model parameters and compared the results with those coming from maximum likelihood estimation. Second, we used Bayes factors to quantify the relative evidence in the data for different competing hypotheses of interest with respect to this data set. Finally, we considered a network autocorrelation model with two subgroups, assuming only one dominant common influence mechanism within and between the two subgroups.

6.1. Bayesian Estimation of a Second-Order Network Autocorrelation Model

Table 4 displays the results of a Bayesian estimation of the second-order model in equation (4), along with the corresponding maximum likelihood estimates.¹⁶ The Bayesian and the maximum likelihood estimates of all parameters are similar to each other, in line with results from our simulation study in Section 5.2. In particular, the (Bayesian) estimate of $ρ_{1}$ , reflecting interactions within the same country, is of large positive magnitude (0.350), and the (Bayesian) estimate of $ρ_{2}$ , reflecting spillovers from regions in neighboring countries, is much smaller and close to zero (–0.058).¹⁷ Dall’erba et al. (2009) concluded by saying that “the results obtained also confirm the hypothesis that economic interactions decrease very substantially when a national border is passed (indeed, the coefficient reflecting external spillovers is not statistically significant)” (p. 342).

Table 4.

Posterior Median Estimates and MLEs and Associated 95 Percent Bayesian Credible and Confidence Intervals (in Parentheses) for the Data in Dall’erba et al. (2009) with Model (4)

Parameter	Bayes	MLE
$ρ_{1}$	0.350 $(0.238, 0.464)$	0.348 $(0.235, 0.460)$
$ρ_{2}$	−0.058 $(- 0.169, 0.052)$	−0.058 $(- 0.168, 0.052)$
Intercept	−0.682 $(- 0.887, - 0.451)$	-0.696 $(- 0.924, - 0.469)$
Market service growth	0.484 $(0.384, 0.585)$	0.483 $(0.385, 0.580)$
Productivity gap	0.212 $(0.122, 0.296)$	0.218 $(0.127, 0.309)$
Urbanization	$3.094 \times 10^{- 5}$ $(4.728 \times 10^{- 6}, 5.703 \times 10^{- 5})$	$3.119 \times 10^{- 5}$ $(5.894 \times 10^{- 6}, 5.648 \times 10^{- 5})$
Accessibility	$1.052 \times 10^{- 5}$ $(- 3.937 \times 10^{- 6}, 2.516 \times 10^{- 5})$	$1.164 \times 10^{- 5}$ $(- 3.132 \times 10^{- 6}, 2.642 \times 10^{- 5})$

Note: MLE = maximum likelihood estimate.

6.2. Bayesian Hypothesis Testing in a Second-Order Network Autocorrelation Model

Using Bayes factors, we quantified the evidence in the data for two hypotheses representing the notion of decreasing economic interactions once a national border is passed, $H_{1} : ρ_{1} > ρ_{2} = 0$ and $H_{2} : ρ_{1} > ρ_{2} > 0$ , and tested them against two competing hypotheses, $H_{0} : ρ_{1} = ρ_{2} = 0$ and $H_{3} : ρ_{1} = ρ_{2} > 0$ .¹⁸ We also included a hypothesis $H_{c} : \neg (H_{0} \lor H_{1} \lor H_{2} \lor H_{3})$ that represents the complement of all other possible hypotheses on $(ρ_{1}, ρ_{2})$ except hypotheses $H_{0}, H_{1}, H_{2}$ , and $H_{3}$ ; that is, hypothesis $H_{c}$ contains all the orders of network effects we did not hypothesize.

Table 5 provides the Bayes factors for every pair out of the set of the five considered hypotheses above using the prior specifications from Sections 4.2 and 4.3. Notably, $H_{1} : ρ_{1} > ρ_{2} = 0$ is the hypothesis most supported by the data; it is approximately $5.3 \times 10^{6}$ , 160.7, $3.2 \times 10^{5}$ , and 14.1 times more supported than hypothesis $H_{0}$ , $H_{2}$ , $H_{3}$ , and $H_{c}$ , respectively. Moreover, we see the least evidence in the data in favor of the null. Consequently, regardless of the specification of alternative expectations about $ρ_{1}$ and $ρ_{2}$ , the hypothesis that both network effects are zero has to be strongly rejected. Although these implications seem in line with the authors’ claim that network effects decrease after a national border is passed, using Bayes factors provides us with much more extensive conclusions about the characteristic evidence in the data. Hence, we can now quantify how much more likely these conclusions are than competing conclusions (hypotheses) and how (un)likely it is that an entirely different mechanism $H_{c}$ generated the data. Ultimately, this data set contains the most and very strong evidence for a positive within-country network effect only.

Table 5.

Bayes Factors $B_{tt'}$ , $t, t' \in {0, 1, 2, 3, c}$ , for Hypotheses $H_{0} : ρ_{1} = ρ_{2} = 0$ , $H_{1} : ρ_{1} > ρ_{2} = 0$ , $H_{2} : ρ_{1} > ρ_{2} > 0$ , $H_{3} : ρ_{1} = ρ_{2} > 0$ , and $H_{c} : \neg (H_{0} \lor H_{1} \lor H_{2} \lor H_{3})$ for the Data in Dall’erba et al. (2009) with Model (4)

Hypothesis	$H_{0}$	$H_{1}$	$H_{2}$	$H_{3}$	$H_{c}$
$H_{0}$	—	$1.893 \times 10^{- 7}$	$3.043 \times 10^{- 5}$	0.059	$2.966 \times 10^{- 6}$
$H_{1}$	$5.283 \times 10^{6}$	—	160.746	$3.117 \times 10^{5}$	14.085
$H_{2}$	$3.286 \times 10^{4}$	$6.220 \times 10^{- 3}$	—	$1.939 \times 10^{3}$	0.089
$H_{3}$	16.945	$3.208 \times 10^{- 6}$	$5.157 \times 10^{- 4}$	—	$4.539 \times 10^{- 5}$
$H_{c}$	$3.732 \times 10^{5}$	0.071	11.359	$2.203 \times 10^{4}$	—

6.3. Bayesian Hypothesis Testing in a Fourth-Order Network Autocorrelation Model

Dall’erba et al. (2009) pointed to potentially asymmetric growth rates across the regions, depending on a region’s initial productivity level. Thus, the authors proceeded by dividing the sample into two clusters: 111 initially more productive regions and 77 initially less productive regions, implying a core-periphery pattern (see Figure 7).¹⁹ Next, they separately estimated two second-order network autocorrelation models for the two clusters. Here, for illustrative purposes, we allowed for varying levels of network autocorrelation within and between the two clusters and consider a model with two subgroups instead. For example, we could expect network effects within regions of the same subgroup to be larger than network effects between regions of different subgroups, or we could expect initially more productive regions to influence initially less productive ones more strongly than the other way around.

Figure 7.

Spatial distribution of productivity levels in 1980 across the 188 regions.

Our analyses in Section 6.2 suggest that there is very strong evidence in the data for a positive within-country network effect only; that is, $ρ_{1} > 0$ and $ρ_{2} = 0$ . Thus, we merely considered spillover effects within the same country, in other words, we assume only $W_{1}$ plays a role, not $W_{2}$ . We denote by $ρ_{hh}, ρ_{hl}, ρ_{lh}$ , and $ρ_{ll}$ the network effect within regions with initially higher productivity levels, the network effect of initially less productive regions on initially more productive regions, the network effect of initially more productive regions on initially less productive ones, and the network effect within regions with initially lower productivity levels, respectively. Accordingly, $y_{h} \in R^{111}$ and $y_{l} \in R^{77}$ contain the growth rates of labor productivity of the initially more and less productive regions, respectively, and we partitioned $W_{1}$ , the unstandardized connectivity matrix using the region’s three nearest neighbors within the same country, into the four submatrices $W_{hh} \in R^{111 \times 111}$ , $W_{hl} \in R^{111 \times 77}$ , $W_{lh} \in R^{77 \times 111}$ , and $W_{ll} \in R^{77 \times 77}$ , representing ties within and between the two subgroups.²⁰ This results in the following fourth-order network autocorrelation model

\begin{array}{l} y = [\begin{matrix} y_{h} \\ y_{l} \end{matrix}] = [\begin{matrix} ρ_{h h} W_{h h} & ρ_{h l} W_{h l} \\ ρ_{l h} W_{l h} & ρ_{l l} W_{l l} \end{matrix}] [\begin{matrix} y_{h} \\ y_{l} \end{matrix}] + β_{1} X_{\cdot 1} + β_{2} X_{\cdot 2} + β_{3} X_{\cdot 3} + β_{4} X_{\cdot 4} + β_{5} X_{\cdot 5} + ε \\ \begin{array}{l} = (ρ_{h h} [\begin{matrix} W_{h h} & 0 \\ 0 & 0 \end{matrix}] + ρ_{h l} [\begin{matrix} 0 & W_{h l} \\ 0 & 0 \end{matrix}] + ρ_{l h} [\begin{matrix} 0 & 0 \\ W_{l h} & 0 \end{matrix}] + ρ_{l l} [\begin{matrix} 0 & 0 \\ 0 & W_{l l} \end{matrix}]) [\begin{matrix} y_{h} \\ y_{l} \end{matrix}] \\ + β_{1} X_{\cdot 1} + β_{2} X_{\cdot 2} + β_{3} X_{\cdot 3} + β_{4} X_{\cdot 4} + β_{5} X_{\cdot 5} + ε . \end{array} \end{array}

(17)

We generally expect the network effects within the two subgroups to be larger than the network effects between subgroups, that is, ${ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}$ , where the “>” sign holds pairwise for any two elements of the first and second set, respectively. Furthermore, hypotheses of substantial interest might be based on expectations of positive network effects within both subgroups but with potentially differing magnitudes. We translated these expectations to three hypotheses, $H_{1}, H_{2}$ , and $H_{3}$ , and supplemented them with the hypothesis of no network effects and the complement of all the orders of network effects we did not have hypotheses for.²¹ Formally,

\begin{matrix} H_{0} : ρ_{hh} = ρ_{hl} = ρ_{lh} = ρ_{ll} = 0, \\ H_{1} : {ρ_{hh} > ρ_{ll} > 0} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}, \\ H_{2} : {ρ_{hh} = ρ_{ll} > 0} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}, \\ H_{3} : {0 < ρ_{hh} < ρ_{ll}} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}, \\ H_{c} : \neg (H_{0} \lor H_{1} \lor H_{2} \lor H_{3}) . \end{matrix}

Table 6 shows the Bayes factors for every pair out of the set of the five considered hypotheses. We see that $H_{2} : {ρ_{hh} = ρ_{ll} > 0} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}$ is the hypothesis most supported by the data; it receives approximately $3.1 \times 10^{5}$ , 3.2, 3.7, and 55.6 times more support than do hypothesis $H_{0}$ , $H_{1}$ , $H_{3}$ , and $H_{c}$ , respectively. Hence, there is no evidence in the data for differing network effects within the initially more and less productive regions, but there is very strong evidence that network effects within the two subgroups are larger than network effects between subgroups.

Table 6.

Bayes Factors $B_{tt'}$ , $t, t' \in {0, 1, 2, 3, c}$ , for Hypotheses $H_{0} : ρ_{hh} = ρ_{hl} = ρ_{lh} = ρ_{ll} = 0$ , $H_{1} : {ρ_{hh} > ρ_{ll} > 0} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}$ , $H_{2} : {ρ_{hh} = ρ_{ll} > 0} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}$ , $H_{3} : {0 < ρ_{hh} < ρ_{ll}} \land {{ρ_{hh}, ρ_{ll}} > {ρ_{hl}, ρ_{lh}}}$ , and $H_{c} : \neg (H_{0} \lor H_{1} \lor H_{2} \lor H_{3})$ for the Data in Dall’erba et al. (2009) with Model (17)

Hypothesis	$H_{0}$	$H_{1}$	$H_{2}$	$H_{3}$	$H_{c}$
$H_{0}$	—	$1.023 \times 10^{- 5}$	$3.176 \times 10^{- 6}$	$1.177 \times 10^{- 5}$	$1.800 \times 10^{- 4}$
$H_{1}$	$9.773 \times 10^{4}$	—	0.310	1.151	17.241
$H_{2}$	$3.149 \times 10^{5}$	3.222	—	3.704	55.556
$H_{3}$	$8.497 \times 10^{4}$	0.869	0.270	—	14.925
$H_{c}$	$5.563 \times 10^{3}$	0.058	0.018	0.067	—

7. Conclusions

In this article, we developed Bayesian techniques for estimating and, primarily, testing higher-order network autocorrelation models with multiple network autocorrelations. In particular, we provided default Bayes factors that enable researchers to test hypotheses with order constraints on the network effects in a direct manner. The proposed methods allow researchers to simultaneously test any number of competing hypotheses on the relative strength of network effects against one another and to quantify the amount of evidence in the data for each hypothesis. This has not yet been possible using currently available statistical techniques for network autocorrelation models. Our proposed methods can straightforwardly be extended to test hypotheses on network autocorrelation parameters in heteroskedastic network autocorrelation models (LeSage 1997a) and network disturbances models (Leenders 2002).

We ran a simulation study to evaluate the numerical behavior of the presented Bayesian procedures for a number of different network specifications, including varying network sizes and network overlap. Our simulation study showed that, first, the Bayesian estimator and the maximum likelihood estimator yield similar parameter estimates of the network autocorrelation parameters for all scenarios. This was expected because we relied on noninformative priors for the network autocorrelation parameters. As a next step, it would be interesting to explore the use of (weakly) informative priors for multiple network autocorrelation parameters. Such priors can either be derived from published estimates in previous literature (similar to what Dittrich et al. [2017a] did for the first-order network autocorrelation model) or by eliciting experts on anticipated network effects in a given case study. Given previous findings in the literature when using a weakly informative prior for estimating a single network autocorrelation parameter (Dittrich et al. 2017a), we expect a carefully specified weakly informative prior for multiple network autocorrelation parameters will also decrease the negative bias of network autocorrelation parameters associated with maximum likelihood estimation of the model. Second, the Bayesian estimator exhibits nominal coverage of credible intervals and is more accurate than the maximum likelihood estimator, which is a strong argument in favor of the Bayesian approach, even when noninformative priors are used. Third, we found that the proposed Bayes factors always result in the largest evidence for the true data-generating hypothesis, with this evidence increasing further with network size. In other words, the proposed Bayes factors tend to point researchers to the correct (or best) hypothesis out of a set of competing hypotheses; these hypotheses can represent highly complex relations between the autocorrelation parameters, and the set of hypotheses tested against each other simultaneously can, in principle, be arbitrarily large.

The practical tools needed to perform the methods developed in this article are, or will be, freely accessible in an R package. Given the many, often implicit, expectations researchers have about the relative importance of different network effects, we hope that by enabling researchers to test these expectations directly and explicitly, higher-order network autocorrelation models will bring forth a more thorough understanding of social contagion processes that goes beyond the current state of the art.

Footnotes

Appendix A: Posterior Sampling

We outlined the procedure for sampling from the full posterior p ( ρ , σ 2 , β | y ) in higher-order network autocorrelation models in Section 3.2. However, the exact form of the candidate-generating distribution for the conditional posterior p ( ρ , β 1 | σ 2 , β ~ , y ) and the expressions μ β ~ and Σ β ~ in equation (12) remain to be specified.

Appendix B: Bayes Factor Computation

In the following, we show how the integral I t = ∫ Θ ρ t h t ( ρ t ) d ρ t in equation (16) can be effectively approximated by its importance sampling estimate I ^ t ,

(B1)

I ^ t = N − 1 ∑ i = 1 N h t ( ρ i ) q t ( ρ i ) = N − 1 ∑ i = 1 N | A ρ i | exp ( − 1 2 ( ρ i − μ t ) T Σ t − 1 ( ρ i − μ t ) ) y T A ρ i T M A ρ i y − g − k 2 q t ( ρ i ) , ( B 1 )

where ρ i are draws from a suitable importance density q t ( · ) . We specify q t ( · ) such that it closely follows the integrand h t ( ρ t ) but has heavier tails than the latter, which ensures a reliable estimation of I t .

As in Appendix A, we approximate log ( | A ρ t | ) by a quadratic polynomial in ρ t at its maximum value, the origin. This results in a normal approximation of | A ρ t | , that is, | A ρ t | ≈ N ( 0 R , T − 1 ) , where T rr ′ : = tr ( W r W r ′ ) , r , r ′ ∈ { 1 , . . . , R } . In the case that T is not positive definite, we use the nearest positive definite matrix to T instead. The second term in the denominator of equation (B1) already equals the kernel of the probability density function of the normal distribution N ( μ t , Σ t ) . Finally, we also approximate the logarithm of the third term in h t ( ρ t ) by a second-order Taylor polynomial at its maximum. It follows that y T A ρ t T M A ρ t y − ( g − k ) / 2 ≈ N ( μ 3 , Σ 3 ) , where μ 3 = ( y T W · T M W · y ) − 1 y T M W · y , Σ 3 = ( y T W · T M W · y ) − 1 ( y T A ρ t T M A ρ t y ) / ( g − k ) and ( y T W · T M W · y ) rr ′ : = y T W r T M W r ′ y , ( y T M W · y ) r : = y T M W r y . Thus, h t ( ρ t ) can be approximated by the product of three multivariate normal densities that is multivariate normal itself, so q t ( ρ t ) : = ϕ μ I S t , Σ I S t ( ρ t ) 1 Θ ρ t ( ρ t ) c I S t − 1 , c I S t : = ∫ Θ ρ t ϕ μ I S t , Σ I S t ( ρ t ) d ρ t , with Σ I S t = ( T + Σ t − 1 + Σ 3 − 1 ) − 1 , μ I S t = Σ I S t ( Σ t − 1 μ t + Σ 3 − 1 μ 3 ) .

Calculating I ^ t directly might result in underflow in R, which is why we next show how to compute its logarithm only. We can write

log ( I ^ t ) = log ( N − 1 ∑ i = 1 N h t ( ρ i ) q t ( ρ i ) ) = − log ( N ) + log ( ∑ i = 1 N h t ( ρ i ) q t ( ρ i ) ) = − log ( N ) + log ( ∑ i = 1 N exp ( log ( h t ( ρ i ) q t ( ρ i ) ) + d − d ) ) = − log ( N ) − d + log ( ∑ i = 1 N exp ( log ( h t ( ρ i ) q t ( ρ i ) ) + d ) ) = − log ( N ) − d + log ( ∑ i = 1 N exp ( log ( | A ρ i | ) − g − k 2 log ( y T A ρ i T M A ρ i y ) − 1 2 ( ρ i − μ t ) T Σ t − 1 ( ρ i − μ t )

− log ( ϕ μ I S t , Σ I S t ( ρ i ) 1 Θ ρ t ( ρ i ) c I S t ) + d ) ) ≈ − log ( N ) − d + log ( ∑ i = 1 N exp ( log ( | A ρ i | ) − g − k 2 log ( y T A ρ i T M A ρ i y ) − 1 2 ( ρ i − μ t ) T Σ t − 1 ( ρ i − μ t ) − log ( ϕ μ I S t , Σ I S t ( ρ i ) 1 Θ ρ t ( ρ i ) ∑ i = 1 N 1 Θ ρ t ( ρ i ) N ) + d ) ) = − 2 log ( N ) + log ( ∑ i = 1 N 1 Θ ρ t ( ρ i ) ) − d + log ( ∑ i = 1 N exp ( log ( | A ρ i | ) − g − k 2 log ( y T A ρ i T M A ρ i y ) − 1 2 ( ρ i − μ t ) T Σ t − 1 ( ρ i − μ t ) − log ( ϕ μ I S t , Σ I S t ( ρ ι ) 1 Θ ρ t ( ρ i ) ) + d ) ) ,

where ρ i are draws from the unconstrained importance density N ( μ I S t , Σ I S t ) and d is an auxiliary constant, for example, d = − g − k 2 min i ∈ { 1 , . . . , N } ( y T A ρ i T M A ρ i y ) , which is added to prevent the marginal likelihood to become too small to be distinguished from zero in R. The auxiliary constant d is set in advance after generating the N draws from the unconstrained importance density N ( μ I S t , Σ I S t ) first.

Acknowledgements

We thank Sandy Dall’erba, Marco Percoco, and Gianfranco Piras for sharing their data with us.

Funding

J.M. was supported by a Veni Grant (451.13.011) provided by the Netherlands Organization for Scientific Research.

ORCID iD

Dino Dittrich

Notes

Author Biographies

Dino Dittrich holds a PhD in statistics from Tilburg University. His work centers on Bayesian estimation and hypothesis-testing techniques for social network models. Currently, he works as a data scientist at Health Care Systems GmbH.

Roger Th. A. J. Leenders is a professor at the Jheronimus Academy of Data Science and in the Department of Organization Studies at Tilburg University. He holds a PhD in sociology from the University of Groningen. He has published broadly on social network analysis, teams, innovation, and organization behavior in leading journals such as Organization Science, the Journal of Applied Psychology, the Journal of Product Innovation Management, Social Networks, and the Academy of Management Journal.

Joris Mulder is an associate professor in the Department of Methodology and Statistics at Tilburg University. He holds a PhD in applied Bayesian statistics from Utrecht University. His research focuses on Bayesian model selection and social network modeling.

References

Anselin

Luc

. 2001. “Rao’s Score Test in Spatial Econometrics.”Journal of Statistical Planning and Inference 97(1):113–39.

Badinger

Harald

Egger

Peter

. 2013. “Estimation and Testing of Higher-Order Spatial Autoregressive Panel Data Error Component Models.”Journal of Geographical Systems 15(4):453–89.

Bartlett

M. S.

1957. “A Comment on D. V. Lindley’s Statistical Paradox.”Biometrika 44(3–4):533–34.

Bates

Douglas

Maechler

Martin

. 2017. “Matrix: Sparse and Dense Matrix Classes and Methods.” Retrieved March 9, 2020. http://CRAN.R-project.org/package=Matrix.

Beck

Nathaniel

Gleditsch

Kristian S.

Beardsley

Kyle

. 2006. “Space Is More Than Geography: Using Spatial Econometrics in the Study of Political Economy.”International Studies Quarterly 50(1):27–44.

Bivand

Roger

Keitt

Tim

Rowlingson

Barry

. 2017. “rgdal: Bindings for the ‘Geospatial’ Data Abstraction Library.” Retrieved March 9, 2020. http://CRAN.R-project.org/package=rgdal.

Böing-Messing

Florian

van Assen

Marcel A.

Hofman

Abe D.

Hoijtink

Herbert

Mulder

Joris

. 2017. “Bayesian Evaluation of Constrained Hypotheses on Variances of Multiple Independent Groups.”Psychological Methods 22(2):262–87.

Braeken

Johan

Mulder

Joris

Wood

Stephen

. 2015. “Relative Effects at Work: Bayes Factors for Order Hypotheses.”Journal of Management 41(2):544–73.

Burt

Ronald S.

Doreian

Patrick

. 1982. “Testing a Structural Model of Perception: Conformity and Deviance with Respect to Journal Norms in Elite Sociological Methodology.”Quality and Quantity 16(2):109–50.

10.

Butts

Carter T.

2008. “Social Network Analysis with sna.”Journal of Statistical Software 24(6):1–51.

11.

Carlin

Bradley C.

Louis

Thomas A

. 2000. “Empirical Bayes: Past, Present and Future.”Journal of the American Statistical Association 95(452):1286–89.

12.

Chib

Siddharta

Greenberg

Edward

. 1994. “Bayes Inference in Regression Models with ARMA (p, q) Errors.”Journal of Econometrics 64(1–2):183–206.

13.

Chib

Siddharta

Greenberg

Edward

. 1995. “Understanding the Metropolis-Hastings Algorithm.”American Statistician 49(4):327–35.

14.

Chib

Siddharta

Greenberg

Edward

. 1998. “Analysis of Multivariate Probit Models.”Biometrika 85(2):347–61.

15.

Dall’erba

Sandy

Percoco

Marco

Piras

Gianfranco

. 2009. “Service Industry and Cumulative Growth in the Regions of Europe.”Entrepreneurship and Regional Development 21(4):333–49.

16.

Dittrich

Dino

Leenders

Roger Th.

Mulder

Joris

. 2017a. “Bayesian Estimation of the Network Autocorrelation Model.”Social Networks 48:213–36.

17.

Dittrich

Dino

Leenders

Roger Th.

Mulder

Joris

. 2017b. “Network Autocorrelation Modeling: A Bayes Factor Approach for Testing (Multiple) Precise and Interval Hypotheses.”Sociological Methods and Research 48(3):642–76.

18.

Doreian

Patrick

. 1981. “Estimating Linear Models with Spatially Distributed Data.”Sociological Methodology 12:359–88.

19.

Elhorst

J. Paul

. 2010. “Applied Spatial Econometrics: Raising the Bar.”Spatial Economic Analysis 5(1):9–28.

20.

Elhorst

J. Paul

Lacombe

Donald J.

Piras

Gianfranco

. 2012. “On Model Specification and Parameter Space Definitions in Higher Order Spatial Econometric Models.”Regional Science and Urban Economics 42(1–2):211–20.

21.

Fujimoto

Kayo

Chou

Chih-Ping

Valente

Thomas W

. 2011. “The Network Autocorrelation Model Using Two-Mode Data: Affiliation Exposure and Potential Bias in the Autocorrelation Parameter.”Social Networks 33(3):231–43.

22.

Gelfand

Alan E.

Smith

Adrian F

. 1990. “Sampling-Based Approaches to Calculating Marginal Densities.”Journal of the American Statistical Association 85(410):398–409.

23.

Gelman

Andrew

Carlin

John B.

Stern

Hal S.

Rubin

Donald B

. 2003. Bayesian Data Analysis. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC Press.

24.

Geman

Stuart

Geman

Donald

. 1984. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.”IEEE Transactions on Pattern Analysis and Machine Intelligence 6(6):721–41.

25.

Gimpel

James G.

Schuknecht

Jason E

. 2003. “Political Participation and the Accessibility of the Ballot Box.”Political Geography 22(5):471–88.

26.

Gupta

Abhimanyu

Robinson

Peter M

. 2015. “Inference on Higher-Order Spatial Autoregressive Models with Increasingly Many Parameters.”Journal of Econometrics 186(1):19–31.

27.

Hall

Brian C.

2003. Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. New York: Springer.

28.

Han

Xiaoyi

Hsieh

Chih-Sheng

Lee

Lung-Fei

. 2017. “Estimation and Model Selection of Higher-Order Spatial Autoregressive Model: An Efficient Bayesian Approach.”Regional Science and Urban Economics 63:97–120.

29.

Hansen

Mark H.

Bin

. 2001. “Model Selection and the Principle of Minimum Description Length.”Journal of the American Statistical Association 96(454):746–74.

30.

Hastings

W. K.

1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.”Biometrika 57(1):97–109.

31.

Hepple

Leslie W.

1995a. “Bayesian Techniques in Spatial and Network Econometrics: 1. Model Comparison and Posterior Odds.”Environment and Planning A: Economy and Space 27(3):447–69.

32.

Hepple

Leslie W.

1995b. “Bayesian Techniques in Spatial and Network Econometrics: 2. Computational Methods and Algorithms.”Environment and Planning A: Economy and Space 27(4):615–44.

33.

Higham

Nicholas J.

2008. Functions of Matrices: Theory and Computation. Philadelphia: Society for Industrial and Applied Mathematics.

34.

Holloway

Garth

Shankar

Bhavani

Rahman

Sanzidur

. 2002. “Bayesian Spatial Probit Estimation: A Primer and an Application to HYV Rice Adoption.”Agricultural Economics 27(3):383–402.

35.

Jeffreys

Harold

. 1961. Theory of Probability. 3rd ed. Oxford, UK: Oxford University Press.

36.

Kalenkoski

Charlene M.

Lacombe

Donald J

. 2008. “Effects of Minimum Wages on Youth Employment: The Importance of Accounting for Spatial Correlation.”Journal of Labor Research 29(4):303–17.

37.

Kass

Robert E.

Raftery

Adrian E

. 1995. “Bayes Factors.”Journal of the American Statistical Association 90(430):773–95.

38.

Klugkist

Irene

Laudy

Olav

Hoijtink

Herbert

. 2005. “Inequality Constrained Analysis of Variance: A Bayesian Approach.”Psychological Methods 10(4):477–93.

39.

Lacombe

Donald J.

2004. “Does Econometric Methodology Matter? An Analysis of Public Policy Using Spatial Econometric Techniques.”Geographical Analysis 36(2):105–18.

40.

Lee

Lung-Fei

Liu

Xiaodong

. 2010. “Efficient GMM Estimation of High Order Spatial Autoregressive Models with Autoregressive Disturbances.”Econometric Theory 26(1):187–230.

41.

Leenders

Roger Th

. 1995. Structure and Influence: Statistical Models for the Dynamics of Actor Attributes, Network Structure and Their Interdependence. Amsterdam, the Netherlands: Thela Thesis.

42.

Leenders

Roger Th

. 2002. “Modeling Social Influence through Network Autocorrelation: Constructing the Weight Matrix.”Social Networks 24(1):21–47.

43.

LeSage

James P.

1997a. “Bayesian Estimation of Spatial Autoregressive Models.”International Regional Science Review 20(1–2):113–29.

44.

LeSage

James P.

1997b. “Regression Analysis of Spatial Data.”Journal of Regional Analysis and Policy 27(2):83–94.

45.

LeSage

James P.

Pace

R. Kelley

. 2007. “A Matrix Exponential Spatial Specification.”Journal of Econometrics 140(1):190–214.

46.

LeSage

James P.

Pace

R. Kelley

. 2008. “Spatial Econometric Modeling of Origin-Destination Flows.”Journal of Regional Science 48(5):941–67.

47.

LeSage

James P.

Pace

R. Kelley

. 2011. “Pitfalls in Higher Order Model Extensions of Basic Spatial Regression Methodology.”Review of Regional Studies 41(1):13–26.

48.

LeSage

James P.

Parent

Olivier

. 2007. “Bayesian Model Averaging for Spatial Econometric Models.”Geographical Analysis 39(3):241–67.

49.

Liang

Feng

Paulo

Rui

Molina

German

Clyde

Merlise A.

Berger

Jim O

. 2008. “Mixtures of g Priors for Bayesian Variable Selection.”Journal of the American Statistical Association 103(481):410–23.

50.

Lin

Tse-Min

Chin-En

Lee

Feng-Yu

. 2006. “‘Neighborhood’ Influence on the Formation of National Identity in Taiwan: Spatial Regression with Disjoint Neighborhoods.”Political Research Quarterly 59(1):35–46.

51.

McMillen

Daniel P.

Singell Jr.

Larry D.

Jr. Waddell

Glen R

. 2007. “Spatial Competition and the Price of College.”Economic Inquiry 45(4):817–33.

52.

McPherson

Michael A.

Nieswiadomy

Michael L

. 2005. “Environmental Kuznets Curve: Threatened Species and Spatial Effects.”Ecological Economics 55(3):395–407.

53.

Metropolis

Nicholas

Rosenbluth

Arianna W.

Rosenbluth

Marshall N.

Teller

Augusta H.

Teller

Edward

. 1953. “Equations of State Calculations by Fast Computing Machines.”Journal of Chemical Physics 21(6):1087–92.

54.

Mizruchi

Mark S.

Stearns

Linda B

. 2006. “The Conditional Nature of Embeddedness: A Study of Borrowing by Large U.S. Firms, 1973–1994.”American Sociological Review 71(2):310–33.

55.

Morey

Richard D.

Rouder

Jeffrey N

. 2011. “Bayes Factor Approaches for Testing Interval Null Hypotheses.”Psychological Methods 16(4):406–19.

56.

Mulder

Joris

. 2014. “Prior Adjusted Default Bayes Factors for Testing (In)equality Constrained Hypotheses.”Computational Statistics and Data Analysis 71:448–63.

57.

Mulder

Joris

. 2016. “Bayes Factors for Testing Order-Constrained Hypotheses on Correlations.”Journal of Mathematical Psychology 72:104–15.

58.

Mulder

Joris

Fox

Jean-Paul

. 2018. “Bayes Factor Testing of Multiple Intraclass Correlations.”Bayesian Analysis 14(2):521–52.

59.

Mulder

Joris

Hoijtink

Herbert

Klugkist

Irene

. 2010. “Equality and Inequality Constrained Multivariate Linear Models: Objective Model Selection Using Constrained Posterior Priors.”Journal of Statistical Planning and Inference 140(4):887–906.

60.

Mulder

Joris

Wagenmakers

Eric-Jan

. 2016. “Editors’ Introduction to the Special Issue ‘Bayes Factors for Testing Hypotheses in Psychological Research: Practical Relevance and New Developments.’” Journal of Mathematical Psychology 72:1–5.

61.

Mur

Jesús

López

Fernando

Angulo

Ana

. 2008. “Symptoms of Instability in Models of Spatial Dependence.”Geographical Analysis 40(2):189–211.

62.

Murray

Iain

Ghahramani

Zoubin

MacKay

David J

. 2006. “MCMC for Doubly-Intractable Distributions.” Pp. 359–66 in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence. Arlington, VA: AUAI Press.

63.

Neuman

Eric J.

Mizruchi

Mark S

. 2010. “Structure and Bias in the Network Autocorrelation Model.”Social Networks 32(4):290–300.

64.

Nocedal

Jorge

Wright

Stephen J

. 2006. Numerical Optimization. 2nd ed. New York: Springer.

65.

Ord

Keith

. 1975. “Estimation Methods for Models of Spatial Interaction.”Journal of the American Statistical Association 70(349):120–26.

66.

Owen

Art

Zhou

. 2000. “Safe and Effective Importance Sampling.”Journal of the American Statistical Association 95(449):135–43.

67.

R Core Team. 2017. “R: A Language and Environment for Statistical Computing.” Retrieved March 9, 2020. http://www.R-project.org/.

68.

Raftery

Adrian E.

1995. “Bayesian Model Selection in Social Research.”Sociological Methodology 25:111–63.

69.

Raftery

Adrian E.

Madigan

David

Hoeting

Jennifer A

. 1997. “Bayesian Model Averaging for Linear Regression Models.”Journal of the American Statistical Association 92(437):179–91.

70.

Smith

Tony E.

2009. “Estimation Bias in Spatial Models with Strongly Connected Weight Matrices.”Geographical Analysis 41(3):307–32.

71.

Tita

George E.

Radil

Steven M

. 2011. “Spatializing the Social Networks of Gangs to Explore Patterns of Violence.”Journal of Quantitative Criminology 27(4):521–45.

72.

Trautmann

Heike

Steuer

Detlef

Mersmann

Olaf

Bornkamp

Björn

. 2015. “truncnorm: Truncated Normal Distribution.” Retrieved March 9, 2020. http://CRAN.R-project.org/package=truncnorm.

73.

van de Schoot

Rens

Mulder

Joris

Hoijtink

Herbert

van Aken

Marcel A.

Dubas

Judith S.

Castro

Bram Orobio de

Meeus

Wim

Romeijn

Jan-Willem

. 2011. “An Introduction to Bayesian Model Selection for Evaluating Informative Hypotheses.”European Journal of Developmental Psychology 8(6):713–29.

74.

Wagenmakers

Eric-Jan

. 2007. “A Practical Solution to the Pervasive Problems of p Values.”Psychonomic Bulletin and Review 14(5):779–804.

75.

Zellner

Arnold

. 1986. “On Assessing Prior Distributions and Bayesian Regression Analysis with g-Prior Distributions.” Pp. 233–43 in Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, edited by Goel

P. K.

Zellner

. Amsterdam, the Netherlands: North-Holland.

76.

Zhang

Bin

Thomas

Andrew C.

Doreian

Patrick

Krackhardt

David

Krishnan

Ramayya

. 2013. “Contrasting Multiple Social Network Autocorrelations for Binary Outcomes, with Applications to Technology Adoption.”ACM Transactions on Management Information Systems 3(4):18:1–21.

Network Autocorrelation Modeling: Bayesian Techniques for Estimating and Testing Multiple Network Autocorrelations

Abstract

Keywords

1. Introduction

2. The Network Autocorrelation Model

2.1. The First-Order Network Autocorrelation Model

2.2. Higher-Order Network Autocorrelation Models

2.3. Application of a Higher-Order Network Autocorrelation Model: Economic Growth of Labor Productivity

3. Bayesian Estimation of Higher-Order Network Autocorrelation Models

3.1. Prior Specification

3.2. Posterior Computation

4. Bayesian Hypothesis Testing in Higher-Order Network Autocorrelation Models

4.1. The Bayes Factor

4.2. Bayes Factor Computation

4.3. A Default Prior for ρ

5. Simulation Study

5.1. Study Design

5.2. Simulation Results

6. Application Revisited

6.1. Bayesian Estimation of a Second-Order Network Autocorrelation Model

6.2. Bayesian Hypothesis Testing in a Second-Order Network Autocorrelation Model

6.3. Bayesian Hypothesis Testing in a Fourth-Order Network Autocorrelation Model

7. Conclusions

Footnotes

Appendix A: Posterior Sampling

Appendix B: Bayes Factor Computation

Acknowledgements

Funding

ORCID iD

Notes

Author Biographies

References

4.3. A Default Prior for $ρ$