Sage Journals: Discover world-class research

Abstract

Relational event models (REMs) are the primary choice for the analysis of relational-event network data. However, the standard REM assumes static parameters, which hinders the modeling of time-varying dynamics. This assumption might be too restrictive in real-life scenarios, making a model that allows for time-varying parameters more valuable. We introduce a state-space extension of the relational event model as a way to tackle this problem. The model has three main attributes. First, it provides a statistical framework of the temporal change of the parameters. Second, it enables the forecasting of future parameter values (which can be utilized to simulate new networks that can account for temporal dynamics in out-of-sample predictions). Third, it requires smaller data structures to be loaded into computer memory compared to the standard REM; this makes the model easily scalable to large networks. We conduct empirical analyses on bike-sharing data, corporate communications, and interactions among socio-political actors to illustrate model usage and applicability.

Keywords

Relational event model state-space model dynamic linear model data streams social networks

Introduction

Relational event data are a representation of a social network data that contain information on actors’ interactions over time (Butts and Marcum, 2017) by encoding who interacts with whom at what point in time. This data type can provide a wealth of insights for social science researchers interested in unveiling the intricacies of social interactions. With the advent of new technologies, such as web applications that allow for instant messaging, these data have become increasingly prevalent. Research analyzing relational event data has appeared in a variety of disciplines and topics, ranging from logistics (Vu et al., 2017) to finance (Zappa and Vu, 2021), and from social hierarchies (Redhead and Power, 2022) to friendships (Goodreau et al., 2009; Meijerink-Bosman et al., 2023).

The relational event model (REM) introduced by Butts (2008) is the stepping stone of most of the research published in this field. This model has been extended in many different ways. Perry and Wolfe (2013) have proposed a partial likelihood approach for modeling the receiver given the sender. Extensions have been proposed to model the memory decay of past interactions (Arena et al., 2023a). Karimova et al. (2023) applied regularization methods to perform variable selection in relational event models.

The relational event model is dynamic in the sense that its log-linear regression structure allows the introduction of time-varying covariates. This allows the rate at which specific network interactions occur to vary over time. However, the REM does not allow the regression parameters themselves to vary over time, which might be a restrictive assumption, given the dynamic nature of relational event data. Although the REM allows for the rate of dyadic interactions to be different at different points in time, the exact way in which it depends on previous interactions is assumed to remain constant. In real-life social interaction data the interaction rate between actors can be highly dynamic (Boschee et al., 2015; Klimt and Yang, 2004; O’Mahony and Shmoys, 2015) and the way in which actors respond to different levels of transitivity or subgroup membership can change over time. Social network researchers are interested in understanding what drives these dynamics. For example, it could be the case that the average event rate (captured by the intercept) changes over time, while the effects of social interaction structure (e.g. inertia, reciprocity, participation shifts) remain roughly constant. Conversely, it could also be that the intercept remains constant and structural interaction mechanisms time-vary. Either way, it may often be desirable to allow the values of the parameters in the relational event model to vary over time, rather than only the values of the (exogenous and endogenous) statistics.

In this paper, we propose a state-space model for relational event data. The state-space model is the gold standard for dynamic modeling (West et al., 1985; West and Harrison, 1989, 2006), but it has not been applied in the relational event model literature. In traditional state-space models, the parameters change after every observation (Petris et al., 2009). However, this is not realistic within the REM framework. First, a single event says very little about the global network dynamics. Second, the REM uses sufficient statistics in its specification. These are computed as functions of past events, and having only a single event precludes the computation of most network statistics. In order to obtain insight into real-world network dynamics, it often suffices to understand how the parameters change on a slightly wider temporal scale. Thus, to take this into account, we split the event sequence into blocks of meaningful time units (e.g. seconds, minutes, hours) and assume the parameters to be fixed throughout that specific period only. Then, we assume a state-space model for these relational event sequence blocks.

This model has the following properties: (i) it provides a formal framework to model the dynamics governing the time-varying parameters; (ii) it allows for the estimation of parameters for future time points (this can be used to simulate future networks that takes temporal dynamics into account in out-of-sample predictions); (iii) by separating the relational event sequence in blocks, only a smaller portion of the data matrix needs to be loaded into computer memory, which makes the model much more scalable to large networks than the traditional REM. Other approaches have also been proposed for time-varying REM parameters (e.g. Kamalabad et al., 2023; Mulder and Leenders, 2019; Vu et al., 2011) but they do not possess the same attractive properties as the state-space model. To illustrate the model, we conduct empirical analyses on bike sharing in New York City (O’Mahony and Shmoys, 2015), corporate email communications (Klimt and Yang, 2004), and interactions among socio-political actors in India (Boschee et al., 2015).

The remainder of the text is structured as follows: Section 2 briefly describes the relational event model; Section 3 presents the state-space model and describes algorithms used for model estimation and prediction; Section 4 features the application of the proposed model to three different empirical data sets; and Section 5 concludes the paper with a discussion of the main points and offers future research routes.

The relational event model

A relational event sequence is defined as an ordered sequence of dyadic events among a finite set of actors along a time scale (Butts and Marcum, 2017). In its simplest form, a dyadic event entails a sender of an interaction, a receiver, and the time point at which the interaction occurred. Thus, let a network contain $N$ actors and $M$ events observed in a time window $(0, τ]$ , with $τ > 0$ . A relational event sequence is then defined by

\begin{matrix} E = {e_{m} = (t_{m}, s_{m}, r_{m}) : (s_{m}, r_{m}) \in R (t_{m}), \\ 0 < t_{1} < t_{2} < \dots < t_{M} < τ}, \end{matrix}

where $e_{m}$ is the $m^{th}$ event in the sequence, $t_{m}$ is the timestamp of this event and $(s_{m}, r_{m})$ is the pair of actors involved in the interaction, with sender $s \in {1, 2, \dots, N}$ and receiver $r \in {1, 2, \dots, N | r \neq s}$ . The set $R (t)$ is called the risk set and it comprises all possible dyads without self-loops at a particular point in time; the size of the risk set is $N (N - 1)$ .

Butts (2008) assumes a constant piecewise exponential model for these data (Friedman, 1982), where the rate of events between actors $s$ and $r$ has a log-linear form $λ_{sr} (t | E) = \exp {{x'}_{sr} (t) β}$ . The vector $x_{sr} (t)$ is a row of the data matrix $X$ concerning actors $s$ and $r$ at time $t$ and $β$ is a vector of regression parameters. Note that $x_{sr} (t)$ can also contain time-varying covariates. Typically, most of these covariates will be endogenous network statistics such as inertia, reciprocity, or participation shifts (see Meijerink-Bosman et al. (2021) for an introduction to relational event modeling). These endogenous network statistics are computed as functions of past events. For instance, the inertia statistic for dyad $(s, r)$ at time $t$ can be computed as the volume of past events from $s$ to $r$ up until time $t$ . Correspondingly, a positive parameter for the inertia effect would imply that on average dyads who have been active in the past are more likely to be active in the (near) future. Moreover, exogenous statistics can also be included which could reflect actors (or dyad) attributes (e.g. same gender). All of these statistics can then be used to test for the presence of specific behavioral patterns (e.g. activity rate, popularity, etc.) and their direction (positive or negative).

The likelihood function of this model is given by

\begin{matrix} p (E | β) \\ = Π_{m = 1}^{M} [λ_{s_{m} r_{m}} (t_{m} | E) \underset{(s, r) \in R (t_{m})}{Π} \exp {- (t_{m} - t_{m - 1}) λ_{sr} (t_{m} | E)}] . \end{matrix}

(1)

In this paper, we will write $E | β ~ REM (β)$ to represent an event sequence $E$ fitted to a relational event model. Although the REM allows for time-varying covariates, we will refer to this model as the static REM due to the time-invariant nature of the regression parameters $β$ .

A state space approach for the relational event model

In traditional state-space models, the parameters change after every observation (Petris et al., 2009). Thus, assuming a state-space model for the relational event sequence, for every event $m = 1, 2, \dots, M$ , we would have

\begin{matrix} e_{m} | β_{m}, e_{1}, \dots, e_{m - 1} ~ REM (β_{m}) \\ β_{m} = A_{m} β_{m - 1} + w_{m}, \end{matrix}

(2)

where $β_{m}$ is the parameter vector that generated the $m^{th}$ event and $A_{m}$ is a pre-specified weight matrix that reflects how the parameters at time $t_{m}$ are affected by the parameters at time $t_{m - 1}$ . Moreover, $w_{m} ~ N (0, W_{m})$ is a vector of innovations. Innovations play the role of residuals and capture the unexplained variability in the data after accounting for the systematic components of the model $A_{m} β_{m - 1}$ . The matrix $W_{m}$ is the covariance matrix of the innovations. This setup represents a state-space model for the relational event sequence in which parameters change after every successive event.

Generally, in relational event modeling, the interest is in fairly broad temporal dynamics that are unlikely to be captured by changing the parameters after every event. For instance, Mulder and Leenders (2019) considered relational event data of email exchanges among colleagues in a company’s department. They were interested in analyzing how the information-sharing behavior changes over a period of several months. Thus, they analyzed the temporal changes on monthly intervals, even though the timing of the emails themselves is much more fine-grained (in their dataset milliseconds).

In this study, we consider the case where it can be assumed that the effects represented by $β_{m}$ only change gradually, rather than after every successive event. Therefore, to reduce model dimensionality and avoid computational issues, we assume that $β$ only changes between pre-specified time intervals (e.g. seconds, minutes, hours, etc.). Then, considering $E_{t}$ as a batch containing an ordered sequence of $M_{t}$ events at time $t$ , such that $E_{t} = (e_{1}, e_{2}, \dots, e_{M_{t}})$ . Then, we have

\begin{matrix} E_{t} | β_{t} ~ REM (β_{t}) \\ β_{t} = A_{t} β_{t - 1} + w_{t}, \end{matrix}

(3)

where $A_{t}$ is the weight matrix for the batch of events at time $t$ , and $w_{t} ~ N (0, W_{t})$ is a vector of innovations for that same batch.

Vieira et al. (2023) have shown that a REM can be approximated by a Gaussian distribution in case the size of the batch $M_{t}$ is sufficiently large. Then, the first level of equation (3) can be rewritten, resulting in the following dynamic linear model

\begin{matrix} {\hat{β}}_{t} = β_{t} + N (0, {\hat{Ω}}_{t}) \\ β_{t} = A_{t} β_{t - 1} + w_{t}, \end{matrix}

(4)

where ${\hat{β}}_{t}$ and ${\hat{Ω}}_{t}$ are the maximum likelihood estimates obtained from the relational event sequence $E_{t}$ and its error covariance matrix, respectively. The first line in equation (4) is called the observation equation, where ${\hat{β}}_{t}$ is the measurement and ${\hat{Ω}}_{t}$ is the measurement error at time $t$ . The second line is the state equation, thus, the parameter $β_{t}, t = 1, 2, \dots, T$ can be viewed as a latent variable called the state. The goal is to estimate $β_{t}$ by combining the information from the measurement equation and the state equation.

Computationally, one of the main advantages of the setup in equation (4) is that it saves considerable amounts of computer memory. For the static REM, the data matrix $X$ is a 3-dimensional object with dimensions $M \times N (N - 1) \times P$ , where $M$ is the number of events, $N (N - 1)$ is the size of the riskset and $P$ is the number of covariates in the model. The state-space model, however, analyzes the whole network in smaller batches $E = {E_{1}, E_{2}, \dots, E_{T}}$ , each of which with $M_{1}, M_{2}, \dots, M_{T}$ events. Thus, given that $M >> M_{t}$ , for $t = 1, 2, \dots, T$ , this model requires (much) smaller data structures to be loaded into computer memory making it easily scalable to large networks (Vieira et al., 2023). This is a big advantage when compared to the static REM, especially for researchers working with limited computer resources.

In summary, the main idea of the state-space model is that for every discrete time point, there exists an underlying temporal process in nature (e.g. the natural conditions through which actors interact between themselves), which we call the state process $β_{t}, t = 1, 2, \dots, T$ , that generates the relational event sequence $E_{t}$ . However, given that one can only observe the relational events, a relational event model for $E_{t}$ can be specified from which the estimates are used to make inferences about nature. Therefore, this model provides a formal framework to model the dynamics of network parameters that reflect the relative magnitude of social mechanisms and how these change over time.

Weight matrix specification

The matrix $A_{t}$ in equation (4) weighs how the parameters in the past period affects the parameters in the next time point. As is common in the state-space modeling literature, one can either estimate the elements in the matrix from the data or pre-specify them based on the context in the application at hand. The estimation of $A_{t}$ in every step is very computationally intensive, however, which is particularly problematic in the current setup as fitting basic relational event models is already very computationally demanding by itself (Lerner and Lomi, 2020; Vieira et al., 2023), which would require Markov chain Monte Carlo algorithms. Moreover, when large instantaneous changes (such as change-points, see Kamalabad et al., 2023) of the parameters are unlikely and changes are fairly smooth on average, the estimation of the $A_{t}$ matrices would make the algorithm unnecessarily complex. Therefore, the weight matrix $A_{t}$ will be pre-specified. Since the operation is $A_{t} β_{t - 1}$ , the values in every row will determine how a specific component of $β_{t}$ changes in relation to $β_{t - 1}$ . A natural choice for the weight matrices would be to use identity matrices. This choice implies that, on average, our best guess for the REM parameters in the next segment are the values of these parameters from the previous segment. Hence, from the remainder of this paper, the matrices $A_{t}$ are considered to be known and equal to the identity matrix. We shall, however, keep the general notation with $A_{t}$ to reinforce the idea that alternative choices can be made for the weight matrices depending on the researcher’s expectations regarding the behavior of the event sequence.

The state process innovation covariance matrix

From equation (4), it is clear that the residual covariance matrix of the state process equation $W_{t}$ is unknown. Estimating this matrix from the data would require carrying out a sampling scheme using Markov chain Monte Carlo (MCMC) (Petris et al., 2009, see Chapter 4), which can considerably slow down model estimation. Because there is already a heavy computational burden when analyzing relational event data using REMs (Vieira et al., 2023), we propose an alternative, computationally efficient approach.

First, note that the residual state process covariance matrix is responsible for balancing the passing of information from $β_{t - 1}$ to $β_{t}$ . If $W_{t}$ is large, most of the information used to estimate $β_{t}$ comes from ${\hat{β}}_{t}$ . When $W_{t}$ is small, it mainly comes from $β_{t - 1}$ . In the Kalman filter, the uncertainty of $β_{t - 1}$ is given by $P_{t - 1}$ . Moving from $β_{t - 1}$ to $β_{t}$ results in an increased uncertainty resulting from the state equation definition (see equation (4)). This uncertainty is quantified by

A_{t} = A_{t} P_{t - 1} {A'}_{t} + W t .

(4)

If there is no error in the estimation of the state $β_{t}$ , then $P_{t} = A_{t} P_{t - 1} {A'}_{t}$ . If there is some error, then $P_{t}$ is increased by $W_{t}$ . Therefore, the main role of $W_{t}$ is the quantification of information loss from $t - 1$ to $t$ . The usual way to specify this matrix is by utilizing a discount factor (West and Harrison, 1989), such that

W_{t} = \frac{1 - δ}{δ} A_{t} P_{t - 1} {A'}_{t},

(5)

where $0 < δ \leq 1$ . In other words, $W_{t}$ is a proportion of $A_{t} P_{t - 1} {A'}_{t}$ . Empirically, it is customary to choose $δ$ such that $0.9 < δ \leq 0.99$ (Petris et al., 2009). In case out-of-sample predictions are of interest, a grid search can be used to select the model with the $δ$ that provides the best predictive performance.

Model estimation: The Kalman filter

The Kalman filter is an iterative procedure used for parameter estimation and prediction (Kalman, 1960; Kalman and Bucy, 1961). This algorithm leverages the Gaussian structure of the model in equation (4). The steps to estimate this model via the Kalman filter are given in Algorithm 1. The Kalman filter uses recursive estimation and prediction of the state of a dynamic system in the presence of noise. It operates recursively by updating its estimate of the state of the system as new measurements become available over time.

Algorithm 1: Kalman filter for relational event model estimates
Data: ${E_{1}, E_{2}, \dots, E_{T}}$ and ${X_{1}, X_{2}, \dots, X_{T}}$ Result: ${{\bar{β}}_{t}, {\bar{P}}_{t}},$ for $t = 1, 2, \dots, T$ 1 Initialize the state estimate ${\bar{β}}_{0} = 0$ 2 Initialize the covariance matrix ${\bar{P}}_{0} = s^{2} I$ 3 for $t \leftarrow 1$ to $T$ 4 Estimate the parameters ${\hat{β}}_{t}$ and standard error ${\hat{Ω}}_{t}$ from $E_{t}$ 5 (1) Compute the predicted state 6 ${\tilde{β}}_{t} \leftarrow A_{t} {\bar{β}}_{t - 1}$ 7 (2) Compute the value for $W_{t}$ 8 $W_{t} \leftarrow ((1 - δ) / δ) A_{t} {\bar{P}}_{t - 1} {A'}_{t}$ 9 (3) Computed the predicted state covariance matrix 10 ${\tilde{P}}_{t} \leftarrow A_{t} {\bar{P}}_{t - 1} {A'}_{t} + W_{t}$ 11 (4) Compute the Kalman Gain 12 $K_{t} \leftarrow {\tilde{P}}_{t} ({\tilde{P}}_{t} + {\hat{Ω}}_{t})^{- 1}$ 13 (5) Compute the estimated current state 14 ${\bar{β}}_{t} \leftarrow {\tilde{β}}_{t} + K_{t} ({\hat{β}}_{t} - {\tilde{β}}_{t})$ 15 (6) Update the current state error matrix 16 _ ${\bar{P}}_{t} \leftarrow (I - K_{t}) {\tilde{P}}_{t}$

Algorithm 1: Kalman filter for relational event model estimates

Data:

{E_{1}, E_{2}, \dots, E_{T}}

and

{X_{1}, X_{2}, \dots, X_{T}}

Result:

{{\bar{β}}_{t}, {\bar{P}}_{t}},

for

t = 1, 2, \dots, T

1 Initialize the state estimate

{\bar{β}}_{0} = 0

2 Initialize the covariance matrix

{\bar{P}}_{0} = s^{2} I

3 for

t \leftarrow 1

T

4 Estimate the parameters

{\hat{β}}_{t}

and standard error

{\hat{Ω}}_{t}

from

E_{t}

5 (1) Compute the predicted state
6

{\tilde{β}}_{t} \leftarrow A_{t} {\bar{β}}_{t - 1}

7 (2) Compute the value for

W_{t}

W_{t} \leftarrow ((1 - δ) / δ) A_{t} {\bar{P}}_{t - 1} {A'}_{t}

9 (3) Computed the predicted state covariance matrix
10

{\tilde{P}}_{t} \leftarrow A_{t} {\bar{P}}_{t - 1} {A'}_{t} + W_{t}

11 (4) Compute the Kalman Gain
12

K_{t} \leftarrow {\tilde{P}}_{t} ({\tilde{P}}_{t} + {\hat{Ω}}_{t})^{- 1}

13 (5) Compute the estimated current state
14

{\bar{β}}_{t} \leftarrow {\tilde{β}}_{t} + K_{t} ({\hat{β}}_{t} - {\tilde{β}}_{t})

15 (6) Update the current state error matrix
16 _

{\bar{P}}_{t} \leftarrow (I - K_{t}) {\tilde{P}}_{t}

In the initial state ${\bar{β}}_{0}$ is initialized as a vector of zeros and the covariance matrix ${\bar{P}}_{0}$ is initialized as a diagonal matrix with $s^{2}$ being large, to reflect the lack of information at $t = 0$ (given that measurements are yet to be observed). The parameters ${\bar{β}}_{t}$ and ${\bar{P}}_{t}$ are the estimated states and their respective error covariance matrices, for $t = 1, 2, \dots, T$ . The Kalman gain $K_{t}$ is a quantity that balances the uncertainty in the measurement, as reflected by its standard error matrix ${\hat{Ω}}_{t}$ , and the uncertainty in the predicted state, reflected by ${\tilde{P}}_{t}$ . It determines how much weight will be given to the new measurement relative to the predicted state. Matrix $W_{t}$ is the residual covariance matrix of the state process, as defined in equation (2). Hence, this procedure allows iterative updating of estimates as new information streams in, serving as a tool for real-time inference. Due to the Gaussian approximation of the state-space REM, the proofs of the Kalman filter (Becker, 2018; Petris et al., 2009) directly apply. Code to estimate this model is available on https://github.com/Fabio-Vieira/MIO-24-0035.

Predicting future events

One of the main advantages of the state-space model is that it provides a formal framework to predict future events while incorporating temporal dynamics. Algorithm 2 describes the steps to generate future events. This algorithm works by leveraging the state equation to generate predictions for future values of the network states. Subsequently, it generates new networks using that predicted stated as the true value for the parameters in the model.

Algorithm 2: Prediction algorithm
Data: $X_{T}$ and ${{\bar{β}}_{T}, {\bar{P}}_{T}}$ Result: $H$ steps ahead simulated network 1 for $h \leftarrow 1$ to $H$ do 2 Compute predicted state: ${\tilde{β}}_{T + h} \leftarrow A_{T + h} {\bar{β}}_{T}$ 3 Set time window of the analysis $τ$ 4 while $t_{m} < τ$ do 5 (1) Compute event rate; 6 $λ_{sr} (t_{m}) \leftarrow \exp {x_{sr} (t_{m}) {\tilde{β}}_{T + h}}$ 7 (2) Sample inter-event time of new event 8 $(t_{m} - t_{m - 1}) ~ Exp (\sum_{(s, r) \in R (t_{m})} λ_{sr} (t_{m}))$ ; 9 (3) Sample new dyad weighted by 10 $λ_{sr} (t_{m}) / \sum_{(s, r) \in R (t_{m})} λ_{sr} (t_{m})$ 11 (4) Update statistics with the newly generated event

Algorithm 2: Prediction algorithm

Data:

X_{T}

and

{{\bar{β}}_{T}, {\bar{P}}_{T}}

Result:

H

steps ahead simulated network
1 for

h \leftarrow 1

H

do
2 Compute predicted state:

{\tilde{β}}_{T + h} \leftarrow A_{T + h} {\bar{β}}_{T}

3 Set time window of the analysis

τ

4 while

t_{m} < τ

do
5 (1) Compute event rate;
6

λ_{sr} (t_{m}) \leftarrow \exp {x_{sr} (t_{m}) {\tilde{β}}_{T + h}}

7 (2) Sample inter-event time of new event
8

(t_{m} - t_{m - 1}) ~ Exp (\sum_{(s, r) \in R (t_{m})} λ_{sr} (t_{m}))

;
9 (3) Sample new dyad weighted by
10

λ_{sr} (t_{m}) / \sum_{(s, r) \in R (t_{m})} λ_{sr} (t_{m})

11 (4) Update statistics with the newly generated event

Here, $Exp (λ)$ is an exponential distribution with rate $λ$ and $τ$ is a pre-specified value that represents the original time scale of the network. For instance, if events were observed in seconds and the analysis was conducted in hourly windows, then we need to generate events until T = 3600 seconds (i.e. the number of seconds in 1 hour).

Finally, to generate prediction intervals, the covariance matrix of the newly predicted state ${\tilde{β}}_{T + h}$ is needed. This matrix is given by ${\tilde{P}}_{T + h} = A_{T + h} {\bar{P}}_{T} {A'}_{T + h} + W_{T + h}$ . One must compute intervals for the predicted state and repeat the steps in Algorithm 2 to generate new networks that will represent the prediction intervals. Generating networks covering the distribution of the predicted state provides a range of scenarios regarding the likely future configurations of the network.

Empirical illustrations

In this Section, we provide three empirical examples to illustrate the model’s ability to capture time series relational event patterns. The goal is to show how the model can be used to study the temporal dynamics of social mechanisms and also to disentangle what explains the change in interaction rates in applications on bike sharing (O’Mahony and Shmoys, 2015), email communications (Klimt and Yang, 2004), and interactions among socio-political actors (Boschee et al., 2015). In all illustrations, we employed a discount factor of $δ = 0.99$ (we also considered $δ = 0.9$ and $δ = 0.95$ and found results that were virtually the same for all cases). The parameter $s^{2}$ , used to initialized the covariance matrix ${\bar{P}}_{0}$ at $t = 0$ , was fixed at 100.

Bike sharing in New York City

The sharing of Citi bikes in the city of New York is registered in this data set with timestamps, and start and end stations of each bike ride (O’Mahony and Shmoys, 2015).This data is freely available on https://citibikenyc.com/system-data. Table 1 shows the first five rows of the relational event sequence, the columns for sender and receiver represent the station where the ride started and the station where it ended, respectively. The data were analyzed hourly from 9 AM to 6 PM and represent the first 10 days of August 2023. The data have 111 stations and the 50 most active were selected. In total, 13,830 events were analyzed.

Table 1.

First five rows of the bike sharing data.

Time	Sender	Receiver
2023-08-01 00:06:33	JC014	JC006
2023-08-01 00:08:48	JC008	HB201
2023-08-01 00:11:27	JC014	JC008
2023-08-01 00:18:25	JC066	JC009
2023-08-01 00:22:14	JC008	JC014

Figure 1 displays the number of events per hour, where every bar represents an hour. The time series displays strong seasonality with a spike in the number of events for the last 3 hours of the time window. The number of events per batch (i.e. per hour) ranges from a minimum of 31 to a maximum of 329 events. The number of dyads at risk at each point in time is 50 × 49 = 2450. Hence, the largest data matrix loaded into computer memory has dimensions 329 × 2450 × 5, as opposed to 13,830 × 2450 × 5 required by the static REM.

Figure 1.

Number of events (bike rides) per hour from 9 AM to 6 PM for the first 10 days of August 2023.

We specified the rate of our model with an intercept and four network effects (out-degree sender, in-degree receiver, reciprocity, and inertia). The out-degree sender represents the number of bikes that started a ride at a certain station. The in-degree receiver is the number of bikes that finished a ride at a certain station. Reciprocity is the number of times that station A received a bike from station B, after sending a bike to station B. Inertia is the number of times that someone started a bike ride at station A and ended at station B. Thus, substantively, this model describes the bicycle traffic in terms of activity (out-degree) and popularity (in-degree) of stations. Moreover, inertia and reciprocity are also important in this context. For instance, if we think simply in terms of spatial distribution of stations, it is likely that commuters will use bicycles everyday to go from a station close to their homes to a station close to their work, accentuating the effect of inertia in the data. Then, they would take the inverse route to go back home, accentuating the effect of reciprocity.

Figure 2 shows the predicted state per hour in the first 10 days of August 2023. The shaded area is a $95 %$ confidence interval. We can see that all effects present a stationary behavior with strong seasonality, which is very similar to the pattern displayed by the series of the number of events in Figure 1. Not all effects present spikes in the last 3 hours of the time window analyzed, which are mostly present in the intercept and out-degree sender. Inertia and in-degree receiver present seasonality without the spikes. Reciprocity presents spikes in the first hours of the period analyzed.

Figure 2.

Predicted states per time point in the first 10 days of August 2023, for the bike sharing application.

Finally, Figure 3 displays a plot of the number of events with an out-of-sample prediction of one-day ahead, for the 10-day window analyzed. As it can be seen in Table 1, the events are recorded with a precision down to the seconds. Since we analyzed the relational event sequence in hourly intervals, we specified $τ$ , from Algorithm 2, as equal to 3600 (the number of seconds in 1 hour). The blue line represents the point estimates, the shaded area is $95 %$ prediction intervals (PI) and the red line is the observed values out-of-sample. In Table 2, which displays the numerical values of the predictions, prediction intervals, observed values and errors (observed minus predicted), can be seen that all observed values are contained in the prediction intervals.

Figure 3.

Number of events (bike rides) per hour from 9 AM to 6 PM for the first 10 days of August 2023 with predictions for the next day.

Table 2.

Predictions of number of events with 95% confidence interval and prediction error for the Bike sharing data.

One day ahead prediction—Bike sharing
Hour	$2.5 %$	Prediction	$97.5 %$	Observed	Error	Covered by PI
AM	88	100	154	123	23	✓
AM	90	89	118	104	15	✓
AM	106	115	164	104	−11	✓
PM	105	130	153	129	−1	✓
PM	97	124	182	146	20	✓
PM	116	130	158	138	8	✓
PM	123	130	160	126	−4	✓
PM	136	165	235	208	43	✓
PM	241	273	352	272	−1	✓
PM	204	275	367	223	−52	✓

Where $2.5 %$ and $97.5 %$ represent the bounds of the interval. The last column contains a check mark (✓) if the observed values fell into the $95 %$ prediction interval and a cross mark (✗) in case the observation was covered by the prediction interval.

Email communications in corporate networks

Enron was a company in the energy sector that went bankrupt after a series of events involving dubious accounting practices. The data set containing email communications among its employees has been featured in the literature in the past (Klimt and Yang, 2004; Perry and Wolfe, 2013). In this illustration, we follow Perry and Wolfe (2013) and analyze only those events that had five receivers or less. We consider the email data in the years of 2000 and 2001, this is the period in which the “Enron scandal” was unveiled. For instance, according to Bondarenko (2004), Enron’s share went from around $90 in the mid-2000 to less than $12 by the end of 2001. This time frame contains 153 actors and 34,180 events; we analyzed the events in monthly intervals. Our interest is in the general change of the email interaction structure over the 2-year period; intervals of 1 month results in batches with a sufficient number of events to use the Gaussian approximation approach (Vieira et al., 2023). The number of events in each batch ranges from 363 to 3403. Table 3 displays the first five rows in the relational event sequence.

Table 3.

First five rows of the Enron email data.

Time	Sender	Receiver
2000-01-03 01:47:00	67	147
2000-01-04 00:51:00	46	113
2000-01-04 02:18:00	34	117
2000-01-04 02:39:00	138	65
2000-01-04 04:11:00	34	117

Figure 4 displays the number of events per month during the period analyzed, every bar represents 1 month. We can clearly see a growing trend in the number of emails across these 2 years. We specified a model with an intercept and six network effects (out-degree sender, in-degree sender, out-degree receiver, in-degree receiver, reciprocity, and inertia). The model captures the activity of senders and receivers through out-degree sender and out-degree receiver. It also includes the popularity of senders and receivers via in-degree sender and indegree-receiver. We included reciprocity to capture the tendency of email messages to be reciprocated and inertia to reflect the fact that usually in a company senders tend to mail the same receivers over and over (e.g. lower level employees reporting to managers via email).

Figure 4.

Number of events (emails) per month from January 2000 to December 2001.

Figure 5 presents the series of predicted states for this model, with shades representing $95 %$ confidence intervals. The intercept captured the growing trend displayed in the series of the number of events in Figure 4. The communication seems to be mainly driven by the out-degree sender effect in 2000 and then switches to the in-degree sender, which became the main driver of communication in 2001. The other effects display a fairly constant pattern across the 2 year period. Inertia is the only one showing a slight downwards trend. Interestingly, we can see that after a first increase of inertia in the beginning of the observational period, a gradual decrease of inertia towards 0 occurs as time progresses to the year 2002 (when the Enron company was officially bankrupt).

Figure 5.

Predicted states per month from January 2000 to December 2001 for the Enron email data set.

Finally, as seen in Table 3 the events are recorded with timestamps down to the seconds. Thus, given that we analyzed events monthly, we specify $τ$ , from Algorithm 2, as being equal to τ = 60 × 60 × 24 × 30 = 2,592,000 seconds (which is approximately equal to the number of seconds in a month). Figure 6 displays the time series of number of events with an out-of-sample prediction for January 2002. The shaded area is a $95 %$ prediction interval (PI), the blue line represents the prediction and the red line the observed value. From Table 4, which shows the prediction numerical values and prediction error, it can be seen that the observed value is contained in the prediction interval.

Figure 6.

Number of events (emails) per month from January 2000 to December 2001 with a prediction for January 2002.

Table 4.

Predictions of number of events with 95% confidence interval and prediction error for the Enron email data.

One month ahead prediction—Enron email
Month–year	$2.5 %$	Predicted	$97.5 %$	Observed	Error	Covered by PI
Jan–2002	1434	1658	1822	1533	−125	✓

Where $2.5 %$ and $97.5 %$ represent the bounds of the interval. The last column contains a check mark (✓) if the observed values fell into the $95 %$ predicition interval and a cross mark (✗) in case the observation was covered by the prediction interval.

Interactions between socio-political actors within news outlets

The Integrated Crisis Early Warning System (ICEWS) is a database consisting of dyadic interactions among socio-political actors collected from news outlets (Boschee et al., 2015).This system is used by the United States military to provide conflict early warnings that can help decision-makers to allocate resources and draw effective mitigation plans (O’brien, 2010). In this illustration, we analyze the network of monthly events among Indian socio-political actors from 2011 to 2018. The network contained a total of 1612 actors. Here, only the 50 most active actors were selected and in total, there were 419,274 events. Table 5 shows the first five rows of the relational event sequence. The number of events in each month ranges from 1397 to 8368.

Table 5.

First five rows of the ICEWS event history of interactions among Indian political actors.

Time	Sender	Receiver
2011-01-01	Manmohan Singh	Citizen (India)
2011-01-01	Manmohan Singh	Citizen (India)
2011-01-01	Bharatiya Janata	Party Member (India)
2011-01-01	Activist (India)	India
2011-01-01	Police (India)	Citizen (India)

Figure 7 shows the distribution of the number of events from January 2011 to December 2018. These data display a strong upward trend and a very slight seasonality. We specify a model with an intercept and six network effects. We focus on analyzing the activity and popularity of senders and receivers by including the network effects out-degree sender (activity) and in-degree receiver (popularity). We also include inertia and reciprocity, to capture the tendency of actors to forge communications partnerships and reciprocate events, which can be important in the context of interactions amongst political entities. Finally, the participation shifts ABBA and ABXA were included in the model. The former indicates that after actor A sending an event to actor B, actor B immediately reciprocates (the difference between ABBA and reciprocity is that ABBA captures immediate reciprocation of the BA event by A, whereas reciprocity captures the tendency for A to respond to the accumulated past BA events). The latter indicates that after actor A sends an event to actor B, another actor X (different than B) sends an event to A. These participation shifts can be important in this context, given that politicians often refer to other politicians when addressing the press and giving speeches (Bhatia, 2006).

Figure 7.

Number of events among Indian political actors extracted from news outlets per month from January 2011 to December 2018.

Figure 8 shows the predicted states for the effects in this model. The shaded areas are the 95% confidence intervals. As expected, the intercept presents a strong upward trend. A slight seasonality is clearly seen with the intercept, out-degree sender, and in-degree receiver, which indicates that these effects are mimicking the slight seasonality displayed in the series of number of events. The other effects display stationary behavior (except for inertia that displays a slight downward trend). Overall, the dynamics seem to be fairly stable for reciprocity, ABBA, and ABXA. Out-degree sender, in-degree receiver, and inertia display patterns that might point toward a future change in the level of these effects, with out- and in-degree being likely to increase slightly at the end of the observed time window and inertia being more likely to slightly decrease.

Figure 8.

Predicted states per month from January 2011 to December 2018 for the ICEWS data set of interactions among Indian political actors.

Finally, Figure 9 shows the number of events with an out-of-sample prediction of 1-year-ahead for the year 2019. For these data, as opposed to the previous data sets presented, we do not have the timestamps of the events (see Table 5). All we know is the order in which the events occurred. This makes it hard to specify the $τ$ parameter required by Algorithm 2. In this case, a grid search was conducted using the Root-Mean-Square Error (RMSE) to select the best value for $τ$ , comparing the model predictions with the true observations for year 2019. During this experiment, we found that decreasing $τ$ to under 30 or increasing it to over 100 provided excessively large RMSEs. Thus, we only consider RMSEs values for $τ = {40, 60, 80}$ . The RMSEs values were $RMSE (τ = 40) = 1946.52$ , $RMSE (τ = 60) = 1044.28$ , and $RMSE (τ = 80) = 3483.11$ .

Figure 9.

Number of events among Indian political actors extracted from news outlets per month from January 2011 to December 2018 with a prediction for the year 2019.

Hence, Figure 9 presents the predictions derived from $τ = 60$ . The shaded area is a $95 %$ prediction interval (PI), the blue line represents the predictions and the red line the observed values. Though the predictions seem consistent with the actual pattern presented by the time series in the years analyzed, the errors are relatively large in the second half of the predicted period although the observation is again close to the predicted region at the end. Table 6 shows the numerical values for the predictions, observed values and errors. From this table, we see that the only values covered by the prediction intervals are March, April, and May. This is probably a reflection of how difficult it is to specify the parameter $τ$ when only the order of the events is known. It might even be the case that, for this relational event sequence, every month should have their own specific $τ$ . Developing strategies to determine the value of $τ$ for ordinal event sequences is a challenging, but highly important task. This is left for future research.

Table 6.

Predictions of number of events with 95% confidence interval and prediction error for the ICEWS data.

One year ahead prediction—ICEWS
Month	$2.5 %$	Prediction	$97.5 %$	Observed	Error	Covered by PI
Jan	6949	6945	7403	6556	−389	✗
Feb	7458	7672	8167	6756	−916	✗
Mar	7002	7115	7898	7289	174	✓
Apr	7434	7845	8226	8118	273	✓
May	7944	8145	8961	8282	137	✓
Jun	7262	7238	7813	6835	−403	✗
Jul	8561	8934	9564	7235	−1699	✗
Aug	8224	8569	9096	6888	−1681	✗
Sep	7924	8346	8635	7007	−1339	✗
Oct	7689	8124	8448	6408	−1716	✗
Nov	7484	7908	8408	6843	−1065	✗
Dec	7460	7893	8119	8368	475	✗

Discussion

The relational event model (REM) has been the main choice in the literature for analyzing relational event data. However, even though this data type is essentially dynamic, the standard REM does not support the modeling of time-varying parameters. The static nature of the parameters in this model can be considered a restrictive assumption in most applications (see Section 4). This paper presented a state-space model extension of the relational event model. This model has three main advantages over the traditional REM. First, it is equipped with a formal rule that drives the parameter changes over time, being flexible enough to capture time series patterns (e.g. trend and seasonality) in the estimates without requiring additional modeling effort. This allow us to better understand what drives the changes in social interaction rates in the data. Second, it provides a framework to predict future states that can be used to produce out-of-sample predictions that take the temporal dynamics into account. Third, it requires smaller data structures making it more easily scalable to larger networks than the static REM. The use of these smaller structures is a big advantage, particularly for researchers who do not have vast computational resources at their disposal.

We presented three empirical illustrations to showcase the method. These applications concerned a varied set of data on bike sharing in New York City (O’Mahony and Shmoys, 2015), email communications in corporate networks (Klimt and Yang, 2004), and interactions between Indian socio-political actors in news outlets (Boschee et al., 2015).The examples show that it can be quite important to model the time-varying dynamics in relational event networks. If the static REM had been employed, we would have only learned average effects and miss the temporal information brought up by the state-space model.

On the other hand, there may be cases where the proposed state-space relational event model may not be preferred. One example would be when there are sudden changes in the interaction behavior between the actors which would cause abrupt changes of the model parameters. In this case, the state of the model parameters before the change could result in bad predictions right after the change, due to violations of the pre-specified values in the weight matrices $A_{t}$

In this case, it may be preferred to estimate the weight matrix, even though this could be highly computationally intensive (see Section 3.1). State-space approaches for relational event models may also then be preferred (Kamalabad et al., 2023). Second, there might be rare instances where the interaction behavior is static in the sense that the model parameters remain approximately constant over the observational period. Using the state-space relational event model would then result in statistical overfitting where the noise could incorrectly be identified as the signal resulting in potential poor predictive performance. This may especially be the case when fitting relational event models with a very large number of possible drivers of social interaction behavior (Karimova et al., 2023).

We leave for future research the investigation of the choice of batch sizes. Most of the time, a researcher would be interested in a particular time unit (i.e. days, hours, weeks, etc.). In this case, the partition of the network is natural. However, it might be the case that partitioning according to a particular time unit might result in batches that are too small to use the Gaussian approximation. Thus, providing rules of thumb for the choice of batch sizes can aid researchers in deriving meaningful inferences from this model. Moreover, it is also useful to develop model evaluation metrics for the out-of-sample prediction. In the empirical illustrations we only predicted the future number of events (within a certain time period). For this purpose, metrics such as the root-mean-squared error (RMSE) would suffice for model evaluation. But, to obtain an estimate for the future number of events, multiple networks were generated. Therefore, there is a need to evaluate how well those simulated networks reflect the actual future observations. Although out-of-sample prediction has been used routinely to assess the fit of a relational event model, it is not yet clear how this would be used best in the case where model parameters change over time and the model might fit specific parts of the observation period better than other parts. Assessing model fit for dynamic models is a generally important issue and will need to be developed further in future research. Finally, another issue that needs to be tackled concerns the definition of the time window in which predictions will be generated, defined by $τ$ in Algorithm 2, in ordinal data.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Netherlands Organization for Scientific Research (NWO) Grant to JM and FV (452-17-006).

ORCID iD

Fabio Vieira

Data availability statement

The data associated with the manuscript is available upon request.

Author biographies

Fabio Vieira is a PhD candidate in the Department of Statistics and Methodology at Tilburg University, the Netherlands. He obtained his Master's degree in Statistics from the Federal University of Rio de Janeiro, Brazil. His research focuses on Bayesian multilevel models for dynamic social network data.

Roger Leenders is a professor at the Jheronimus Academy of Data Science and in the Department of Organization Studies at Tilburg University. He holds a PhD in sociology from the University of Groningen. He has published broadly on social network analysis, teams, innovation, and organization behavior in leading journals such as Organization Science, the Journal of Applied Psychology, the Journal of Product Innovation Management, Social Networks, and the Academy of Management Journal.

Joris Mulder is an associate professor in the Department of Methodology and Statistics at Tilburg University. He holds a PhD in applied Bayesian statistics from Utrecht University. His research focuses on Bayesian model selection and social network modeling.

References

Arena

Mulder

Leenders

RTA

(2023a) A Bayesian semi-parametric approach for modeling memory decay in dynamic social networks. Sociological Methods & Research 53: 1201–1251.

Arena

Mulder

Leenders

RTA

(2023b) How fast do we forget our past social interactions? Understanding memory retention with parametric decays in relational event models. Network Science 11(2): 267–294.

Becker

(2018) Kalman filter. Dipetik Juli, 1:2020.

Bhatia

(2006) Critical discourse analysis of political press conferences. Discourse & Society, 17(2):173–203.

Bondarenko

(2004) Enron scandal. Encyclopedia Britannica.

Boschee

Lautenschlager

O’Brien

, et al. (2015) ICEWS Coded Event Data.

Butts

(2008) A relational event framework for social action. Sociological Methodology 38(1): 155–200.

Butts

Marcum

(2017) A relational event approach to modeling behavioral dynamics. In: Pilny

Poole

(eds) Group Processes. Cham: Springer, pp.51–92.

Friedman

(1982) Piecewise exponential models for survival data with covariates. The Annals of Statistics 10(1): 101–113.

10.

Goodreau

Kitts

Morris

(2009) Birds of a feather, or friend of a friend? using exponential random graph models to investigate adolescent social networks. Demography 46(1): 103–125.

11.

Kalman

(1960) A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82(1): 35–45.

12.

Kalman

Bucy

(1961) New results in linear filtering and prediction theory. Journal of Basic Engineering 83(1): 95–108.

13.

Kamalabad

Leenders

Mulder

(2023) What is the point of change? Change point detection in relational event models. Social Networks 74: 166–181.

14.

Karimova

Leenders

RTA

Meijerink-Bosman

, et al. (2023) Separating the wheat from the chaff: Bayesian regularization in dynamic social networks. Social Networks 74: 139–155.

15.

Klimt

Yang

(2004) Introducing the enron corpus. In: CEAS.

16.

Lerner

Lomi

(2020) Reliability of relational event model estimates under sampling: How to fit a relational event model to 360 million dyadic events. Network Science 8(1): 97–135.

17.

Meijerink-Bosman

Arena

Karimova

, et al. (2021) remstats: Computes Statistics For Relational Event History Data. R package version 3.0.0.

18.

Meijerink-Bosman

Back

Geukes

, et al. (2023) Discovering trends of social interaction behavior over time: An introduction to relational event modeling: Trends of social interaction. Behavior Research Methods 55(3): 997–1023.

19.

Mulder

Leenders

RTA

(2019) Modeling the evolution of interaction behavior in social networks: A dynamic relational event approach for real-time analysis. Chaos, Solitons & Fractals 119: 73–85.

20.

O’brien

(2010) Crisis early warning and decision support: Contemporary approaches and thoughts on future research. International Studies Review 12(1): 87–104.

21.

O’Mahony

Shmoys

(2015) Data analysis and optimization for (citi) bike sharing. In: Proceedings of the AAAI conference on artificial intelligence, vol. 29.

22.

Perry

Wolfe

(2013) Point process modelling for directed interaction networks. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75(5): 821–849.

23.

Petris

Petrone

Campagnoli

(2009) Dynamic Linear Models With R. New York, NY: Springer Science & Business Media.

24.

Redhead

Power

(2022) Social hierarchies and social networks in humans. Philosophical Transactions of the Royal Society B 377(1845): 20200440.

25.

Vieira

Leenders

Mulder

(2023) Fast meta-analytic approximations for relational event models: Applications to data streams and multilevel data. arXiv preprint arXiv:2312.07177.

26.

Hunter

Smyth

, et al. (2011) Continuous-time regression models for longitudinal networks. In: Advances in Neural Information Processing Systems, pp.2492–2500.

27.

Lomi

Mascia

, et al. (2017) Relational event models for longitudinal network data with an application to interhospital patient transfers. Statistics in Medicine 36(14): 2265–2287.

28.

West

Harrison

(1989) The dynamic linear model. In: Bayesian Forecasting and Dynamic Models, Springer, pp.105–141.

29.

West

Harrison

(2006) Bayesian Forecasting and Dynamic Models. Springer Science & Business Media.

30.

West

Harrison

Migon

(1985) Dynamic generalized linear models and bayesian forecasting. Journal of the American Statistical Association 80(389): 73–83.

31.

Zappa

(2021) Markets as networks evolving step by step: Relational event models for the interbank market. Physica A: Statistical Mechanics and its Applications 565: 125557.

A state-space relational event modeling approach for learning dynamic social interaction behavior

Abstract

Keywords

Introduction

The relational event model

A state space approach for the relational event model

Weight matrix specification

The state process innovation covariance matrix

Model estimation: The Kalman filter

Predicting future events

Empirical illustrations

Bike sharing in New York City

Email communications in corporate networks

Interactions between socio-political actors within news outlets

Discussion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Data availability statement

Author biographies

References